Basic Problem Settings

We assume the readers already have the basic knowledge of active learning. In this chapter, we try to identify and describe different type of basic active learning problem settings.

Basic Problem Settings
Taxonomy
Pool-based Scenario
- Classification
- Regression
Stream-based Scenario
Query Synthesis Scenario

Taxonomy

In this chapter, we consider three types of scenarios and two types of tasks in the basic problem setting.

Scenarios: (where the queried instances are from)
- pool-based: select from a pre-collected data pool
- stream-based: select from a steam of incoming data
- query synthesis: generate query instead of selecting data
Task: (what we are going to accomplish)
- classification
- regression

According to scenarios and tasks, the AL works could be divided into the following sub-problem settings.

	Pool-based	Stream-based	Query synthesis
Classification	PB-classification (most works)	SB-classification	-
Regression	PB-regression	SB-regression	-

Pool-based Scenario

In pool-based setting, a bunch of unlabeled data should be collected in advance as a data pool. The purpose of pool-based active learning is to learn a model on the current data pool with as less labeled instances as possible. The instances need to be annotated are selected iteratively in the active learning loop with the corresponding query strategy.

The instances selection strategies evaluate how useful the instances are. So the AL strategies would give each instance a score. The score usually imply how much information the instance contains in the corresponding task. The instances with highest scores would be selected. Different strategies calculate the scores in different ways.

In pool-based scenario, batch mode selection is also important, i.e. select a batch of instances with the maximum information. Batch makes the query selection more efficient and avoids redundant information query. We summarize the idea of batch-mode selection here.

Classification

We have to note that a large amount of works focus on pool based classification. We categorized the current pool-based classification strategies by how they calculate the scores.

Intuition	Description	Comments
Informativeness	Uncertainty by the model prediction	Usually refers to how much information instances would bring to the model.
Representativeness-impart	Represent the underlying distribution	Normally used with informativeness. This type of methods may have overlaps with batch-mode selection.
Expected Improvements	The improvement of the model's performance	The evaluations usually take more time.
Learn to score	Learn a evaluation function directly.
Others	Could not classified into the previous categories

For more details, the list of works with short introductions could see here.

Regression

For active learning regression (ALR), there are two problem settings. Supervised ALR is similar to the conventional pool based AL where the selection proceed interactively. Unsupervised ALR (passive sampling sometimes) assume we don't have any labeled instances when we select data. So in unsupervised ALR, the selection is only happened once at the beginning. In this case, the active refers the way to select. We list several representative methods in the following table.

Active learning for Regression	Supervised	Unsupervised
Non-batch mode	QBC/EMCM/RSAL/GSy/iGS	P-ALICE/Gsx/iRDM
Batch mode	EBMALR	-

For more details, the list of works could see here.

Stream-based Scenario

In stream-based AL, the unlabeled data come with a stream manner, and the AL module decides whether to annotate the coming instance to update the model. This setting is also called online AL. There won't be any comparisons between different instances. This setting is not as popular as pool-based active learning. In most times, it needs to consider data drift where the underlying distribution is varying over time.

For both the classification and the regression tasks, the common methodology is to set a threshold and define a information measurement score, and the coming instance with a score above the threshold would be queried. The corresponding works would be found in the following links:

Stream-based Classification.
Stream-based Regression.

Query Synthesis Scenario

Instead of selecting instances, another type of works tries to generalize new instances to query, which is called Query synthesis. This field is not well developed in the past years. But there still are several works focus on it. For more details, the list of works could see here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AL_problem.md

AL_problem.md

Basic Problem Settings

Taxonomy

Pool-based Scenario

Classification

Regression

Stream-based Scenario

Query Synthesis Scenario

Files

AL_problem.md

Latest commit

History

AL_problem.md

File metadata and controls

Basic Problem Settings

Taxonomy

Pool-based Scenario

Classification

Regression

Stream-based Scenario

Query Synthesis Scenario