Skip to content

Latest commit

 

History

History
283 lines (225 loc) · 23.7 KB

pb_classification.md

File metadata and controls

283 lines (225 loc) · 23.7 KB

Pool-Based Active Learning for Classification

In this chapter, we use our taxonomy to classify different types of AL strategies. In each section and each type of strategy, we will make a short description at the beginning, then provide a more detail. And at the end, we will list the representative works under the category (with a short note).

We note that here we doesn't take batch mode as a dimension in our taxonomy. If you are only care about how to apply batch selection, please check here. The classification problems here include binary and multi-class classification (even some works can only be applied to binary classification). There also are some works focus on multi-class classification settings, please check here.

Taxonomy

In pool based AL, the strategy is in fact a scoring function for each instance to judge how much information it contains for the current task. Previous works calculate their scores in different ways. We summarize them into the following catagories.

Intuition Description Comments
Informativeness Uncertainty by the model prediction Usually refers to how much information instances would bring to the model.
Representativeness-impart Represent the underlying distribution Normally used with informativeness. This type of methods may have overlaps with batch-mode selection.
Expected Improvements The improvement of the model's performance The evaluations usually take more time.
Learn to score Learn an evaluation function directly.
Others Could not classified into the previous categories

Categories

1. Informativeness

The informativeness usually refers to how much information it would bring to the model. Thus, the evaluation is usually depending on the current trained model.

1.1. Uncertainty-based sampling

This is the most basic strategy of AL. It aims to select the instances which are most uncertain to the current model. There are basically three sub-strategies here.

  • Classification uncertainty
    • Select the instance close to the decision boundary.
  • Classification margin
    • Select the instance whose probability to be classified into to the two most likely classes are most close.
  • Classification entropy
    • Select the instance whose have the largest classification entropy among all the classes.

The equations and details could see here.

Works:

1.2. Disagreement-based sampling

This types of methods need a group of models. The sampling strategy is basing on the output of the models. The group of models are called committees, so this type of works are also named Query-By-Committee (QBC). The intuition is that if the group of committees are disagree with the label of an unlabeled instance, it should be informative in the current stage.

  • Disagreement measurement
    • Vote entropy
    • Consensus entropy

Works:

1.3. Model-change-based

If the selected instance will bring the largest model change, it could be considered as the most informative instance under the current task.

Works:

1.4. Other informativeness measurement

The informativeness of instances could be defined in many other ways.

Works:

  • Optimizing Active Learning for Low Annotation Budgets [2021]: Select the samples with the maximum shift from certainty to uncertainty.
  • Active Learning by Acquiring Contrastive Examples [2021, EMNLP]: CAL. Take the inconsistency of predictions with the neighbors as the selection criteria. Believe the data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods should be quired.
  • On The Effectiveness of Active Learning by Uncertainty Sampling in Classification of High-Dimensional Gaussian Mixture Data [2022, PerCom Workshops]: Add area-under-margin as informative measurements.
  • ALLSH: Active Learning Guided by Local Sensitivity and Hardness [2022]: Select the instances whose predictive likelihoods diverge the most from their perturbations.
  • Active Learning by Feature Mixing [2022, CVPR]: The instance with the representation which could maximally influence the output of the anchor labeled instance (by feature mixing) could be informative.
  • Gaussian Switch Sampling: A Second Order Approach to Active Learning [2023, TAI]: The forgettable data (classified correctly at time t and subsequently misclassified at a later time) should be informative.
  • Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning [2023, TPAMI]

2. Representativeness-impart sampling

Previous introduced works seldomly consider the data distributions. So those strategies are more focusing on the decision boundary, and the representativeness of the data is neglected. Therefore, many works take the representativeness of the data into account. Basically, it measures how much the labeled instances are aligned with the unlabeled instances in distribution. We note that there aren't many works only consider the representativeness of the data. More commonly, the representativeness and informativeness are considered together to sample instances.

2.1. Cluster-based sampling

The simplest idea is to use cluster structure to guide the selection. The cluster could either be applied on the original features or the learned embeddings.

  • Cluster-based sampling:
    • Pre-cluster
    • Hierarchical sampling
    • Cluster on other types of embedding

Works:

2.2. Density-based sampling

These types of strategies take into account the distribution and local density. The intuition is that the location with more density is more likely to be queried. i.e. the selected instances and the unlabeled instances should have similar distributions.

  • Density-based sampling:
    • Information density
    • RALF
    • k-Center-Greedy (Core-set): Only consider the representativeness.

Works:

2.3. Alignment-based sampling

This type of works directly takes into account the measurement of distribution alignment between labeled/selected data and unlabeled data. i.e. The labeled and the unlabeled instances should hard to be distinguished. There are adversarial works and non-adversarial works.

Types:

  • Adversarial based
  • non-adversarial based

Works:

  • Exploring Representativeness and Informativeness for Active Learning [2017, IEEE TRANSACTIONS ON CYBERNETICS]: Optimization based. The representativeness is measured by fully investigating the triple similarities that include the similarities between a query sample and the unlabeled set, between a query sample and the labeled set, and between any two candidate query samples. For representativeness, our goal is also to find the sample that makes the distribution discrepancy of unlabeled data and labeled data small. For informativeness, use BvSB. (85 citations)
  • Discriminative Active Learning [2019, Arxiv]: Make the labeled and unlabeled pool indistinguishable.
  • Agreement-Discrepancy-Selection: Active Learning with Progressive Distribution Alignment [2021]
  • Dual Adversarial Network for Deep Active Learning [2021, ECCV]: DAAL.
  • Multi-Classifier Adversarial Optimization for Active Learning [2023, AAAI]

2.4. Expected loss on unlabeled data

Many works only score the instance by the expected performance on the labeled data and the selected data. Some other works also take into account the expected loss on the rest unlabeled data as a measurement of representativeness.

  • Expected loss on unlabeled data:
    • QUIRE
    • ALDR+

Works:

2.5. Divide and Select

Pre-divide the pool into batches by a certain why. Then select from each batches. Except the pre-cluster, there are other criteria to prepare the batches:

  • Divide by loss on auxiliary tasks (self-supervised tasks).
  • Divide by certain distance.

Works:

  • Using Self-Supervised Pretext Tasks for Active Learning [2022]
  • BAL: Balancing Diversity and Novelty for Active Learning [2024, TPAMI]

3. Expected Improvements

Our learning purpose is to reduce the generalization error at the end (in other word, have a better performance at the end). From this perspective, we can select the instances which could improve the performance for each selection stage. Because we don't know the true label of the instance we are going to selecting, normally the expected performance is calculated for each instance. These methods normally need to be retrained for each unlabeled instances in the pool, so it could be really time consuming.

  • Expected improvement
    • Error Reduction: Most directly, reduce the generalization error.
    • Variance Reduction: We can still reduce generalization error indirectly by minimizing output variance.
    • Entropy Change: The reduction of prediction entropy on the evaluation set after adding a new item.

Works:

4. Learn to Score

All the mentioned sampling strategies above are basing on heuristic approaches. Their intuitions are clear, but might perform differently in different datasets. So some researchers purposed that we can learn a sampling strategy from the sampling process.

  • Learn to score
    • Learn a strategy selection method: select from heuristics
    • Learn a score function: learn a score function
    • Learn a AL policy (as a MDP process)

Works:

  • Active learning by learning [2015, AAAI]: ALBL. A single human-designed philosophy is unlikely to work on all scenarios. Given an appropriate choice for the multi-armed bandit learner, take the importance-weighted-accuracy as reward function (an unbiased estimator for the test accuracy). It is possible to estimate the performance of different strategies on the fly. SVM as underlying classifier.(41 citations)
  • Learning active learning from data [2017, NeurIPS]: LAL. Train a random forest regressor that predicts the expected error reduction for a candidate sample in a particular learning state. Previous works they cannot go beyond combining pre-existing hand-designed heuristics. Random forest as basic classifiers. (Not clear how to get test classification loss l. It is not explained in both the paper and the code.)(73 citations)
  • Learning how to Active Learn: A Deep Reinforcement Learning Approach [2017, Arxiv]: PAL. Use RL to learn how to select instance. Even though the strategy is learned and applied in a stream manner, the stream is made by the data pool. So under my angle, it could be considered as a pool-based method. (92)
  • Learning How to Actively Learn: A Deep Imitation Learning Approach [2018, ACL]: Learn an AL policy using imitation learning, mapping situations to most informative query datapoints. (8 citations)
  • Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning [2018, Arxiv]
  • Learning Loss for Active Learning [2019, CVPR]: Attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs.
  • Learning to Rank for Active Learning: A Listwise Approach [2020]: Have an additional loss prediction model to predict the loss of instances beside the classification model. Then the loss is calculated by the ranking instead of the ground truth loss of the classifier.
  • Deep Reinforcement Active Learning for Medical Image Classification [2020, MICCAI]: Take the prediction probability of the whole unlabeled set as the state. The action as the strategy is to get a rank of unlabeled set by a actor network. The reward is the different of prediction value and true label of the selected instances. Adopt a critic network with parameters θ cto approximate the Q-value function.
  • ImitAL: Learning Active Learning Strategies from Synthetic Data [2021]: An imitation learning approach.
  • Cartography Active Learning [2021, EMNLP]: CAL. Select the instances that are the closest to the decision boundary between ambiguous and hard-to-learn instances.
  • Deep reinforced active learning for multi-class image classification [2022]
  • ImitAL: Learned Active Learning Strategy on Synthetic Data [2022]
  • Algorithm Selection for Deep Active Learning with Imbalanced Datasets [2023]
  • Reinforced Active Learning for Low-Resource, Domain-Specific, Multi-Label Text Classification [2023, ACL]
  • BatchGFN: Generative Flow Networks for Batch Active Learning [2023, ICML workshop]
  • Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes [2023, ECML PKDD]

5. Others

There still are other works uses innovative heuristics. It is a little bit hard to classify those works for now. So we put these works under this section. These works might be classified later.

Self-paced:

Utilize historical evaluation results:

Hybrid:

  • HAL: Hybrid active learning for efficient labeling in medical domain [2021, Neurocomputing]
  • How to Select Which Active Learning Strategy is Best Suited for Your Specific Problem and Budget [2023]: SelectAL. Combine representative and uncertainty under different budgets.
  • A More Robust Baseline for Active Learning by Injecting Randomness to Uncertainty Sampling [2023, ICML workshop]