skactiveml.pool.UncertaintySampling#

class skactiveml.pool.UncertaintySampling(method='least_confident', cost_matrix=None, missing_label=nan, random_state=None)[source]#

Bases: SingleAnnotatorPoolQueryStrategy

Uncertainty Sampling (US)

This class implement various uncertainty based query strategies, i.e., the standard uncertainty measures [1], cost-sensitive ones [2], and one optimizing expected average precision [3].

Parameters
method‘least_confident’ or ‘margin’ or ‘entropy’ or ‘expected_average_precision’, default=’least_confident’

The method to calculate the uncertainty.

cost_matrixarray-like of shape (n_classes, n_classes)

Cost matrix with cost_matrix[i,j] defining the cost of predicting class j for a sample with the actual class i. Only supported for least_confident and margin_sampling variant.

missing_labelscalar or string or np.nan or None, default=np.nan

Value to represent a missing label.

random_stateint or np.random.RandomState

The random state to use.

References

1

Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.

2

P.-L. Chen and H.-T. Lin. Active Learning for Multiclass Cost-Sensitive Classification Using Probabilistic Models. In Conf. Technol. Appl. Artif. Intell., pages 13–18, 2013.

3

H. Wang, X. Chang, L. Shi, Y. Yang, and Y.-D. Shen. Uncertainty Sampling for Action Recognition via Maximizing Expected Average Precision. In Int. Jt. Conf. Artif. Intell., pages 964–970, 2018.

Methods

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(X, y, clf[, fit_clf, sample_weight, ...])

Determines for which candidate samples labels are to be queried.

set_params(**params)

Set the parameters of this estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

query(X, y, clf, fit_clf=True, sample_weight=None, utility_weight=None, candidates=None, batch_size=1, return_utilities=False)[source]#

Determines for which candidate samples labels are to be queried.

Parameters
Xarray-like of shape (n_samples, n_features)

Training data set, usually complete, i.e., including the labeled and unlabeled samples.

yarray-like of shape (n_samples,)

Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label).

clfskactiveml.base.SkactivemlClassifier

Model implementing the methods fit and predict_proba.

fit_clfbool, default=True

Defines whether the classifier should be fitted on X, y, and sample_weight.

sample_weightarray-like of shape (n_samples,), default=None

Weights of training samples in X.

utility_weightarray-like, default=None

Weight for each candidate (multiplied with utilities). Usually, this is to be the density of a candidate. The length of utility_weight is usually n_samples, except for the case when candidates contains samples (ndim >= 2). Then the length is n_candidates.

candidatesNone or array-like of shape (n_candidates), dtype=int or array-like of shape (n_candidates, n_features), default=None
  • If candidates is None, the unlabeled samples from (X,y) are considered as candidates.

  • If candidates is of shape (n_candidates,) and of type int, candidates is considered as the indices of the samples in (X,y).

  • If candidates is of shape (n_candidates, *), the candidate samples are directly given in candidates (not necessarily contained in X).

batch_sizeint, default=1

The number of samples to be selected in one AL cycle.

return_utilitiesbool, default=False

If true, also return the utilities based on the query strategy.

Returns
query_indicesnumpy.ndarray of shape (batch_size)

The query indices indicate for which candidate sample a label is to be queried, e.g., query_indices[0] indicates the first selected sample.

  • If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.

  • If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.

utilitiesnumpy.ndarray of shape (batch_size, n_samples)

The utilities of samples after each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan.

  • If candidates is None, the indexing refers to the samples in X.

  • If candidates is of shape (n_candidates,) and of type int, utilities refers to the samples in X.

  • If candidates is of shape (n_candidates, *), utilities refers to the indexing in candidates.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

Examples using skactiveml.pool.UncertaintySampling#

Sub-sampling Wrapper

Sub-sampling Wrapper

Density-weighted Uncertainty Sampling

Density-weighted Uncertainty Sampling

Dual Strategy for Active Learning

Dual Strategy for Active Learning

Expected Average Precision

Expected Average Precision

Uncertainty Sampling with Entropy

Uncertainty Sampling with Entropy

Uncertainty Sampling with Least-Confidence

Uncertainty Sampling with Least-Confidence

Uncertainty Sampling with Margin

Uncertainty Sampling with Margin