skactiveml.pool.Badge#

class skactiveml.pool.Badge(clf_embedding_flag_name=None, missing_label=nan, random_state=None)[source]#

Bases: SingleAnnotatorPoolQueryStrategy

Batch Active Learning by Diverse Gradient Embedding (BADGE)

This class implements the BADGE algorithm [1]. This query strategy is designed to incorporate both predictive uncertainty and sample diversity into every selected batch.

Parameters

missing_labelscalar or string or np.nan or None, default=np.nan: Value to represent a missing label.
random_stateNone or int or np.random.RandomState, default=None: The random state to use.
clf_embedding_flag_namestr or None, default=None: Name of the flag, which is passed to the predict_proba method for getting the (learned) sample representations. If clf_embedding_flag_name=None and predict_proba returns only one output, the input samples X are used. If predict_proba returns two outputs or clf_embedding_name is not None, (proba, embeddings) are expected as outputs.

References

1: J. Ash, Jordan T., Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal, “Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds.” ICLR, 2019.

Methods

`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`query`(X, y, clf[, fit_clf, sample_weight, ...])	Query the next samples to be labeled.
`set_params`(**params)	Set the parameters of this estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

query(X, y, clf, fit_clf=True, sample_weight=None, candidates=None, batch_size=1, return_utilities=False)[source]#

Query the next samples to be labeled.

Parameters

Xarray-like of shape (n_samples, n_features): Training data set, usually complete, i.e. including the labeled and unlabeled samples.
yarray-like of shape (n_samples, ): Labels of the training data set (possibly including unlabeled samples, indicated by self.missing_label).
clfskactiveml.base.SkactivemlClassifier: Model implementing the methods fit and predict_proba.
fit_clfbool, optional (default=True): Defines whether the classifier should be fitted on X, y, and sample_weight.
sample_weight: array-like of shape (n_samples), optional (default=None): Weights of training samples in X.
candidatesNone or array-like of shape (n_candidates), dtype=int or: array-like of shape (n_candidates, n_features), optional (default=None) If candidates is None, the unlabeled samples from (X,y) are considered as candidates. If candidates is of shape (n_candidates) and of type int, candidates is considered as the indices of the samples in (X,y). If candidates is of shape (n_candidates, n_features), the candidates are directly given in candidates (not necessarily contained in X). This is not supported by all query strategies.
batch_sizeint, optional (default=1): The number of samples to be selected in one AL cycle.
return_utilitiesbool, optional (default=False): If true, also return the utilities based on the query strategy.

Returns

query_indicesnumpy.ndarray of shape (batch_size): The query_indices indicate for which candidate sample a label is being queried for a label, e.g., query_indices[0] indicates the first selected sample. If candidates is None or of shape (n_candidates), the indexing refers to samples in X. If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.
utilitiesnumpy.ndarray of shape (batch_size, n_samples) or: numpy.ndarray of shape (batch_size, n_candidates) The utilities of samples before each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan. For the case where the samples are uniformly randomly selected from the set, the sum of all utility of samples will be 1. The utilities represent here the probabilities of samples being chosen. If candidates is None or of shape (n_candidates), the indexing refers to samples in X. If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

Examples using `skactiveml.pool.Badge`#

Batch Active Learning by Diverse Gradient Embedding (BADGE)

skactiveml.pool.Badge#

Examples using skactiveml.pool.Badge#

Examples using `skactiveml.pool.Badge`#