skactiveml.pool.multiannotator.IntervalEstimationThreshold#

class skactiveml.pool.multiannotator.IntervalEstimationThreshold(epsilon=0.9, alpha=0.05, random_state=None, missing_label=nan)[source]#

Bases: MultiAnnotatorPoolQueryStrategy

Interval Estimation Threshold (IEThresh)

The strategy ‘Interval Estimation Threshold’ (IEThresh) [1] is useful for addressing the exploration vs. exploitation trade-off when dealing with multiple error-prone annotators in active learning. This class relies on IntervalEstimationAnnotModel for estimating the annotation performances, i.e., label accuracies, of multiple annotators. Samples are selected based on ‘Uncertainty Sampling’ (US). The selected samples are labeled by the annotators whose estimated annotation performances are equal or greater than an adaptive threshold. The strategy assumes all annotators to be available and is not defined otherwise. To deal with this case nonetheless value-annotator pairs are first ranked according to the amount of annotators available for the given value in candidates and are than ranked according to IntervalEstimationThreshold.

Parameters
epsilonfloat, interval=[0, 1], default=0.9

Parameter for specifying the adaptive threshold used for annotator selection.

alphafloat, interval=(0, 1), default=0.05

Half of the confidence level for student’s t-distribution.

random_stateint or np.random.RandomState or None, default=None

The random state to use.

References

1

P. Donmez, J. G. Carbonell, and J. Schneider. Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling. In ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pages 259–268, 2009.

Methods

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(X, y, clf[, fit_clf, candidates, ...])

Determines which candidate sample is to be annotated by which annotator.

set_params(**params)

Set the parameters of this estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

query(X, y, clf, fit_clf=True, candidates=None, annotators=None, sample_weight=None, batch_size='adaptive', return_utilities=False)[source]#

Determines which candidate sample is to be annotated by which annotator.

Parameters
Xarray-like of shape (n_samples, n_features)

Training data set, usually complete, i.e., including the labeled and unlabeled samples.

yarray-like of shape (n_samples, n_annotators)

Labels of the training data set for each annotator (possibly including unlabeled ones indicated by self.MISSING_LABEL), meaning that y[i, j] contains the label annotated by annotator i for sample j.

clfskactiveml.base.SkactivemlClassifier

Model implementing the methods fit and predict_proba.

fit_clfbool, default=True

Defines whether the classifier should be fitted on X, y, and sample_weight.

candidatesNone or array-like of shape (n_candidates), dtype=int or array-like of shape (n_candidates, n_features), default=None

See parameter annotators.

annotatorsNone or array-like of shape (n_avl_annotators), dtype=int or array-like of shape (n_candidates, n_annotators), default=None
  • If candidate samples and annotators are not specified, i.e., candidates=None, annotators=None the unlabeled target values, y, are the candidates annotator-sample-pairs.

  • If candidate samples and available annotators are specified: The annotator-sample-pairs, for which the sample is a candidate sample and the annotator is an available annotator are considered as candidate annotator-sample-pairs.

  • If candidates is None, all samples of X are considered as candidate samples. In this case n_candidates equals len(X).

  • If candidates is of shape (n_candidates,) and of type int, candidates is considered as the indices of the sample candidates in (X, y).

  • If candidates is of shape (n_candidates, n_features), the sample candidates are directly given in candidates (not necessarily contained in X). This is not supported by all query strategies.

  • If annotators is None, all annotators are considered as available annotators.

  • If annotators is of shape (n_avl_annotators), and of type int, annotators is considered as the indices of the available annotators.

  • If annotators is a boolean array of shape (n_candidates, n_annotators) the annotator-sample-pairs, for which the sample is a candidate sample and the boolean matrix has entry True are considered as candidate annotator-sample pairs.

sample_weightarray-like, (n_samples, n_annotators), default=None

It contains the weights of the training samples’ class labels. It must have the same shape as y.

batch_size‘adaptive’ or int, default=1

The number of samples to be selected in one AL cycle. If ‘adaptive’ is set, the batch_size is determined based on the annotation performances and the parameter epsilon.

return_utilitiesbool, default=False

If True, also return the utilities based on the query strategy.

Returns
query_indicesnp.ndarray of shape (batch_size, 2)

The query_indices indicate which candidate sample pairs are to be queried is, i.e., which candidate sample is to be annotated by which annotator, e.g., query_indices[:, 0] indicates the selected candidate samples and query_indices[:, 1] indicates the respectively selected annotators.

  • If candidates is None or of shape (n_candidates,), the indexing of refers to samples in X.

  • If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.

utilities: numpy.ndarray of shape (batch_size, n_samples, n_annotators) or numpy.ndarray of shape (batch_size, n_candidates, n_annotators)

The utilities of all candidate samples w.r.t. to the available annotators after each selected sample of the batch, e.g., utilities[0, :, j] indicates the utilities used for selecting the first sample-annotator-pair (with indices query_indices[0]).

  • If candidates is None or of shape (n_candidates,), the indexing refers to samples in X.

  • If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.