skactiveml.pool.multiannotator.SingleAnnotatorWrapper#

class skactiveml.pool.multiannotator.SingleAnnotatorWrapper(strategy, y_aggregate=None, missing_label=nan, random_state=None)[source]#

Bases: MultiAnnotatorPoolQueryStrategy

Implementation of a wrapper class for pool-based active learning query strategies with a single annotator such that it transforms the query strategy for the single annotator into a query strategy for multiple annotators by choosing an annotator randomly or according to the parameter A_pef and setting the labeled matrix to a labeled vector by an aggregation function, e.g., majority voting.

Parameters
strategySingleAnnotatorPoolQueryStrategy

An active learning strategy for a single annotator.

y_aggregatecallable, optional (default=None)

y_aggregate is used to transform y as a matrix of shape (n_samples, n_annotators) into a vector of shape (n_samples) during the querying process and is then passed to the given strategy. If y_aggregate is None and y is used in the strategy, majority_vote is used as y_aggregate.

missing_labelscalar or string or np.nan or None, optional
(default=np.nan)

Value to represent a missing label.

random_stateint or RandomState instance, optional (default=None)

Controls the randomness of the estimator.

Methods

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(X, y[, candidates, annotators, ...])

Determines which candidate sample is to be annotated by which annotator.

set_params(**params)

Set the parameters of this estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

query(X, y, candidates=None, annotators=None, batch_size=1, n_annotators_per_sample=1, A_perf=None, return_utilities=False, **query_kwargs)[source]#

Determines which candidate sample is to be annotated by which annotator. The samples are first and primarily ranked by the given strategy as if one unspecified annotator where to annotate the sample. Then for each sample the sample-annotator pairs are ranked based either on previously set preferences or at random.

Parameters
Xarray-like of shape (n_samples, n_features)

Training data set, usually complete, i.e., including the labeled and unlabeled samples.

yarray-like of shape (n_samples, n_annotators)

Labels of the training data set for each annotator (possibly including unlabeled ones indicated by self.MISSING_LABEL), meaning that y[i, j] contains the label annotated by annotator i for sample j.

candidatesNone or array-like of shape (n_candidates), dtype=int or

array-like of shape (n_candidates, n_features), optional (default=None) See annotators.

annotatorsNone or array-like of shape (n_avl_annotators), dtype=int
or array-like of shape (n_candidates, n_annotators), optional
(default=None)

If candidate samples and annotators are not specified, i.e., candidates=None, annotators=None the unlabeled target values, y, are the candidates annotator-sample-pairs. If candidate samples and available annotators are specified: The annotator-sample-pairs, for which the sample is a candidate sample and the annotator is an available annotator are considered as candidate annotator-sample-pairs. If candidates is None, all samples of X are considered as candidate samples. In this case n_candidates equals len(X). If candidates is of shape (n_candidates,) and of type int, candidates is considered as the indices of the sample candidates in (X, y). If candidates is of shape (n_candidates, n_features), the sample candidates are directly given in candidates (not necessarily contained in X). This is not supported by all query strategies. If annotators is None, all annotators are considered as available annotators. If annotators is of shape (n_avl_annotators), and of type int, annotators is considered as the indices of the available annotators. If annotators is a boolean array of shape (n_candidates, n_annotators) the annotator-sample-pairs, for which the sample is a candidate sample and the boolean matrix has entry True are considered as candidate annotator-sample-pairs.

batch_sizeint, optional (default=1)

The number of annotators sample pairs to be selected in one AL cycle.

A_perfarray-like, shape (n_annotators,) or
(n_candidates, n_annotators), optional (default=None)

The performance based ranking of each annotator. 1.) If A_perf is of shape (n_candidates, n_annotators) for each sample i the value-annotators pair (i, j) is chosen over the pair (i, k) if A_perf[i, j] is greater or equal to A_perf[i, k]. 2.) If A_perf is of shape (n_annotators,) for each sample i the value-annotators pair (i, j) is chosen over the pair (i, k) if A_perf[j] is greater or equal to A_perf[k]. 3.) If A_perf is None, the annotators are chosen at random, with a different distribution for each sample.

return_utilitiesbool, optional (default=False)

If true, also returns the utilities based on the query strategy.

n_annotators_per_sampleint, array-like, optional (default=1)
array-like of shape (k,), k <= n_samples

If n_annotators_per_sample is an int, the value indicates the number of annotators that are preferably assigned to a candidate sample, selected by the query_strategy. Preferably in this case means depending on how many annotators can be assigned to a given candidate sample and how many annotator-sample-pairs should be assigned considering the batch_size. If n_annotators_per_sample is an int array, the values of the array are interpreted as follows. The value at the i-th index determines the preferred number of annotators for the candidate sample at the i-th index in the ranking of the batch. The ranking of the batch is given by the strategy (SingleAnnotatorPoolQueryStrategy). The last index of the n_annotators_per_sample array (k-1) indicates the preferred number of annotators for all candidate sample at an index greater of equal to k-1.

query_kwargsdict, optional

Dictionary for the parameters of the query method besides X and the transformed y.

Returns
query_indicesnp.ndarray of shape (batchsize, 2)

The query_indices indicate which candidate sample pairs are to be queried is, i.e., which candidate sample is to be annotated by which annotator, e.g., query_indices[:, 0] indicates the selected candidate samples and query_indices[:, 1] indicates the respectively selected annotators.

utilities: np.ndarray of shape (batch_size, n_samples, n_annotators) or

np.ndarray of shape (batch_size, n_candidates, n_annotators) The utilities of all candidate samples w.r.t. to the available annotators after each selected sample of the batch, e.g., utilities[0, :, j] indicates the utilities used for selecting the first sample-annotator-pair (with indices query_indices[0]). If candidates is None or of shape (n_candidates,), the indexing refers to samples in X. If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.