GreedySamplingX#

class skactiveml.pool.GreedySamplingX(metric=None, metric_dict=None, missing_label=nan, random_state=None)[source]#

Bases: SingleAnnotatorPoolQueryStrategy

Greedy Sampling in the Feature Space (GSx)

This class implements the query strategy Greedy Sampling in the Feature Space (GSx) [1] that tries to select those samples that increase the diversity of the feature space the most. It does this by selecting those features that are the furthest away from all previously labeled samples.

Parameters:
metricstr, default=”euclidean”

Metric used for calculating the distances of the samples in the feature space. It must be a valid argument for sklearn.metrics.pairwise_distances argument metric.

metric_dictdict, default=None

Any further parameters are passed directly to the pairwise_distances function.

missing_labelscalar or string or np.nan or None, default=np.nan

Value to represent a missing label.

random_stateint or np.random.RandomState, default=None

Random state for candidate selection.

References

[1]

D. Wu, C.-T. Lin, and J. Huang. Active Learning for Regression using Greedy Sampling. Inf. Sci., 474:90–105, 2019.

Methods

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(X, y[, candidates, batch_size, ...])

Query the next samples to be labeled.

set_params(**params)

Set the parameters of this estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

query(X, y, candidates=None, batch_size=1, return_utilities=False)[source]#

Query the next samples to be labeled.

Xarray-like of shape (n_samples, n_features)

Training data set, usually complete, i.e., including the labeled and unlabeled samples.

yarray-like of shape (n_samples,)

Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label.)

candidatesNone or array-like of shape (n_candidates, ) of type int, default=None
  • If candidates is None, the unlabeled samples from (X,y) are considered as candidates.

  • If candidates is of shape (n_candidates,) and of type int, candidates is considered as the indices of the samples in (X,y).

  • If candidates is of shape (n_candidates, …), candidates is considered as the candidate samples in (X,y).

batch_sizeint, default=1

The number of samples to be selected in one AL cycle.

return_utilitiesbool, default=False

If true, also return the utilities based on the query strategy.

Returns:
query_indicesnumpy.ndarray of shape (batch_size)

The query indices indicate for which candidate sample a label is to be queried, e.g., query_indices[0] indicates the first selected sample.

  • If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.

  • If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.

utilitiesnumpy.ndarray of shape (batch_size, n_samples)

The utilities of samples after each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan.

  • If candidates is None, the indexing refers to the samples in X.

  • If candidates is of shape (n_candidates,) and of type int, utilities refers to the samples in X.

  • If candidates is of shape (n_candidates, …), utilities refers to the indexing in candidates.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

Examples using skactiveml.pool.GreedySamplingX#

Greedy Sampling on the Feature Space (GSx)

Greedy Sampling on the Feature Space (GSx)