skactiveml.pool.GreedySamplingTarget#
- class skactiveml.pool.GreedySamplingTarget(x_metric=None, y_metric=None, x_metric_dict=None, y_metric_dict=None, method=None, n_GSx_samples=1, missing_label=nan, random_state=None)[source]#
Bases:
SingleAnnotatorPoolQueryStrategy
Greedy Sampling on the target space.
This class implements greedy sampling on the target space. A query strategy that at first selects samples to maximize the diversity in the feature space and than selects samples to maximize the diversity in the feature and the target space (GSi), optionally only the diversity in the target space can be maximized (GSy).
- Parameters
- x_metricstr, optional (default=None)
Metric used for calculating the distances of the samples in the feature space. It must be a valid argument for sklearn.metrics.pairwise_distances argument metric.
- y_metricstr, optional (default=None)
Metric used for calculating the distances of the samples in the target space. It must be a valid argument for sklearn.metrics.pairwise_distances argument metric.
- x_metric_dictdict, optional (default=None)
Any further parameters for computing the distances of the samples in the feature space are passed directly to the pairwise_distances function.
- y_metric_dictdict, optional (default=None)
Any further parameters for computing the distances of the samples in the target space are passed directly to the pairwise_distances function.
- n_GSx_samplesint, optional (default=1)
Indicates the number of selected samples required till the query strategy switches from GSx to the strategy specified by method.
- method“GSy” or “GSi”, optional (default=”GSi”)
Specifies whether only the diversity in the target space (GSy) or the diversity in the feature and the target space (GSi) should be maximized, when the number of selected samples exceeds n_GSx_samples.
- missing_labelscalar or string or np.nan or None,
- (default=skactiveml.utils.MISSING_LABEL)
Value to represent a missing label.
- random_stateint | np.random.RandomState, optional
Random state for candidate selection.
References
- [1] Wu, Dongrui, Chin-Teng Lin, and Jian Huang. Active Learning for
Regression using Greedy Sampling, Information Sciences, pages 90–105, 2019.
Methods
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
query
(X, y, reg[, fit_reg, sample_weight, ...])Determines for which candidate samples labels are to be queried.
set_params
(**params)Set the parameters of this estimator.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- query(X, y, reg, fit_reg=True, sample_weight=None, candidates=None, batch_size=1, return_utilities=False)[source]#
Determines for which candidate samples labels are to be queried.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Training data set, usually complete, i.e. including the labeled and unlabeled samples.
- yarray-like of shape (n_samples)
Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label).
- reg: SkactivemlRegressor
Regressor to predict the data.
- fit_regbool, optional (default=True)
Defines whether the regressor should be fitted on X, y, and sample_weight.
- sample_weight: array-like of shape (n_samples), optional (default=None)
Weights of training samples in X.
- candidatesNone or array-like of shape (n_candidates), dtype=int or
array-like of shape (n_candidates, n_features), optional (default=None) If candidates is None, the unlabeled samples from (X,y) are considered as candidates. If candidates is of shape (n_candidates) and of type int, candidates is considered as the indices of the samples in (X,y). If candidates is of shape (n_candidates, n_features), the candidates are directly given in candidates (not necessarily contained in X).
- batch_sizeint, optional (default=1)
The number of samples to be selected in one AL cycle.
- return_utilitiesbool, optional (default=False)
If true, also return the utilities based on the query strategy.
- Returns
- query_indicesnumpy.ndarray of shape (batch_size)
The query_indices indicate for which candidate sample a label is to queried, e.g., query_indices[0] indicates the first selected sample. If candidates is None or of shape (n_candidates), the indexing refers to samples in X. If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.
- utilitiesnumpy.ndarray of shape (batch_size, n_samples) or
numpy.ndarray of shape (batch_size, n_candidates) The utilities of samples after each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan. If candidates is None or of shape (n_candidates), the indexing refers to samples in X. If candidates is of shape (n_candidates, n_features), the indexing refers to samples in candidates.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.