KLDivergenceMaximization#
- class skactiveml.pool.KLDivergenceMaximization(integration_dict_target_val=None, integration_dict_cross_entropy=None, missing_label=nan, random_state=None)[source]#
Bases:
SingleAnnotatorPoolQueryStrategyRegression based Kullback-Leibler Divergence Maximization
This class implements a query strategy [1], which selects those samples that maximize the expected Kullback-Leibler divergence from the new model to the old model, where the new model is the model that results from adding the samples to the training set and the expectation is performed over the model parameters.
- Parameters:
- integration_dict_target_valdict, default=None
Dictionary for integration arguments, i.e. integration method etc., used for calculating the expected y value for the candidate samples. For details see method skactiveml.pool.utils.conditional_expect.
- integration_dict_cross_entropydict, default=None
Dictionary for integration arguments, i.e. integration method etc., used for calculating the cross entropy between the updated conditional estimator by the X_cand value and the old conditional estimator. For details see method skactiveml.pool.utils.conditional_expect.
- missing_labelscalar or string or np.nan or None, default=np.nan
Value to represent a missing label.
- random_stateint or RandomState instance, default=None
Random state for candidate selection.
References
[1]D. Elreedy, A. F. Atiya, and S. I. Shaheen. A Novel Active Learning Regression Framework for Balancing the Exploration-Exploitation Trade-Off. Entropy, 21(7):651, 2019.
Methods
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
query(X, y, reg[, fit_reg, sample_weight, ...])Determines for which candidate samples labels are to be queried.
set_params(**params)Set the parameters of this estimator.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- query(X, y, reg, fit_reg=True, sample_weight=None, candidates=None, batch_size=1, return_utilities=False)[source]#
Determines for which candidate samples labels are to be queried.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Training data set, usually complete, i.e. including the labeled and unlabeled samples.
- yarray-like of shape (n_samples)
Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label).
- regskactiveml.base.ProbabilisticRegressor
Predicts the entropy and the cross entropy and the potential y-values for the candidate samples.
- fit_regbool, default=True
Defines whether the regressor should be fitted on X, y, and sample_weight.
- sample_weightarray-like of shape (n_samples,), default=None
Weights of training samples in X.
- candidatesNone or array-like of shape (n_candidates), dtype=int or array-like of shape (n_candidates, n_features), default=None
If candidates is None, the unlabeled samples from (X,y) are considered as candidates.
If candidates is of shape (n_candidates,) and of type int, candidates is considered as the indices of the samples in (X,y).
If candidates is of shape (n_candidates, …), the candidate samples are directly given in candidates (not necessarily contained in X). This is not supported by all query strategies.
- batch_sizeint, default=1
The number of samples to be selected in one AL cycle.
- return_utilitiesbool, default=False
If True, also return the utilities based on the query strategy.
- Returns:
- query_indicesnumpy.ndarray of shape (batch_size)
The query indices indicate for which candidate sample a label is to be queried, e.g., query_indices[0] indicates the first selected sample.
If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.
If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.
- utilitiesnumpy.ndarray of shape (batch_size, n_samples)
The utilities of samples after each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan.
If candidates is None, the indexing refers to the samples in X.
If candidates is of shape (n_candidates,) and of type int, utilities refers to the samples in X.
If candidates is of shape (n_candidates, …), utilities refers to the indexing in candidates.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
Examples using skactiveml.pool.KLDivergenceMaximization#
Regression based Kullback Leibler Divergence Maximization