skactiveml.stream.CognitiveDualQueryStrategy#

class skactiveml.stream.CognitiveDualQueryStrategy(force_full_budget=False, dist_func=None, dist_func_dict=None, density_threshold=1, cognition_window_size=10, budget_manager=None, budget=None, random_state=None)[source]#

Bases: SingleAnnotatorStreamQueryStrategy

This class is the base for the CognitiveDualQueryStrategy query strategy proposed in [1]. To use this strategy, refer to CognitiveDualQueryStrategyRan, CognitiveDualQueryStrategyRanVarUn, CognitiveDualQueryStrategyVarUn , and CognitiveDualQueryStrategyFixUn. The CognitiveDualQueryStrategy strategy is an extension to the uncertainty based query strategies proposed by Žliobaitė et al. [2] and follows the same idea as StreamDensityBasedAL [3] where queries for labels is only allowed if the local density around the corresponding sample is sufficiently high. The authors propose the use of a cognitive window that monitors the most representative samples within a data stream.

Parameters
force_full_budgetbool, default=False

If True, tries to utilize the full budget. The article does not update the budget manager if the locale density factor is 0.

dist_funccallable, default=None

The distance function used to calculate the distances within the local density window. If it is None, sklearn.metrics.pairwise.pairwise_distances will be used by default.

dist_func_dictdict, default=None

Additional parameters for dist_func.

density_thresholdint, default=1

Determines the local density factor size that needs to be reached in order to query the candidate’s label.

cognition_window_sizeint, default=10

Determines the size of the cognition window.

budget_managerBudgetManager, default=None

The BudgetManager which models the budgeting constraint used in the stream-based active learning setting. if set to None, DensityBasedBudgetManager will be used by default. The budget manager will be initialized based on the following conditions:

  • If only a budget is given, the default budget manager is initialized with the given budget.

  • If only a budget manager is given, use the budget manager.

  • If both are not given, the default budget manager with the default budget.

  • If both are given, and the budget differs from budgetmanager.budget, throw a warning and the budget manager is used as is.

budgetfloat, default=None

Specifies the ratio of samples which are allowed to be sampled, with 0 <= budget <= 1. If budget is None, it is replaced with the default budget 0.1.

random_stateint or RandomState instance, default=None

Controls the randomness of the estimator.

See also

CognitiveDualQueryStrategyRan

CognitiveDualQueryStrategy using the RandomBudgetManager.

CognitiveDualQueryStrategyFixUn

CognitiveDualQueryStrategy using the FixedUncertaintyBudgetManager.

CognitiveDualQueryStrategyVarUn

VariableUncertaintyBudgetManager using the VariableUncertaintyBudgetManager.

CognitiveDualQueryStrategyRanVarUn

CognitiveDualQueryStrategy using the RandomVariableUncertaintyBudgetManager.

References

1

S. Liu, S. Xue, J. Wu, C. Zhou, J. Yang, Z. Li, and J. Cao. Online Active Learning for Drifting Data Streams. IEEE Trans. Neural Netw. Learn. Syst., 34(1):186–200, 2023.

2

I. Žliobaitė, A. Bifet, B. Pfahringer, and G. Holmes. Active Learning With Drifting Streaming Data. IEEE Trans. Neural Netw. Learn. Syst., 25(1):27–39, 2014.

3

D. Ienco, I. Žliobaitė, and B. Pfahringer. High density-focused uncertainty sampling for active learning over evolving stream data. In Int. Workshop Big Data Streams Heterog. Source Min. Algorithms Syst. Program. Models Appl., pages 133–148, 2014.

Methods

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(candidates, clf[, X, y, ...])

Determines for which candidate samples labels are to be queried.

set_params(**params)

Set the parameters of this estimator.

update(candidates, queried_indices[, ...])

Updates the budget manager and the count for seen and queried labels.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

query(candidates, clf, X=None, y=None, sample_weight=None, fit_clf=False, return_utilities=False)[source]#

Determines for which candidate samples labels are to be queried.

The query startegy determines the most useful samples in candidates, which can be acquired within the budgeting constraint specified by budget. Please note that, this method does not change the internal state of the query strategy. To adapt the query strategy to the selected candidates, use update(…).

Parameters
candidates{array-like, sparse matrix} of shape (n_candidates, n_features)

The samples which may be queried. Sparse matrices are accepted only if they are supported by the base query strategy.

clfskactiveml.base.SkactivemlClassifier

Model implementing the methods fit and predict_proba.

Xarray-like of shape (n_samples, n_features), default=None

Training data set used to fit the classifier.

yarray-like of shape (n_samples,)

Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label).

sample_weightarray-like of shape (n_samples,), default=None

Weights of training samples in X.

fit_clfbool, default=False

Defines whether the classifier should be fitted on X, y, and sample_weight.

return_utilitiesbool, default=False

If True, also return the utilities based on the query strategy.

Returns
queried_indicesnp.ndarray of shape (n_queried_indices,)

The indices of samples in candidates whose labels are queried, with 0 <= queried_indices <= n_candidates.

utilities: np.ndarray of shape (n_candidates,),

The utilities based on the query strategy. Only provided if return_utilities is True.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

update(candidates, queried_indices, budget_manager_param_dict=None)[source]#

Updates the budget manager and the count for seen and queried labels. This function should be used in conjunction with the query function.

Parameters
candidates{array-like, sparse matrix} of shape (n_candidates, n_features)

The samples which may be queried. Sparse matrices are accepted only if they are supported by the base query strategy.

queried_indicesnp.ndarray of shape (n_queried_indices,)

The indices of samples in candidates whose labels are queried, with 0 <= queried_indices <= n_candidates.

budget_manager_param_dictdict, default=None

Optional kwargs for budget_manager.

Returns
selfCognitiveDualQueryStrategy

The query strategy returns itself, after it is updated.

Examples using skactiveml.stream.CognitiveDualQueryStrategy#

Cognitive Dual-Query Strategy with Fixed-Uncertainty

Cognitive Dual-Query Strategy with Fixed-Uncertainty

Cognitive Dual-Query Strategy with Random Sampling

Cognitive Dual-Query Strategy with Random Sampling

Cognitive Dual-Query Strategy with Randomized-Variable-Uncertainty

Cognitive Dual-Query Strategy with Randomized-Variable-Uncertainty

Cognitive Dual-Query Strategy with Variable-Uncertainty

Cognitive Dual-Query Strategy with Variable-Uncertainty