skactiveml.pool.k_greedy_center#

skactiveml.pool.k_greedy_center(X, y, batch_size=1, random_state=None, missing_label=nan, mapping=None, n_new_cand=None)[source]#

An active learning method that greedily forms a batch to minimize the maximum distance to a cluster center among all unlabeled datapoints.

Parameters
Xarray-like of shape (n_samples, n_features)

Training data set, usually complete, i.e., including the labeled and unlabeled samples.

ynp.ndarray of shape (n_samples,)

Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label).

batch_sizeint, default=1

The number of samples to be selected in one AL cycle.

random_stateNone or int or np.random.RandomState, default=None

Random state for candidate selection.

missing_labelscalar or string or np.nan or None, default=np.nan

Value to represent a missing label.

mappingNone or np.ndarray of shape (n_candidates,), default=None

Index array that maps candidates to X (candidates = X[mapping]).

n_new_candint or None, default=None

The number of new candidates that are additionally added to X. Only used for the case, that in the query function with the shape of candidates is (n_candidates, n_feature).

Returns
query_indicesnumpy.ndarray of shape (batch_size)

The query_indices indicate for which candidate sample a label is to queried, e.g., query_indices[0] indicates the first selected sample.

  • If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.

  • If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.

utilitiesnumpy.ndarray of shape (batch_size, n_samples) or numpy.ndarray of shape (batch_size, n_candidates)

The utilities of samples after each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan.

  • If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.

  • If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.