skactiveml.pool.k_greedy_center#
- skactiveml.pool.k_greedy_center(X, y, batch_size=1, random_state=None, missing_label=nan, mapping=None, n_new_cand=None)[source]#
An active learning method that greedily forms a batch to minimize the maximum distance to a cluster center among all unlabeled datapoints.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Training data set, usually complete, i.e., including the labeled and unlabeled samples.
- ynp.ndarray of shape (n_samples,)
Labels of the training data set (possibly including unlabeled ones indicated by self.missing_label).
- batch_sizeint, default=1
The number of samples to be selected in one AL cycle.
- random_stateNone or int or np.random.RandomState, default=None
Random state for candidate selection.
- missing_labelscalar or string or np.nan or None, default=np.nan
Value to represent a missing label.
- mappingNone or np.ndarray of shape (n_candidates,), default=None
Index array that maps candidates to X (candidates = X[mapping]).
- n_new_candint or None, default=None
The number of new candidates that are additionally added to X. Only used for the case, that in the query function with the shape of candidates is (n_candidates, n_feature).
- Returns
- query_indicesnumpy.ndarray of shape (batch_size)
The query_indices indicate for which candidate sample a label is to queried, e.g., query_indices[0] indicates the first selected sample.
If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.
If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.
- utilitiesnumpy.ndarray of shape (batch_size, n_samples) or numpy.ndarray of shape (batch_size, n_candidates)
The utilities of samples after each selected sample of the batch, e.g., utilities[0] indicates the utilities used for selecting the first sample (with index query_indices[0]) of the batch. Utilities for labeled samples will be set to np.nan.
If candidates is None or of shape (n_candidates,), the indexing refers to the samples in X.
If candidates is of shape (n_candidates, n_features), the indexing refers to the samples in candidates.