AnnotatorLogisticRegression#
- class skactiveml.classifier.multiannotator.AnnotatorLogisticRegression(n_annotators=None, tol=0.0001, max_iter=100, fit_intercept=True, annot_prior_full=1, annot_prior_diag=0, weights_prior=1, solver='Newton-CG', solver_dict=None, classes=None, cost_matrix=None, missing_label=nan, random_state=None)[source]#
Bases:
SkactivemlClassifierLogistic Regression for Crowds
Logistic Regression based on Raykar [1] is a classification algorithm that learns from multiple annotators. Besides, building a model for the classification task, the algorithm estimates the performance of the annotators. The performance of an annotator is assumed to only depend on the true label of a sample and not on the sample itself. Each annotator is assigned a confusion matrix, where each row is normalized. This contains the bias of the annotators decisions. These estimated biases are then used to refine the classifier itself.
The classifier also supports a bayesian view on the problem, for this a prior distribution over an annotator’s confusion matrix is assumed. It also assumes a prior distribution over the classifiers’ weight vectors corresponding to a regularization.
- Parameters:
- tolfloat, default=0.0001
Threshold for stopping the EM-Algorithm and the optimization of the logistic regression weights. If the change of the respective value between two steps is smaller than tol, the respective algorithm stops.
- max_iterint, default=100
The maximum number of iterations of the EM-algorithm to be performed.
- fit_interceptbool, default=True
Specifies if a constant (a.k.a. bias or intercept) should be added to input samples.
- annot_prior_fullint or float or array-like, default=1
Determines A as the Dirichlet prior for each annotator l (i.e., A[l] = annot_prior_full * np.ones(n_classes, n_classes) for numeric or A[l] = annot_prior_full[l] * np.ones(n_classes, n_classes) for array-like parameter). A[l,i,j] is the estimated number of times. annotator l has provided label j for a sample of true label i.
- annot_prior_diagint or float or array-like, default=0
Adds a value to the diagonal of A[l] being the Dirichlet prior for annotator l (i.e., A[l] += annot_prior_diag * np.eye(n_classes) for numeric or A[l] += annot_prior_diag[l] * np.ones(n_classes) for array-like parameter). A[l,i,j] is the estimated number of times annotator l has provided label j for a sample of true label i.
- weights_priorint or float, default=1
Determines Gamma as the inverse covariance matrix of the prior distribution for every weight vector (i.e., Gamma=weights_prior * np.eye(n_features)). As default, the identity matrix is used for each weight vector.
- solverstr or callable, default=’L-BFGS-B’
Type of solver. Should be ‘Nelder-Mead’, ‘Powell’, ‘CG’, ‘BFGS’, ‘Newton-CG’, ‘L-BFGS-B’, ‘TNC’, ‘COBYLA’, ‘SLSQP’, ‘trust-constr’, ‘dogleg’, ‘trust-ncg’, ‘trust-exact’, ‘trust-krylov’, or custom - a callable object. See scipy.optimize.minimize for more information.
- solver_dictdictionary, default=None
Additional solver options passed to scipy.optimize.minimize. If None, {‘maxiter’: 100} is passed.
- classesarray-like of shape (n_classes), default=None
Holds the label for each class. If none, the classes are determined during the fit.
- missing_labelscalar or string or np.nan or None, default=np.nan
Value to represent a missing label.
- cost_matrixarray-like of shape (n_classes, n_classes)
Cost matrix with cost_matrix[i,j] indicating cost of predicting class classes[j] for a sample of class classes[i]. Can be only set, if classes is not None.
- random_stateint or RandomState instance or None, default=None
Determines random number for predict method. Pass an int for reproducible results across multiple method calls.
- Attributes:
- n_annotators_int
Number of annotators.
- W_numpy.ndarray of shape (n_features, n_classes)
The weight vectors of the logistic regression model.
- Alpha_numpy.ndarray of shape (n_annotators, n_classes, n_classes)
This is a confusion matrix for each annotator, where each row is normalized. Alpha_[l,k,c] describes the probability that annotator l provides the class label c for a sample belonging to class k.
- classes_array-like of shape (n_classes,)
Holds the label for each class after fitting.
- cost_matrix_array-like of shape (classes, classes)
Cost matrix with C[i,j] indicating cost of predicting class self.classes_[j] for a sample of class classes_[i].
References
[1]V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from Crowds. J. Mach. Learn. Res., 11(4):1297–1322, 2010.
Methods
fit(X, y[, sample_weight])Fit the model using X as samples and y as class labels.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X[, extra_outputs])Return class predictions for the test samples X.
predict_proba(X[, extra_outputs])Return class probability estimates for the test samples X.
score(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_fit_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_predict_proba_request(*[, extra_outputs])Configure whether metadata should be requested to be passed to the
predict_probamethod.set_predict_request(*[, extra_outputs])Configure whether metadata should be requested to be passed to the
predictmethod.set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.- fit(X, y, sample_weight=None)[source]#
Fit the model using X as samples and y as class labels.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Feature matrix representing the samples.
- yarray-like of shape (n_samples, n_annotators)
It contains the class labels of the training samples, where missing labels are represented via missing_label. Specifically, label y[n, m] refers to the label of sample X[n] from annotator m.
- sample_weightarray-like of shape (n_samples, n_annotators)
It contains the weights of the training samples’ class labels. It must have the same shape as y. Accordingly, the sample weights are only used for the initialization of the majority vote and the computation of the confusion matrix. It is not supported for the update of logistic regression weights and the expectation computation.
- Returns:
- self: AnnotatorLogisticRegression,
The AnnotatorLogisticRegression is fitted on the training data.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X, extra_outputs=None)[source]#
Return class predictions for the test samples X.
By default, this method returns only the class predictions y_pred. If extra_outputs is provided, a tuple is returned whose first element is y_pred and whose remaining elements are the requested additional forward outputs, in the order specified by extra_outputs.
- Parameters:
- Xarray-like of shape (n_samples, …)
Test samples.
- extra_outputsNone or str or sequence of str, default=None
Names of additional outputs to return next to y_pred. The names must be a subset of the following keys:
“logits” : Additionally return the class-membership logits L_class for the samples in X.
“annotator_perf” : additionally return the estimated annotator performance probabilities P_perf for each sample–annotator pair.
“annotator_class” : Additionally return the annotator–class probability estimates P_annot for each sample, class, and annotator.
- Returns:
- y_prednumpy.ndarray of shape (n_samples,)
Class predictions of the test samples.
- *extrasnumpy.ndarray, optional
Only returned if extra_outputs is not None. In that case, the method returns a tuple whose first element is P and whose remaining elements correspond to the requested forward outputs in the order given by extra_outputs. Potential outputs are:
L_class : np.ndarray of shape (n_samples, n_classes), where L_class[n, c] is the logit for the class classes_[c] of sample X[n].
P_perf : np.ndarray of shape (n_samples, n_annotators), where P_perf[n, m] refers to the estimated label correctness probability (performance) of annotator m when labeling sample X[n].
P_annot : np.ndarray of shape (n_samples, n_annotators, n_classes), where P_annot[n, m, c] refers to the probability that annotator m provides the class label c for sample X[n].
- predict_proba(X, extra_outputs=None)[source]#
Return class probability estimates for the test samples X.
By default, this method returns only the class probabilities P. If extra_outputs is provided, a tuple is returned whose first element is P and whose remaining elements are the requested additional forward outputs, in the order specified by extra_outputs.
- Parameters:
- Xarray-like of shape (n_samples, …)
Test samples.
- extra_outputsNone or str or sequence of str, default=None
Names of additional outputs to return next to P. The names must be a subset of the following keys:
“logits” : Additionally return the class-membership logits L_class for the samples in X.
“annotator_perf” : additionally return the estimated annotator performance probabilities P_perf for each sample–annotator pair.
“annotator_class” : Additionally return the annotator–class probability estimates P_annot for each sample, class, and annotator.
- Returns:
- Pnumpy.ndarray of shape (n_samples, n_classes)
Class probabilities of the test samples. Classes are ordered according to self.classes_.
- *extrasnumpy.ndarray, optional
Only returned if extra_outputs is not None. In that case, the method returns a tuple whose first element is P and whose remaining elements correspond to the requested forward outputs in the order given by extra_outputs. Potential outputs are:
L_class : np.ndarray of shape (n_samples, n_classes), where L_class[n, c] is the logit for the class classes_[c] of sample X[n].
P_perf : np.ndarray of shape (n_samples, n_annotators), where P_perf[n, m] refers to the estimated label correctness probability (performance) of annotator m when labeling sample X[n].
P_annot : np.ndarray of shape (n_samples, n_annotators, n_classes), where P_annot[n, m, c] refers to the probability that annotator m provides the class label c for sample X[n]. Only returned, if return_annotator_class=True.
- score(X, y, sample_weight=None)#
Return the mean accuracy on the given test data and labels.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of self.predict(X) regarding y.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') AnnotatorLogisticRegression#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_predict_proba_request(*, extra_outputs: bool | None | str = '$UNCHANGED$') AnnotatorLogisticRegression#
Configure whether metadata should be requested to be passed to the
predict_probamethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredict_probaif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict_proba.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- extra_outputsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
extra_outputsparameter inpredict_proba.
- Returns:
- selfobject
The updated object.
- set_predict_request(*, extra_outputs: bool | None | str = '$UNCHANGED$') AnnotatorLogisticRegression#
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- extra_outputsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
extra_outputsparameter inpredict.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') AnnotatorLogisticRegression#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.