skactiveml.classifier.multiannotator.AnnotatorLogisticRegression#

class skactiveml.classifier.multiannotator.AnnotatorLogisticRegression(tol=0.01, max_iter=100, fit_intercept=True, annot_prior_full=1, annot_prior_diag=0, weights_prior=1, solver='Newton-CG', solver_dict=None, classes=None, cost_matrix=None, missing_label=nan, random_state=None)[source]#

Bases: SkactivemlClassifier, AnnotatorModelMixin

Logistic Regression based on Raykar [1] is a classification algorithm that learns from multiple annotators. Besides, building a model for the classification task, the algorithm estimates the performance of the annotators. The performance of an annotator is assumed to only depend on the true label of a sample and not on the sample itself. Each annotator is assigned a confusion matrix, where each row is normalized. This contains the bias of the annotators decisions. These estimated biases are then used to refine the classifier itself.

The classifier also supports a bayesian view on the problem, for this a prior distribution over an annotator’s confusion matrix is assumed. It also assumes a prior distribution over the classifiers weight vectors corresponding to a regularization.

Parameters
tolfloat, default=1.e-2

Threshold for stopping the EM-Algorithm, if the change of the expectation value between two steps is smaller than tol, the fit algorithm stops.

max_iterint, default=100

The maximum number of iterations of the EM-algorithm to be performed.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to input samples.

annot_prior_fullint or float or array-like, default=1

This parameter determines A as the Dirichlet prior for each annotator l (i.e., A[l] = annot_prior_full * np.ones(n_classes, n_classes) for numeric or A[l] = annot_prior_full[l] * np.ones(n_classes, n_classes) for array-like parameter). A[l,i,j] is the estimated number of times. annotator l has provided label j for an instance of true label i.

annot_prior_diagint or float or array-like, default=0

This parameter adds a value to the diagonal of A[l] being the Dirichlet prior for annotator l (i.e., A[l] += annot_prior_diag * np.eye(n_classes) for numeric or A[l] += annot_prior_diag[l] * np.ones(n_classes) for array-like parameter). A[l,i,j] is the estimated number of times annotator l has provided label j for an instance of true label i.

weights_priorint or float, default=1

Determines Gamma as the inverse covariance matrix of the prior distribution for every weight vector (i.e., Gamma=weights_prior * np.eye(n_features)). As default, the identity matrix is used for each weight vector.

solverstr or callable, default=’Newton-CG’

Type of solver. Should be ‘Nelder-Mead’, ‘Powell’, ‘CG’, ‘BFGS’, ‘Newton-CG’, ‘L-BFGS-B’, ‘TNC’, ‘COBYLA’, ‘SLSQP’, ‘trust-constr’, ‘dogleg’, ‘trust-ncg’, ‘trust-exact’, ‘trust-krylov’, or custom - a callable object. See scipy.optimize.minimize for more information.

solver_dictdictionary, default=None

Additional solver options passed to scipy.optimize.minimize. If None, {‘maxiter’: 5} is passed.

classesarray-like of shape (n_classes), default=None

Holds the label for each class. If none, the classes are determined during the fit.

missing_labelscalar or string or np.nan or None, default=np.nan

Value to represent a missing label.

cost_matrixarray-like of shape (n_classes, n_classes)

Cost matrix with cost_matrix[i,j] indicating cost of predicting class classes[j] for a sample of class classes[i]. Can be only set, if classes is not none.

random_stateint or RandomState instance or None, optional (default=None)

Determines random number for ‘predict’ method. Pass an int for reproducible results across multiple method calls.

References

1

`Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., & Moy, L. (2010). Learning from crowds. Journal of Machine Learning Research, 11(4).`_

Attributes
n_annotators_int

Number of annotators.

W_numpy.ndarray of shape (n_features, n_classes)

The weight vectors of the logistic regression model.

Alpha_numpy.ndarray of shape (n_annotators, n_classes, n_classes)

This is a confusion matrix for each annotator, where each row is normalized. Alpha_[l,k,c] describes the probability that annotator l provides the class label c for a sample belonging to class k.

classes_array-like of shape (n_classes)

Holds the label for each class after fitting.

cost_matrix_array-like of shape (classes, classes)

Cost matrix with C[i,j] indicating cost of predicting class classes_[j] for a sample of class classes_[i].

Methods

fit(X, y[, sample_weight])

Fit the model using X as training data and y as class labels.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Return class label predictions for the test samples X.

predict_annotator_perf(X)

Calculates the probability that an annotator provides the true label for a given sample.

predict_proba(X)

Return probability estimates for the test data X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

fit(X, y, sample_weight=None)[source]#

Fit the model using X as training data and y as class labels.

Parameters
Xmatrix-like, shape (n_samples, n_features)

The sample matrix X is the feature matrix representing the samples.

yarray-like, shape (n_samples) or (n_samples, n_outputs)

It contains the class labels of the training samples. The number of class labels may be variable for the samples, where missing labels are represented the attribute ‘missing_label’.

sample_weightarray-like, shape (n_samples) or (n_samples, n_outputs)

It contains the weights of the training samples’ class labels. It must have the same shape as y.

Returns
self: AnnotatorLogisticRegression,

The AnnotatorLogisticRegression is fitted on the training data.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

predict(X)#

Return class label predictions for the test samples X.

Parameters
Xarray-like of shape (n_samples, n_features)

Input samples.

Returns
ynumpy.ndarray of shape (n_samples)

Predicted class labels of the test samples X. Classes are ordered according to classes_.

predict_annotator_perf(X)[source]#

Calculates the probability that an annotator provides the true label for a given sample. The true label is hereby provided by the classification model. The label provided by an annotator l is based on his/her confusion matrix (i.e., attribute Alpha_[l]).

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

Returns
P_annotnumpy.ndarray of shape (n_samples, classes)

P_annot[i,l] is the probability, that annotator l provides the correct class label for sample X[i].

predict_proba(X)[source]#

Return probability estimates for the test data X.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

Returns
Pnumpy.ndarray of shape (n_samples, classes)

The class probabilities of the test samples. Classes are ordered according to classes_.

score(X, y, sample_weight=None)#

Return the mean accuracy on the given test data and labels.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) regarding y.

set_fit_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') AnnotatorLogisticRegression#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

set_score_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') AnnotatorLogisticRegression#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns
selfobject

The updated object.