.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated/sphinx_gallery_examples/1-pool-classification/plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_sphinx_gallery_examples_1-pool-classification_plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.py: Query-by-Committee (QBC) with Kullback-Leibler Divergence ========================================================= .. GENERATED FROM PYTHON SOURCE LINES 7-8 **Idea:** QBC maintains a committee of models and selects unlabeled samples where the committee most disagrees, targeting epistemic uncertainty. In batch mode, it ranks points by a disagreement score and takes the top `batch_size` samples. KL-divergence disagreement (classification) means that for each model, we compute the Kullback–Leibler divergence between its predictive distribution and the committee average and average across models. Larger values indicate stronger distributional disagreement. .. GENERATED FROM PYTHON SOURCE LINES 10-20 | **Google Colab Note**: If the notebook fails to run after installing the needed packages, try to restart the runtime (Ctrl + M) under Runtime -> Restart session. .. image:: https://colab.research.google.com/assets/colab-badge.svg :target: https://colab.research.google.com/github/scikit-activeml/scikit-activeml.github.io/blob/gh-pages/latest/generated/sphinx_gallery_notebooks//1-pool-classification/plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.ipynb | **Notebook Dependencies** | Uncomment the following cell to install all dependencies for this tutorial. .. GENERATED FROM PYTHON SOURCE LINES 20-23 .. code-block:: Python # !pip install scikit-activeml .. GENERATED FROM PYTHON SOURCE LINES 24-127 .. code-block:: Python import numpy as np from matplotlib import pyplot as plt, animation from sklearn.datasets import make_blobs from sklearn.model_selection import train_test_split from skactiveml.utils import MISSING_LABEL, labeled_indices from skactiveml.visualization import plot_utilities, plot_decision_boundary from skactiveml.classifier import ParzenWindowClassifier from sklearn.ensemble import BaggingClassifier from skactiveml.pool import QueryByCommittee random_state = np.random.RandomState(0) # Build a dataset. X_true, y_clusters = make_blobs( n_samples=400, n_features=2, centers=[[0, 1], [-3, 0.5], [-1, -1], [2, 1], [1, -0.5]], cluster_std=0.7, random_state=random_state, ) y_true = y_clusters % 2 X_pool, X_test, y_pool, y_test = train_test_split( X_true, y_true, test_size=0.25, random_state=random_state ) X = X_pool y = np.full(shape=y_pool.shape, fill_value=MISSING_LABEL) # Initialise the classifier. clf = ParzenWindowClassifier(classes=np.unique(y_true), class_prior=0.1) # Initialise the query strategy. qs = QueryByCommittee(method='KL_divergence', sample_predictions_method_name='sample_proba', sample_predictions_dict={'n_samples': 100}) # Preparation for plotting. fig, ax = plt.subplots() feature_bound = [ [min(X[:, 0]), min(X[:, 1])], [max(X[:, 0]), max(X[:, 1])] ] artists = [] # Active learning cycle: n_cycles = 20 for c in range(n_cycles): # Fit the classifier with current labels. clf.fit(X, y) # Query the next sample(s). query_idx = qs.query(X=X, y=y, ensemble=clf) # Capture the current plot state. coll_old = list(ax.collections) title = ax.text( 0.5, 1.05, f"Decision boundary after acquiring {c} labels\n" f"Test Accuracy: {clf.score(X_test, y_test):.4f}", size=plt.rcParams["axes.titlesize"], ha="center", transform=ax.transAxes, ) # Update plot with utility values, samples, and decision boundary. X_labeled = X[labeled_indices(y)] ax = plot_utilities( qs, X=X, y=y, ensemble=clf, candidates=None, res=25, feature_bound=feature_bound, ax=ax, ) ax.scatter( X[:, 0], X[:, 1], c=y_pool, cmap="coolwarm", marker=".", zorder=2 ) ax.scatter( X_labeled[:, 0], X_labeled[:, 1], c="grey", alpha=0.8, marker=".", s=300, ) ax = plot_decision_boundary(clf, feature_bound, ax=ax) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') coll_new = list(ax.collections) coll_new.append(title) artists.append([x for x in coll_new if x not in coll_old]) # Update labels based on query. y[query_idx] = y_pool[query_idx] ani = animation.ArtistAnimation(fig, artists, interval=1000, blit=True) .. container:: sphx-glr-animation .. raw:: html
.. GENERATED FROM PYTHON SOURCE LINES 128-129 .. image:: ../../examples/pool_classification_legend.png .. GENERATED FROM PYTHON SOURCE LINES 131-136 .. rubric:: References: The implementation of this strategy is based on :footcite:t:`seung1992query` and :footcite:t:`mccallum1998employing`. .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 6.112 seconds) .. _sphx_glr_download_generated_sphinx_gallery_examples_1-pool-classification_plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot-QueryByCommittee-Query-by-Committee_(QBC)_with_Kullback-Leibler_Divergence.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_