{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": true,
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Sample Annotating"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **_Google Colab Note:_** If the notebook fails to run after installing the needed packages, try to restart the runtime (Ctrl + M) under Runtime -> Restart session.\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scikit-activeml/scikit-activeml.github.io/blob/gh-pages/latest/generated/tutorials_colab//03_pool_oracle_annotations.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Notebook Dependencies**\n",
    "\n",
    "Uncomment the following cells to install all dependencies for this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install scikit-activeml\n",
    "# !pip install ipyannotations\n",
    "# !pip install superintendent\n",
    "# !jupyter nbextension install --user --py ipyannotations\n",
    "# !jupyter nbextension enable --user --py ipyannotations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr style=\"border-style: solid; border-top: 1px solid; border-right: 0; border-bottom: 0; border-left: 0;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In supervised and semi-supervised machine learning it is necessary to label data after it was selected by an active learning algorithm. This tutorial shows how to make a simple annotation tool using [ipyannotations](https://ipyannotations.readthedocs.io/en/latest/index.html) and [superintendent](https://superintendent.readthedocs.io/en/latest/index.html). This tutorial requires prior knowledge of our framework. If you are not familiar with it, try some basic [tutorials](https://scikit-activeml.github.io/latest/tutorials.html).\n",
    "\n",
    "> **_NOTE:_** For testing execute this notebook on your local machine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import math\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.datasets import load_digits\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "from skactiveml.utils import is_labeled\n",
    "from skactiveml.classifier import SklearnClassifier\n",
    "from skactiveml.pool import UncertaintySampling\n",
    "\n",
    "from superintendent import Superintendent\n",
    "from ipywidgets import widgets\n",
    "from ipyannotations.images import ClassLabeller\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## The Annotation Widget Class\n",
    "\n",
    "At first we define the class `DataLabeler`, which inherits from `Superintendent`. To adapt it to our framework, we have to overwrite the constructor and the methods `_annotation_iterator`, `retrain`, and `_undo`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from skactiveml.utils import unlabeled_indices, call_func\n",
    "\n",
    "\n",
    "class DataLabeler(Superintendent):\n",
    "    \"\"\"DataLabeler\n",
    "\n",
    "    This class creates a widget for label assignments.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    X : array-like of shape (n_samples, *)\n",
    "        Training data set, usually complete, i.e. including the labeled and\n",
    "        unlabeled samples.\n",
    "    y : array-like of shape (n_samples,)\n",
    "        Labels of the training data set (possibly including unlabeled ones\n",
    "        indicated by self.MISSING_LABEL).\n",
    "    clf : skactiveml.base.SkactivemlClassifier\n",
    "        Model implementing the method `fit`. Check `query_strategy` for\n",
    "        compatibility.\n",
    "    query_strategy : skactiveml.base.QueryStrategy\n",
    "        Query strategy used to select the next sample(s) to be labeled.\n",
    "    labelling_widget : Optional (widgets.Widget)\n",
    "        An input widget. This needs to follow the interface of the class\n",
    "        `superintendent.controls.base.SubmissionWidgetMixin`.\n",
    "    query_dict : dict, default=None\n",
    "        A dictionary with additional arguments past to `query_strategy`.\n",
    "    shape_query : Tuple, default=None\n",
    "        The shape of `X` that is expected of `query_strategy`.\n",
    "    shape_clf : tuple, default=None\n",
    "        The shape of `X` that is expected of `clf.fit`.\n",
    "    batch_size : int, default=1\n",
    "        The number of samples to be selected in one AL cycle.\n",
    "    n_cycles : int, default=None\n",
    "        `n_cycles`*`batch_size` is the maximum number of samples you want to\n",
    "        annotate. If `None`, the entire dataset is requested for labeling.\n",
    "    X_eval : array-like of shape (n_eval_samples, n_features), default=None\n",
    "        Evaluation data set that is used by the `eval_method`. Only used if\n",
    "        `y_eval` is specified.\n",
    "    y_eval : array-like of shape (n_eval_samples), default=None\n",
    "        Labels for the evaluation data set. Only used if `X_eval` is\n",
    "        specified.\n",
    "    clf_eval : skactiveml.base.SkactivemlClassifier\n",
    "        Model implementing the method `fit`, passed to the `eval_method`.\n",
    "        If None, `clf` is used.\n",
    "    eval_method : callable\n",
    "        A function that accepts three arguments - `clf`, `X`, and `y` - and\n",
    "        returns a validation score of the `clf`. If None,\n",
    "        `sklearn.model_selection.cross_val_score` is used.\n",
    "        \"\"\"\n",
    "    def __init__(\n",
    "            self,\n",
    "            X,\n",
    "            y,\n",
    "            clf,\n",
    "            query_strategy,\n",
    "            labelling_widget,\n",
    "            query_dict=None,\n",
    "            shape_query=None,\n",
    "            shape_clf=None,\n",
    "            batch_size=1,\n",
    "            n_cycles=None,\n",
    "            X_eval=None,\n",
    "            y_eval=None,\n",
    "            clf_eval=None,\n",
    "            eval_method=None,\n",
    "            **kwargs,\n",
    "    ):\n",
    "        # Call the super constructor.\n",
    "        try:\n",
    "            super().__init__(\n",
    "                labelling_widget=labelling_widget,\n",
    "                eval_method=eval_method,\n",
    "                **kwargs\n",
    "            )\n",
    "        except AttributeError:\n",
    "            pass\n",
    "\n",
    "        # Assign parameters.\n",
    "        self.X = X\n",
    "        self.y = y\n",
    "        self.clf = clf\n",
    "        self.shape_query = shape_query\n",
    "        self.shape_clf = shape_clf\n",
    "        self.X_eval = X_eval\n",
    "        self.y_eval = y_eval\n",
    "        self.clf_eval = clf_eval or clf\n",
    "        self.query_dict = query_dict or {}\n",
    "        self.batch_size = batch_size\n",
    "        self.query_strategy = query_strategy\n",
    "        self.n_cycles = n_cycles or math.ceil(len(X)/batch_size)\n",
    "\n",
    "        self.labeled_indices = []\n",
    "        self.labels = []\n",
    "        self.candidates = unlabeled_indices(y)\n",
    "\n",
    "        # Generate the widgets.\n",
    "        self.model_performance = widgets.HTML(value=\"\")\n",
    "        self.top_bar = widgets.HBox(\n",
    "            [\n",
    "                widgets.HBox(\n",
    "                    [self.progressbar],\n",
    "                    layout=widgets.Layout(width=\"50%\",\n",
    "                                          justify_content=\"space-between\"),\n",
    "                ),\n",
    "                widgets.HBox(\n",
    "                    [self.model_performance],\n",
    "                    layout=widgets.Layout(width=\"50%\"),\n",
    "                ),\n",
    "            ]\n",
    "        )\n",
    "        self.children = [self.top_bar, self.labelling_widget]\n",
    "\n",
    "        # Start the annotation loop.\n",
    "        self._begin_annotation()\n",
    "\n",
    "    def _annotation_iterator(self):\n",
    "        \"\"\"The annotation loop.\"\"\"\n",
    "        self.children = [self.top_bar, self.labelling_widget]\n",
    "        self.progressbar.bar_style = \"\"\n",
    "        # Fit the clf\n",
    "        self.retrain()\n",
    "        i = 0\n",
    "        y = None\n",
    "        while i < self.n_cycles:\n",
    "            # Query the next batch of samples.\n",
    "            self.query_dict[\"X\"] = self.X.reshape(self.shape_query)\n",
    "            idx = call_func(self.query_strategy.query,\n",
    "                            y=self.y,\n",
    "                            clf=self.clf,\n",
    "                            reg=self.clf,\n",
    "                            ensemble=self.clf,\n",
    "                            candidates=self.candidates,\n",
    "                            batch_size=self.batch_size,\n",
    "                            **self.query_dict)\n",
    "            j = 0\n",
    "            if y == 'undo':\n",
    "                j = self.batch_size-1\n",
    "                if self.batch_size != 1:\n",
    "                    self.y[self.labeled_indices[-j:]] = self.labels[-j:]\n",
    "                    self.candidates = np.delete(\n",
    "                        self.candidates,\n",
    "                        self.labeled_indices[-j:]\n",
    "                    )\n",
    "            while j<len(idx):\n",
    "                # Display and label the next sample.\n",
    "                with self._render_hold_message(\"Loading...\"):\n",
    "                    self.labelling_widget.display(self.X[idx[j]])\n",
    "                y = yield\n",
    "                if y is None:\n",
    "                    y = self.clf.missing_label\n",
    "                if y == 'undo':\n",
    "                    # If the undo button is press, the previous step is undone.\n",
    "                    if j==0:\n",
    "                        i -= 2\n",
    "                        self.y[self.labeled_indices[-self.batch_size:]] = \\\n",
    "                            self.clf.missing_label\n",
    "                        self.candidates = np.sort(np.append(\n",
    "                            self.candidates,\n",
    "                            self.labeled_indices[-self.batch_size:]\n",
    "                        ))\n",
    "                        self.labels.pop()\n",
    "                        self.labeled_indices.pop()\n",
    "                        self.progressbar.value = \\\n",
    "                            np.sum(is_labeled(self.y))\\\n",
    "                            /(min(self.n_cycles*self.batch_size, len(self.X)))\n",
    "                        break\n",
    "                    self.y[self.labeled_indices[-1]] = self.clf.missing_label\n",
    "                    self.labels.pop()\n",
    "                    self.candidates = np.append(\n",
    "                        self.candidates, self.labeled_indices.pop()\n",
    "                    )\n",
    "                    j -= 2\n",
    "                else:\n",
    "                    # Assigning the label.\n",
    "                    self.labeled_indices.append(idx[j])\n",
    "                    self.labels.append(y)\n",
    "                    self.y[idx[j]] = y\n",
    "                    self.candidates = self.candidates[self.candidates!=idx[j]]\n",
    "\n",
    "                self.progressbar.value = \\\n",
    "                    ((j+i*self.batch_size)\n",
    "                       /(min(self.n_cycles*self.batch_size, len(self.X))))\n",
    "                j += 1\n",
    "            # Fit the clf.\n",
    "            self.retrain()\n",
    "            # Brake if all samples are labeled.\n",
    "            if len(self.candidates) == 0:\n",
    "                break\n",
    "            i += 1\n",
    "\n",
    "        yield self._render_finished()\n",
    "\n",
    "    def _undo(self):\n",
    "        if len(self.labeled_indices) > 0:\n",
    "            self._annotation_loop.send('undo')  # Advance next item\n",
    "\n",
    "\n",
    "    def retrain(self, button=None):\n",
    "        \"\"\"Re-train the `clf` you passed when creating this widget.\n",
    "\n",
    "        This calls the `fit` method of your `clf` with the data that you've\n",
    "        labeled. It will also score the classifier and display the\n",
    "        performance.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        button : widget.Widget, optional\n",
    "            Optional & ignored; this is passed when invoked by a button.\n",
    "        \"\"\"\n",
    "        with self._render_hold_message(\"Retraining...\"):\n",
    "            if self.X_eval is not None:\n",
    "                X_eval = self.X_eval\n",
    "                y_eval = self.y_eval\n",
    "            else:\n",
    "                X_eval = self.X[is_labeled(self.y)]\n",
    "                y_eval = self.y[is_labeled(self.y)]\n",
    "            shape_clf = (len(X_eval), *self.shape_clf[1:])\n",
    "\n",
    "            # Fit the clf.\n",
    "            try:\n",
    "                self.clf.fit(self.X.reshape(self.shape_clf), self.y)\n",
    "            except ValueError as e:\n",
    "                if str(e).startswith(\n",
    "                        \"This solver needs samples of at least 2\"\n",
    "                ):\n",
    "                    self.model_performance.value = \\\n",
    "                        \"Not enough classes to retrain.\"\n",
    "                    return\n",
    "                else:\n",
    "                    raise\n",
    "\n",
    "            # Evaluate the clf. By default, using cross validation.\n",
    "            # In sklearn this clones the clf, so it's OK to do after the clf\n",
    "            # fit.\n",
    "            try:\n",
    "                if self.eval_method is not None:\n",
    "                    performance = np.mean(\n",
    "                        self.eval_method(\n",
    "                            self.clf_eval,\n",
    "                            X_eval.reshape(shape_clf),\n",
    "                            y_eval\n",
    "                        )\n",
    "                    )\n",
    "                else:\n",
    "                    performance = np.mean(\n",
    "                        cross_val_score(\n",
    "                            self.clf_eval,\n",
    "                            X_eval.reshape(shape_clf),\n",
    "                            y_eval,\n",
    "                            cv=3,\n",
    "                            error_score=np.nan\n",
    "                        )\n",
    "                    )\n",
    "            except ValueError as e:\n",
    "                if \"n_splits=\" in str(e) \\\n",
    "                        or \"Found array with 0 sample(s)\" in str(e) \\\n",
    "                        or \"cannot reshape array of size 0\" in str(e):\n",
    "                    self.model_performance.value = \\\n",
    "                        \"Not enough labels to evaluate.\"\n",
    "                    return\n",
    "                else:\n",
    "                    raise\n",
    "\n",
    "            self.model_performance.value = f\"Score: {performance:.3f}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Create Dataset\n",
    "For this tutorial we use the digit data set available through the `sklearn` package. The 8x8 images show handwritten digits from 0 to 9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = load_digits().data.reshape(-1, 8, 8)\n",
    "y = np.full(shape=len(X), fill_value=np.nan)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Create and Start Annotation Process\n",
    "\n",
    "As classifier, `MLPClassifier` and `StandardScaler` by `sklearn` is used in a pipeline and `UncertaintySampling` from our framework `Skactiveml` as query strategy. `ClassLabeller` creates a ipywidget, which displays the selected sample and provides the labelling interface for the user. This class can be exchanged by [other widgets](https://ipyannotations.readthedocs.io/en/latest/widget-list.html) to support different types of data. The `DataLabeler`-widget manages the iteration over the data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "random_state = 42\n",
    "pipe = Pipeline([('scaler', StandardScaler()), ('MLP', MLPClassifier(random_state=random_state))])\n",
    "clf = SklearnClassifier(pipe, classes=range(10), random_state=random_state)\n",
    "\n",
    "qs = UncertaintySampling(random_state=random_state)\n",
    "\n",
    "labelling_widget = ClassLabeller(\n",
    "    options=list(range(0, 10)), image_size=(100, 100)\n",
    ")\n",
    "\n",
    "data_labeler = DataLabeler(\n",
    "    X=X,\n",
    "    y=y,\n",
    "    clf=clf,\n",
    "    shape_query=(len(X), -1),\n",
    "    shape_clf=(len(X), -1),\n",
    "    query_strategy=qs,\n",
    "    labelling_widget=labelling_widget,\n",
    "    batch_size=2,\n",
    "    n_cycles=50\n",
    ")\n",
    "data_labeler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The cell above produces an output which looks like the following image.\n",
    "\n",
    "![Annotating widget](https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/03_pool_oracle_annotations.png)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "skaml",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}