Spaces:

AashishAIHub
/

DataScience

Running

File size: 5,760 Bytes

854c114

{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 23 - Model Explainability (SHAP)\n",
                "\n",
                "Welcome to the final \"Industry-Grade\" module! **Model Explainability** is about knowing *why* your model made a decision. This is critical for building trust, especially in sensitive areas like finance or medicine.\n",
                "\n",
                "### Objectives:\n",
                "1. **Global Interpretability**: Which features matter most across the whole dataset?\n",
                "2. **Local Interpretability**: Why was *this specific person* denied a loan?\n",
                "3. **SHAP values**: Game-theoretic approach to feature contribution.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use a small Random Forest classifier on the **Breast Cancer** dataset."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "import numpy as np\n",
                "from sklearn.datasets import load_breast_cancer\n",
                "from sklearn.ensemble import RandomForestClassifier\n",
                "from sklearn.model_selection import train_test_split\n",
                "\n",
                "# Note: You will need to install shap: pip install shap\n",
                "import shap\n",
                "\n",
                "# Load data\n",
                "data = load_breast_cancer()\n",
                "X = pd.DataFrame(data.data, columns=data.feature_names)\n",
                "y = data.target\n",
                "\n",
                "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
                "\n",
                "# Train a model\n",
                "model = RandomForestClassifier(n_estimators=100, random_state=42)\n",
                "model.fit(X_train, y_train)\n",
                "\n",
                "print(\"Model trained!\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Using SHAP (Global)\n",
                "\n",
                "### Task 1: Summary Plot\n",
                "Create a SHAP Tree Explainer and plot a summary of the feature importances. This is more detailed than standard feature importance as it shows the direction (positive/negative) of the impact."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "explainer = shap.TreeExplainer(model)\n",
                "shap_values = explainer.shap_values(X_test)\n",
                "\n",
                "# For binary classification, use [1] for the positive class\n",
                "shap.summary_plot(shap_values[1], X_test)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Local Performance\n",
                "\n",
                "### Task 2: Force Plot\n",
                "Pick the first person in the test set and explain the model's prediction for them specifically using a force plot."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "# Plot for the first record in the test set\n",
                "shap.initjs()\n",
                "shap.force_plot(explainer.expected_value[1], shap_values[1][0,:], X_test.iloc[0,:])\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### The Ultimate Skill Unlocked! \n",
                "You can now explain black-box models to humans. This is the mark of a top-tier Data Scientist.\n",
                "You have completed all 23 modules of the master series!"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}