File size: 5,760 Bytes
854c114
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 23 - Model Explainability (SHAP)\n",
                "\n",
                "Welcome to the final \"Industry-Grade\" module! **Model Explainability** is about knowing *why* your model made a decision. This is critical for building trust, especially in sensitive areas like finance or medicine.\n",
                "\n",
                "### Objectives:\n",
                "1. **Global Interpretability**: Which features matter most across the whole dataset?\n",
                "2. **Local Interpretability**: Why was *this specific person* denied a loan?\n",
                "3. **SHAP values**: Game-theoretic approach to feature contribution.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use a small Random Forest classifier on the **Breast Cancer** dataset."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "import numpy as np\n",
                "from sklearn.datasets import load_breast_cancer\n",
                "from sklearn.ensemble import RandomForestClassifier\n",
                "from sklearn.model_selection import train_test_split\n",
                "\n",
                "# Note: You will need to install shap: pip install shap\n",
                "import shap\n",
                "\n",
                "# Load data\n",
                "data = load_breast_cancer()\n",
                "X = pd.DataFrame(data.data, columns=data.feature_names)\n",
                "y = data.target\n",
                "\n",
                "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
                "\n",
                "# Train a model\n",
                "model = RandomForestClassifier(n_estimators=100, random_state=42)\n",
                "model.fit(X_train, y_train)\n",
                "\n",
                "print(\"Model trained!\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Using SHAP (Global)\n",
                "\n",
                "### Task 1: Summary Plot\n",
                "Create a SHAP Tree Explainer and plot a summary of the feature importances. This is more detailed than standard feature importance as it shows the direction (positive/negative) of the impact."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "explainer = shap.TreeExplainer(model)\n",
                "shap_values = explainer.shap_values(X_test)\n",
                "\n",
                "# For binary classification, use [1] for the positive class\n",
                "shap.summary_plot(shap_values[1], X_test)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Local Performance\n",
                "\n",
                "### Task 2: Force Plot\n",
                "Pick the first person in the test set and explain the model's prediction for them specifically using a force plot."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "# Plot for the first record in the test set\n",
                "shap.initjs()\n",
                "shap.force_plot(explainer.expected_value[1], shap_values[1][0,:], X_test.iloc[0,:])\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### The Ultimate Skill Unlocked! \n",
                "You can now explain black-box models to humans. This is the mark of a top-tier Data Scientist.\n",
                "You have completed all 23 modules of the master series!"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}