{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 23 - Model Explainability (SHAP)\n", "\n", "Welcome to the final \"Industry-Grade\" module! **Model Explainability** is about knowing *why* your model made a decision. This is critical for building trust, especially in sensitive areas like finance or medicine.\n", "\n", "### Objectives:\n", "1. **Global Interpretability**: Which features matter most across the whole dataset?\n", "2. **Local Interpretability**: Why was *this specific person* denied a loan?\n", "3. **SHAP values**: Game-theoretic approach to feature contribution.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "We will use a small Random Forest classifier on the **Breast Cancer** dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.datasets import load_breast_cancer\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Note: You will need to install shap: pip install shap\n", "import shap\n", "\n", "# Load data\n", "data = load_breast_cancer()\n", "X = pd.DataFrame(data.data, columns=data.feature_names)\n", "y = data.target\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "\n", "# Train a model\n", "model = RandomForestClassifier(n_estimators=100, random_state=42)\n", "model.fit(X_train, y_train)\n", "\n", "print(\"Model trained!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Using SHAP (Global)\n", "\n", "### Task 1: Summary Plot\n", "Create a SHAP Tree Explainer and plot a summary of the feature importances. This is more detailed than standard feature importance as it shows the direction (positive/negative) of the impact." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "explainer = shap.TreeExplainer(model)\n", "shap_values = explainer.shap_values(X_test)\n", "\n", "# For binary classification, use [1] for the positive class\n", "shap.summary_plot(shap_values[1], X_test)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Local Performance\n", "\n", "### Task 2: Force Plot\n", "Pick the first person in the test set and explain the model's prediction for them specifically using a force plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "# Plot for the first record in the test set\n", "shap.initjs()\n", "shap.force_plot(explainer.expected_value[1], shap_values[1][0,:], X_test.iloc[0,:])\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### The Ultimate Skill Unlocked! \n", "You can now explain black-box models to humans. This is the mark of a top-tier Data Scientist.\n", "You have completed all 23 modules of the master series!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }