{ "cells": [ { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "a6a69339-37e8-4cb6-863a-858e6df8872f", "showTitle": false, "title": "" }, "id": "o5Iixw4vHWG9" }, "source": [ "# Problem Statement" ] }, { "cell_type": "markdown", "metadata": { "id": "daBFxPKyAhrl" }, "source": [ "## Business Context" ] }, { "cell_type": "markdown", "metadata": { "id": "ucRv3UZ6zed-" }, "source": [ "In the competitive landscape of retail banking, customer retention is critical for ensuring sustainable growth and profitability. A prominent retail banking institution in Europe provides a range of financial products, including credit cards, loans, and savings accounts, and has been rapidly expanding its customer base across multiple countries. However, with a growing customer base, it faces an increasingly pressing challenge: customer churn. A significant number of customers are closing their accounts and switching to competitors. This decline in customer retention is impacting revenue and long-term customer relationships\n", "\n", "Understanding the reasons behind customer attrition (or churn) is essential for the bank to devise effective retention strategies to minimize churn and enhance customer loyalty and satisfaction. The Customer Analytics & Retention Department has been diligently collecting and analyzing historical customer data. Despite the valuable insights provided by historical data, the department grapples with several challenges:\n", "\n", "1. **Complex Customer Behavior**: The diverse nature of the bank's offerings and the varying customer preferences across different countries complicate the identification of factors that lead to churn.\n", "2. **Proactive Retention**: The current processes for identifying at-risk customers are reactive rather than proactive, leading to missed opportunities for timely interventions that could prevent churn." ] }, { "cell_type": "markdown", "metadata": { "id": "o7nc6WZUvX3Z" }, "source": [ "## Objective" ] }, { "cell_type": "markdown", "metadata": { "id": "GllLo7aI02Ep" }, "source": [ "The Customer Analytics & Retention Department has successfully developed a machine learning model that identifies patterns indicative of churn risk and predicts the likelihood of customer churn. They recognize the potential of this model to significantly contribute to reducing churn rates by identifying at-risk customers before they decide to leave.\n", "\n", "However, to harness the value of this model, the team seeks to deploy it as a web application to allow for broader use across departments, enabling customer service representatives, marketing teams, and management to access churn predictions in real time. The primary objective is to create an intuitive web app with the ML model under the hood to identify customers at risk of churn. The successful deployment of this web application will facilitate timely interventions, improve customer retention strategies, and ultimately work towards enhancing customer satisfaction and loyalty." ] }, { "cell_type": "markdown", "metadata": { "id": "fgwYhsXzvbnH" }, "source": [ "## Data Dictionary" ] }, { "cell_type": "markdown", "metadata": { "id": "vf5bB5IsveCg" }, "source": [ "- **CustomerId**: Unique identifier for each customer. \n", "- **Surname**: Customer's last name. \n", "- **CreditScore**: Customer's credit score. \n", "- **Geography**: Country where the customer resides. \n", "- **Age**: Customer's age in years. \n", "- **Tenure**: Number of years the customer has been with the bank. \n", "- **Balance**: Customer’s account balance. \n", "- **NumOfProducts**: Number of products the customer has with the bank. \n", "- **HasCrCard**: Indicates if the customer has a credit card (1 = Yes, 0 = No). \n", "- **IsActiveMember**: Indicates if the customer is an active member (1 = Yes, 0 = No). \n", "- **EstimatedSalary**: Customer’s estimated salary. \n", "- **Exited**: Indicates whether the customer churned (1 = Yes, 0 = No). " ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "f3a9806a-355a-41fe-9b2a-0bda43aedd4e", "showTitle": false, "title": "" }, "id": "niLZjnkCHWG_" }, "source": [ "# Installing and Importing Necessary Libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: Skipping pandas as it is not installed.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Collecting pandas==2.2.2\n", " Downloading pandas-2.2.2.tar.gz (4.4 MB)\n", " ---------------------------------------- 0.0/4.4 MB ? eta -:--:--\n", " ------- -------------------------------- 0.8/4.4 MB 4.4 MB/s eta 0:00:01\n", " ---------------- ----------------------- 1.8/4.4 MB 4.8 MB/s eta 0:00:01\n", " -------------------------- ------------- 2.9/4.4 MB 4.7 MB/s eta 0:00:01\n", " ----------------------------------- ---- 3.9/4.4 MB 4.8 MB/s eta 0:00:01\n", " ---------------------------------------- 4.4/4.4 MB 4.9 MB/s eta 0:00:00\n", " Installing build dependencies: started\n", " Installing build dependencies: finished with status 'done'\n", " Getting requirements to build wheel: started\n", " Getting requirements to build wheel: finished with status 'done'\n", " Installing backend dependencies: started\n", " Installing backend dependencies: finished with status 'done'\n", " Preparing metadata (pyproject.toml): started\n", " Preparing metadata (pyproject.toml): still running...\n", " Preparing metadata (pyproject.toml): still running...\n", " Preparing metadata (pyproject.toml): still running...\n", " Preparing metadata (pyproject.toml): still running...\n", " Preparing metadata (pyproject.toml): still running...\n", " Preparing metadata (pyproject.toml): finished with status 'done'\n", "Collecting numpy>=1.26.0 (from pandas==2.2.2)\n", " Downloading numpy-2.3.0-cp313-cp313-win_amd64.whl.metadata (60 kB)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\adity\\miniconda3\\envs\\md\\lib\\site-packages (from pandas==2.2.2) (2.9.0.post0)\n", "Collecting pytz>=2020.1 (from pandas==2.2.2)\n", " Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)\n", "Collecting tzdata>=2022.7 (from pandas==2.2.2)\n", " Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)\n", "Requirement already satisfied: six>=1.5 in c:\\users\\adity\\miniconda3\\envs\\md\\lib\\site-packages (from python-dateutil>=2.8.2->pandas==2.2.2) (1.17.0)\n", "Downloading numpy-2.3.0-cp313-cp313-win_amd64.whl (12.7 MB)\n", " ---------------------------------------- 0.0/12.7 MB ? eta -:--:--\n", " - -------------------------------------- 0.5/12.7 MB 3.6 MB/s eta 0:00:04\n", " ---- ----------------------------------- 1.6/12.7 MB 4.3 MB/s eta 0:00:03\n", " --------- ------------------------------ 2.9/12.7 MB 4.9 MB/s eta 0:00:03\n", " ------------- -------------------------- 4.2/12.7 MB 5.3 MB/s eta 0:00:02\n", " ------------------ --------------------- 5.8/12.7 MB 5.7 MB/s eta 0:00:02\n", " ----------------------- ---------------- 7.6/12.7 MB 6.1 MB/s eta 0:00:01\n", " ----------------------------- ---------- 9.4/12.7 MB 6.5 MB/s eta 0:00:01\n", " ------------------------------------ --- 11.5/12.7 MB 7.0 MB/s eta 0:00:01\n", " ---------------------------------------- 12.7/12.7 MB 7.0 MB/s eta 0:00:00\n", "Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)\n", "Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)\n", "Building wheels for collected packages: pandas\n", " Building wheel for pandas (pyproject.toml): started\n", " Building wheel for pandas (pyproject.toml): finished with status 'done'\n", " Created wheel for pandas: filename=pandas-2.2.2-cp313-cp313-win_amd64.whl size=39042675 sha256=d76901219749142dcbe4f1008b018c888c21ecc93502b56688b43199ada5790c\n", " Stored in directory: C:\\Users\\adity\\AppData\\Local\\Temp\\pip-ephem-wheel-cache-abhz96hy\\wheels\\d3\\d5\\d9\\b9df883b9242aa4091bb9baf55fceac592c4175236b44d0515\n", "Successfully built pandas\n", "Installing collected packages: pytz, tzdata, numpy, pandas\n", "\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------------------------------------- 0/4 [pytz]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " ---------- ----------------------------- 1/4 [tzdata]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " -------------------- ------------------- 2/4 [numpy]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ------------------------------ --------- 3/4 [pandas]\n", " ---------------------------------------- 4/4 [pandas]\n", "\n", "Successfully installed numpy-2.3.0 pandas-2.2.2 pytz-2025.2 tzdata-2025.2\n" ] } ], "source": [ "!pip uninstall -y pandas\n", "!pip install --no-cache-dir pandas==2.2.2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "9JZt3Or8v2dQ" }, "outputs": [], "source": [ "!pip install scikit-learn==1.6.1 xgboost==2.1.4 joblib==1.4.2 streamlit==1.43.2 huggingface_hub==0.29.3 -q" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "69c6950d-190a-4c7d-a5a2-a42164f4833b", "showTitle": false, "title": "" }, "id": "SEBLSdL-HWHA" }, "outputs": [], "source": [ "# for data manipulation\n", "import pandas as pd\n", "\n", "import sklearn\n", "\n", "# for data preprocessing and pipeline creation\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", "from sklearn.compose import make_column_transformer\n", "from sklearn.pipeline import make_pipeline\n", "\n", "# for model training, tuning, and evaluation\n", "import xgboost as xgb\n", "from sklearn.model_selection import GridSearchCV\n", "from sklearn.metrics import accuracy_score, classification_report, recall_score\n", "\n", "# for model serialization\n", "import joblib\n", "\n", "# for creating a folder\n", "import os\n", "\n", "# for hugging face space authentication to upload files\n", "from huggingface_hub import login, HfApi" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "d006e342-db1e-42d7-95e1-7cfed889cb59", "showTitle": false, "title": "" }, "id": "kzG9kVi4HWHB" }, "outputs": [], "source": [ "# Set scikit-learn's display mode to 'diagram' for better visualization of pipelines and estimators\n", "sklearn.set_config(display='diagram')" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "28fdb1c5-213d-4bfc-bf78-c14c60f0f365", "showTitle": false, "title": "" }, "id": "bQrbzi5RHWHC" }, "source": [ "# Data Loading and Overview" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "de73d795-30fe-4891-8a22-2356863aaef3", "showTitle": false, "title": "" }, "id": "h5ZOCZjlHWHC" }, "outputs": [], "source": [ "# Load the dataset from a CSV file into a Pandas DataFrame\n", "bank_churn = pd.read_csv(\"C:\\\\Users\\\\adity\\\\OneDrive\\\\Desktop\\\\GL\\\\Model_deployment\\\\bank_customer_churn.csv\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "auPAKY7ZaMKp" }, "outputs": [], "source": [ "# Create a copy of the dataframe\n", "dataset = bank_churn.copy()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 226 }, "id": "QrJ4xetpUwlI", "outputId": "2fac47f9-7fe1-4894-8c94-16324a06d90f" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CustomerIdSurnameCreditScoreGeographyAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalaryExited
015634602Hargrave619France42.020.0011.01.0101348.881
115647311Hill608Spain41.0183807.8610.01.0112542.580
215619304Onio502France42.08159660.8031.00.0113931.571
315701354Boni699France39.010.0020.00.093826.630
415737888Mitchell850Spain43.02125510.821NaN1.079084.100
\n", "
" ], "text/plain": [ " CustomerId Surname CreditScore Geography Age Tenure Balance \\\n", "0 15634602 Hargrave 619 France 42.0 2 0.00 \n", "1 15647311 Hill 608 Spain 41.0 1 83807.86 \n", "2 15619304 Onio 502 France 42.0 8 159660.80 \n", "3 15701354 Boni 699 France 39.0 1 0.00 \n", "4 15737888 Mitchell 850 Spain 43.0 2 125510.82 \n", "\n", " NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited \n", "0 1 1.0 1.0 101348.88 1 \n", "1 1 0.0 1.0 112542.58 0 \n", "2 3 1.0 0.0 113931.57 1 \n", "3 2 0.0 0.0 93826.63 0 \n", "4 1 NaN 1.0 79084.10 0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first five rows of the dataset\n", "dataset.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Nz87EbMeUr1F", "outputId": "832d3352-6aa6-4f89-8be0-37cae2159761" }, "outputs": [ { "data": { "text/plain": [ "(10002, 12)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the number of rows and columns in the dataset\n", "dataset.shape" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ICRnFc0EUuFi", "outputId": "5c03468e-9b8d-4aaa-b96e-761585e0d6e1" }, "outputs": [ { "data": { "text/plain": [ "Index(['CustomerId', 'Surname', 'CreditScore', 'Geography', 'Age', 'Tenure',\n", " 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember',\n", " 'EstimatedSalary', 'Exited'],\n", " dtype='object')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the column names of the dataset\n", "dataset.columns" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "c1f6d4b1-8956-4c8c-99c6-f616b6ffbbb7", "showTitle": false, "title": "" }, "id": "iT25LmvoHWHD" }, "source": [ "# EDA" ] }, { "cell_type": "markdown", "metadata": { "id": "1jOAC1TGpXOe" }, "source": [ "Let's start by defining the target and predictor (numerical and categorical) variables.\n", "\n", "- We'll not consider the `CustomerId` and `Surname` attributed as they don't add value to the analysis and consequent modeling.\n", "\n", "- Although the `HasCrCard` and `IsActiveMember` attributes are categorical (binary), we'll consider them as numerical as they're already encoded." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "20762caf-3731-4797-9fd2-8400061aae8a", "showTitle": false, "title": "" }, "id": "Y_tEmVPFHWHD" }, "outputs": [], "source": [ "# Define the target variable for the classification task\n", "target = 'Exited'\n", "\n", "# List of numerical features in the dataset\n", "numeric_features = [\n", " 'CreditScore', # Customer's credit score\n", " 'Age', # Customer's age\n", " 'Tenure', # Number of years the customer has been with the bank\n", " 'Balance', # Customer’s account balance\n", " 'NumOfProducts', # Number of products the customer has with the bank\n", " 'HasCrCard', # Whether the customer has a credit card (binary: 0 or 1)\n", " 'IsActiveMember', # Whether the customer is an active member (binary: 0 or 1)\n", " 'EstimatedSalary' # Customer’s estimated salary\n", "]\n", "\n", "# List of categorical features in the dataset\n", "categorical_features = [\n", " 'Geography', # Country where the customer resides\n", "]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "13f93578-a8e7-4cd1-b599-af01cde98947", "showTitle": false, "title": "" }, "colab": { "base_uri": "https://localhost:8080/", "height": 320 }, "id": "pRTCdlYTHWHD", "outputId": "8f8dd5bf-0027-4929-a249-85ddda622217" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CreditScoreAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalary
count10002.00000010001.00000010002.00000010002.00000010002.00000010001.00000010001.00000010002.000000
mean650.55508938.9223115.01249876491.1128751.5301940.7055290.514949100083.331145
std96.66161510.4872002.89197362393.4741440.5816390.4558270.49980157508.117802
min350.00000018.0000000.0000000.0000001.0000000.0000000.00000011.580000
25%584.00000032.0000003.0000000.0000001.0000000.0000000.00000050983.750000
50%652.00000037.0000005.00000097198.5400001.0000001.0000001.000000100185.240000
75%718.00000044.0000007.000000127647.8400002.0000001.0000001.000000149383.652500
max850.00000092.00000010.000000250898.0900004.0000001.0000001.000000199992.480000
\n", "
" ], "text/plain": [ " CreditScore Age Tenure Balance NumOfProducts \\\n", "count 10002.000000 10001.000000 10002.000000 10002.000000 10002.000000 \n", "mean 650.555089 38.922311 5.012498 76491.112875 1.530194 \n", "std 96.661615 10.487200 2.891973 62393.474144 0.581639 \n", "min 350.000000 18.000000 0.000000 0.000000 1.000000 \n", "25% 584.000000 32.000000 3.000000 0.000000 1.000000 \n", "50% 652.000000 37.000000 5.000000 97198.540000 1.000000 \n", "75% 718.000000 44.000000 7.000000 127647.840000 2.000000 \n", "max 850.000000 92.000000 10.000000 250898.090000 4.000000 \n", "\n", " HasCrCard IsActiveMember EstimatedSalary \n", "count 10001.000000 10001.000000 10002.000000 \n", "mean 0.705529 0.514949 100083.331145 \n", "std 0.455827 0.499801 57508.117802 \n", "min 0.000000 0.000000 11.580000 \n", "25% 0.000000 0.000000 50983.750000 \n", "50% 1.000000 1.000000 100185.240000 \n", "75% 1.000000 1.000000 149383.652500 \n", "max 1.000000 1.000000 199992.480000 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate summary statistics for numerical features\n", "dataset[numeric_features].describe()" ] }, { "cell_type": "markdown", "metadata": { "id": "tev5Z92d1PIr" }, "source": [ "1. **Balance Distribution:**\n", " - The mean balance is **~76,491**, but the **25th percentile is 0**, indicating that **at least 25% of customers have a balance of zero**.\n", " - Possible explanations:\n", " - Customers may use the account primarily for transactions, withdrawing funds immediately after deposits.\n", " - Some customers might have secondary accounts with zero balance.\n", " - Certain customers may have no savings account but only hold credit cards.\n", "\n", "2. **Age and Credit Score:**\n", " - The average age is **~39 years**, with a max of **92 years**.\n", " - The credit score ranges from **350 to 850**, with an average of **650**.\n", "\n", "3. **Products and Activity:**\n", " - Most customers have **1 or 2 products** (median = 1).\n", " - Around **70% have a credit card**, and **51% are active members**.\n", "\n", "4. **Churn Rate:**\n", " - The **Exited (churn) rate is ~20%**, meaning **1 in 5 customers leave** the bank.\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "888a838c-f16c-408c-bcd6-5aa0f13d943c", "showTitle": false, "title": "" }, "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "id": "phM5W-m2HWHE", "outputId": "391deb19-8407-4b35-8a84-15d2db51d503" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Geography
count10001
unique3
topFrance
freq5014
\n", "
" ], "text/plain": [ " Geography\n", "count 10001\n", "unique 3\n", "top France\n", "freq 5014" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[categorical_features].describe()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 178 }, "id": "dVRml2osH8Av", "outputId": "2c32a91a-a176-4e30-f75d-f14fcc187878" }, "outputs": [ { "data": { "text/plain": [ "Exited\n", "0 0.796241\n", "1 0.203759\n", "Name: proportion, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Compute the proportion of each class in the target variable\n", "dataset[target].value_counts(normalize=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "Fci59YnNfMF1" }, "source": [ "Dataset contains **10002** total records with the following distribution: \n", "\n", "- **79.6% customers did not churn (No)** \n", "- **20.3% customers churned (Yes)** \n", "\n", "This means the dataset is **imbalanced**" ] }, { "cell_type": "markdown", "metadata": { "id": "B8ebrkKpyfA1" }, "source": [ "### Why does this matter?" ] }, { "cell_type": "markdown", "metadata": { "id": "CcYeiwMYyhYQ" }, "source": [ "- A model trained on this data **may be biased towards predicting \"No\" (non-churn)** because it is the majority class. \n", "- Standard classification models might achieve **high accuracy but low recall for churners**, meaning they fail to detect many actual churn cases." ] }, { "cell_type": "markdown", "metadata": { "id": "azzNAL-HyjFp" }, "source": [ "### What can be done?" ] }, { "cell_type": "markdown", "metadata": { "id": "bIx_6kh4ykvA" }, "source": [ "1. **Resampling techniques**: \n", " - **Oversampling (SMOTE)**: Generate synthetic churn cases to balance the dataset. \n", " - **Undersampling**: Remove some \"No\" cases to balance the dataset. \n", "\n", "2. **Adjust `scale_pos_weight` in XGBoost**: \n", " - Set `scale_pos_weight = 5174 / 1869 ≈ 2.77` to give more importance to the minority class (churn). \n", "\n", "3. **Optimize for recall instead of accuracy**: \n", " - Since churn prediction is a critical business problem, a high recall ensures fewer actual churners are missed." ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "550dfffd-2a1f-44c5-8d31-23a452896bef", "showTitle": false, "title": "" }, "id": "Cl7T7_jFHWHE" }, "source": [ "# Model Training with Hyperparameter Tuning" ] }, { "cell_type": "markdown", "metadata": { "id": "4pbczYKIyQke" }, "source": [ "## Data Preprocessing" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "0c99d9b0-4771-487b-86fd-9d452118b7ab", "showTitle": false, "title": "" }, "id": "YKuRNqVuHWHE" }, "outputs": [], "source": [ "# Define predictor matrix (X) using selected numeric and categorical features\n", "X = dataset[numeric_features + categorical_features]\n", "\n", "# Define target variable\n", "y = dataset[target]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "1e90b308-b374-4f65-81b8-598a0bab1d1d", "showTitle": false, "title": "" }, "id": "tbgt_5ZCHWHF" }, "outputs": [], "source": [ "# Split the dataset into training and test sets\n", "Xtrain, Xtest, ytrain, ytest = train_test_split(\n", " X, y, # Predictors (X) and target variable (y)\n", " test_size=0.2, # 20% of the data is reserved for testing\n", " random_state=42 # Ensures reproducibility by setting a fixed random seed\n", ")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "bffcbcac-0047-4332-b206-9fadb5c78956", "showTitle": false, "title": "" }, "id": "GYnMo95hHWHF" }, "outputs": [], "source": [ "# Create a preprocessing pipeline for numerical and categorical features\n", "preprocessor = make_column_transformer(\n", " (StandardScaler(), numeric_features), # Scale numeric features to have mean equal to 0 and standard deviation equal to 1 (x - mue/sigma)\n", " (OneHotEncoder(handle_unknown='ignore'), categorical_features) # Encode categorical features as one-hot vectors\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "5mttQVhtfgnl" }, "source": [ "- The **`make_column_transformer`** step creates a preprocessing pipeline for your dataset by applying different transformations to numeric and categorical features." ] }, { "cell_type": "markdown", "metadata": { "id": "iUKvlYZBcToC" }, "source": [ "### **Breakdown:**" ] }, { "cell_type": "markdown", "metadata": { "id": "83LsA7cEcVb3" }, "source": [ "1. **`StandardScaler()` for numeric features** \n", " - Scales numerical columns (`tenure`, `MonthlyCharges`, `TotalCharges`, `SeniorCitizen`) to have **zero mean and unit variance**. \n", " - This helps ML models handle features with different scales effectively.\n", "\n", "2. **`OneHotEncoder()` for categorical features** \n", " - Converts categorical columns (`Contract`, `PaperlessBilling`, `PaymentMethod`, etc.) into **one-hot encoded vectors**. " ] }, { "cell_type": "markdown", "metadata": { "id": "gxYjsmiQcXFz" }, "source": [ "### **Why is this important?**" ] }, { "cell_type": "markdown", "metadata": { "id": "D2lsN3ZmcYJJ" }, "source": [ "- **Ensures proper feature scaling** (for better model performance). \n", "- **Encodes categorical variables** into a machine-readable format. \n", "- **Prepares data for ML models** that require numerical inputs." ] }, { "cell_type": "markdown", "metadata": { "id": "FB4FWU8ScbTG" }, "source": [ "## Creating Model Pipeline" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "76a64681-e124-4eef-ada8-75a8511dc783", "showTitle": false, "title": "" }, "id": "D83BGGp3HWHF" }, "outputs": [], "source": [ "# Initialize an XGBoost classifier\n", "model_xgb = xgb.XGBClassifier(random_state=42)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "7e566699-0812-46e2-a48b-cf8a4f83c7b6", "showTitle": false, "title": "" }, "id": "3HkzYXsPHWHF" }, "outputs": [], "source": [ "# Create a machine learning pipeline with preprocessing and model training steps\n", "model_pipeline = make_pipeline(\n", " preprocessor, # Preprocesses numerical and categorical features\n", " model_xgb # XGBoost classifier for model training\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "9ooL7JNHf8w4" }, "source": [ "This creates a pipeline that\n", "\n", "- **first preprocesses the data** (scaling numeric features and encoding categorical ones), and\n", "- **then trains** the XGBoost **model**.\n", "\n", "It ensures that **all transformations are applied automatically before feeding the data into the model**." ] }, { "cell_type": "markdown", "metadata": { "id": "XkZ442UA1M8I" }, "source": [ "## Model Training" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "245034db-84c4-406f-b808-dde8860b9084", "showTitle": false, "title": "" }, "colab": { "base_uri": "https://localhost:8080/", "height": 249 }, "id": "E_ZmCrDnHWHG", "outputId": "ddd96ada-94f2-4eb4-c51c-1dd2d7a31fac" }, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('columntransformer',\n",
       "                 ColumnTransformer(transformers=[('standardscaler',\n",
       "                                                  StandardScaler(),\n",
       "                                                  ['CreditScore', 'Age',\n",
       "                                                   'Tenure', 'Balance',\n",
       "                                                   'NumOfProducts', 'HasCrCard',\n",
       "                                                   'IsActiveMember',\n",
       "                                                   'EstimatedSalary']),\n",
       "                                                 ('onehotencoder',\n",
       "                                                  OneHotEncoder(handle_unknown='ignore'),\n",
       "                                                  ['Geography'])])),\n",
       "                ('xgbclassifier',\n",
       "                 XGBClassifier(base_score=None, booster=None, callbac...\n",
       "                               feature_types=None, gamma=None, grow_policy=None,\n",
       "                               importance_type=None,\n",
       "                               interaction_constraints=None, learning_rate=None,\n",
       "                               max_bin=None, max_cat_threshold=None,\n",
       "                               max_cat_to_onehot=None, max_delta_step=None,\n",
       "                               max_depth=None, max_leaves=None,\n",
       "                               min_child_weight=None, missing=nan,\n",
       "                               monotone_constraints=None, multi_strategy=None,\n",
       "                               n_estimators=None, n_jobs=None,\n",
       "                               num_parallel_tree=None, random_state=42, ...))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('columntransformer',\n", " ColumnTransformer(transformers=[('standardscaler',\n", " StandardScaler(),\n", " ['CreditScore', 'Age',\n", " 'Tenure', 'Balance',\n", " 'NumOfProducts', 'HasCrCard',\n", " 'IsActiveMember',\n", " 'EstimatedSalary']),\n", " ('onehotencoder',\n", " OneHotEncoder(handle_unknown='ignore'),\n", " ['Geography'])])),\n", " ('xgbclassifier',\n", " XGBClassifier(base_score=None, booster=None, callbac...\n", " feature_types=None, gamma=None, grow_policy=None,\n", " importance_type=None,\n", " interaction_constraints=None, learning_rate=None,\n", " max_bin=None, max_cat_threshold=None,\n", " max_cat_to_onehot=None, max_delta_step=None,\n", " max_depth=None, max_leaves=None,\n", " min_child_weight=None, missing=nan,\n", " monotone_constraints=None, multi_strategy=None,\n", " n_estimators=None, n_jobs=None,\n", " num_parallel_tree=None, random_state=42, ...))])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Train the model pipeline on the training data\n", "model_pipeline.fit(Xtrain, ytrain)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "brtV1ErPcxCy" }, "outputs": [], "source": [ "# Make predictions on the training data\n", "y_pred_train = model_pipeline.predict(Xtrain)\n", "\n", "# Make predictions on the test data\n", "y_pred_test = model_pipeline.predict(Xtest)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "087eeb21-d77e-4ca1-9f75-414586c18b7d", "showTitle": false, "title": "" }, "colab": { "base_uri": "https://localhost:8080/" }, "id": "yquQSa6KHWHH", "outputId": "cd32292c-7d4c-4ed3-cb0a-27b8132b41fb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.95 0.99 0.97 6365\n", " 1 0.97 0.81 0.88 1636\n", "\n", " accuracy 0.96 8001\n", " macro avg 0.96 0.90 0.93 8001\n", "weighted avg 0.96 0.96 0.95 8001\n", "\n" ] } ], "source": [ "# Generate a classification report to evaluate model performance on training set\n", "print(classification_report(ytrain, y_pred_train))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZTwKVOtmHr-M", "outputId": "29a15a9b-3697-4d2f-8df8-c771b9ecdf8a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.89 0.95 0.91 1599\n", " 1 0.71 0.52 0.60 402\n", "\n", " accuracy 0.86 2001\n", " macro avg 0.80 0.73 0.76 2001\n", "weighted avg 0.85 0.86 0.85 2001\n", "\n" ] } ], "source": [ "# Generate a classification report to evaluate model performance on test set\n", "print(classification_report(ytest, y_pred_test))" ] }, { "cell_type": "markdown", "metadata": { "id": "lSAMZzmaduO4" }, "source": [ "- While the model can identify non-churners well, it fails to do the same for customers who will churn.\n", " - Since the number of customers who churn is much lower, the model fails to identify the associated patterns.\n", " - We'll handle this by assigning weights to the different classes using the `scale_pos_weight` parameter of the XGBoost model.\n", "- The model overfits the training data, as the training and test recall for customers who churn differ by 30%.\n", " - We'll handle this via hyperparameter tuning." ] }, { "cell_type": "markdown", "metadata": { "id": "UGqiCLRnRx8i" }, "source": [ "## Hyperparameter Tuning" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YunmTqQgiwCd", "outputId": "f56e076a-9c14-45fc-ced9-675adaba928f" }, "outputs": [ { "data": { "text/plain": [ "np.float64(3.890586797066015)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set the clas weight to handle class imbalance\n", "class_weight = ytrain.value_counts()[0] / ytrain.value_counts()[1]\n", "class_weight" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "bffcbcac-0047-4332-b206-9fadb5c78956", "showTitle": false, "title": "" }, "id": "D8BdWyOUR46x" }, "outputs": [], "source": [ "# Define the preprocessing steps\n", "preprocessor = make_column_transformer(\n", " (StandardScaler(), numeric_features),\n", " (OneHotEncoder(handle_unknown='ignore'), categorical_features)\n", ")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "76a64681-e124-4eef-ada8-75a8511dc783", "showTitle": false, "title": "" }, "id": "creqMx9fR46y" }, "outputs": [], "source": [ "# Define base XGBoost model\n", "xgb_model = xgb.XGBClassifier(scale_pos_weight=class_weight, random_state=42)\n", "\n", "# Define hyperparameter grid\n", "param_grid = {\n", " 'xgbclassifier__n_estimators': [50, 100, 150, 200], # number of tree to build\n", " 'xgbclassifier__max_depth': [2, 3, 4], # maximum depth of each tree\n", " 'xgbclassifier__colsample_bytree': [0.4, 0.5, 0.6], # percentage of attributes to be considered (randomly) for each tree\n", " 'xgbclassifier__colsample_bylevel': [0.4, 0.5, 0.6], # percentage of attributes to be considered (randomly) for each level of a tree\n", " 'xgbclassifier__learning_rate': [0.01, 0.05, 0.1], # learning rate\n", " 'xgbclassifier__reg_lambda': [0.4, 0.5, 0.6], # L2 regularization factor\n", "}" ] }, { "cell_type": "markdown", "metadata": { "id": "WGEoLzkZK2eO" }, "source": [ "The following code, which includes model pipeline creation and hyperparameter tuning using GridSearchCV, will take approximately 10-15 minutes to complete. Please allow sufficient time for execution." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "7e566699-0812-46e2-a48b-cf8a4f83c7b6", "showTitle": false, "title": "" }, "colab": { "base_uri": "https://localhost:8080/", "height": 281 }, "id": "bwV3VmWrR46y", "outputId": "ad5701c3-f1b2-45b1-8864-5b5aa619e55d" }, "outputs": [ { "data": { "text/html": [ "
GridSearchCV(cv=5,\n",
       "             estimator=Pipeline(steps=[('columntransformer',\n",
       "                                        ColumnTransformer(transformers=[('standardscaler',\n",
       "                                                                         StandardScaler(),\n",
       "                                                                         ['CreditScore',\n",
       "                                                                          'Age',\n",
       "                                                                          'Tenure',\n",
       "                                                                          'Balance',\n",
       "                                                                          'NumOfProducts',\n",
       "                                                                          'HasCrCard',\n",
       "                                                                          'IsActiveMember',\n",
       "                                                                          'EstimatedSalary']),\n",
       "                                                                        ('onehotencoder',\n",
       "                                                                         OneHotEncoder(handle_unknown='ignore'),\n",
       "                                                                         ['Geography'])])),\n",
       "                                       ('xgbclassifier',\n",
       "                                        XGBClassifier(base_sco...\n",
       "                                                      n_jobs=None,\n",
       "                                                      num_parallel_tree=None,\n",
       "                                                      random_state=42, ...))]),\n",
       "             n_jobs=-1,\n",
       "             param_grid={'xgbclassifier__colsample_bylevel': [0.4, 0.5, 0.6],\n",
       "                         'xgbclassifier__colsample_bytree': [0.4, 0.5, 0.6],\n",
       "                         'xgbclassifier__learning_rate': [0.01, 0.05, 0.1],\n",
       "                         'xgbclassifier__max_depth': [2, 3, 4],\n",
       "                         'xgbclassifier__n_estimators': [50, 100, 150, 200],\n",
       "                         'xgbclassifier__reg_lambda': [0.4, 0.5, 0.6]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "GridSearchCV(cv=5,\n", " estimator=Pipeline(steps=[('columntransformer',\n", " ColumnTransformer(transformers=[('standardscaler',\n", " StandardScaler(),\n", " ['CreditScore',\n", " 'Age',\n", " 'Tenure',\n", " 'Balance',\n", " 'NumOfProducts',\n", " 'HasCrCard',\n", " 'IsActiveMember',\n", " 'EstimatedSalary']),\n", " ('onehotencoder',\n", " OneHotEncoder(handle_unknown='ignore'),\n", " ['Geography'])])),\n", " ('xgbclassifier',\n", " XGBClassifier(base_sco...\n", " n_jobs=None,\n", " num_parallel_tree=None,\n", " random_state=42, ...))]),\n", " n_jobs=-1,\n", " param_grid={'xgbclassifier__colsample_bylevel': [0.4, 0.5, 0.6],\n", " 'xgbclassifier__colsample_bytree': [0.4, 0.5, 0.6],\n", " 'xgbclassifier__learning_rate': [0.01, 0.05, 0.1],\n", " 'xgbclassifier__max_depth': [2, 3, 4],\n", " 'xgbclassifier__n_estimators': [50, 100, 150, 200],\n", " 'xgbclassifier__reg_lambda': [0.4, 0.5, 0.6]})" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Model pipeline\n", "model_pipeline = make_pipeline(preprocessor, xgb_model)\n", "\n", "# Hyperparameter tuning with GridSearchCV\n", "grid_search = GridSearchCV(model_pipeline, param_grid, cv=5, n_jobs=-1)\n", "grid_search.fit(Xtrain, ytrain)" ] }, { "cell_type": "markdown", "metadata": { "id": "wxu-KzO21Dj7" }, "source": [ "## Selecting the Best Model" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bezveeT9h_Rj", "outputId": "48e28e8e-e937-463d-ce59-dcf3d2c5a1d4" }, "outputs": [ { "data": { "text/plain": [ "{'xgbclassifier__colsample_bylevel': 0.5,\n", " 'xgbclassifier__colsample_bytree': 0.6,\n", " 'xgbclassifier__learning_rate': 0.1,\n", " 'xgbclassifier__max_depth': 4,\n", " 'xgbclassifier__n_estimators': 200,\n", " 'xgbclassifier__reg_lambda': 0.6}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the parameters of the best model\n", "grid_search.best_params_" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 249 }, "id": "fSNxALuGR6iT", "outputId": "fdfb9cc3-4a1d-4ee8-e298-054368bf6962" }, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('columntransformer',\n",
       "                 ColumnTransformer(transformers=[('standardscaler',\n",
       "                                                  StandardScaler(),\n",
       "                                                  ['CreditScore', 'Age',\n",
       "                                                   'Tenure', 'Balance',\n",
       "                                                   'NumOfProducts', 'HasCrCard',\n",
       "                                                   'IsActiveMember',\n",
       "                                                   'EstimatedSalary']),\n",
       "                                                 ('onehotencoder',\n",
       "                                                  OneHotEncoder(handle_unknown='ignore'),\n",
       "                                                  ['Geography'])])),\n",
       "                ('xgbclassifier',\n",
       "                 XGBClassifier(base_score=None, booster=None, callbac...\n",
       "                               feature_types=None, gamma=None, grow_policy=None,\n",
       "                               importance_type=None,\n",
       "                               interaction_constraints=None, learning_rate=0.1,\n",
       "                               max_bin=None, max_cat_threshold=None,\n",
       "                               max_cat_to_onehot=None, max_delta_step=None,\n",
       "                               max_depth=4, max_leaves=None,\n",
       "                               min_child_weight=None, missing=nan,\n",
       "                               monotone_constraints=None, multi_strategy=None,\n",
       "                               n_estimators=200, n_jobs=None,\n",
       "                               num_parallel_tree=None, random_state=42, ...))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('columntransformer',\n", " ColumnTransformer(transformers=[('standardscaler',\n", " StandardScaler(),\n", " ['CreditScore', 'Age',\n", " 'Tenure', 'Balance',\n", " 'NumOfProducts', 'HasCrCard',\n", " 'IsActiveMember',\n", " 'EstimatedSalary']),\n", " ('onehotencoder',\n", " OneHotEncoder(handle_unknown='ignore'),\n", " ['Geography'])])),\n", " ('xgbclassifier',\n", " XGBClassifier(base_score=None, booster=None, callbac...\n", " feature_types=None, gamma=None, grow_policy=None,\n", " importance_type=None,\n", " interaction_constraints=None, learning_rate=0.1,\n", " max_bin=None, max_cat_threshold=None,\n", " max_cat_to_onehot=None, max_delta_step=None,\n", " max_depth=4, max_leaves=None,\n", " min_child_weight=None, missing=nan,\n", " monotone_constraints=None, multi_strategy=None,\n", " n_estimators=200, n_jobs=None,\n", " num_parallel_tree=None, random_state=42, ...))])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Store the best model\n", "best_model = grid_search.best_estimator_\n", "best_model" ] }, { "cell_type": "markdown", "metadata": { "id": "7K5BB3_M2Bk5" }, "source": [ "The classification threshold is important because it controls precision and recall trade-offs.\n", "- Since customer churn prediction is a retention problem, a higher recall is preferred. We want to correctly identify as many churners as possible, even if it means getting some false positives.\n", "- We'll lower the classification threshold to 0.45 from 0.5 to increase the recall." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "rvYz9c-QhWDx" }, "outputs": [], "source": [ "# Set the classification threshold\n", "classification_threshold = 0.45" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "Fi-WqyVOhecQ" }, "outputs": [], "source": [ "# Make predictions on the training data\n", "y_pred_train_proba = best_model.predict_proba(Xtrain)[:, 1]\n", "y_pred_train = (y_pred_train_proba >= classification_threshold).astype(int)\n", "\n", "# Make predictions on the test data\n", "y_pred_test_proba = best_model.predict_proba(Xtest)[:, 1]\n", "y_pred_test = (y_pred_test_proba >= classification_threshold).astype(int)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "DW9AG_ONc7PD", "outputId": "3e5066ae-83d9-48f4-ec0c-91766625ee6c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.96 0.81 0.88 6365\n", " 1 0.54 0.85 0.66 1636\n", "\n", " accuracy 0.82 8001\n", " macro avg 0.75 0.83 0.77 8001\n", "weighted avg 0.87 0.82 0.83 8001\n", "\n" ] } ], "source": [ "# Generate a classification report to evaluate model performance on training set\n", "print(classification_report(ytrain, y_pred_train))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GlOJwOE2SA4m", "outputId": "e587fba0-2f43-49d8-8acd-eab75c4fecd6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.93 0.79 0.86 1599\n", " 1 0.48 0.78 0.60 402\n", "\n", " accuracy 0.79 2001\n", " macro avg 0.71 0.78 0.73 2001\n", "weighted avg 0.84 0.79 0.80 2001\n", "\n" ] } ], "source": [ "# Generate a classification report to evaluate model performance on test set\n", "print(classification_report(ytest, y_pred_test))" ] }, { "cell_type": "markdown", "metadata": { "id": "ptaeTmSxqEe5" }, "source": [ "- We can see that the **overfitting has significantly reduced**.\n", "- The **test set recall for the class corresponding to churn** has also **significantly improved** (by ~25%) to 78%.\n", "- As expected, while recall has improved, precision has dropped.\n", "\n", "We'll go ahead with this model as our final model." ] }, { "cell_type": "markdown", "metadata": { "id": "lqkUvTHzO3UH" }, "source": [ "# Model Serialization" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "id": "SwIewqGpiRtW" }, "outputs": [], "source": [ "# Create a folder for storing the files needed for web app deployment\n", "os.makedirs(\"deployment_files\", exist_ok=True)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E5-0ybugEl_5", "outputId": "9815d03e-3894-4015-ab56-eec141e6c468" }, "outputs": [ { "data": { "text/plain": [ "['deployment_files/churn_prediction_model_v1_0.joblib']" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define the file path to save (serialize) the trained model along with the data preprocessing steps\n", "saved_model_path = \"deployment_files/churn_prediction_model_v1_0.joblib\"\n", "\n", "# Save the trained model pipeline using joblib\n", "joblib.dump(best_model, saved_model_path)" ] }, { "cell_type": "markdown", "metadata": { "id": "qATgOdleEwCg" }, "source": [ "This code is used to save a trained machine learning model pipeline using `joblib`, which is a library for efficient object serialization in Python.\n", "\n", "**Breakdown:** \n", "1. **`saved_model_path = \"churn_prediction_model_v1_0.joblib\"`** \n", " - Defines the file path where the model will be saved. \n", " - The model will be stored as a `.joblib` file, a format optimized for large NumPy arrays and machine learning models. \n", " - The last part of the filename (`v1_0`) specifies a version number, which is a good practice to track changes and maintain multiple model iterations\n", "\n", "2. **`joblib.dump(model_pipeline, saved_model_path)`** \n", " - Saves the trained `model_pipeline` object to the specified path (`model.joblib`). \n", " - `joblib.dump()` is preferred over `pickle.dump()` for saving large models because it is faster and more efficient with numerical data. \n", " - The saved file can be loaded later using `joblib.load(\"model.joblib\")` for inference or further training. \n", "\n", "This approach ensures that the model pipeline, including preprocessing steps and the trained model, is preserved for later use." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "id": "uNR7Dxj2MTM9" }, "outputs": [], "source": [ "# Load the saved model pipeline from the file\n", "saved_model = joblib.load(\"deployment_files/churn_prediction_model_v1_0.joblib\")" ] }, { "cell_type": "markdown", "metadata": { "id": "FwEnJLtfFEw4" }, "source": [ "1. **`joblib.load(\"churn_prediction_model_v1_0.joblib\")`** \n", " - Loads the previously saved machine learning model (or pipeline) from the `model.joblib` file. \n", " - The model retains all trained parameters, including preprocessing steps and learned patterns.\n", "\n", "2. **`saved_model`** \n", " - This variable stores the deserialized model, allowing it to be used for inference, further training, or evaluation.\n", "\n", "This allows you to reuse the trained model **without retraining it.**" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 249 }, "id": "LoPE6PZUM7Uh", "outputId": "cdfe55e4-520a-428a-ab0e-582e4e16b4e7" }, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('columntransformer',\n",
       "                 ColumnTransformer(transformers=[('standardscaler',\n",
       "                                                  StandardScaler(),\n",
       "                                                  ['CreditScore', 'Age',\n",
       "                                                   'Tenure', 'Balance',\n",
       "                                                   'NumOfProducts', 'HasCrCard',\n",
       "                                                   'IsActiveMember',\n",
       "                                                   'EstimatedSalary']),\n",
       "                                                 ('onehotencoder',\n",
       "                                                  OneHotEncoder(handle_unknown='ignore'),\n",
       "                                                  ['Geography'])])),\n",
       "                ('xgbclassifier',\n",
       "                 XGBClassifier(base_score=None, booster=None, callbac...\n",
       "                               feature_types=None, gamma=None, grow_policy=None,\n",
       "                               importance_type=None,\n",
       "                               interaction_constraints=None, learning_rate=0.1,\n",
       "                               max_bin=None, max_cat_threshold=None,\n",
       "                               max_cat_to_onehot=None, max_delta_step=None,\n",
       "                               max_depth=4, max_leaves=None,\n",
       "                               min_child_weight=None, missing=nan,\n",
       "                               monotone_constraints=None, multi_strategy=None,\n",
       "                               n_estimators=200, n_jobs=None,\n",
       "                               num_parallel_tree=None, random_state=42, ...))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('columntransformer',\n", " ColumnTransformer(transformers=[('standardscaler',\n", " StandardScaler(),\n", " ['CreditScore', 'Age',\n", " 'Tenure', 'Balance',\n", " 'NumOfProducts', 'HasCrCard',\n", " 'IsActiveMember',\n", " 'EstimatedSalary']),\n", " ('onehotencoder',\n", " OneHotEncoder(handle_unknown='ignore'),\n", " ['Geography'])])),\n", " ('xgbclassifier',\n", " XGBClassifier(base_score=None, booster=None, callbac...\n", " feature_types=None, gamma=None, grow_policy=None,\n", " importance_type=None,\n", " interaction_constraints=None, learning_rate=0.1,\n", " max_bin=None, max_cat_threshold=None,\n", " max_cat_to_onehot=None, max_delta_step=None,\n", " max_depth=4, max_leaves=None,\n", " min_child_weight=None, missing=nan,\n", " monotone_constraints=None, multi_strategy=None,\n", " n_estimators=200, n_jobs=None,\n", " num_parallel_tree=None, random_state=42, ...))])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "saved_model" ] }, { "cell_type": "markdown", "metadata": { "id": "UJMu1yQztach" }, "source": [ "Let's try making predictions on the test set using the deserialized model.\n", "\n", "- Please ensure that the saved model is loaded before making predictions." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "MzpxBjtSM9f-", "outputId": "90db0d65-d3ba-4254-a8dc-af87c90c5484" }, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, ..., 0, 0, 1], shape=(2001,))" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "saved_model.predict(Xtest)" ] }, { "cell_type": "markdown", "metadata": { "id": "oxj1WOs1uxt_" }, "source": [ "- As we can see, the model can be directly used for making predictions without any retraining." ] }, { "cell_type": "markdown", "metadata": { "id": "o81KHMBZTwXR" }, "source": [ "# Creating a Web App using Streamlit" ] }, { "cell_type": "markdown", "metadata": { "id": "U3x2eyoA49uL" }, "source": [ "We want to create a web app using Streamlit that can do the following:\n", "1. Create a UI for users to provide their input\n", "2. Load a serialized ML model\n", "3. Take the user input and loaded model to make a prediction\n", "4. Display the prediction from the model to the user" ] }, { "cell_type": "markdown", "metadata": { "id": "Jdck5Mso5R_1" }, "source": [ "For this, we write an **`app.py`** script that'll do all the above steps in one shot." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3oT-EsXmTvbu", "outputId": "197a323b-0a8a-48fa-8188-9545a6512250" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing deployment_files/app.py\n" ] } ], "source": [ "%%writefile deployment_files/app.py\n", "\n", "import streamlit as st\n", "import pandas as pd\n", "import joblib\n", "\n", "# Load the trained model\n", "def load_model():\n", " return joblib.load(\"churn_prediction_model_v1_0.joblib\")\n", "\n", "model = load_model()\n", "\n", "# Streamlit UI for Customer Churn Prediction\n", "st.title(\"Customer Churn Prediction App\")\n", "st.write(\"The Customer Churn Prediction App is an internal tool for bank staff that predicts whether customers are at risk of churning based on their details.\")\n", "st.write(\"Kindly enter the customer details to check whether they are likely to churn.\")\n", "\n", "# Collect user input\n", "CreditScore = st.number_input(\"Credit Score (customer's credit score)\", min_value=300, max_value=900, value=650)\n", "Geography = st.selectbox(\"Geography (country where the customer resides)\", [\"France\", \"Germany\", \"Spain\"])\n", "Age = st.number_input(\"Age (customer's age in years)\", min_value=18, max_value=100, value=30)\n", "Tenure = st.number_input(\"Tenure (number of years the customer has been with the bank)\", value=12)\n", "Balance = st.number_input(\"Account Balance (customer’s account balance)\", min_value=0.0, value=10000.0)\n", "NumOfProducts = st.number_input(\"Number of Products (number of products the customer has with the bank)\", min_value=1, value=1)\n", "HasCrCard = st.selectbox(\"Has Credit Card?\", [\"Yes\", \"No\"])\n", "IsActiveMember = st.selectbox(\"Is Active Member?\", [\"Yes\", \"No\"])\n", "EstimatedSalary = st.number_input(\"Estimated Salary (customer’s estimated salary)\", min_value=0.0, value=50000.0)\n", "\n", "# Convert categorical inputs to match model training\n", "input_data = pd.DataFrame([{\n", " 'CreditScore': CreditScore,\n", " 'Geography': Geography,\n", " 'Age': Age,\n", " 'Tenure': Tenure,\n", " 'Balance': Balance,\n", " 'NumOfProducts': NumOfProducts,\n", " 'HasCrCard': 1 if HasCrCard == \"Yes\" else 0,\n", " 'IsActiveMember': 1 if IsActiveMember == \"Yes\" else 0,\n", " 'EstimatedSalary': EstimatedSalary\n", "}])\n", "\n", "# Set the classification threshold\n", "classification_threshold = 0.45\n", "\n", "# Predict button\n", "if st.button(\"Predict\"):\n", " prediction_proba = model.predict_proba(input_data)[0, 1]\n", " prediction = (prediction_proba >= classification_threshold).astype(int)\n", " result = \"churn\" if prediction == 1 else \"not churn\"\n", " st.write(f\"Based on the information provided, the customer is likely to {result}.\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Dh7JAZWJu9UA" }, "source": [ "- It's important to note that the library import calls have to be mentioned in the script, as it won't automatically happen in the hosting platform." ] }, { "cell_type": "markdown", "metadata": { "id": "luytCgCGXYdb" }, "source": [ "# Creating a Dependencies File" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VllfgdDo4kAA", "outputId": "2a3c247a-3aa4-4624-fa42-68c3d8e0efe4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing deployment_files/requirements.txt\n" ] } ], "source": [ "%%writefile deployment_files/requirements.txt\n", "pandas==2.2.2\n", "numpy==2.0.2\n", "scikit-learn==1.6.1\n", "xgboost==2.1.4\n", "joblib==1.4.2\n", "streamlit==1.43.2" ] }, { "cell_type": "markdown", "metadata": { "id": "Ou9pmcKNRgte" }, "source": [ "A **`requirements.txt`** file is essential for ensuring that your project runs smoothly across different environments. It's like a **blueprint** for setting up your ML project!" ] }, { "cell_type": "markdown", "metadata": { "id": "Vh6L3EqqiDbw" }, "source": [ "# Dockerfile" ] }, { "cell_type": "markdown", "metadata": { "id": "af2KN5Zxibak" }, "source": [ "**Note for Learners**\n", "\n", "In the case study recording on **Introduction to Model Deployment**, we deployed a Streamlit app using the **Hugging Face Spaces template**. At that time, Hugging Face allowed direct deployment using the **Streamlit SDK template**, and a `Dockerfile` was *not required*.\n", "\n", "However, Hugging Face has since updated their platform, and now **Streamlit apps must be deployed using the Docker template**, which requires a valid `Dockerfile`. While the recording does not show the `Dockerfile` creation, we have included the necessary `Dockerfile` code in this notebook for your reference.\n", "\n", "Don't worry — we'll cover the `Dockerfile` structure and containerization **in depth** in the upcoming week on **Containerization**.\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "id": "5xgEHj3fiGHq" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing deployment_files/Dockerfile\n" ] } ], "source": [ "%%writefile deployment_files/Dockerfile\n", "# Use a minimal base image with Python 3.9 installed\n", "FROM python:3.9-slim\n", "\n", "# Set the working directory inside the container to /app\n", "WORKDIR /app\n", "\n", "# Copy all files from the current directory on the host to the container's /app directory\n", "COPY . .\n", "\n", "# Install Python dependencies listed in requirements.txt\n", "RUN pip3 install -r requirements.txt\n", "\n", "# Define the command to run the Streamlit app on port 8501 and make it accessible externally\n", "CMD [\"streamlit\", \"run\", \"app.py\", \"--server.port=8501\", \"--server.address=0.0.0.0\", \"--server.enableXsrfProtection=false\"]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'C:\\\\Users\\\\adity\\\\Model_Deployment'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "markdown", "metadata": { "id": "GChQ7KMTXmfa" }, "source": [ "# Uploading Files to Hugging Face Repository" ] }, { "cell_type": "markdown", "metadata": { "id": "FPsXhZQ7YIyc" }, "source": [ "Once create the following files in the notebook, lets upload it in to the hugging face space\n", "- **`churn_prediction_model_v1_0.joblib`**\n", "- **`requirements.txt`**\n", "- **`Dockerfile`**\n", "- **`app.py`**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "TsW6UmL9GIFL" }, "outputs": [ { "ename": "HTTPError", "evalue": "Invalid user token.", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mHTTPError\u001b[39m Traceback (most recent call last)", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\utils\\_http.py:409\u001b[39m, in \u001b[36mhf_raise_for_status\u001b[39m\u001b[34m(response, endpoint_name)\u001b[39m\n\u001b[32m 408\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m409\u001b[39m response.raise_for_status()\n\u001b[32m 410\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m HTTPError \u001b[38;5;28;01mas\u001b[39;00m e:\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\requests\\models.py:1024\u001b[39m, in \u001b[36mResponse.raise_for_status\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m 1023\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m http_error_msg:\n\u001b[32m-> \u001b[39m\u001b[32m1024\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m HTTPError(http_error_msg, response=\u001b[38;5;28mself\u001b[39m)\n", "\u001b[31mHTTPError\u001b[39m: 401 Client Error: Unauthorized for url: https://huggingface.co/api/whoami-v2", "\nThe above exception was the direct cause of the following exception:\n", "\u001b[31mHfHubHTTPError\u001b[39m Traceback (most recent call last)", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\hf_api.py:1664\u001b[39m, in \u001b[36mHfApi.whoami\u001b[39m\u001b[34m(self, token)\u001b[39m\n\u001b[32m 1663\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1664\u001b[39m hf_raise_for_status(r)\n\u001b[32m 1665\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m HTTPError \u001b[38;5;28;01mas\u001b[39;00m e:\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\utils\\_http.py:481\u001b[39m, in \u001b[36mhf_raise_for_status\u001b[39m\u001b[34m(response, endpoint_name)\u001b[39m\n\u001b[32m 479\u001b[39m \u001b[38;5;66;03m# Convert `HTTPError` into a `HfHubHTTPError` to display request information\u001b[39;00m\n\u001b[32m 480\u001b[39m \u001b[38;5;66;03m# as well (request id and/or server error message)\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m481\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m _format(HfHubHTTPError, \u001b[38;5;28mstr\u001b[39m(e), response) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01me\u001b[39;00m\n", "\u001b[31mHfHubHTTPError\u001b[39m: 401 Client Error: Unauthorized for url: https://huggingface.co/api/whoami-v2 (Request ID: Root=1-6847119b-459f1d1f4a27a0e8241704c7;3e4f0b8f-9f77-41a9-8ffe-df042ec507bd)\n\nInvalid credentials in Authorization header", "\nThe above exception was the direct cause of the following exception:\n", "\u001b[31mHTTPError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[14]\u001b[39m\u001b[32m, line 5\u001b[39m\n\u001b[32m 2\u001b[39m repo_id = \u001b[33m\"\u001b[39m\u001b[33madityasharma0511/gldeploy\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;66;03m# Your Hugging Face space id\u001b[39;00m\n\u001b[32m 4\u001b[39m \u001b[38;5;66;03m# Login to Hugging Face platform with the access token\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m5\u001b[39m login(token=access_key)\n\u001b[32m 7\u001b[39m \u001b[38;5;66;03m# Initialize the API\u001b[39;00m\n\u001b[32m 8\u001b[39m api = HfApi()\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\utils\\_deprecation.py:101\u001b[39m, in \u001b[36m_deprecate_arguments.._inner_deprecate_positional_args..inner_f\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m 99\u001b[39m message += \u001b[33m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[33m\"\u001b[39m + custom_message\n\u001b[32m 100\u001b[39m warnings.warn(message, \u001b[38;5;167;01mFutureWarning\u001b[39;00m)\n\u001b[32m--> \u001b[39m\u001b[32m101\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m f(*args, **kwargs)\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\utils\\_deprecation.py:31\u001b[39m, in \u001b[36m_deprecate_positional_args.._inner_deprecate_positional_args..inner_f\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m 29\u001b[39m extra_args = \u001b[38;5;28mlen\u001b[39m(args) - \u001b[38;5;28mlen\u001b[39m(all_args)\n\u001b[32m 30\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m extra_args <= \u001b[32m0\u001b[39m:\n\u001b[32m---> \u001b[39m\u001b[32m31\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m f(*args, **kwargs)\n\u001b[32m 32\u001b[39m \u001b[38;5;66;03m# extra_args > 0\u001b[39;00m\n\u001b[32m 33\u001b[39m args_msg = [\n\u001b[32m 34\u001b[39m \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m=\u001b[39m\u001b[33m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00marg\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m'\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(arg, \u001b[38;5;28mstr\u001b[39m) \u001b[38;5;28;01melse\u001b[39;00m \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00marg\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m\n\u001b[32m 35\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m name, arg \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mzip\u001b[39m(kwonly_args[:extra_args], args[-extra_args:])\n\u001b[32m 36\u001b[39m ]\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\_login.py:126\u001b[39m, in \u001b[36mlogin\u001b[39m\u001b[34m(token, add_to_git_credential, new_session, write_permission)\u001b[39m\n\u001b[32m 119\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m add_to_git_credential:\n\u001b[32m 120\u001b[39m logger.info(\n\u001b[32m 121\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mThe token has not been saved to the git credentials helper. Pass \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 122\u001b[39m \u001b[33m\"\u001b[39m\u001b[33m`add_to_git_credential=True` in this function directly or \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 123\u001b[39m \u001b[33m\"\u001b[39m\u001b[33m`--add-to-git-credential` if using via `huggingface-cli` if \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 124\u001b[39m \u001b[33m\"\u001b[39m\u001b[33myou want to set the git credential as well.\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 125\u001b[39m )\n\u001b[32m--> \u001b[39m\u001b[32m126\u001b[39m _login(token, add_to_git_credential=add_to_git_credential)\n\u001b[32m 127\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m is_notebook():\n\u001b[32m 128\u001b[39m notebook_login(new_session=new_session)\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\_login.py:404\u001b[39m, in \u001b[36m_login\u001b[39m\u001b[34m(token, add_to_git_credential)\u001b[39m\n\u001b[32m 401\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m token.startswith(\u001b[33m\"\u001b[39m\u001b[33mapi_org\u001b[39m\u001b[33m\"\u001b[39m):\n\u001b[32m 402\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33mYou must use your personal account token, not an organization token.\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m--> \u001b[39m\u001b[32m404\u001b[39m token_info = whoami(token)\n\u001b[32m 405\u001b[39m permission = token_info[\u001b[33m\"\u001b[39m\u001b[33mauth\u001b[39m\u001b[33m\"\u001b[39m][\u001b[33m\"\u001b[39m\u001b[33maccessToken\u001b[39m\u001b[33m\"\u001b[39m][\u001b[33m\"\u001b[39m\u001b[33mrole\u001b[39m\u001b[33m\"\u001b[39m]\n\u001b[32m 406\u001b[39m logger.info(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mToken is valid (permission: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mpermission\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m).\u001b[39m\u001b[33m\"\u001b[39m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\utils\\_validators.py:114\u001b[39m, in \u001b[36mvalidate_hf_hub_args.._inner_fn\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m 111\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m check_use_auth_token:\n\u001b[32m 112\u001b[39m kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.\u001b[34m__name__\u001b[39m, has_token=has_token, kwargs=kwargs)\n\u001b[32m--> \u001b[39m\u001b[32m114\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m fn(*args, **kwargs)\n", "\u001b[36mFile \u001b[39m\u001b[32m~\\miniconda3\\envs\\md\\Lib\\site-packages\\huggingface_hub\\hf_api.py:1677\u001b[39m, in \u001b[36mHfApi.whoami\u001b[39m\u001b[34m(self, token)\u001b[39m\n\u001b[32m 1675\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m effective_token == _get_token_from_file():\n\u001b[32m 1676\u001b[39m error_message += \u001b[33m\"\u001b[39m\u001b[33m The token stored is invalid. Please run `huggingface-cli login` to update it.\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m-> \u001b[39m\u001b[32m1677\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m HTTPError(error_message, request=e.request, response=e.response) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01me\u001b[39;00m\n\u001b[32m 1678\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m r.json()\n", "\u001b[31mHTTPError\u001b[39m: Invalid user token." ] } ], "source": [ "access_key = \"hf_token\" # Your Hugging Face token created from access keys in write mode\n", "repo_id = \"adityasharma0511/gldeploy\" # Your Hugging Face space id\n", "\n", "# Login to Hugging Face platform with the access token\n", "login(token=access_key)\n", "\n", "# Initialize the API\n", "api = HfApi()\n", "\n", "# Upload Streamlit app files stored in the folder called deployment_files\n", "api.upload_folder(\n", " folder_path=\"C:\\\\Users\\\\adity\\\\Model_Deployment\", # Local folder path in azureml\n", " repo_id=repo_id, # Hugging face space id\n", " repo_type=\"space\", # Hugging face repo type \"space\"\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "_i3RxMaYNDtX" }, "source": [ "1. **Define authentication and repository details:** \n", " - `hf_token` stores the Hugging Face API token for authentication. \n", " - `repo_id` specifies the Hugging Face **Space** repository where files will be uploaded. \n", "\n", "2. **Authenticate with Hugging Face:** \n", " - The `login(token=hf_token)` function logs into Hugging Face using the provided API token. \n", "\n", "3. **Initialize Hugging Face API object:** \n", " - `api = HfApi()` creates an instance of `HfApi`, which allows interaction with the Hugging Face Hub. \n", "\n", "4. **Upload files from the local folder to Hugging Face Space:** \n", " - `api.upload_folder()` uploads all files from the `deployment_files` folder to the specified Hugging Face repository. \n", " - `folder_path=\"/content/deployment_files\"` specifies the local directory containing the files. \n", " - `repo_id=repo_id` sets the target Hugging Face **Space** repository. \n", " - `repo_type=\"space\"` ensures that the upload is directed to a **Space** repository, which is used for hosting applications like Streamlit. \n" ] }, { "cell_type": "markdown", "metadata": { "id": "cfrgxVM66grM" }, "source": [ "Here's how the web app looks like." ] }, { "cell_type": "markdown", "metadata": { "id": "98iM0_0w6kPU" }, "source": [ "![image.png]()" ] }, { "cell_type": "markdown", "metadata": { "id": "LswDabmTHofd" }, "source": [ "Power Ahead!\n", "___" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "application/vnd.databricks.v1+notebook": { "dashboards": [], "language": "python", "notebookMetadata": { "pythonIndentUnit": 4 }, "notebookName": "machine_failure_prediction", "widgets": {} }, "colab": { "collapsed_sections": [ "o5Iixw4vHWG9", "niLZjnkCHWG_", "bQrbzi5RHWHC", "iT25LmvoHWHD", "Cl7T7_jFHWHE", "lqkUvTHzO3UH", "o81KHMBZTwXR", "luytCgCGXYdb", "Vh6L3EqqiDbw", "GChQ7KMTXmfa" ], "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.4" } }, "nbformat": 4, "nbformat_minor": 4 }