diff --git "a/1_Churn_Data_Creation_and_Processing.ipynb" "b/1_Churn_Data_Creation_and_Processing.ipynb"
new file mode 100644--- /dev/null
+++ "b/1_Churn_Data_Creation_and_Processing.ipynb"
@@ -0,0 +1,2342 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "pAyn432WulFm"
+      },
+      "source": [
+        "# 🔍 Notebook 1: Churn Data Creation and Processing\n",
+        "## AI for Big Data Management — ESCP Business School\n",
+        "### Final Group Project\n",
+        "\n",
+        "---\n",
+        "\n",
+        "## 📌 Problem Statement\n",
+        "> **\"How can a company predict customer churn based on support interactions and proactively adapt its retention strategy?\"**\n",
+        "\n",
+        "\n",
+        "- We aim to predict and understand customer churn by combining structured telecom data with synthetic behavioral signals derived from customer support interactions.\n",
+        "---\n",
+        "\n",
+        "## 🗺️ Project Pipeline\n",
+        "```\n",
+        "PROBLEM CREATION → REAL-WORLD DATA PROCESSING → SYNTHETIC DATASET GENERATION → AUTOMATION → WRAP-UP\n",
+        "```\n",
+        "\n",
+        "---\n",
+        "\n",
+        "## 📋 What This Notebook Does\n",
+        "1. **[REAL-WORLD]** Loads the Telco Customer Churn dataset from Kaggle\n",
+        "2. **[REAL-WORLD]** Cleans and preprocesses the real data\n",
+        "3. **[SYNTHETIC]** Generates realistic support interaction variables\n",
+        "4. **[SYNTHETIC]** Creates a merged, enriched final dataset\n",
+        "5. Exports `customer_churn_support_dataset.csv` for Notebook 2\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### ⚠️ Before Running\n",
+        "You need to upload one file:\n",
+        "- `WA_Fn-UseC_-Telco-Customer-Churn.csv` (downloaded from Kaggle)\n",
+        "\n",
+        "Upload it using the 📁 Files panel on the left sidebar in Google Colab.\n",
+        "All other data is generated synthetically in this notebook."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "acacAe8GulFp"
+      },
+      "source": [
+        "---\n",
+        "## 📦 SECTION 1: Install & Import Libraries\n",
+        "Run this cell first. It installs VADER for sentiment analysis and imports all necessary libraries."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "p5lEOq-1ulFq",
+        "outputId": "09999301-c22f-49db-896f-5344d4c0322d"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ All libraries imported successfully!\n",
+            "   pandas  : 2.2.2\n",
+            "   numpy   : 2.0.2\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Install required packages ──────────────────────────────────────────────────\n",
+        "!pip install vaderSentiment --quiet\n",
+        "\n",
+        "# ── Standard imports ──────────────────────────────────────────────────────────\n",
+        "import pandas as pd\n",
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "import seaborn as sns\n",
+        "import random\n",
+        "from datetime import datetime, timedelta\n",
+        "from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\n",
+        "\n",
+        "# ── Settings ──────────────────────────────────────────────────────────────────\n",
+        "np.random.seed(42)          # reproducibility\n",
+        "random.seed(42)\n",
+        "pd.set_option('display.max_columns', None)\n",
+        "\n",
+        "print('✅ All libraries imported successfully!')\n",
+        "print(f'   pandas  : {pd.__version__}')\n",
+        "print(f'   numpy   : {np.__version__}')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hrZYSJs4ulFr"
+      },
+      "source": [
+        "---\n",
+        "## 📥 SECTION 2: Load the Real-World Dataset\n",
+        "\n",
+        "### [REAL-WORLD DATA PROCESSING]\n",
+        "\n",
+        "**Where to download the dataset:**\n",
+        "1. Go to: https://www.kaggle.com/datasets/blastchar/telco-customer-churn\n",
+        "2. Click the **Download** button (top right)\n",
+        "3. Unzip the file — you will get: `WA_Fn-UseC_-Telco-Customer-Churn.csv`\n",
+        "4. In Google Colab, click the 📁 folder icon on the left sidebar\n",
+        "5. Click the ⬆️ Upload button and select the CSV file\n",
+        "6. Wait for the upload to finish, then run the cell below\n",
+        "\n",
+        "**Dataset info:** IBM Telco Customer Churn — 7,043 real customers with billing, contract, and service information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 382
+        },
+        "id": "2BkVJSBzulFr",
+        "outputId": "a11033cd-94ba-4ba8-9387-edfdb4ca27bb"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ Dataset loaded successfully!\n",
+            "   Shape: 7043 rows × 21 columns\n",
+            "\n",
+            "📊 First 5 rows:\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "   customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \\\n",
+              "0  7590-VHVEG  Female              0     Yes         No       1           No   \n",
+              "1  5575-GNVDE    Male              0      No         No      34          Yes   \n",
+              "2  3668-QPYBK    Male              0      No         No       2          Yes   \n",
+              "3  7795-CFOCW    Male              0      No         No      45           No   \n",
+              "4  9237-HQITU  Female              0      No         No       2          Yes   \n",
+              "\n",
+              "      MultipleLines InternetService OnlineSecurity OnlineBackup  \\\n",
+              "0  No phone service             DSL             No          Yes   \n",
+              "1                No             DSL            Yes           No   \n",
+              "2                No             DSL            Yes          Yes   \n",
+              "3  No phone service             DSL            Yes           No   \n",
+              "4                No     Fiber optic             No           No   \n",
+              "\n",
+              "  DeviceProtection TechSupport StreamingTV StreamingMovies        Contract  \\\n",
+              "0               No          No          No              No  Month-to-month   \n",
+              "1              Yes          No          No              No        One year   \n",
+              "2               No          No          No              No  Month-to-month   \n",
+              "3              Yes         Yes          No              No        One year   \n",
+              "4               No          No          No              No  Month-to-month   \n",
+              "\n",
+              "  PaperlessBilling              PaymentMethod  MonthlyCharges TotalCharges  \\\n",
+              "0              Yes           Electronic check           29.85        29.85   \n",
+              "1               No               Mailed check           56.95       1889.5   \n",
+              "2              Yes               Mailed check           53.85       108.15   \n",
+              "3               No  Bank transfer (automatic)           42.30      1840.75   \n",
+              "4              Yes           Electronic check           70.70       151.65   \n",
+              "\n",
+              "  Churn  \n",
+              "0    No  \n",
+              "1    No  \n",
+              "2   Yes  \n",
+              "3    No  \n",
+              "4   Yes  "
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-810f7e37-1f04-4ec4-b86d-f9c91950bd13\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>customerID</th>\n",
+              "      <th>gender</th>\n",
+              "      <th>SeniorCitizen</th>\n",
+              "      <th>Partner</th>\n",
+              "      <th>Dependents</th>\n",
+              "      <th>tenure</th>\n",
+              "      <th>PhoneService</th>\n",
+              "      <th>MultipleLines</th>\n",
+              "      <th>InternetService</th>\n",
+              "      <th>OnlineSecurity</th>\n",
+              "      <th>OnlineBackup</th>\n",
+              "      <th>DeviceProtection</th>\n",
+              "      <th>TechSupport</th>\n",
+              "      <th>StreamingTV</th>\n",
+              "      <th>StreamingMovies</th>\n",
+              "      <th>Contract</th>\n",
+              "      <th>PaperlessBilling</th>\n",
+              "      <th>PaymentMethod</th>\n",
+              "      <th>MonthlyCharges</th>\n",
+              "      <th>TotalCharges</th>\n",
+              "      <th>Churn</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>7590-VHVEG</td>\n",
+              "      <td>Female</td>\n",
+              "      <td>0</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>1</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No phone service</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>Electronic check</td>\n",
+              "      <td>29.85</td>\n",
+              "      <td>29.85</td>\n",
+              "      <td>No</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>5575-GNVDE</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>34</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>One year</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Mailed check</td>\n",
+              "      <td>56.95</td>\n",
+              "      <td>1889.5</td>\n",
+              "      <td>No</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3668-QPYBK</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>Mailed check</td>\n",
+              "      <td>53.85</td>\n",
+              "      <td>108.15</td>\n",
+              "      <td>Yes</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>7795-CFOCW</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>45</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No phone service</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>One year</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Bank transfer (automatic)</td>\n",
+              "      <td>42.30</td>\n",
+              "      <td>1840.75</td>\n",
+              "      <td>No</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>9237-HQITU</td>\n",
+              "      <td>Female</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Fiber optic</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>Electronic check</td>\n",
+              "      <td>70.70</td>\n",
+              "      <td>151.65</td>\n",
+              "      <td>Yes</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-810f7e37-1f04-4ec4-b86d-f9c91950bd13')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-810f7e37-1f04-4ec4-b86d-f9c91950bd13 button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-810f7e37-1f04-4ec4-b86d-f9c91950bd13');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe"
+            }
+          },
+          "metadata": {}
+        }
+      ],
+      "source": [
+        "# ── Load the real-world Telco Churn dataset ────────────────────────────────────\n",
+        "# If you renamed your file differently, change the filename below\n",
+        "DATASET_FILENAME = 'WA_Fn-UseC_-Telco-Customer-Churn.csv'\n",
+        "\n",
+        "try:\n",
+        "    df_real = pd.read_csv(DATASET_FILENAME)\n",
+        "    print(f'✅ Dataset loaded successfully!')\n",
+        "    print(f'   Shape: {df_real.shape[0]} rows × {df_real.shape[1]} columns')\n",
+        "    print(f'\\n📊 First 5 rows:')\n",
+        "    display(df_real.head())\n",
+        "except FileNotFoundError:\n",
+        "    print('❌ ERROR: File not found!')\n",
+        "    print('   Please upload the CSV file to Colab (see instructions above).')\n",
+        "    print('   File expected:', DATASET_FILENAME)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "0ZxflowUulFr",
+        "outputId": "d7356987-2ce1-4eff-cf0e-8263f2c1942d"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "📋 Column names:\n",
+            "['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']\n",
+            "\n",
+            "📋 Data types:\n",
+            "customerID           object\n",
+            "gender               object\n",
+            "SeniorCitizen         int64\n",
+            "Partner              object\n",
+            "Dependents           object\n",
+            "tenure                int64\n",
+            "PhoneService         object\n",
+            "MultipleLines        object\n",
+            "InternetService      object\n",
+            "OnlineSecurity       object\n",
+            "OnlineBackup         object\n",
+            "DeviceProtection     object\n",
+            "TechSupport          object\n",
+            "StreamingTV          object\n",
+            "StreamingMovies      object\n",
+            "Contract             object\n",
+            "PaperlessBilling     object\n",
+            "PaymentMethod        object\n",
+            "MonthlyCharges      float64\n",
+            "TotalCharges         object\n",
+            "Churn                object\n",
+            "dtype: object\n",
+            "\n",
+            "📋 Missing values per column:\n",
+            "customerID          0\n",
+            "gender              0\n",
+            "SeniorCitizen       0\n",
+            "Partner             0\n",
+            "Dependents          0\n",
+            "tenure              0\n",
+            "PhoneService        0\n",
+            "MultipleLines       0\n",
+            "InternetService     0\n",
+            "OnlineSecurity      0\n",
+            "OnlineBackup        0\n",
+            "DeviceProtection    0\n",
+            "TechSupport         0\n",
+            "StreamingTV         0\n",
+            "StreamingMovies     0\n",
+            "Contract            0\n",
+            "PaperlessBilling    0\n",
+            "PaymentMethod       0\n",
+            "MonthlyCharges      0\n",
+            "TotalCharges        0\n",
+            "Churn               0\n",
+            "dtype: int64\n",
+            "\n",
+            "📋 Churn distribution (real data):\n",
+            "Churn\n",
+            "No     5174\n",
+            "Yes    1869\n",
+            "Name: count, dtype: int64\n",
+            "\n",
+            "📋 Churn rate (real data): 26.54 %\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Basic exploration of real dataset ─────────────────────────────────────────\n",
+        "print('📋 Column names:')\n",
+        "print(df_real.columns.tolist())\n",
+        "print('\\n📋 Data types:')\n",
+        "print(df_real.dtypes)\n",
+        "print('\\n📋 Missing values per column:')\n",
+        "print(df_real.isnull().sum())\n",
+        "print('\\n📋 Churn distribution (real data):')\n",
+        "print(df_real['Churn'].value_counts())\n",
+        "print('\\n📋 Churn rate (real data):', round(df_real['Churn'].value_counts(normalize=True)['Yes'] * 100, 2), '%')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Qtj_34hvulFr"
+      },
+      "source": [
+        "---\n",
+        "## 🧹 SECTION 3: Real-World Data Cleaning\n",
+        "\n",
+        "### [REAL-WORLD DATA PROCESSING — continued]\n",
+        "\n",
+        "This section handles:\n",
+        "- Converting `TotalCharges` to numeric (it arrives as a string with spaces)\n",
+        "- Filling missing values\n",
+        "- Encoding the target variable `Churn` as 0/1\n",
+        "- Selecting the columns we need"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 729
+        },
+        "id": "jpJKPzueulFs",
+        "outputId": "99e08519-c0e8-495c-eb84-09af613c0a4c"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ TotalCharges NaN values filled with median: 1397.47\n",
+            "✅ Churn encoded: Yes=1, No=0\n",
+            "\n",
+            "✅ SeniorCitizen unique values: [0 1]\n",
+            "\n",
+            "✅ Cleaned dataset shape: (7043, 14)\n",
+            "   Missing values after cleaning:\n",
+            "customerID         0\n",
+            "gender             0\n",
+            "SeniorCitizen      0\n",
+            "Partner            0\n",
+            "Dependents         0\n",
+            "tenure             0\n",
+            "Contract           0\n",
+            "PaymentMethod      0\n",
+            "MonthlyCharges     0\n",
+            "TotalCharges       0\n",
+            "InternetService    0\n",
+            "TechSupport        0\n",
+            "Churn              0\n",
+            "Churn_binary       0\n",
+            "dtype: int64\n"
+          ]
+        },
+        {
+          "output_type": "stream",
+          "name": "stderr",
+          "text": [
+            "/tmp/ipykernel_2585/1930681133.py:6: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
+            "The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
+            "\n",
+            "For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
+            "\n",
+            "\n",
+            "  df_real['TotalCharges'].fillna(median_total, inplace=True)\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "   customerID  gender  SeniorCitizen Partner Dependents  tenure  \\\n",
+              "0  7590-VHVEG  Female              0     Yes         No       1   \n",
+              "1  5575-GNVDE    Male              0      No         No      34   \n",
+              "2  3668-QPYBK    Male              0      No         No       2   \n",
+              "3  7795-CFOCW    Male              0      No         No      45   \n",
+              "4  9237-HQITU  Female              0      No         No       2   \n",
+              "\n",
+              "         Contract              PaymentMethod  MonthlyCharges  TotalCharges  \\\n",
+              "0  Month-to-month           Electronic check           29.85         29.85   \n",
+              "1        One year               Mailed check           56.95       1889.50   \n",
+              "2  Month-to-month               Mailed check           53.85        108.15   \n",
+              "3        One year  Bank transfer (automatic)           42.30       1840.75   \n",
+              "4  Month-to-month           Electronic check           70.70        151.65   \n",
+              "\n",
+              "  InternetService TechSupport Churn  Churn_binary  \n",
+              "0             DSL          No    No             0  \n",
+              "1             DSL          No    No             0  \n",
+              "2             DSL          No   Yes             1  \n",
+              "3             DSL         Yes    No             0  \n",
+              "4     Fiber optic          No   Yes             1  "
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-02f93de8-8ca9-4487-b357-345a6f768ecd\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>customerID</th>\n",
+              "      <th>gender</th>\n",
+              "      <th>SeniorCitizen</th>\n",
+              "      <th>Partner</th>\n",
+              "      <th>Dependents</th>\n",
+              "      <th>tenure</th>\n",
+              "      <th>Contract</th>\n",
+              "      <th>PaymentMethod</th>\n",
+              "      <th>MonthlyCharges</th>\n",
+              "      <th>TotalCharges</th>\n",
+              "      <th>InternetService</th>\n",
+              "      <th>TechSupport</th>\n",
+              "      <th>Churn</th>\n",
+              "      <th>Churn_binary</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>7590-VHVEG</td>\n",
+              "      <td>Female</td>\n",
+              "      <td>0</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>1</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Electronic check</td>\n",
+              "      <td>29.85</td>\n",
+              "      <td>29.85</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>5575-GNVDE</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>34</td>\n",
+              "      <td>One year</td>\n",
+              "      <td>Mailed check</td>\n",
+              "      <td>56.95</td>\n",
+              "      <td>1889.50</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3668-QPYBK</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Mailed check</td>\n",
+              "      <td>53.85</td>\n",
+              "      <td>108.15</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>1</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>7795-CFOCW</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>45</td>\n",
+              "      <td>One year</td>\n",
+              "      <td>Bank transfer (automatic)</td>\n",
+              "      <td>42.30</td>\n",
+              "      <td>1840.75</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>0</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>9237-HQITU</td>\n",
+              "      <td>Female</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Electronic check</td>\n",
+              "      <td>70.70</td>\n",
+              "      <td>151.65</td>\n",
+              "      <td>Fiber optic</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>1</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-02f93de8-8ca9-4487-b357-345a6f768ecd')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-02f93de8-8ca9-4487-b357-345a6f768ecd button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-02f93de8-8ca9-4487-b357-345a6f768ecd');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"display(df_clean\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"customerID\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"5575-GNVDE\",\n          \"9237-HQITU\",\n          \"3668-QPYBK\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"gender\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Male\",\n          \"Female\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"SeniorCitizen\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 0,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Partner\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"No\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Dependents\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"No\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tenure\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 21,\n        \"min\": 1,\n        \"max\": 45,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          34\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Contract\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"One year\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"PaymentMethod\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          \"Electronic check\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"MonthlyCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 15.445573799635934,\n        \"min\": 29.85,\n        \"max\": 70.7,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          56.95\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TotalCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 969.8243111512518,\n        \"min\": 29.85,\n        \"max\": 1889.5,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1889.5\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"InternetService\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Fiber optic\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TechSupport\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Yes\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Yes\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn_binary\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {}
+        }
+      ],
+      "source": [
+        "# ── Step 1: Fix TotalCharges column (arrives as string) ──────────────────────\n",
+        "df_real['TotalCharges'] = pd.to_numeric(df_real['TotalCharges'], errors='coerce')\n",
+        "\n",
+        "# ── Step 2: Fill missing TotalCharges with median ────────────────────────────\n",
+        "median_total = df_real['TotalCharges'].median()\n",
+        "df_real['TotalCharges'].fillna(median_total, inplace=True)\n",
+        "print(f'✅ TotalCharges NaN values filled with median: {median_total:.2f}')\n",
+        "\n",
+        "# ── Step 3: Encode Churn as 0 / 1 ────────────────────────────────────────────\n",
+        "df_real['Churn_binary'] = df_real['Churn'].map({'Yes': 1, 'No': 0})\n",
+        "print(f'✅ Churn encoded: Yes=1, No=0')\n",
+        "\n",
+        "# ── Step 4: Encode SeniorCitizen (already 0/1, but verify) ───────────────────\n",
+        "print(f'\\n✅ SeniorCitizen unique values: {df_real[\"SeniorCitizen\"].unique()}')\n",
+        "\n",
+        "# ── Step 5: Select core columns for our analysis ─────────────────────────────\n",
+        "core_cols = [\n",
+        "    'customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',\n",
+        "    'tenure', 'Contract', 'PaymentMethod', 'MonthlyCharges',\n",
+        "    'TotalCharges', 'InternetService', 'TechSupport',\n",
+        "    'Churn', 'Churn_binary'\n",
+        "]\n",
+        "df_clean = df_real[core_cols].copy()\n",
+        "\n",
+        "print(f'\\n✅ Cleaned dataset shape: {df_clean.shape}')\n",
+        "print(f'   Missing values after cleaning:')\n",
+        "print(df_clean.isnull().sum())\n",
+        "display(df_clean.head())"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 317
+        },
+        "id": "ySxvvy5hulFs",
+        "outputId": "5c552cda-2f7f-4088-e11e-0d07e0886538"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "📊 Descriptive statistics (numeric columns):\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "       SeniorCitizen       tenure  MonthlyCharges  TotalCharges  Churn_binary\n",
+              "count    7043.000000  7043.000000     7043.000000   7043.000000   7043.000000\n",
+              "mean        0.162147    32.371149       64.761692   2281.916928      0.265370\n",
+              "std         0.368612    24.559481       30.090047   2265.270398      0.441561\n",
+              "min         0.000000     0.000000       18.250000     18.800000      0.000000\n",
+              "25%         0.000000     9.000000       35.500000    402.225000      0.000000\n",
+              "50%         0.000000    29.000000       70.350000   1397.475000      0.000000\n",
+              "75%         0.000000    55.000000       89.850000   3786.600000      1.000000\n",
+              "max         1.000000    72.000000      118.750000   8684.800000      1.000000"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>SeniorCitizen</th>\n",
+              "      <th>tenure</th>\n",
+              "      <th>MonthlyCharges</th>\n",
+              "      <th>TotalCharges</th>\n",
+              "      <th>Churn_binary</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>count</th>\n",
+              "      <td>7043.000000</td>\n",
+              "      <td>7043.000000</td>\n",
+              "      <td>7043.000000</td>\n",
+              "      <td>7043.000000</td>\n",
+              "      <td>7043.000000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>mean</th>\n",
+              "      <td>0.162147</td>\n",
+              "      <td>32.371149</td>\n",
+              "      <td>64.761692</td>\n",
+              "      <td>2281.916928</td>\n",
+              "      <td>0.265370</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>std</th>\n",
+              "      <td>0.368612</td>\n",
+              "      <td>24.559481</td>\n",
+              "      <td>30.090047</td>\n",
+              "      <td>2265.270398</td>\n",
+              "      <td>0.441561</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>min</th>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>18.250000</td>\n",
+              "      <td>18.800000</td>\n",
+              "      <td>0.000000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>25%</th>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>9.000000</td>\n",
+              "      <td>35.500000</td>\n",
+              "      <td>402.225000</td>\n",
+              "      <td>0.000000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>50%</th>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>29.000000</td>\n",
+              "      <td>70.350000</td>\n",
+              "      <td>1397.475000</td>\n",
+              "      <td>0.000000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>75%</th>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>55.000000</td>\n",
+              "      <td>89.850000</td>\n",
+              "      <td>3786.600000</td>\n",
+              "      <td>1.000000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>max</th>\n",
+              "      <td>1.000000</td>\n",
+              "      <td>72.000000</td>\n",
+              "      <td>118.750000</td>\n",
+              "      <td>8684.800000</td>\n",
+              "      <td>1.000000</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5 button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"display(df_clean\",\n  \"rows\": 8,\n  \"fields\": [\n    {\n      \"column\": \"SeniorCitizen\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2489.9992387084,\n        \"min\": 0.0,\n        \"max\": 7043.0,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.1621468124378816,\n          1.0,\n          0.36861160561002687\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tenure\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2478.9752758409018,\n        \"min\": 0.0,\n        \"max\": 7043.0,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          32.37114865824223,\n          29.0,\n          7043.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"MonthlyCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2468.7047672837775,\n        \"min\": 18.25,\n        \"max\": 7043.0,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          64.76169246059918,\n          70.35,\n          7043.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TotalCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 3119.0484860242914,\n        \"min\": 18.8,\n        \"max\": 8684.8,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          2281.9169281556156,\n          1397.475,\n          7043.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn_binary\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2489.939844235915,\n        \"min\": 0.0,\n        \"max\": 7043.0,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.2653698707936959,\n          1.0,\n          0.44156130512195013\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {}
+        }
+      ],
+      "source": [
+        "# ── Descriptive statistics on real data ──────────────────────────────────────\n",
+        "print('📊 Descriptive statistics (numeric columns):')\n",
+        "display(df_clean.describe())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RpDjH_T6ulFs"
+      },
+      "source": [
+        "---\n",
+        "## 🤖 SECTION 4: Synthetic Support Interaction Data Generation\n",
+        "\n",
+        "### [SYNTHETIC DATASET GENERATION]\n",
+        "\n",
+        "**Why synthetic data?**  \n",
+        "Real telecom datasets do not include detailed support call logs. We simulate realistic support interaction variables that are **statistically correlated** with churn — just as a real company's CRM data would show.\n",
+        "\n",
+        "**Variables we create:**\n",
+        "| Variable | Description |\n",
+        "|---|---|\n",
+        "| `support_calls` | Number of support calls made in the last 6 months |\n",
+        "| `avg_call_duration` | Average call duration in minutes |\n",
+        "| `complaint_type` | Type of most frequent complaint |\n",
+        "| `days_since_last_contact` | Days since the customer last contacted support |\n",
+        "| `last_contact_sentiment` | Text of the last customer feedback |\n",
+        "| `sentiment_score` | VADER compound sentiment score |\n",
+        "| `support_churn_risk` | Composite risk score (Low / Medium / High) |"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "NK03tdXdulFs",
+        "outputId": "6a003b55-cad7-4b8e-9ddb-426a7d5f9135"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ Feedback templates and complaint types defined.\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Helper: realistic sentiment phrases per churn status ─────────────────────\n",
+        "POSITIVE_FEEDBACK = [\n",
+        "    \"The support agent was very helpful and solved my issue quickly.\",\n",
+        "    \"Great service, no complaints at all!\",\n",
+        "    \"Fast resolution. I am happy with the service.\",\n",
+        "    \"The team was professional and friendly. Very satisfied.\",\n",
+        "    \"Everything was resolved in one call. Excellent experience.\",\n",
+        "    \"I love this company, always responsive and caring.\",\n",
+        "    \"No issues, service works perfectly. Very happy customer.\"\n",
+        "]\n",
+        "\n",
+        "NEUTRAL_FEEDBACK = [\n",
+        "    \"The wait time was long but the issue was eventually resolved.\",\n",
+        "    \"Average experience. Could be better.\",\n",
+        "    \"Service is okay. Nothing special.\",\n",
+        "    \"The agent was polite but the problem took two calls to fix.\",\n",
+        "    \"Acceptable support, but I expected faster resolution.\"\n",
+        "]\n",
+        "\n",
+        "NEGATIVE_FEEDBACK = [\n",
+        "    \"I have called five times and the problem is still not fixed!\",\n",
+        "    \"Terrible service. I am thinking of switching providers.\",\n",
+        "    \"The agents are unhelpful and the wait times are ridiculous.\",\n",
+        "    \"I am very frustrated. Nobody seems to care about my problem.\",\n",
+        "    \"Worst customer service I have ever experienced. Cancelling soon.\",\n",
+        "    \"My bill is wrong again. This is the third time this month!\",\n",
+        "    \"I am extremely disappointed. No follow-up, no resolution.\"\n",
+        "]\n",
+        "\n",
+        "COMPLAINT_TYPES = [\n",
+        "    'Billing Issue', 'Service Outage', 'Speed/Performance',\n",
+        "    'Contract Dispute', 'Technical Failure', 'Overcharge'\n",
+        "]\n",
+        "\n",
+        "print('✅ Feedback templates and complaint types defined.')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "EIzwa0U3ulFs",
+        "outputId": "0aa0c944-1e0d-4df3-cc48-15acc7a590ae"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ Synthetic support variables generated!\n",
+            "   support_calls range    : 1 – 15\n",
+            "   avg_call_duration range: 3.0 – 35.0 min\n",
+            "   Sample complaint types : {'Contract Dispute', 'Billing Issue', 'Technical Failure', 'Service Outage'}\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Generate synthetic support variables ──────────────────────────────────────\n",
+        "n = len(df_clean)\n",
+        "churn_flag = df_clean['Churn_binary'].values\n",
+        "\n",
+        "# support_calls: churners call more (5-15 calls), non-churners call less (1-6)\n",
+        "support_calls = np.where(\n",
+        "    churn_flag == 1,\n",
+        "    np.random.randint(5, 16, n),\n",
+        "    np.random.randint(1, 7,  n)\n",
+        ")\n",
+        "\n",
+        "# avg_call_duration: churners have longer calls (frustration)\n",
+        "avg_call_duration = np.where(\n",
+        "    churn_flag == 1,\n",
+        "    np.round(np.random.uniform(12, 35, n), 1),\n",
+        "    np.round(np.random.uniform(3,  15, n), 1)\n",
+        ")\n",
+        "\n",
+        "# complaint_type: random, but churners have heavier billing/contract issues\n",
+        "def pick_complaint(is_churner):\n",
+        "    if is_churner:\n",
+        "        # weight toward billing and contract disputes\n",
+        "        weights = [0.30, 0.15, 0.15, 0.20, 0.10, 0.10]\n",
+        "    else:\n",
+        "        weights = [0.20, 0.20, 0.20, 0.10, 0.20, 0.10]\n",
+        "    return random.choices(COMPLAINT_TYPES, weights=weights, k=1)[0]\n",
+        "\n",
+        "complaint_type = [pick_complaint(c) for c in churn_flag]\n",
+        "\n",
+        "# days_since_last_contact: churners contacted recently (about to leave)\n",
+        "days_since_last_contact = np.where(\n",
+        "    churn_flag == 1,\n",
+        "    np.random.randint(1,  30, n),\n",
+        "    np.random.randint(15, 90, n)\n",
+        ")\n",
+        "\n",
+        "# last_contact_sentiment: text phrase matching churn likelihood\n",
+        "def pick_sentiment_text(is_churner):\n",
+        "    if is_churner:\n",
+        "        # 70% negative, 20% neutral, 10% positive\n",
+        "        pool = (NEGATIVE_FEEDBACK * 7) + (NEUTRAL_FEEDBACK * 2) + (POSITIVE_FEEDBACK * 1)\n",
+        "    else:\n",
+        "        # 10% negative, 20% neutral, 70% positive\n",
+        "        pool = (POSITIVE_FEEDBACK * 7) + (NEUTRAL_FEEDBACK * 2) + (NEGATIVE_FEEDBACK * 1)\n",
+        "    return random.choice(pool)\n",
+        "\n",
+        "last_contact_sentiment = [pick_sentiment_text(c) for c in churn_flag]\n",
+        "\n",
+        "print('✅ Synthetic support variables generated!')\n",
+        "print(f'   support_calls range    : {support_calls.min()} – {support_calls.max()}')\n",
+        "print(f'   avg_call_duration range: {avg_call_duration.min()} – {avg_call_duration.max()} min')\n",
+        "print(f'   Sample complaint types : {set(complaint_type[:10])}')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 12,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "Yak2n7gpulFt",
+        "outputId": "ff972046-295c-44c0-c4bb-e0d6c59cb165"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ VADER sentiment scores computed!\n",
+            "   Score range: -0.710 to 0.872\n",
+            "   Mean score : 0.307\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Compute VADER sentiment score ─────────────────────────────────────────────\n",
+        "analyzer = SentimentIntensityAnalyzer()\n",
+        "\n",
+        "def get_compound_score(text):\n",
+        "    \"\"\"Return VADER compound score: -1 (most negative) to +1 (most positive)\"\"\"\n",
+        "    return analyzer.polarity_scores(text)['compound']\n",
+        "\n",
+        "sentiment_score = [get_compound_score(text) for text in last_contact_sentiment]\n",
+        "\n",
+        "print('✅ VADER sentiment scores computed!')\n",
+        "print(f'   Score range: {min(sentiment_score):.3f} to {max(sentiment_score):.3f}')\n",
+        "print(f'   Mean score : {np.mean(sentiment_score):.3f}')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 13,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "KJ3wGj9mulFt",
+        "outputId": "6b8e8fd1-6eaf-4f77-8e37-6218bea1e414"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ support_churn_risk categories created!\n",
+            "   Distribution: Counter({'Medium': 4166, 'Low': 1471, 'High': 1406})\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Compute composite support_churn_risk ──────────────────────────────────────\n",
+        "# Logic:\n",
+        "#   HIGH   = many calls (>=6)  AND negative sentiment (<= -0.3)\n",
+        "#   MEDIUM = moderate calls (3-5) OR somewhat negative (-0.3 to 0)\n",
+        "#   LOW    = everything else\n",
+        "\n",
+        "def compute_risk(calls, score):\n",
+        "    if calls >= 6 and score <= -0.3:\n",
+        "        return 'High'\n",
+        "    elif calls >= 3 or score <= 0.0:\n",
+        "        return 'Medium'\n",
+        "    else:\n",
+        "        return 'Low'\n",
+        "\n",
+        "support_churn_risk = [\n",
+        "    compute_risk(c, s) for c, s in zip(support_calls, sentiment_score)\n",
+        "]\n",
+        "\n",
+        "print('✅ support_churn_risk categories created!')\n",
+        "from collections import Counter\n",
+        "print('   Distribution:', Counter(support_churn_risk))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Synthetic Data Design & Assumptions\n",
+        "\n",
+        "### Why Synthetic Data Was Created\n",
+        "\n",
+        "The original Telco dataset does not include detailed information about customer support interactions or behavioral signals such as sentiment. However, in real-world business settings, these factors play a critical role in customer churn.\n",
+        "\n",
+        "To better approximate real-world conditions, we generated synthetic variables representing:\n",
+        "- number of support calls\n",
+        "- average call duration\n",
+        "- complaint type\n",
+        "- time since last interaction\n",
+        "- sentiment score\n",
+        "- support-based churn risk\n",
+        "\n",
+        "These variables allow us to enrich the dataset and create a more realistic narrative around customer experience and dissatisfaction.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### Key Assumptions\n",
+        "\n",
+        "The synthetic data generation is based on the following assumptions:\n",
+        "\n",
+        "- Customers who churn tend to have **more frequent support interactions**\n",
+        "- Negative experiences lead to **lower sentiment scores**\n",
+        "- Certain complaint types (e.g., technical failures) are more strongly associated with dissatisfaction\n",
+        "- Customers with unresolved issues are more likely to churn\n",
+        "\n",
+        "These assumptions are grounded in typical telecom business logic but are not directly observed in the original dataset.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### Limitations of Synthetic Data\n",
+        "\n",
+        "Because these variables are artificially generated:\n",
+        "- They may introduce **bias toward expected relationships**\n",
+        "- They are not independent of the churn outcome\n",
+        "- They may **overstate model performance**\n",
+        "\n",
+        "Therefore, results should be interpreted as illustrative rather than fully generalizable."
+      ],
+      "metadata": {
+        "id": "zvVoEV0Fl77_"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7TZ0lxJQulFt"
+      },
+      "source": [
+        "---\n",
+        "## 🔗 SECTION 5: Merge Real + Synthetic Data\n",
+        "\n",
+        "We now combine the cleaned real-world dataset with our synthetic support variables into one rich dataset."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 14,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 330
+        },
+        "id": "7tMGLUqvulFt",
+        "outputId": "d5013de1-0912-46b4-886e-e077adc64ac3"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ Final merged dataset: 7043 rows × 21 columns\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "   customerID  gender  SeniorCitizen Partner Dependents  tenure  \\\n",
+              "0  7590-VHVEG  Female              0     Yes         No       1   \n",
+              "1  5575-GNVDE    Male              0      No         No      34   \n",
+              "2  3668-QPYBK    Male              0      No         No       2   \n",
+              "3  7795-CFOCW    Male              0      No         No      45   \n",
+              "4  9237-HQITU  Female              0      No         No       2   \n",
+              "\n",
+              "         Contract              PaymentMethod  MonthlyCharges  TotalCharges  \\\n",
+              "0  Month-to-month           Electronic check           29.85         29.85   \n",
+              "1        One year               Mailed check           56.95       1889.50   \n",
+              "2  Month-to-month               Mailed check           53.85        108.15   \n",
+              "3        One year  Bank transfer (automatic)           42.30       1840.75   \n",
+              "4  Month-to-month           Electronic check           70.70        151.65   \n",
+              "\n",
+              "  InternetService TechSupport Churn  Churn_binary  support_calls  \\\n",
+              "0             DSL          No    No             0              2   \n",
+              "1             DSL          No    No             0              4   \n",
+              "2             DSL          No   Yes             1             15   \n",
+              "3             DSL         Yes    No             0              3   \n",
+              "4     Fiber optic          No   Yes             1              9   \n",
+              "\n",
+              "   avg_call_duration    complaint_type  days_since_last_contact  \\\n",
+              "0                9.7  Contract Dispute                       16   \n",
+              "1                3.8     Billing Issue                       57   \n",
+              "2               21.2     Billing Issue                       12   \n",
+              "3                9.8    Service Outage                       40   \n",
+              "4               12.3  Contract Dispute                       24   \n",
+              "\n",
+              "                              last_contact_sentiment  sentiment_score  \\\n",
+              "0  I love this company, always responsive and car...           0.8720   \n",
+              "1               Great service, no complaints at all!           0.7684   \n",
+              "2  My bill is wrong again. This is the third time...          -0.5255   \n",
+              "3  Everything was resolved in one call. Excellent...           0.6597   \n",
+              "4  My bill is wrong again. This is the third time...          -0.5255   \n",
+              "\n",
+              "  support_churn_risk  \n",
+              "0                Low  \n",
+              "1             Medium  \n",
+              "2               High  \n",
+              "3             Medium  \n",
+              "4               High  "
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-72891642-6b1c-4c46-9a3a-adb94778721a\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>customerID</th>\n",
+              "      <th>gender</th>\n",
+              "      <th>SeniorCitizen</th>\n",
+              "      <th>Partner</th>\n",
+              "      <th>Dependents</th>\n",
+              "      <th>tenure</th>\n",
+              "      <th>Contract</th>\n",
+              "      <th>PaymentMethod</th>\n",
+              "      <th>MonthlyCharges</th>\n",
+              "      <th>TotalCharges</th>\n",
+              "      <th>InternetService</th>\n",
+              "      <th>TechSupport</th>\n",
+              "      <th>Churn</th>\n",
+              "      <th>Churn_binary</th>\n",
+              "      <th>support_calls</th>\n",
+              "      <th>avg_call_duration</th>\n",
+              "      <th>complaint_type</th>\n",
+              "      <th>days_since_last_contact</th>\n",
+              "      <th>last_contact_sentiment</th>\n",
+              "      <th>sentiment_score</th>\n",
+              "      <th>support_churn_risk</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>7590-VHVEG</td>\n",
+              "      <td>Female</td>\n",
+              "      <td>0</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>1</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Electronic check</td>\n",
+              "      <td>29.85</td>\n",
+              "      <td>29.85</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>0</td>\n",
+              "      <td>2</td>\n",
+              "      <td>9.7</td>\n",
+              "      <td>Contract Dispute</td>\n",
+              "      <td>16</td>\n",
+              "      <td>I love this company, always responsive and car...</td>\n",
+              "      <td>0.8720</td>\n",
+              "      <td>Low</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>5575-GNVDE</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>34</td>\n",
+              "      <td>One year</td>\n",
+              "      <td>Mailed check</td>\n",
+              "      <td>56.95</td>\n",
+              "      <td>1889.50</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>0</td>\n",
+              "      <td>4</td>\n",
+              "      <td>3.8</td>\n",
+              "      <td>Billing Issue</td>\n",
+              "      <td>57</td>\n",
+              "      <td>Great service, no complaints at all!</td>\n",
+              "      <td>0.7684</td>\n",
+              "      <td>Medium</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3668-QPYBK</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Mailed check</td>\n",
+              "      <td>53.85</td>\n",
+              "      <td>108.15</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>1</td>\n",
+              "      <td>15</td>\n",
+              "      <td>21.2</td>\n",
+              "      <td>Billing Issue</td>\n",
+              "      <td>12</td>\n",
+              "      <td>My bill is wrong again. This is the third time...</td>\n",
+              "      <td>-0.5255</td>\n",
+              "      <td>High</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>7795-CFOCW</td>\n",
+              "      <td>Male</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>45</td>\n",
+              "      <td>One year</td>\n",
+              "      <td>Bank transfer (automatic)</td>\n",
+              "      <td>42.30</td>\n",
+              "      <td>1840.75</td>\n",
+              "      <td>DSL</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>No</td>\n",
+              "      <td>0</td>\n",
+              "      <td>3</td>\n",
+              "      <td>9.8</td>\n",
+              "      <td>Service Outage</td>\n",
+              "      <td>40</td>\n",
+              "      <td>Everything was resolved in one call. Excellent...</td>\n",
+              "      <td>0.6597</td>\n",
+              "      <td>Medium</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>9237-HQITU</td>\n",
+              "      <td>Female</td>\n",
+              "      <td>0</td>\n",
+              "      <td>No</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Month-to-month</td>\n",
+              "      <td>Electronic check</td>\n",
+              "      <td>70.70</td>\n",
+              "      <td>151.65</td>\n",
+              "      <td>Fiber optic</td>\n",
+              "      <td>No</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>1</td>\n",
+              "      <td>9</td>\n",
+              "      <td>12.3</td>\n",
+              "      <td>Contract Dispute</td>\n",
+              "      <td>24</td>\n",
+              "      <td>My bill is wrong again. This is the third time...</td>\n",
+              "      <td>-0.5255</td>\n",
+              "      <td>High</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-72891642-6b1c-4c46-9a3a-adb94778721a')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-72891642-6b1c-4c46-9a3a-adb94778721a button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-72891642-6b1c-4c46-9a3a-adb94778721a');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe"
+            }
+          },
+          "metadata": {}
+        }
+      ],
+      "source": [
+        "# ── Build the synthetic support DataFrame ────────────────────────────────────\n",
+        "df_support = pd.DataFrame({\n",
+        "    'support_calls'          : support_calls,\n",
+        "    'avg_call_duration'      : avg_call_duration,\n",
+        "    'complaint_type'         : complaint_type,\n",
+        "    'days_since_last_contact': days_since_last_contact,\n",
+        "    'last_contact_sentiment' : last_contact_sentiment,\n",
+        "    'sentiment_score'        : sentiment_score,\n",
+        "    'support_churn_risk'     : support_churn_risk\n",
+        "})\n",
+        "\n",
+        "# ── Merge with real-world data (index-aligned) ────────────────────────────────\n",
+        "df_final = pd.concat([df_clean.reset_index(drop=True), df_support], axis=1)\n",
+        "\n",
+        "print(f'✅ Final merged dataset: {df_final.shape[0]} rows × {df_final.shape[1]} columns')\n",
+        "display(df_final.head())"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 15,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "UAW-XT66ulFu",
+        "outputId": "32243187-b541-40b2-8069-894cbe430f8b"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "📊 Mean support_calls by Churn:\n",
+            "Churn\n",
+            "No      3.47\n",
+            "Yes    10.02\n",
+            "Name: support_calls, dtype: float64\n",
+            "\n",
+            "📊 Mean sentiment_score by Churn:\n",
+            "Churn\n",
+            "No     0.519\n",
+            "Yes   -0.279\n",
+            "Name: sentiment_score, dtype: float64\n",
+            "\n",
+            "📊 support_churn_risk vs Churn crosstab:\n",
+            "Churn                  No    Yes\n",
+            "support_churn_risk              \n",
+            "High                0.080  0.920\n",
+            "Low                 1.000  0.000\n",
+            "Medium              0.862  0.138\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Verification: check correlations make sense ──────────────────────────────\n",
+        "print('📊 Mean support_calls by Churn:')\n",
+        "print(df_final.groupby('Churn')['support_calls'].mean().round(2))\n",
+        "\n",
+        "print('\\n📊 Mean sentiment_score by Churn:')\n",
+        "print(df_final.groupby('Churn')['sentiment_score'].mean().round(3))\n",
+        "\n",
+        "print('\\n📊 support_churn_risk vs Churn crosstab:')\n",
+        "print(pd.crosstab(df_final['support_churn_risk'], df_final['Churn'], normalize='index').round(3))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "27am_r6QulFu"
+      },
+      "source": [
+        "---\n",
+        "## 💾 SECTION 6: Export Final Dataset"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 16,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "EfM7zP4XulFu",
+        "outputId": "5f76e254-0dc9-49ea-c6ff-3c202b00c78f"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ Dataset exported: customer_churn_support_dataset.csv\n",
+            "   Rows    : 7043\n",
+            "   Columns : 21\n",
+            "\n",
+            "📋 Final column list:\n",
+            "   • customerID\n",
+            "   • gender\n",
+            "   • SeniorCitizen\n",
+            "   • Partner\n",
+            "   • Dependents\n",
+            "   • tenure\n",
+            "   • Contract\n",
+            "   • PaymentMethod\n",
+            "   • MonthlyCharges\n",
+            "   • TotalCharges\n",
+            "   • InternetService\n",
+            "   • TechSupport\n",
+            "   • Churn\n",
+            "   • Churn_binary\n",
+            "   • support_calls\n",
+            "   • avg_call_duration\n",
+            "   • complaint_type\n",
+            "   • days_since_last_contact\n",
+            "   • last_contact_sentiment\n",
+            "   • sentiment_score\n",
+            "   • support_churn_risk\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Export to CSV ─────────────────────────────────────────────────────────────\n",
+        "OUTPUT_FILENAME = 'customer_churn_support_dataset.csv'\n",
+        "df_final.to_csv(OUTPUT_FILENAME, index=False)\n",
+        "\n",
+        "print(f'✅ Dataset exported: {OUTPUT_FILENAME}')\n",
+        "print(f'   Rows    : {df_final.shape[0]}')\n",
+        "print(f'   Columns : {df_final.shape[1]}')\n",
+        "print(f'\\n📋 Final column list:')\n",
+        "for col in df_final.columns:\n",
+        "    print(f'   • {col}')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 17,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "id": "HFWnx9zlulFu",
+        "outputId": "a9084d14-05d9-4cb2-98c6-a25491f9b6b1"
+      },
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "<IPython.core.display.Javascript object>"
+            ],
+            "application/javascript": [
+              "\n",
+              "    async function download(id, filename, size) {\n",
+              "      if (!google.colab.kernel.accessAllowed) {\n",
+              "        return;\n",
+              "      }\n",
+              "      const div = document.createElement('div');\n",
+              "      const label = document.createElement('label');\n",
+              "      label.textContent = `Downloading \"${filename}\": `;\n",
+              "      div.appendChild(label);\n",
+              "      const progress = document.createElement('progress');\n",
+              "      progress.max = size;\n",
+              "      div.appendChild(progress);\n",
+              "      document.body.appendChild(div);\n",
+              "\n",
+              "      const buffers = [];\n",
+              "      let downloaded = 0;\n",
+              "\n",
+              "      const channel = await google.colab.kernel.comms.open(id);\n",
+              "      // Send a message to notify the kernel that we're ready.\n",
+              "      channel.send({})\n",
+              "\n",
+              "      for await (const message of channel.messages) {\n",
+              "        // Send a message to notify the kernel that we're ready.\n",
+              "        channel.send({})\n",
+              "        if (message.buffers) {\n",
+              "          for (const buffer of message.buffers) {\n",
+              "            buffers.push(buffer);\n",
+              "            downloaded += buffer.byteLength;\n",
+              "            progress.value = downloaded;\n",
+              "          }\n",
+              "        }\n",
+              "      }\n",
+              "      const blob = new Blob(buffers, {type: 'application/binary'});\n",
+              "      const a = document.createElement('a');\n",
+              "      a.href = window.URL.createObjectURL(blob);\n",
+              "      a.download = filename;\n",
+              "      div.appendChild(a);\n",
+              "      a.click();\n",
+              "      div.remove();\n",
+              "    }\n",
+              "  "
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "<IPython.core.display.Javascript object>"
+            ],
+            "application/javascript": [
+              "download(\"download_160eb193-7480-4c6e-90ba-3271caaea521\", \"customer_churn_support_dataset.csv\", 1309271)"
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "✅ Download triggered! Check your Downloads folder.\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Download the file to your computer ───────────────────────────────────────\n",
+        "from google.colab import files\n",
+        "files.download(OUTPUT_FILENAME)\n",
+        "print('✅ Download triggered! Check your Downloads folder.')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wr8dQ0YQulFu"
+      },
+      "source": [
+        "---\n",
+        "## ✅ SECTION 7: Final Verification Checks"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 18,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "FK7d-FE2ulFu",
+        "outputId": "4e8cb3b5-08f8-4e0a-efd6-97b738be9181"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "============================================================\n",
+            "         FINAL DATASET VERIFICATION REPORT\n",
+            "============================================================\n",
+            "\n",
+            "✅ Shape            : (7043, 21)\n",
+            "✅ Missing values   : 0 total\n",
+            "✅ Churn rate       : 26.5%\n",
+            "✅ Risk categories  : {'Medium': 4166, 'Low': 1471, 'High': 1406}\n",
+            "✅ Complaint types  : 6 unique\n",
+            "✅ Sentiment range  : -0.710 to 0.872\n",
+            "\n",
+            "============================================================\n",
+            "   ✅ Notebook 1 COMPLETE! Proceed to Notebook 2.\n",
+            "============================================================\n"
+          ]
+        }
+      ],
+      "source": [
+        "# ── Final verification ────────────────────────────────────────────────────────\n",
+        "print('=' * 60)\n",
+        "print('         FINAL DATASET VERIFICATION REPORT')\n",
+        "print('=' * 60)\n",
+        "\n",
+        "df_verify = pd.read_csv(OUTPUT_FILENAME)\n",
+        "\n",
+        "print(f'\\n✅ Shape            : {df_verify.shape}')\n",
+        "print(f'✅ Missing values   : {df_verify.isnull().sum().sum()} total')\n",
+        "print(f'✅ Churn rate       : {df_verify[\"Churn_binary\"].mean()*100:.1f}%')\n",
+        "print(f'✅ Risk categories  : {df_verify[\"support_churn_risk\"].value_counts().to_dict()}')\n",
+        "print(f'✅ Complaint types  : {df_verify[\"complaint_type\"].nunique()} unique')\n",
+        "print(f'✅ Sentiment range  : {df_verify[\"sentiment_score\"].min():.3f} to {df_verify[\"sentiment_score\"].max():.3f}')\n",
+        "\n",
+        "print('\\n' + '=' * 60)\n",
+        "print('   ✅ Notebook 1 COMPLETE! Proceed to Notebook 2.')\n",
+        "print('=' * 60)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## SECTION 8: Final Dataset Summary"
+      ],
+      "metadata": {
+        "id": "7KUQ8dwMa9pK"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print('=' * 60)\n",
+        "print('        NOTEBOOK 1 — FINAL DATASET SUMMARY')\n",
+        "print('=' * 60)\n",
+        "\n",
+        "print(f'\\n📐 Shape: {df_final.shape[0]} rows × {df_final.shape[1]} columns')\n",
+        "\n",
+        "print('\\n📋 Column overview:')\n",
+        "for col in df_final.columns:\n",
+        "    dtype = str(df_final[col].dtype)\n",
+        "    sample = str(df_final[col].iloc[0])[:40]\n",
+        "    print(f'   {col:<30} [{dtype:<10}]  e.g. {sample}')\n",
+        "\n",
+        "print('\\n📊 Real vs Synthetic columns:')\n",
+        "real_cols = ['customerID','gender','SeniorCitizen','Partner','Dependents',\n",
+        "             'tenure','Contract','PaymentMethod','MonthlyCharges',\n",
+        "             'TotalCharges','InternetService','TechSupport','Churn','Churn_binary']\n",
+        "synthetic_cols = ['support_calls','avg_call_duration','complaint_type',\n",
+        "                  'days_since_last_contact','last_contact_sentiment',\n",
+        "                  'sentiment_score','support_churn_risk']\n",
+        "\n",
+        "print(f'   ✅ Real-world columns  ({len(real_cols)}): {real_cols}')\n",
+        "print(f'   🤖 Synthetic columns   ({len(synthetic_cols)}): {synthetic_cols}')\n",
+        "\n",
+        "print('\\n📊 Sample rows:')\n",
+        "display(df_final[['customerID','tenure','Churn','support_calls',\n",
+        "                   'sentiment_score','support_churn_risk']].head(5))\n",
+        "\n",
+        "\n",
+        "print('=' * 60)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 833
+        },
+        "id": "XpmZTW0RahxT",
+        "outputId": "10a7c450-6a35-447c-ca9f-7db0aab8d5ce"
+      },
+      "execution_count": 21,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "============================================================\n",
+            "        NOTEBOOK 1 — FINAL DATASET SUMMARY\n",
+            "============================================================\n",
+            "\n",
+            "📐 Shape: 7043 rows × 21 columns\n",
+            "\n",
+            "📋 Column overview:\n",
+            "   customerID                     [object    ]  e.g. 7590-VHVEG\n",
+            "   gender                         [object    ]  e.g. Female\n",
+            "   SeniorCitizen                  [int64     ]  e.g. 0\n",
+            "   Partner                        [object    ]  e.g. Yes\n",
+            "   Dependents                     [object    ]  e.g. No\n",
+            "   tenure                         [int64     ]  e.g. 1\n",
+            "   Contract                       [object    ]  e.g. Month-to-month\n",
+            "   PaymentMethod                  [object    ]  e.g. Electronic check\n",
+            "   MonthlyCharges                 [float64   ]  e.g. 29.85\n",
+            "   TotalCharges                   [float64   ]  e.g. 29.85\n",
+            "   InternetService                [object    ]  e.g. DSL\n",
+            "   TechSupport                    [object    ]  e.g. No\n",
+            "   Churn                          [object    ]  e.g. No\n",
+            "   Churn_binary                   [int64     ]  e.g. 0\n",
+            "   support_calls                  [int64     ]  e.g. 2\n",
+            "   avg_call_duration              [float64   ]  e.g. 9.7\n",
+            "   complaint_type                 [object    ]  e.g. Contract Dispute\n",
+            "   days_since_last_contact        [int64     ]  e.g. 16\n",
+            "   last_contact_sentiment         [object    ]  e.g. I love this company, always responsive a\n",
+            "   sentiment_score                [float64   ]  e.g. 0.872\n",
+            "   support_churn_risk             [object    ]  e.g. Low\n",
+            "\n",
+            "📊 Real vs Synthetic columns:\n",
+            "   ✅ Real-world columns  (14): ['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'Contract', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'InternetService', 'TechSupport', 'Churn', 'Churn_binary']\n",
+            "   🤖 Synthetic columns   (7): ['support_calls', 'avg_call_duration', 'complaint_type', 'days_since_last_contact', 'last_contact_sentiment', 'sentiment_score', 'support_churn_risk']\n",
+            "\n",
+            "📊 Sample rows:\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "   customerID  tenure Churn  support_calls  sentiment_score support_churn_risk\n",
+              "0  7590-VHVEG       1    No              2           0.8720                Low\n",
+              "1  5575-GNVDE      34    No              4           0.7684             Medium\n",
+              "2  3668-QPYBK       2   Yes             15          -0.5255               High\n",
+              "3  7795-CFOCW      45    No              3           0.6597             Medium\n",
+              "4  9237-HQITU       2   Yes              9          -0.5255               High"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25\" class=\"colab-df-container\">\n",
+              "    <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>customerID</th>\n",
+              "      <th>tenure</th>\n",
+              "      <th>Churn</th>\n",
+              "      <th>support_calls</th>\n",
+              "      <th>sentiment_score</th>\n",
+              "      <th>support_churn_risk</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>7590-VHVEG</td>\n",
+              "      <td>1</td>\n",
+              "      <td>No</td>\n",
+              "      <td>2</td>\n",
+              "      <td>0.8720</td>\n",
+              "      <td>Low</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>5575-GNVDE</td>\n",
+              "      <td>34</td>\n",
+              "      <td>No</td>\n",
+              "      <td>4</td>\n",
+              "      <td>0.7684</td>\n",
+              "      <td>Medium</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3668-QPYBK</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>15</td>\n",
+              "      <td>-0.5255</td>\n",
+              "      <td>High</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>7795-CFOCW</td>\n",
+              "      <td>45</td>\n",
+              "      <td>No</td>\n",
+              "      <td>3</td>\n",
+              "      <td>0.6597</td>\n",
+              "      <td>Medium</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>9237-HQITU</td>\n",
+              "      <td>2</td>\n",
+              "      <td>Yes</td>\n",
+              "      <td>9</td>\n",
+              "      <td>-0.5255</td>\n",
+              "      <td>High</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "    <div class=\"colab-df-buttons\">\n",
+              "\n",
+              "  <div class=\"colab-df-container\">\n",
+              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25')\"\n",
+              "            title=\"Convert this dataframe to an interactive table.\"\n",
+              "            style=\"display:none;\">\n",
+              "\n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
+              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "\n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-buttons div {\n",
+              "      margin-bottom: 4px;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const buttonEl =\n",
+              "        document.querySelector('#df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25 button.colab-df-convert');\n",
+              "      buttonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function convertToInteractive(key) {\n",
+              "        const element = document.querySelector('#df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25');\n",
+              "        const dataTable =\n",
+              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                    [key], {});\n",
+              "        if (!dataTable) return;\n",
+              "\n",
+              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "          + ' to learn more about interactive tables.';\n",
+              "        element.innerHTML = '';\n",
+              "        dataTable['output_type'] = 'display_data';\n",
+              "        await google.colab.output.renderOutput(dataTable, element);\n",
+              "        const docLink = document.createElement('div');\n",
+              "        docLink.innerHTML = docLinkHtml;\n",
+              "        element.appendChild(docLink);\n",
+              "      }\n",
+              "    </script>\n",
+              "  </div>\n",
+              "\n",
+              "\n",
+              "    </div>\n",
+              "  </div>\n"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "dataframe",
+              "summary": "{\n  \"name\": \"print('=' * 60)\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"customerID\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"5575-GNVDE\",\n          \"9237-HQITU\",\n          \"3668-QPYBK\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tenure\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 21,\n        \"min\": 1,\n        \"max\": 45,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          34,\n          45,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Yes\",\n          \"No\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"support_calls\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 5,\n        \"min\": 2,\n        \"max\": 15,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          4,\n          9\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"sentiment_score\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.7117367821041709,\n        \"min\": -0.5255,\n        \"max\": 0.872,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          0.7684,\n          0.6597\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"support_churn_risk\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          \"Low\",\n          \"Medium\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "============================================================\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "nEAK367ZulFu"
+      },
+      "source": [
+        "---\n",
+        "## 📌 Summary — What Was Done in This Notebook\n",
+        "\n",
+        "| Phase | Action | Result |\n",
+        "|---|---|---|\n",
+        "| **Real-World Data** | Downloaded Telco Churn from Kaggle | 7,043 real customers |\n",
+        "| **Cleaning** | Fixed TotalCharges, encoded Churn | Clean 13-column DataFrame |\n",
+        "| **Synthetic Generation** | Created 7 support variables with VADER | Statistically realistic |\n",
+        "| **Merge** | Combined real + synthetic | 20-column final dataset |\n",
+        "| **Export** | Saved as CSV | `customer_churn_support_dataset.csv` |\n",
+        "\n",
+        "**➡️ Next Step:** Open `2_Churn_Data_Analysis_and_Insights.ipynb` and upload this CSV file."
+      ]
+    }
+  ]
+}
\ No newline at end of file