{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pAyn432WulFm"
      },
      "source": [
        "# 🔍 Notebook 1: Churn Data Creation and Processing\n",
        "## AI for Big Data Management — ESCP Business School\n",
        "### Final Group Project\n",
        "\n",
        "---\n",
        "\n",
        "## 📌 Problem Statement\n",
        "> **\"How can a company predict customer churn based on support interactions and proactively adapt its retention strategy?\"**\n",
        "\n",
        "\n",
        "- We aim to predict and understand customer churn by combining structured telecom data with synthetic behavioral signals derived from customer support interactions.\n",
        "---\n",
        "\n",
        "## 🗺️ Project Pipeline\n",
        "```\n",
        "PROBLEM CREATION → REAL-WORLD DATA PROCESSING → SYNTHETIC DATASET GENERATION → AUTOMATION → WRAP-UP\n",
        "```\n",
        "\n",
        "---\n",
        "\n",
        "## 📋 What This Notebook Does\n",
        "1. **[REAL-WORLD]** Loads the Telco Customer Churn dataset from Kaggle\n",
        "2. **[REAL-WORLD]** Cleans and preprocesses the real data\n",
        "3. **[SYNTHETIC]** Generates realistic support interaction variables\n",
        "4. **[SYNTHETIC]** Creates a merged, enriched final dataset\n",
        "5. Exports `customer_churn_support_dataset.csv` for Notebook 2\n",
        "\n",
        "---\n",
        "\n",
        "### ⚠️ Before Running\n",
        "You need to upload one file:\n",
        "- `WA_Fn-UseC_-Telco-Customer-Churn.csv` (downloaded from Kaggle)\n",
        "\n",
        "Upload it using the 📁 Files panel on the left sidebar in Google Colab.\n",
        "All other data is generated synthetically in this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "acacAe8GulFp"
      },
      "source": [
        "---\n",
        "## 📦 SECTION 1: Install & Import Libraries\n",
        "Run this cell first. It installs VADER for sentiment analysis and imports all necessary libraries."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "p5lEOq-1ulFq",
        "outputId": "09999301-c22f-49db-896f-5344d4c0322d"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ All libraries imported successfully!\n",
            "   pandas  : 2.2.2\n",
            "   numpy   : 2.0.2\n"
          ]
        }
      ],
      "source": [
        "# ── Install required packages ──────────────────────────────────────────────────\n",
        "!pip install vaderSentiment --quiet\n",
        "\n",
        "# ── Standard imports ──────────────────────────────────────────────────────────\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "import random\n",
        "from datetime import datetime, timedelta\n",
        "from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\n",
        "\n",
        "# ── Settings ──────────────────────────────────────────────────────────────────\n",
        "np.random.seed(42)          # reproducibility\n",
        "random.seed(42)\n",
        "pd.set_option('display.max_columns', None)\n",
        "\n",
        "print('✅ All libraries imported successfully!')\n",
        "print(f'   pandas  : {pd.__version__}')\n",
        "print(f'   numpy   : {np.__version__}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hrZYSJs4ulFr"
      },
      "source": [
        "---\n",
        "## 📥 SECTION 2: Load the Real-World Dataset\n",
        "\n",
        "### [REAL-WORLD DATA PROCESSING]\n",
        "\n",
        "**Where to download the dataset:**\n",
        "1. Go to: https://www.kaggle.com/datasets/blastchar/telco-customer-churn\n",
        "2. Click the **Download** button (top right)\n",
        "3. Unzip the file — you will get: `WA_Fn-UseC_-Telco-Customer-Churn.csv`\n",
        "4. In Google Colab, click the 📁 folder icon on the left sidebar\n",
        "5. Click the ⬆️ Upload button and select the CSV file\n",
        "6. Wait for the upload to finish, then run the cell below\n",
        "\n",
        "**Dataset info:** IBM Telco Customer Churn — 7,043 real customers with billing, contract, and service information."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 382
        },
        "id": "2BkVJSBzulFr",
        "outputId": "a11033cd-94ba-4ba8-9387-edfdb4ca27bb"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Dataset loaded successfully!\n",
            "   Shape: 7043 rows × 21 columns\n",
            "\n",
            "📊 First 5 rows:\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "   customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \\\n",
              "0  7590-VHVEG  Female              0     Yes         No       1           No   \n",
              "1  5575-GNVDE    Male              0      No         No      34          Yes   \n",
              "2  3668-QPYBK    Male              0      No         No       2          Yes   \n",
              "3  7795-CFOCW    Male              0      No         No      45           No   \n",
              "4  9237-HQITU  Female              0      No         No       2          Yes   \n",
              "\n",
              "      MultipleLines InternetService OnlineSecurity OnlineBackup  \\\n",
              "0  No phone service             DSL             No          Yes   \n",
              "1                No             DSL            Yes           No   \n",
              "2                No             DSL            Yes          Yes   \n",
              "3  No phone service             DSL            Yes           No   \n",
              "4                No     Fiber optic             No           No   \n",
              "\n",
              "  DeviceProtection TechSupport StreamingTV StreamingMovies        Contract  \\\n",
              "0               No          No          No              No  Month-to-month   \n",
              "1              Yes          No          No              No        One year   \n",
              "2               No          No          No              No  Month-to-month   \n",
              "3              Yes         Yes          No              No        One year   \n",
              "4               No          No          No              No  Month-to-month   \n",
              "\n",
              "  PaperlessBilling              PaymentMethod  MonthlyCharges TotalCharges  \\\n",
              "0              Yes           Electronic check           29.85        29.85   \n",
              "1               No               Mailed check           56.95       1889.5   \n",
              "2              Yes               Mailed check           53.85       108.15   \n",
              "3               No  Bank transfer (automatic)           42.30      1840.75   \n",
              "4              Yes           Electronic check           70.70       151.65   \n",
              "\n",
              "  Churn  \n",
              "0    No  \n",
              "1    No  \n",
              "2   Yes  \n",
              "3    No  \n",
              "4   Yes  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-810f7e37-1f04-4ec4-b86d-f9c91950bd13\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>customerID</th>\n",
              "      <th>gender</th>\n",
              "      <th>SeniorCitizen</th>\n",
              "      <th>Partner</th>\n",
              "      <th>Dependents</th>\n",
              "      <th>tenure</th>\n",
              "      <th>PhoneService</th>\n",
              "      <th>MultipleLines</th>\n",
              "      <th>InternetService</th>\n",
              "      <th>OnlineSecurity</th>\n",
              "      <th>OnlineBackup</th>\n",
              "      <th>DeviceProtection</th>\n",
              "      <th>TechSupport</th>\n",
              "      <th>StreamingTV</th>\n",
              "      <th>StreamingMovies</th>\n",
              "      <th>Contract</th>\n",
              "      <th>PaperlessBilling</th>\n",
              "      <th>PaymentMethod</th>\n",
              "      <th>MonthlyCharges</th>\n",
              "      <th>TotalCharges</th>\n",
              "      <th>Churn</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>7590-VHVEG</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>1</td>\n",
              "      <td>No</td>\n",
              "      <td>No phone service</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Yes</td>\n",
              "      <td>Electronic check</td>\n",
              "      <td>29.85</td>\n",
              "      <td>29.85</td>\n",
              "      <td>No</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>5575-GNVDE</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>34</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>DSL</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>One year</td>\n",
              "      <td>No</td>\n",
              "      <td>Mailed check</td>\n",
              "      <td>56.95</td>\n",
              "      <td>1889.5</td>\n",
              "      <td>No</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3668-QPYBK</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>DSL</td>\n",
              "      <td>Yes</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Yes</td>\n",
              "      <td>Mailed check</td>\n",
              "      <td>53.85</td>\n",
              "      <td>108.15</td>\n",
              "      <td>Yes</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>7795-CFOCW</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>45</td>\n",
              "      <td>No</td>\n",
              "      <td>No phone service</td>\n",
              "      <td>DSL</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>One year</td>\n",
              "      <td>No</td>\n",
              "      <td>Bank transfer (automatic)</td>\n",
              "      <td>42.30</td>\n",
              "      <td>1840.75</td>\n",
              "      <td>No</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>9237-HQITU</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>Fiber optic</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Yes</td>\n",
              "      <td>Electronic check</td>\n",
              "      <td>70.70</td>\n",
              "      <td>151.65</td>\n",
              "      <td>Yes</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-810f7e37-1f04-4ec4-b86d-f9c91950bd13')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-810f7e37-1f04-4ec4-b86d-f9c91950bd13 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-810f7e37-1f04-4ec4-b86d-f9c91950bd13');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "# ── Load the real-world Telco Churn dataset ────────────────────────────────────\n",
        "# If you renamed your file differently, change the filename below\n",
        "DATASET_FILENAME = 'WA_Fn-UseC_-Telco-Customer-Churn.csv'\n",
        "\n",
        "try:\n",
        "    df_real = pd.read_csv(DATASET_FILENAME)\n",
        "    print(f'✅ Dataset loaded successfully!')\n",
        "    print(f'   Shape: {df_real.shape[0]} rows × {df_real.shape[1]} columns')\n",
        "    print(f'\\n📊 First 5 rows:')\n",
        "    display(df_real.head())\n",
        "except FileNotFoundError:\n",
        "    print('❌ ERROR: File not found!')\n",
        "    print('   Please upload the CSV file to Colab (see instructions above).')\n",
        "    print('   File expected:', DATASET_FILENAME)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0ZxflowUulFr",
        "outputId": "d7356987-2ce1-4eff-cf0e-8263f2c1942d"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "📋 Column names:\n",
            "['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']\n",
            "\n",
            "📋 Data types:\n",
            "customerID           object\n",
            "gender               object\n",
            "SeniorCitizen         int64\n",
            "Partner              object\n",
            "Dependents           object\n",
            "tenure                int64\n",
            "PhoneService         object\n",
            "MultipleLines        object\n",
            "InternetService      object\n",
            "OnlineSecurity       object\n",
            "OnlineBackup         object\n",
            "DeviceProtection     object\n",
            "TechSupport          object\n",
            "StreamingTV          object\n",
            "StreamingMovies      object\n",
            "Contract             object\n",
            "PaperlessBilling     object\n",
            "PaymentMethod        object\n",
            "MonthlyCharges      float64\n",
            "TotalCharges         object\n",
            "Churn                object\n",
            "dtype: object\n",
            "\n",
            "📋 Missing values per column:\n",
            "customerID          0\n",
            "gender              0\n",
            "SeniorCitizen       0\n",
            "Partner             0\n",
            "Dependents          0\n",
            "tenure              0\n",
            "PhoneService        0\n",
            "MultipleLines       0\n",
            "InternetService     0\n",
            "OnlineSecurity      0\n",
            "OnlineBackup        0\n",
            "DeviceProtection    0\n",
            "TechSupport         0\n",
            "StreamingTV         0\n",
            "StreamingMovies     0\n",
            "Contract            0\n",
            "PaperlessBilling    0\n",
            "PaymentMethod       0\n",
            "MonthlyCharges      0\n",
            "TotalCharges        0\n",
            "Churn               0\n",
            "dtype: int64\n",
            "\n",
            "📋 Churn distribution (real data):\n",
            "Churn\n",
            "No     5174\n",
            "Yes    1869\n",
            "Name: count, dtype: int64\n",
            "\n",
            "📋 Churn rate (real data): 26.54 %\n"
          ]
        }
      ],
      "source": [
        "# ── Basic exploration of real dataset ─────────────────────────────────────────\n",
        "print('📋 Column names:')\n",
        "print(df_real.columns.tolist())\n",
        "print('\\n📋 Data types:')\n",
        "print(df_real.dtypes)\n",
        "print('\\n📋 Missing values per column:')\n",
        "print(df_real.isnull().sum())\n",
        "print('\\n📋 Churn distribution (real data):')\n",
        "print(df_real['Churn'].value_counts())\n",
        "print('\\n📋 Churn rate (real data):', round(df_real['Churn'].value_counts(normalize=True)['Yes'] * 100, 2), '%')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Qtj_34hvulFr"
      },
      "source": [
        "---\n",
        "## 🧹 SECTION 3: Real-World Data Cleaning\n",
        "\n",
        "### [REAL-WORLD DATA PROCESSING — continued]\n",
        "\n",
        "This section handles:\n",
        "- Converting `TotalCharges` to numeric (it arrives as a string with spaces)\n",
        "- Filling missing values\n",
        "- Encoding the target variable `Churn` as 0/1\n",
        "- Selecting the columns we need"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 729
        },
        "id": "jpJKPzueulFs",
        "outputId": "99e08519-c0e8-495c-eb84-09af613c0a4c"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ TotalCharges NaN values filled with median: 1397.47\n",
            "✅ Churn encoded: Yes=1, No=0\n",
            "\n",
            "✅ SeniorCitizen unique values: [0 1]\n",
            "\n",
            "✅ Cleaned dataset shape: (7043, 14)\n",
            "   Missing values after cleaning:\n",
            "customerID         0\n",
            "gender             0\n",
            "SeniorCitizen      0\n",
            "Partner            0\n",
            "Dependents         0\n",
            "tenure             0\n",
            "Contract           0\n",
            "PaymentMethod      0\n",
            "MonthlyCharges     0\n",
            "TotalCharges       0\n",
            "InternetService    0\n",
            "TechSupport        0\n",
            "Churn              0\n",
            "Churn_binary       0\n",
            "dtype: int64\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_2585/1930681133.py:6: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
            "The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
            "\n",
            "For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
            "\n",
            "\n",
            "  df_real['TotalCharges'].fillna(median_total, inplace=True)\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "   customerID  gender  SeniorCitizen Partner Dependents  tenure  \\\n",
              "0  7590-VHVEG  Female              0     Yes         No       1   \n",
              "1  5575-GNVDE    Male              0      No         No      34   \n",
              "2  3668-QPYBK    Male              0      No         No       2   \n",
              "3  7795-CFOCW    Male              0      No         No      45   \n",
              "4  9237-HQITU  Female              0      No         No       2   \n",
              "\n",
              "         Contract              PaymentMethod  MonthlyCharges  TotalCharges  \\\n",
              "0  Month-to-month           Electronic check           29.85         29.85   \n",
              "1        One year               Mailed check           56.95       1889.50   \n",
              "2  Month-to-month               Mailed check           53.85        108.15   \n",
              "3        One year  Bank transfer (automatic)           42.30       1840.75   \n",
              "4  Month-to-month           Electronic check           70.70        151.65   \n",
              "\n",
              "  InternetService TechSupport Churn  Churn_binary  \n",
              "0             DSL          No    No             0  \n",
              "1             DSL          No    No             0  \n",
              "2             DSL          No   Yes             1  \n",
              "3             DSL         Yes    No             0  \n",
              "4     Fiber optic          No   Yes             1  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-02f93de8-8ca9-4487-b357-345a6f768ecd\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>customerID</th>\n",
              "      <th>gender</th>\n",
              "      <th>SeniorCitizen</th>\n",
              "      <th>Partner</th>\n",
              "      <th>Dependents</th>\n",
              "      <th>tenure</th>\n",
              "      <th>Contract</th>\n",
              "      <th>PaymentMethod</th>\n",
              "      <th>MonthlyCharges</th>\n",
              "      <th>TotalCharges</th>\n",
              "      <th>InternetService</th>\n",
              "      <th>TechSupport</th>\n",
              "      <th>Churn</th>\n",
              "      <th>Churn_binary</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>7590-VHVEG</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>1</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Electronic check</td>\n",
              "      <td>29.85</td>\n",
              "      <td>29.85</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>5575-GNVDE</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>34</td>\n",
              "      <td>One year</td>\n",
              "      <td>Mailed check</td>\n",
              "      <td>56.95</td>\n",
              "      <td>1889.50</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3668-QPYBK</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Mailed check</td>\n",
              "      <td>53.85</td>\n",
              "      <td>108.15</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>7795-CFOCW</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>45</td>\n",
              "      <td>One year</td>\n",
              "      <td>Bank transfer (automatic)</td>\n",
              "      <td>42.30</td>\n",
              "      <td>1840.75</td>\n",
              "      <td>DSL</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>9237-HQITU</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Electronic check</td>\n",
              "      <td>70.70</td>\n",
              "      <td>151.65</td>\n",
              "      <td>Fiber optic</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-02f93de8-8ca9-4487-b357-345a6f768ecd')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-02f93de8-8ca9-4487-b357-345a6f768ecd button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-02f93de8-8ca9-4487-b357-345a6f768ecd');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "summary": "{\n  \"name\": \"display(df_clean\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"customerID\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"5575-GNVDE\",\n          \"9237-HQITU\",\n          \"3668-QPYBK\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"gender\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Male\",\n          \"Female\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"SeniorCitizen\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 0,\n        \"num_unique_values\": 1,\n        \"samples\": [\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Partner\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"No\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Dependents\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"No\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tenure\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 21,\n        \"min\": 1,\n        \"max\": 45,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          34\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Contract\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"One year\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"PaymentMethod\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          \"Electronic check\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"MonthlyCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 15.445573799635934,\n        \"min\": 29.85,\n        \"max\": 70.7,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          56.95\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TotalCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 969.8243111512518,\n        \"min\": 29.85,\n        \"max\": 1889.5,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1889.5\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"InternetService\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Fiber optic\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TechSupport\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Yes\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Yes\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn_binary\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "# ── Step 1: Fix TotalCharges column (arrives as string) ──────────────────────\n",
        "df_real['TotalCharges'] = pd.to_numeric(df_real['TotalCharges'], errors='coerce')\n",
        "\n",
        "# ── Step 2: Fill missing TotalCharges with median ────────────────────────────\n",
        "median_total = df_real['TotalCharges'].median()\n",
        "df_real['TotalCharges'].fillna(median_total, inplace=True)\n",
        "print(f'✅ TotalCharges NaN values filled with median: {median_total:.2f}')\n",
        "\n",
        "# ── Step 3: Encode Churn as 0 / 1 ────────────────────────────────────────────\n",
        "df_real['Churn_binary'] = df_real['Churn'].map({'Yes': 1, 'No': 0})\n",
        "print(f'✅ Churn encoded: Yes=1, No=0')\n",
        "\n",
        "# ── Step 4: Encode SeniorCitizen (already 0/1, but verify) ───────────────────\n",
        "print(f'\\n✅ SeniorCitizen unique values: {df_real[\"SeniorCitizen\"].unique()}')\n",
        "\n",
        "# ── Step 5: Select core columns for our analysis ─────────────────────────────\n",
        "core_cols = [\n",
        "    'customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',\n",
        "    'tenure', 'Contract', 'PaymentMethod', 'MonthlyCharges',\n",
        "    'TotalCharges', 'InternetService', 'TechSupport',\n",
        "    'Churn', 'Churn_binary'\n",
        "]\n",
        "df_clean = df_real[core_cols].copy()\n",
        "\n",
        "print(f'\\n✅ Cleaned dataset shape: {df_clean.shape}')\n",
        "print(f'   Missing values after cleaning:')\n",
        "print(df_clean.isnull().sum())\n",
        "display(df_clean.head())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 317
        },
        "id": "ySxvvy5hulFs",
        "outputId": "5c552cda-2f7f-4088-e11e-0d07e0886538"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "📊 Descriptive statistics (numeric columns):\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "       SeniorCitizen       tenure  MonthlyCharges  TotalCharges  Churn_binary\n",
              "count    7043.000000  7043.000000     7043.000000   7043.000000   7043.000000\n",
              "mean        0.162147    32.371149       64.761692   2281.916928      0.265370\n",
              "std         0.368612    24.559481       30.090047   2265.270398      0.441561\n",
              "min         0.000000     0.000000       18.250000     18.800000      0.000000\n",
              "25%         0.000000     9.000000       35.500000    402.225000      0.000000\n",
              "50%         0.000000    29.000000       70.350000   1397.475000      0.000000\n",
              "75%         0.000000    55.000000       89.850000   3786.600000      1.000000\n",
              "max         1.000000    72.000000      118.750000   8684.800000      1.000000"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>SeniorCitizen</th>\n",
              "      <th>tenure</th>\n",
              "      <th>MonthlyCharges</th>\n",
              "      <th>TotalCharges</th>\n",
              "      <th>Churn_binary</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>7043.000000</td>\n",
              "      <td>7043.000000</td>\n",
              "      <td>7043.000000</td>\n",
              "      <td>7043.000000</td>\n",
              "      <td>7043.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>mean</th>\n",
              "      <td>0.162147</td>\n",
              "      <td>32.371149</td>\n",
              "      <td>64.761692</td>\n",
              "      <td>2281.916928</td>\n",
              "      <td>0.265370</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>std</th>\n",
              "      <td>0.368612</td>\n",
              "      <td>24.559481</td>\n",
              "      <td>30.090047</td>\n",
              "      <td>2265.270398</td>\n",
              "      <td>0.441561</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>min</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>18.250000</td>\n",
              "      <td>18.800000</td>\n",
              "      <td>0.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25%</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>9.000000</td>\n",
              "      <td>35.500000</td>\n",
              "      <td>402.225000</td>\n",
              "      <td>0.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50%</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>29.000000</td>\n",
              "      <td>70.350000</td>\n",
              "      <td>1397.475000</td>\n",
              "      <td>0.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>75%</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>55.000000</td>\n",
              "      <td>89.850000</td>\n",
              "      <td>3786.600000</td>\n",
              "      <td>1.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>max</th>\n",
              "      <td>1.000000</td>\n",
              "      <td>72.000000</td>\n",
              "      <td>118.750000</td>\n",
              "      <td>8684.800000</td>\n",
              "      <td>1.000000</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-8890e666-5ae4-4934-bf59-dbf2a8fcabe5');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "summary": "{\n  \"name\": \"display(df_clean\",\n  \"rows\": 8,\n  \"fields\": [\n    {\n      \"column\": \"SeniorCitizen\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2489.9992387084,\n        \"min\": 0.0,\n        \"max\": 7043.0,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.1621468124378816,\n          1.0,\n          0.36861160561002687\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tenure\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2478.9752758409018,\n        \"min\": 0.0,\n        \"max\": 7043.0,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          32.37114865824223,\n          29.0,\n          7043.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"MonthlyCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2468.7047672837775,\n        \"min\": 18.25,\n        \"max\": 7043.0,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          64.76169246059918,\n          70.35,\n          7043.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"TotalCharges\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 3119.0484860242914,\n        \"min\": 18.8,\n        \"max\": 8684.8,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          2281.9169281556156,\n          1397.475,\n          7043.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn_binary\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2489.939844235915,\n        \"min\": 0.0,\n        \"max\": 7043.0,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.2653698707936959,\n          1.0,\n          0.44156130512195013\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "# ── Descriptive statistics on real data ──────────────────────────────────────\n",
        "print('📊 Descriptive statistics (numeric columns):')\n",
        "display(df_clean.describe())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RpDjH_T6ulFs"
      },
      "source": [
        "---\n",
        "## 🤖 SECTION 4: Synthetic Support Interaction Data Generation\n",
        "\n",
        "### [SYNTHETIC DATASET GENERATION]\n",
        "\n",
        "**Why synthetic data?**  \n",
        "Real telecom datasets do not include detailed support call logs. We simulate realistic support interaction variables that are **statistically correlated** with churn — just as a real company's CRM data would show.\n",
        "\n",
        "**Variables we create:**\n",
        "| Variable | Description |\n",
        "|---|---|\n",
        "| `support_calls` | Number of support calls made in the last 6 months |\n",
        "| `avg_call_duration` | Average call duration in minutes |\n",
        "| `complaint_type` | Type of most frequent complaint |\n",
        "| `days_since_last_contact` | Days since the customer last contacted support |\n",
        "| `last_contact_sentiment` | Text of the last customer feedback |\n",
        "| `sentiment_score` | VADER compound sentiment score |\n",
        "| `support_churn_risk` | Composite risk score (Low / Medium / High) |"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NK03tdXdulFs",
        "outputId": "6a003b55-cad7-4b8e-9ddb-426a7d5f9135"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Feedback templates and complaint types defined.\n"
          ]
        }
      ],
      "source": [
        "# ── Helper: realistic sentiment phrases per churn status ─────────────────────\n",
        "POSITIVE_FEEDBACK = [\n",
        "    \"The support agent was very helpful and solved my issue quickly.\",\n",
        "    \"Great service, no complaints at all!\",\n",
        "    \"Fast resolution. I am happy with the service.\",\n",
        "    \"The team was professional and friendly. Very satisfied.\",\n",
        "    \"Everything was resolved in one call. Excellent experience.\",\n",
        "    \"I love this company, always responsive and caring.\",\n",
        "    \"No issues, service works perfectly. Very happy customer.\"\n",
        "]\n",
        "\n",
        "NEUTRAL_FEEDBACK = [\n",
        "    \"The wait time was long but the issue was eventually resolved.\",\n",
        "    \"Average experience. Could be better.\",\n",
        "    \"Service is okay. Nothing special.\",\n",
        "    \"The agent was polite but the problem took two calls to fix.\",\n",
        "    \"Acceptable support, but I expected faster resolution.\"\n",
        "]\n",
        "\n",
        "NEGATIVE_FEEDBACK = [\n",
        "    \"I have called five times and the problem is still not fixed!\",\n",
        "    \"Terrible service. I am thinking of switching providers.\",\n",
        "    \"The agents are unhelpful and the wait times are ridiculous.\",\n",
        "    \"I am very frustrated. Nobody seems to care about my problem.\",\n",
        "    \"Worst customer service I have ever experienced. Cancelling soon.\",\n",
        "    \"My bill is wrong again. This is the third time this month!\",\n",
        "    \"I am extremely disappointed. No follow-up, no resolution.\"\n",
        "]\n",
        "\n",
        "COMPLAINT_TYPES = [\n",
        "    'Billing Issue', 'Service Outage', 'Speed/Performance',\n",
        "    'Contract Dispute', 'Technical Failure', 'Overcharge'\n",
        "]\n",
        "\n",
        "print('✅ Feedback templates and complaint types defined.')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EIzwa0U3ulFs",
        "outputId": "0aa0c944-1e0d-4df3-cc48-15acc7a590ae"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Synthetic support variables generated!\n",
            "   support_calls range    : 1 – 15\n",
            "   avg_call_duration range: 3.0 – 35.0 min\n",
            "   Sample complaint types : {'Contract Dispute', 'Billing Issue', 'Technical Failure', 'Service Outage'}\n"
          ]
        }
      ],
      "source": [
        "# ── Generate synthetic support variables ──────────────────────────────────────\n",
        "n = len(df_clean)\n",
        "churn_flag = df_clean['Churn_binary'].values\n",
        "\n",
        "# support_calls: churners call more (5-15 calls), non-churners call less (1-6)\n",
        "support_calls = np.where(\n",
        "    churn_flag == 1,\n",
        "    np.random.randint(5, 16, n),\n",
        "    np.random.randint(1, 7,  n)\n",
        ")\n",
        "\n",
        "# avg_call_duration: churners have longer calls (frustration)\n",
        "avg_call_duration = np.where(\n",
        "    churn_flag == 1,\n",
        "    np.round(np.random.uniform(12, 35, n), 1),\n",
        "    np.round(np.random.uniform(3,  15, n), 1)\n",
        ")\n",
        "\n",
        "# complaint_type: random, but churners have heavier billing/contract issues\n",
        "def pick_complaint(is_churner):\n",
        "    if is_churner:\n",
        "        # weight toward billing and contract disputes\n",
        "        weights = [0.30, 0.15, 0.15, 0.20, 0.10, 0.10]\n",
        "    else:\n",
        "        weights = [0.20, 0.20, 0.20, 0.10, 0.20, 0.10]\n",
        "    return random.choices(COMPLAINT_TYPES, weights=weights, k=1)[0]\n",
        "\n",
        "complaint_type = [pick_complaint(c) for c in churn_flag]\n",
        "\n",
        "# days_since_last_contact: churners contacted recently (about to leave)\n",
        "days_since_last_contact = np.where(\n",
        "    churn_flag == 1,\n",
        "    np.random.randint(1,  30, n),\n",
        "    np.random.randint(15, 90, n)\n",
        ")\n",
        "\n",
        "# last_contact_sentiment: text phrase matching churn likelihood\n",
        "def pick_sentiment_text(is_churner):\n",
        "    if is_churner:\n",
        "        # 70% negative, 20% neutral, 10% positive\n",
        "        pool = (NEGATIVE_FEEDBACK * 7) + (NEUTRAL_FEEDBACK * 2) + (POSITIVE_FEEDBACK * 1)\n",
        "    else:\n",
        "        # 10% negative, 20% neutral, 70% positive\n",
        "        pool = (POSITIVE_FEEDBACK * 7) + (NEUTRAL_FEEDBACK * 2) + (NEGATIVE_FEEDBACK * 1)\n",
        "    return random.choice(pool)\n",
        "\n",
        "last_contact_sentiment = [pick_sentiment_text(c) for c in churn_flag]\n",
        "\n",
        "print('✅ Synthetic support variables generated!')\n",
        "print(f'   support_calls range    : {support_calls.min()} – {support_calls.max()}')\n",
        "print(f'   avg_call_duration range: {avg_call_duration.min()} – {avg_call_duration.max()} min')\n",
        "print(f'   Sample complaint types : {set(complaint_type[:10])}')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Yak2n7gpulFt",
        "outputId": "ff972046-295c-44c0-c4bb-e0d6c59cb165"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ VADER sentiment scores computed!\n",
            "   Score range: -0.710 to 0.872\n",
            "   Mean score : 0.307\n"
          ]
        }
      ],
      "source": [
        "# ── Compute VADER sentiment score ─────────────────────────────────────────────\n",
        "analyzer = SentimentIntensityAnalyzer()\n",
        "\n",
        "def get_compound_score(text):\n",
        "    \"\"\"Return VADER compound score: -1 (most negative) to +1 (most positive)\"\"\"\n",
        "    return analyzer.polarity_scores(text)['compound']\n",
        "\n",
        "sentiment_score = [get_compound_score(text) for text in last_contact_sentiment]\n",
        "\n",
        "print('✅ VADER sentiment scores computed!')\n",
        "print(f'   Score range: {min(sentiment_score):.3f} to {max(sentiment_score):.3f}')\n",
        "print(f'   Mean score : {np.mean(sentiment_score):.3f}')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KJ3wGj9mulFt",
        "outputId": "6b8e8fd1-6eaf-4f77-8e37-6218bea1e414"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ support_churn_risk categories created!\n",
            "   Distribution: Counter({'Medium': 4166, 'Low': 1471, 'High': 1406})\n"
          ]
        }
      ],
      "source": [
        "# ── Compute composite support_churn_risk ──────────────────────────────────────\n",
        "# Logic:\n",
        "#   HIGH   = many calls (>=6)  AND negative sentiment (<= -0.3)\n",
        "#   MEDIUM = moderate calls (3-5) OR somewhat negative (-0.3 to 0)\n",
        "#   LOW    = everything else\n",
        "\n",
        "def compute_risk(calls, score):\n",
        "    if calls >= 6 and score <= -0.3:\n",
        "        return 'High'\n",
        "    elif calls >= 3 or score <= 0.0:\n",
        "        return 'Medium'\n",
        "    else:\n",
        "        return 'Low'\n",
        "\n",
        "support_churn_risk = [\n",
        "    compute_risk(c, s) for c, s in zip(support_calls, sentiment_score)\n",
        "]\n",
        "\n",
        "print('✅ support_churn_risk categories created!')\n",
        "from collections import Counter\n",
        "print('   Distribution:', Counter(support_churn_risk))"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Synthetic Data Design & Assumptions\n",
        "\n",
        "### Why Synthetic Data Was Created\n",
        "\n",
        "The original Telco dataset does not include detailed information about customer support interactions or behavioral signals such as sentiment. However, in real-world business settings, these factors play a critical role in customer churn.\n",
        "\n",
        "To better approximate real-world conditions, we generated synthetic variables representing:\n",
        "- number of support calls\n",
        "- average call duration\n",
        "- complaint type\n",
        "- time since last interaction\n",
        "- sentiment score\n",
        "- support-based churn risk\n",
        "\n",
        "These variables allow us to enrich the dataset and create a more realistic narrative around customer experience and dissatisfaction.\n",
        "\n",
        "---\n",
        "\n",
        "### Key Assumptions\n",
        "\n",
        "The synthetic data generation is based on the following assumptions:\n",
        "\n",
        "- Customers who churn tend to have **more frequent support interactions**\n",
        "- Negative experiences lead to **lower sentiment scores**\n",
        "- Certain complaint types (e.g., technical failures) are more strongly associated with dissatisfaction\n",
        "- Customers with unresolved issues are more likely to churn\n",
        "\n",
        "These assumptions are grounded in typical telecom business logic but are not directly observed in the original dataset.\n",
        "\n",
        "---\n",
        "\n",
        "### Limitations of Synthetic Data\n",
        "\n",
        "Because these variables are artificially generated:\n",
        "- They may introduce **bias toward expected relationships**\n",
        "- They are not independent of the churn outcome\n",
        "- They may **overstate model performance**\n",
        "\n",
        "Therefore, results should be interpreted as illustrative rather than fully generalizable."
      ],
      "metadata": {
        "id": "zvVoEV0Fl77_"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7TZ0lxJQulFt"
      },
      "source": [
        "---\n",
        "## 🔗 SECTION 5: Merge Real + Synthetic Data\n",
        "\n",
        "We now combine the cleaned real-world dataset with our synthetic support variables into one rich dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 330
        },
        "id": "7tMGLUqvulFt",
        "outputId": "d5013de1-0912-46b4-886e-e077adc64ac3"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Final merged dataset: 7043 rows × 21 columns\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "   customerID  gender  SeniorCitizen Partner Dependents  tenure  \\\n",
              "0  7590-VHVEG  Female              0     Yes         No       1   \n",
              "1  5575-GNVDE    Male              0      No         No      34   \n",
              "2  3668-QPYBK    Male              0      No         No       2   \n",
              "3  7795-CFOCW    Male              0      No         No      45   \n",
              "4  9237-HQITU  Female              0      No         No       2   \n",
              "\n",
              "         Contract              PaymentMethod  MonthlyCharges  TotalCharges  \\\n",
              "0  Month-to-month           Electronic check           29.85         29.85   \n",
              "1        One year               Mailed check           56.95       1889.50   \n",
              "2  Month-to-month               Mailed check           53.85        108.15   \n",
              "3        One year  Bank transfer (automatic)           42.30       1840.75   \n",
              "4  Month-to-month           Electronic check           70.70        151.65   \n",
              "\n",
              "  InternetService TechSupport Churn  Churn_binary  support_calls  \\\n",
              "0             DSL          No    No             0              2   \n",
              "1             DSL          No    No             0              4   \n",
              "2             DSL          No   Yes             1             15   \n",
              "3             DSL         Yes    No             0              3   \n",
              "4     Fiber optic          No   Yes             1              9   \n",
              "\n",
              "   avg_call_duration    complaint_type  days_since_last_contact  \\\n",
              "0                9.7  Contract Dispute                       16   \n",
              "1                3.8     Billing Issue                       57   \n",
              "2               21.2     Billing Issue                       12   \n",
              "3                9.8    Service Outage                       40   \n",
              "4               12.3  Contract Dispute                       24   \n",
              "\n",
              "                              last_contact_sentiment  sentiment_score  \\\n",
              "0  I love this company, always responsive and car...           0.8720   \n",
              "1               Great service, no complaints at all!           0.7684   \n",
              "2  My bill is wrong again. This is the third time...          -0.5255   \n",
              "3  Everything was resolved in one call. Excellent...           0.6597   \n",
              "4  My bill is wrong again. This is the third time...          -0.5255   \n",
              "\n",
              "  support_churn_risk  \n",
              "0                Low  \n",
              "1             Medium  \n",
              "2               High  \n",
              "3             Medium  \n",
              "4               High  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-72891642-6b1c-4c46-9a3a-adb94778721a\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>customerID</th>\n",
              "      <th>gender</th>\n",
              "      <th>SeniorCitizen</th>\n",
              "      <th>Partner</th>\n",
              "      <th>Dependents</th>\n",
              "      <th>tenure</th>\n",
              "      <th>Contract</th>\n",
              "      <th>PaymentMethod</th>\n",
              "      <th>MonthlyCharges</th>\n",
              "      <th>TotalCharges</th>\n",
              "      <th>InternetService</th>\n",
              "      <th>TechSupport</th>\n",
              "      <th>Churn</th>\n",
              "      <th>Churn_binary</th>\n",
              "      <th>support_calls</th>\n",
              "      <th>avg_call_duration</th>\n",
              "      <th>complaint_type</th>\n",
              "      <th>days_since_last_contact</th>\n",
              "      <th>last_contact_sentiment</th>\n",
              "      <th>sentiment_score</th>\n",
              "      <th>support_churn_risk</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>7590-VHVEG</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>1</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Electronic check</td>\n",
              "      <td>29.85</td>\n",
              "      <td>29.85</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>9.7</td>\n",
              "      <td>Contract Dispute</td>\n",
              "      <td>16</td>\n",
              "      <td>I love this company, always responsive and car...</td>\n",
              "      <td>0.8720</td>\n",
              "      <td>Low</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>5575-GNVDE</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>34</td>\n",
              "      <td>One year</td>\n",
              "      <td>Mailed check</td>\n",
              "      <td>56.95</td>\n",
              "      <td>1889.50</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>0</td>\n",
              "      <td>4</td>\n",
              "      <td>3.8</td>\n",
              "      <td>Billing Issue</td>\n",
              "      <td>57</td>\n",
              "      <td>Great service, no complaints at all!</td>\n",
              "      <td>0.7684</td>\n",
              "      <td>Medium</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3668-QPYBK</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Mailed check</td>\n",
              "      <td>53.85</td>\n",
              "      <td>108.15</td>\n",
              "      <td>DSL</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>1</td>\n",
              "      <td>15</td>\n",
              "      <td>21.2</td>\n",
              "      <td>Billing Issue</td>\n",
              "      <td>12</td>\n",
              "      <td>My bill is wrong again. This is the third time...</td>\n",
              "      <td>-0.5255</td>\n",
              "      <td>High</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>7795-CFOCW</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>45</td>\n",
              "      <td>One year</td>\n",
              "      <td>Bank transfer (automatic)</td>\n",
              "      <td>42.30</td>\n",
              "      <td>1840.75</td>\n",
              "      <td>DSL</td>\n",
              "      <td>Yes</td>\n",
              "      <td>No</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>9.8</td>\n",
              "      <td>Service Outage</td>\n",
              "      <td>40</td>\n",
              "      <td>Everything was resolved in one call. Excellent...</td>\n",
              "      <td>0.6597</td>\n",
              "      <td>Medium</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>9237-HQITU</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>No</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>Month-to-month</td>\n",
              "      <td>Electronic check</td>\n",
              "      <td>70.70</td>\n",
              "      <td>151.65</td>\n",
              "      <td>Fiber optic</td>\n",
              "      <td>No</td>\n",
              "      <td>Yes</td>\n",
              "      <td>1</td>\n",
              "      <td>9</td>\n",
              "      <td>12.3</td>\n",
              "      <td>Contract Dispute</td>\n",
              "      <td>24</td>\n",
              "      <td>My bill is wrong again. This is the third time...</td>\n",
              "      <td>-0.5255</td>\n",
              "      <td>High</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-72891642-6b1c-4c46-9a3a-adb94778721a')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-72891642-6b1c-4c46-9a3a-adb94778721a button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-72891642-6b1c-4c46-9a3a-adb94778721a');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "# ── Build the synthetic support DataFrame ────────────────────────────────────\n",
        "df_support = pd.DataFrame({\n",
        "    'support_calls'          : support_calls,\n",
        "    'avg_call_duration'      : avg_call_duration,\n",
        "    'complaint_type'         : complaint_type,\n",
        "    'days_since_last_contact': days_since_last_contact,\n",
        "    'last_contact_sentiment' : last_contact_sentiment,\n",
        "    'sentiment_score'        : sentiment_score,\n",
        "    'support_churn_risk'     : support_churn_risk\n",
        "})\n",
        "\n",
        "# ── Merge with real-world data (index-aligned) ────────────────────────────────\n",
        "df_final = pd.concat([df_clean.reset_index(drop=True), df_support], axis=1)\n",
        "\n",
        "print(f'✅ Final merged dataset: {df_final.shape[0]} rows × {df_final.shape[1]} columns')\n",
        "display(df_final.head())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UAW-XT66ulFu",
        "outputId": "32243187-b541-40b2-8069-894cbe430f8b"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "📊 Mean support_calls by Churn:\n",
            "Churn\n",
            "No      3.47\n",
            "Yes    10.02\n",
            "Name: support_calls, dtype: float64\n",
            "\n",
            "📊 Mean sentiment_score by Churn:\n",
            "Churn\n",
            "No     0.519\n",
            "Yes   -0.279\n",
            "Name: sentiment_score, dtype: float64\n",
            "\n",
            "📊 support_churn_risk vs Churn crosstab:\n",
            "Churn                  No    Yes\n",
            "support_churn_risk              \n",
            "High                0.080  0.920\n",
            "Low                 1.000  0.000\n",
            "Medium              0.862  0.138\n"
          ]
        }
      ],
      "source": [
        "# ── Verification: check correlations make sense ──────────────────────────────\n",
        "print('📊 Mean support_calls by Churn:')\n",
        "print(df_final.groupby('Churn')['support_calls'].mean().round(2))\n",
        "\n",
        "print('\\n📊 Mean sentiment_score by Churn:')\n",
        "print(df_final.groupby('Churn')['sentiment_score'].mean().round(3))\n",
        "\n",
        "print('\\n📊 support_churn_risk vs Churn crosstab:')\n",
        "print(pd.crosstab(df_final['support_churn_risk'], df_final['Churn'], normalize='index').round(3))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "27am_r6QulFu"
      },
      "source": [
        "---\n",
        "## 💾 SECTION 6: Export Final Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EfM7zP4XulFu",
        "outputId": "5f76e254-0dc9-49ea-c6ff-3c202b00c78f"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Dataset exported: customer_churn_support_dataset.csv\n",
            "   Rows    : 7043\n",
            "   Columns : 21\n",
            "\n",
            "📋 Final column list:\n",
            "   • customerID\n",
            "   • gender\n",
            "   • SeniorCitizen\n",
            "   • Partner\n",
            "   • Dependents\n",
            "   • tenure\n",
            "   • Contract\n",
            "   • PaymentMethod\n",
            "   • MonthlyCharges\n",
            "   • TotalCharges\n",
            "   • InternetService\n",
            "   • TechSupport\n",
            "   • Churn\n",
            "   • Churn_binary\n",
            "   • support_calls\n",
            "   • avg_call_duration\n",
            "   • complaint_type\n",
            "   • days_since_last_contact\n",
            "   • last_contact_sentiment\n",
            "   • sentiment_score\n",
            "   • support_churn_risk\n"
          ]
        }
      ],
      "source": [
        "# ── Export to CSV ─────────────────────────────────────────────────────────────\n",
        "OUTPUT_FILENAME = 'customer_churn_support_dataset.csv'\n",
        "df_final.to_csv(OUTPUT_FILENAME, index=False)\n",
        "\n",
        "print(f'✅ Dataset exported: {OUTPUT_FILENAME}')\n",
        "print(f'   Rows    : {df_final.shape[0]}')\n",
        "print(f'   Columns : {df_final.shape[1]}')\n",
        "print(f'\\n📋 Final column list:')\n",
        "for col in df_final.columns:\n",
        "    print(f'   • {col}')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "id": "HFWnx9zlulFu",
        "outputId": "a9084d14-05d9-4cb2-98c6-a25491f9b6b1"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ],
            "application/javascript": [
              "\n",
              "    async function download(id, filename, size) {\n",
              "      if (!google.colab.kernel.accessAllowed) {\n",
              "        return;\n",
              "      }\n",
              "      const div = document.createElement('div');\n",
              "      const label = document.createElement('label');\n",
              "      label.textContent = `Downloading \"${filename}\": `;\n",
              "      div.appendChild(label);\n",
              "      const progress = document.createElement('progress');\n",
              "      progress.max = size;\n",
              "      div.appendChild(progress);\n",
              "      document.body.appendChild(div);\n",
              "\n",
              "      const buffers = [];\n",
              "      let downloaded = 0;\n",
              "\n",
              "      const channel = await google.colab.kernel.comms.open(id);\n",
              "      // Send a message to notify the kernel that we're ready.\n",
              "      channel.send({})\n",
              "\n",
              "      for await (const message of channel.messages) {\n",
              "        // Send a message to notify the kernel that we're ready.\n",
              "        channel.send({})\n",
              "        if (message.buffers) {\n",
              "          for (const buffer of message.buffers) {\n",
              "            buffers.push(buffer);\n",
              "            downloaded += buffer.byteLength;\n",
              "            progress.value = downloaded;\n",
              "          }\n",
              "        }\n",
              "      }\n",
              "      const blob = new Blob(buffers, {type: 'application/binary'});\n",
              "      const a = document.createElement('a');\n",
              "      a.href = window.URL.createObjectURL(blob);\n",
              "      a.download = filename;\n",
              "      div.appendChild(a);\n",
              "      a.click();\n",
              "      div.remove();\n",
              "    }\n",
              "  "
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ],
            "application/javascript": [
              "download(\"download_160eb193-7480-4c6e-90ba-3271caaea521\", \"customer_churn_support_dataset.csv\", 1309271)"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Download triggered! Check your Downloads folder.\n"
          ]
        }
      ],
      "source": [
        "# ── Download the file to your computer ───────────────────────────────────────\n",
        "from google.colab import files\n",
        "files.download(OUTPUT_FILENAME)\n",
        "print('✅ Download triggered! Check your Downloads folder.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wr8dQ0YQulFu"
      },
      "source": [
        "---\n",
        "## ✅ SECTION 7: Final Verification Checks"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FK7d-FE2ulFu",
        "outputId": "4e8cb3b5-08f8-4e0a-efd6-97b738be9181"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "============================================================\n",
            "         FINAL DATASET VERIFICATION REPORT\n",
            "============================================================\n",
            "\n",
            "✅ Shape            : (7043, 21)\n",
            "✅ Missing values   : 0 total\n",
            "✅ Churn rate       : 26.5%\n",
            "✅ Risk categories  : {'Medium': 4166, 'Low': 1471, 'High': 1406}\n",
            "✅ Complaint types  : 6 unique\n",
            "✅ Sentiment range  : -0.710 to 0.872\n",
            "\n",
            "============================================================\n",
            "   ✅ Notebook 1 COMPLETE! Proceed to Notebook 2.\n",
            "============================================================\n"
          ]
        }
      ],
      "source": [
        "# ── Final verification ────────────────────────────────────────────────────────\n",
        "print('=' * 60)\n",
        "print('         FINAL DATASET VERIFICATION REPORT')\n",
        "print('=' * 60)\n",
        "\n",
        "df_verify = pd.read_csv(OUTPUT_FILENAME)\n",
        "\n",
        "print(f'\\n✅ Shape            : {df_verify.shape}')\n",
        "print(f'✅ Missing values   : {df_verify.isnull().sum().sum()} total')\n",
        "print(f'✅ Churn rate       : {df_verify[\"Churn_binary\"].mean()*100:.1f}%')\n",
        "print(f'✅ Risk categories  : {df_verify[\"support_churn_risk\"].value_counts().to_dict()}')\n",
        "print(f'✅ Complaint types  : {df_verify[\"complaint_type\"].nunique()} unique')\n",
        "print(f'✅ Sentiment range  : {df_verify[\"sentiment_score\"].min():.3f} to {df_verify[\"sentiment_score\"].max():.3f}')\n",
        "\n",
        "print('\\n' + '=' * 60)\n",
        "print('   ✅ Notebook 1 COMPLETE! Proceed to Notebook 2.')\n",
        "print('=' * 60)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## SECTION 8: Final Dataset Summary"
      ],
      "metadata": {
        "id": "7KUQ8dwMa9pK"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print('=' * 60)\n",
        "print('        NOTEBOOK 1 — FINAL DATASET SUMMARY')\n",
        "print('=' * 60)\n",
        "\n",
        "print(f'\\n📐 Shape: {df_final.shape[0]} rows × {df_final.shape[1]} columns')\n",
        "\n",
        "print('\\n📋 Column overview:')\n",
        "for col in df_final.columns:\n",
        "    dtype = str(df_final[col].dtype)\n",
        "    sample = str(df_final[col].iloc[0])[:40]\n",
        "    print(f'   {col:<30} [{dtype:<10}]  e.g. {sample}')\n",
        "\n",
        "print('\\n📊 Real vs Synthetic columns:')\n",
        "real_cols = ['customerID','gender','SeniorCitizen','Partner','Dependents',\n",
        "             'tenure','Contract','PaymentMethod','MonthlyCharges',\n",
        "             'TotalCharges','InternetService','TechSupport','Churn','Churn_binary']\n",
        "synthetic_cols = ['support_calls','avg_call_duration','complaint_type',\n",
        "                  'days_since_last_contact','last_contact_sentiment',\n",
        "                  'sentiment_score','support_churn_risk']\n",
        "\n",
        "print(f'   ✅ Real-world columns  ({len(real_cols)}): {real_cols}')\n",
        "print(f'   🤖 Synthetic columns   ({len(synthetic_cols)}): {synthetic_cols}')\n",
        "\n",
        "print('\\n📊 Sample rows:')\n",
        "display(df_final[['customerID','tenure','Churn','support_calls',\n",
        "                   'sentiment_score','support_churn_risk']].head(5))\n",
        "\n",
        "\n",
        "print('=' * 60)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 833
        },
        "id": "XpmZTW0RahxT",
        "outputId": "10a7c450-6a35-447c-ca9f-7db0aab8d5ce"
      },
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "============================================================\n",
            "        NOTEBOOK 1 — FINAL DATASET SUMMARY\n",
            "============================================================\n",
            "\n",
            "📐 Shape: 7043 rows × 21 columns\n",
            "\n",
            "📋 Column overview:\n",
            "   customerID                     [object    ]  e.g. 7590-VHVEG\n",
            "   gender                         [object    ]  e.g. Female\n",
            "   SeniorCitizen                  [int64     ]  e.g. 0\n",
            "   Partner                        [object    ]  e.g. Yes\n",
            "   Dependents                     [object    ]  e.g. No\n",
            "   tenure                         [int64     ]  e.g. 1\n",
            "   Contract                       [object    ]  e.g. Month-to-month\n",
            "   PaymentMethod                  [object    ]  e.g. Electronic check\n",
            "   MonthlyCharges                 [float64   ]  e.g. 29.85\n",
            "   TotalCharges                   [float64   ]  e.g. 29.85\n",
            "   InternetService                [object    ]  e.g. DSL\n",
            "   TechSupport                    [object    ]  e.g. No\n",
            "   Churn                          [object    ]  e.g. No\n",
            "   Churn_binary                   [int64     ]  e.g. 0\n",
            "   support_calls                  [int64     ]  e.g. 2\n",
            "   avg_call_duration              [float64   ]  e.g. 9.7\n",
            "   complaint_type                 [object    ]  e.g. Contract Dispute\n",
            "   days_since_last_contact        [int64     ]  e.g. 16\n",
            "   last_contact_sentiment         [object    ]  e.g. I love this company, always responsive a\n",
            "   sentiment_score                [float64   ]  e.g. 0.872\n",
            "   support_churn_risk             [object    ]  e.g. Low\n",
            "\n",
            "📊 Real vs Synthetic columns:\n",
            "   ✅ Real-world columns  (14): ['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'Contract', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'InternetService', 'TechSupport', 'Churn', 'Churn_binary']\n",
            "   🤖 Synthetic columns   (7): ['support_calls', 'avg_call_duration', 'complaint_type', 'days_since_last_contact', 'last_contact_sentiment', 'sentiment_score', 'support_churn_risk']\n",
            "\n",
            "📊 Sample rows:\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "   customerID  tenure Churn  support_calls  sentiment_score support_churn_risk\n",
              "0  7590-VHVEG       1    No              2           0.8720                Low\n",
              "1  5575-GNVDE      34    No              4           0.7684             Medium\n",
              "2  3668-QPYBK       2   Yes             15          -0.5255               High\n",
              "3  7795-CFOCW      45    No              3           0.6597             Medium\n",
              "4  9237-HQITU       2   Yes              9          -0.5255               High"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>customerID</th>\n",
              "      <th>tenure</th>\n",
              "      <th>Churn</th>\n",
              "      <th>support_calls</th>\n",
              "      <th>sentiment_score</th>\n",
              "      <th>support_churn_risk</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>7590-VHVEG</td>\n",
              "      <td>1</td>\n",
              "      <td>No</td>\n",
              "      <td>2</td>\n",
              "      <td>0.8720</td>\n",
              "      <td>Low</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>5575-GNVDE</td>\n",
              "      <td>34</td>\n",
              "      <td>No</td>\n",
              "      <td>4</td>\n",
              "      <td>0.7684</td>\n",
              "      <td>Medium</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3668-QPYBK</td>\n",
              "      <td>2</td>\n",
              "      <td>Yes</td>\n",
              "      <td>15</td>\n",
              "      <td>-0.5255</td>\n",
              "      <td>High</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>7795-CFOCW</td>\n",
              "      <td>45</td>\n",
              "      <td>No</td>\n",
              "      <td>3</td>\n",
              "      <td>0.6597</td>\n",
              "      <td>Medium</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>9237-HQITU</td>\n",
              "      <td>2</td>\n",
              "      <td>Yes</td>\n",
              "      <td>9</td>\n",
              "      <td>-0.5255</td>\n",
              "      <td>High</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-2f56388d-2ac4-4fca-8c20-96c9a04bcc25');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "summary": "{\n  \"name\": \"print('=' * 60)\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"customerID\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"5575-GNVDE\",\n          \"9237-HQITU\",\n          \"3668-QPYBK\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tenure\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 21,\n        \"min\": 1,\n        \"max\": 45,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          34,\n          45,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Churn\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Yes\",\n          \"No\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"support_calls\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 5,\n        \"min\": 2,\n        \"max\": 15,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          4,\n          9\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"sentiment_score\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.7117367821041709,\n        \"min\": -0.5255,\n        \"max\": 0.872,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          0.7684,\n          0.6597\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"support_churn_risk\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          \"Low\",\n          \"Medium\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "============================================================\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nEAK367ZulFu"
      },
      "source": [
        "---\n",
        "## 📌 Summary — What Was Done in This Notebook\n",
        "\n",
        "| Phase | Action | Result |\n",
        "|---|---|---|\n",
        "| **Real-World Data** | Downloaded Telco Churn from Kaggle | 7,043 real customers |\n",
        "| **Cleaning** | Fixed TotalCharges, encoded Churn | Clean 13-column DataFrame |\n",
        "| **Synthetic Generation** | Created 7 support variables with VADER | Statistically realistic |\n",
        "| **Merge** | Combined real + synthetic | 20-column final dataset |\n",
        "| **Export** | Saved as CSV | `customer_churn_support_dataset.csv` |\n",
        "\n",
        "**➡️ Next Step:** Open `2_Churn_Data_Analysis_and_Insights.ipynb` and upload this CSV file."
      ]
    }
  ]
}