Spaces:

AashishAIHub
/

DataScience

Running

File size: 7,970 Bytes

854c114

{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 03 - Logistic Regression\n",
                "\n",
                "Welcome to Module 03! Today we dive into **Logistic Regression**, the go-to algorithm for binary classification.\n",
                "\n",
                "### Resources:\n",
                "Refer to the **[Logistic Regression Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub to understand the Sigmoid function and how probability thresholds work.\n",
                "\n",
                "### Objectives:\n",
                "1. **Scaling**: Understand why feature scaling is important.\n",
                "2. **Classification**: Distinguish between regression and classification.\n",
                "3. **Performance Metrics**: Learn how to interpret a Confusion Matrix and ROC Curve.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use the **Breast Cancer Wisconsin** dataset from Scikit-Learn."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "import numpy as np\n",
                "import matplotlib.pyplot as plt\n",
                "import seaborn as sns\n",
                "from sklearn.datasets import load_breast_cancer\n",
                "from sklearn.model_selection import train_test_split\n",
                "from sklearn.preprocessing import StandardScaler\n",
                "from sklearn.linear_model import LogisticRegression\n",
                "from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, roc_curve, auc\n",
                "\n",
                "# Load dataset\n",
                "data = load_breast_cancer()\n",
                "df = pd.DataFrame(data.data, columns=data.feature_names)\n",
                "df['target'] = data.target\n",
                "\n",
                "print(\"Dataset Shape:\", df.shape)\n",
                "df.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Preprocessing\n",
                "\n",
                "### Task 1: Train-Test Split\n",
                "Split the data (X, y) with a test size of 0.25."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "X = df.drop('target', axis=1)\n",
                "y = df['target']\n",
                "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Task 2: Standard Scaling\n",
                "Scale the features using `StandardScaler`.\n",
                "\n",
                "*Web Reference: Check the [Scaling Demo](https://aashishgarg13.github.io/DataScience/feature-engineering/) to see visual differences between Standard and MinMax scalers.*"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "scaler = StandardScaler()\n",
                "X_train_scaled = scaler.fit_transform(X_train)\n",
                "X_test_scaled = scaler.transform(X_test)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Modeling\n",
                "\n",
                "### Task 3: Training\n",
                "Initialize and fit the `LogisticRegression` model."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "model = LogisticRegression()\n",
                "model.fit(X_train_scaled, y_train)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 4. Evaluation\n",
                "\n",
                "### Task 4: Confusion Matrix & ROC Curve\n",
                "Plot the confusion matrix and calculate the ROC-AUC score.\n",
                "\n",
                "*Web Reference: [Model Evaluation Interactive](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)*"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "y_pred = model.predict(X_test_scaled)\n",
                "cm = confusion_matrix(y_test, y_pred)\n",
                "sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')\n",
                "plt.title('Confusion Matrix')\n",
                "plt.show()\n",
                "\n",
                "print(classification_report(y_test, y_pred))\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Excellent Work! \n",
                "You've mastered Logistic Regression basics and integrated it with your website resources.\n",
                "In the next module, we move to non-linear models: **Decision Trees and Random Forests**."
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.8.0"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}