{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 03 - Logistic Regression\n", "\n", "Welcome to Module 03! Today we dive into **Logistic Regression**, the go-to algorithm for binary classification.\n", "\n", "### Resources:\n", "Refer to the **[Logistic Regression Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub to understand the Sigmoid function and how probability thresholds work.\n", "\n", "### Objectives:\n", "1. **Scaling**: Understand why feature scaling is important.\n", "2. **Classification**: Distinguish between regression and classification.\n", "3. **Performance Metrics**: Learn how to interpret a Confusion Matrix and ROC Curve.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "We will use the **Breast Cancer Wisconsin** dataset from Scikit-Learn." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.datasets import load_breast_cancer\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, roc_curve, auc\n", "\n", "# Load dataset\n", "data = load_breast_cancer()\n", "df = pd.DataFrame(data.data, columns=data.feature_names)\n", "df['target'] = data.target\n", "\n", "print(\"Dataset Shape:\", df.shape)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Preprocessing\n", "\n", "### Task 1: Train-Test Split\n", "Split the data (X, y) with a test size of 0.25." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "X = df.drop('target', axis=1)\n", "y = df['target']\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2: Standard Scaling\n", "Scale the features using `StandardScaler`.\n", "\n", "*Web Reference: Check the [Scaling Demo](https://aashishgarg13.github.io/DataScience/feature-engineering/) to see visual differences between Standard and MinMax scalers.*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "scaler = StandardScaler()\n", "X_train_scaled = scaler.fit_transform(X_train)\n", "X_test_scaled = scaler.transform(X_test)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Modeling\n", "\n", "### Task 3: Training\n", "Initialize and fit the `LogisticRegression` model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "model = LogisticRegression()\n", "model.fit(X_train_scaled, y_train)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Evaluation\n", "\n", "### Task 4: Confusion Matrix & ROC Curve\n", "Plot the confusion matrix and calculate the ROC-AUC score.\n", "\n", "*Web Reference: [Model Evaluation Interactive](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "y_pred = model.predict(X_test_scaled)\n", "cm = confusion_matrix(y_test, y_pred)\n", "sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')\n", "plt.title('Confusion Matrix')\n", "plt.show()\n", "\n", "print(classification_report(y_test, y_pred))\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Excellent Work! \n", "You've mastered Logistic Regression basics and integrated it with your website resources.\n", "In the next module, we move to non-linear models: **Decision Trees and Random Forests**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }