{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ML Practice Series: Module 03 - Logistic Regression\n",
"\n",
"Welcome to Module 03! Today we dive into **Logistic Regression**, the go-to algorithm for binary classification.\n",
"\n",
"### Resources:\n",
"Refer to the **[Logistic Regression Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub to understand the Sigmoid function and how probability thresholds work.\n",
"\n",
"### Objectives:\n",
"1. **Scaling**: Understand why feature scaling is important.\n",
"2. **Classification**: Distinguish between regression and classification.\n",
"3. **Performance Metrics**: Learn how to interpret a Confusion Matrix and ROC Curve.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Setup\n",
"We will use the **Breast Cancer Wisconsin** dataset from Scikit-Learn."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.datasets import load_breast_cancer\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, roc_curve, auc\n",
"\n",
"# Load dataset\n",
"data = load_breast_cancer()\n",
"df = pd.DataFrame(data.data, columns=data.feature_names)\n",
"df['target'] = data.target\n",
"\n",
"print(\"Dataset Shape:\", df.shape)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Preprocessing\n",
"\n",
"### Task 1: Train-Test Split\n",
"Split the data (X, y) with a test size of 0.25."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"X = df.drop('target', axis=1)\n",
"y = df['target']\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 2: Standard Scaling\n",
"Scale the features using `StandardScaler`.\n",
"\n",
"*Web Reference: Check the [Scaling Demo](https://aashishgarg13.github.io/DataScience/feature-engineering/) to see visual differences between Standard and MinMax scalers.*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"scaler = StandardScaler()\n",
"X_train_scaled = scaler.fit_transform(X_train)\n",
"X_test_scaled = scaler.transform(X_test)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Modeling\n",
"\n",
"### Task 3: Training\n",
"Initialize and fit the `LogisticRegression` model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"model = LogisticRegression()\n",
"model.fit(X_train_scaled, y_train)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Evaluation\n",
"\n",
"### Task 4: Confusion Matrix & ROC Curve\n",
"Plot the confusion matrix and calculate the ROC-AUC score.\n",
"\n",
"*Web Reference: [Model Evaluation Interactive](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"y_pred = model.predict(X_test_scaled)\n",
"cm = confusion_matrix(y_test, y_pred)\n",
"sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')\n",
"plt.title('Confusion Matrix')\n",
"plt.show()\n",
"\n",
"print(classification_report(y_test, y_pred))\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Excellent Work! \n",
"You've mastered Logistic Regression basics and integrated it with your website resources.\n",
"In the next module, we move to non-linear models: **Decision Trees and Random Forests**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}