{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 12 - K-Nearest Neighbors (KNN)\n", "\n", "Welcome to Module 12! We're exploring **KNN**, a simple yet powerful instance-based learning algorithm used for both classification and regression.\n", "\n", "### Resources:\n", "Visit the **[KNN Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub to see how the decision boundary changes as you increase $K$ and how different distance metrics (Euclidean vs Manhattan) affect the results.\n", "\n", "### Objectives:\n", "1. **Instance-based Learning**: Understanding that KNN doesn't \"learn\" a model but stores training data.\n", "2. **Feature Scaling**: Why it's absolutely critical for distance-based models.\n", "3. **The Elbow Method for K**: Choosing the optimal number of neighbors.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "We will use the **Iris** dataset for this classification task." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.datasets import load_iris\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.metrics import classification_report, accuracy_score\n", "\n", "# Load dataset\n", "iris = load_iris()\n", "X = iris.data\n", "y = iris.target\n", "\n", "print(\"Features:\", iris.feature_names)\n", "print(\"Classes:\", iris.target_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Preprocessing\n", "\n", "### Task 1: Scaling is Mandatory\n", "Split the data (20% test) and scale it using `StandardScaler`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "scaler = StandardScaler()\n", "X_train = scaler.fit_transform(X_train)\n", "X_test = scaler.transform(X_test)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Training & Tuning\n", "\n", "### Task 2: Choosing K\n", "Loop through values of $K$ from 1 to 20 and plot the error rate to find the \"elbow\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "error_rate = []\n", "for i in range(1, 21):\n", " knn = KNeighborsClassifier(n_neighbors=i)\n", " knn.fit(X_train, y_train)\n", " pred_i = knn.predict(X_test)\n", " error_rate.append(np.mean(pred_i != y_test))\n", "\n", "plt.figure(figsize=(10,6))\n", "plt.plot(range(1,21), error_rate, color='blue', linestyle='dashed', marker='o')\n", "plt.title('Error Rate vs. K Value')\n", "plt.xlabel('K')\n", "plt.ylabel('Error Rate')\n", "plt.show()\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Final Evaluation\n", "\n", "### Task 3: Train Final Model\n", "Based on your plot, choose the best $K$ and print the classification report." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "knn = KNeighborsClassifier(n_neighbors=3)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)\n", "print(classification_report(y_test, y_pred))\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Great Job! \n", "You've mastered one of the most intuitive algorithms in ML.\n", "Next: **Naive Bayes**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }