Spaces:

AashishAIHub
/

DataScience

Running

File size: 7,159 Bytes

854c114

{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 12 - K-Nearest Neighbors (KNN)\n",
                "\n",
                "Welcome to Module 12! We're exploring **KNN**, a simple yet powerful instance-based learning algorithm used for both classification and regression.\n",
                "\n",
                "### Resources:\n",
                "Visit the **[KNN Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub to see how the decision boundary changes as you increase $K$ and how different distance metrics (Euclidean vs Manhattan) affect the results.\n",
                "\n",
                "### Objectives:\n",
                "1. **Instance-based Learning**: Understanding that KNN doesn't \"learn\" a model but stores training data.\n",
                "2. **Feature Scaling**: Why it's absolutely critical for distance-based models.\n",
                "3. **The Elbow Method for K**: Choosing the optimal number of neighbors.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use the **Iris** dataset for this classification task."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "import numpy as np\n",
                "import matplotlib.pyplot as plt\n",
                "import seaborn as sns\n",
                "from sklearn.datasets import load_iris\n",
                "from sklearn.model_selection import train_test_split\n",
                "from sklearn.preprocessing import StandardScaler\n",
                "from sklearn.neighbors import KNeighborsClassifier\n",
                "from sklearn.metrics import classification_report, accuracy_score\n",
                "\n",
                "# Load dataset\n",
                "iris = load_iris()\n",
                "X = iris.data\n",
                "y = iris.target\n",
                "\n",
                "print(\"Features:\", iris.feature_names)\n",
                "print(\"Classes:\", iris.target_names)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Preprocessing\n",
                "\n",
                "### Task 1: Scaling is Mandatory\n",
                "Split the data (20% test) and scale it using `StandardScaler`."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
                "scaler = StandardScaler()\n",
                "X_train = scaler.fit_transform(X_train)\n",
                "X_test = scaler.transform(X_test)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Training & Tuning\n",
                "\n",
                "### Task 2: Choosing K\n",
                "Loop through values of $K$ from 1 to 20 and plot the error rate to find the \"elbow\"."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "error_rate = []\n",
                "for i in range(1, 21):\n",
                "    knn = KNeighborsClassifier(n_neighbors=i)\n",
                "    knn.fit(X_train, y_train)\n",
                "    pred_i = knn.predict(X_test)\n",
                "    error_rate.append(np.mean(pred_i != y_test))\n",
                "\n",
                "plt.figure(figsize=(10,6))\n",
                "plt.plot(range(1,21), error_rate, color='blue', linestyle='dashed', marker='o')\n",
                "plt.title('Error Rate vs. K Value')\n",
                "plt.xlabel('K')\n",
                "plt.ylabel('Error Rate')\n",
                "plt.show()\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 4. Final Evaluation\n",
                "\n",
                "### Task 3: Train Final Model\n",
                "Based on your plot, choose the best $K$ and print the classification report."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "knn = KNeighborsClassifier(n_neighbors=3)\n",
                "knn.fit(X_train, y_train)\n",
                "y_pred = knn.predict(X_test)\n",
                "print(classification_report(y_test, y_pred))\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Great Job! \n",
                "You've mastered one of the most intuitive algorithms in ML.\n",
                "Next: **Naive Bayes**."
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}