{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ML Practice Series: Module 12 - K-Nearest Neighbors (KNN)\n",
"\n",
"Welcome to Module 12! We're exploring **KNN**, a simple yet powerful instance-based learning algorithm used for both classification and regression.\n",
"\n",
"### Resources:\n",
"Visit the **[KNN Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub to see how the decision boundary changes as you increase $K$ and how different distance metrics (Euclidean vs Manhattan) affect the results.\n",
"\n",
"### Objectives:\n",
"1. **Instance-based Learning**: Understanding that KNN doesn't \"learn\" a model but stores training data.\n",
"2. **Feature Scaling**: Why it's absolutely critical for distance-based models.\n",
"3. **The Elbow Method for K**: Choosing the optimal number of neighbors.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Setup\n",
"We will use the **Iris** dataset for this classification task."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.datasets import load_iris\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.metrics import classification_report, accuracy_score\n",
"\n",
"# Load dataset\n",
"iris = load_iris()\n",
"X = iris.data\n",
"y = iris.target\n",
"\n",
"print(\"Features:\", iris.feature_names)\n",
"print(\"Classes:\", iris.target_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Preprocessing\n",
"\n",
"### Task 1: Scaling is Mandatory\n",
"Split the data (20% test) and scale it using `StandardScaler`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
"scaler = StandardScaler()\n",
"X_train = scaler.fit_transform(X_train)\n",
"X_test = scaler.transform(X_test)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Training & Tuning\n",
"\n",
"### Task 2: Choosing K\n",
"Loop through values of $K$ from 1 to 20 and plot the error rate to find the \"elbow\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"error_rate = []\n",
"for i in range(1, 21):\n",
" knn = KNeighborsClassifier(n_neighbors=i)\n",
" knn.fit(X_train, y_train)\n",
" pred_i = knn.predict(X_test)\n",
" error_rate.append(np.mean(pred_i != y_test))\n",
"\n",
"plt.figure(figsize=(10,6))\n",
"plt.plot(range(1,21), error_rate, color='blue', linestyle='dashed', marker='o')\n",
"plt.title('Error Rate vs. K Value')\n",
"plt.xlabel('K')\n",
"plt.ylabel('Error Rate')\n",
"plt.show()\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Final Evaluation\n",
"\n",
"### Task 3: Train Final Model\n",
"Based on your plot, choose the best $K$ and print the classification report."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"knn = KNeighborsClassifier(n_neighbors=3)\n",
"knn.fit(X_train, y_train)\n",
"y_pred = knn.predict(X_test)\n",
"print(classification_report(y_test, y_pred))\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Great Job! \n",
"You've mastered one of the most intuitive algorithms in ML.\n",
"Next: **Naive Bayes**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}