ML_ALGORITHMS / pages /KNN.py
sree4411's picture
Update pages/KNN.py
8e2c169 verified
import streamlit as st
st.set_page_config(page_title="KNN Algorithm", page_icon="๐Ÿ“", layout="wide")
# Page Title
st.markdown("<h1>K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)
# Introduction
st.markdown("### ๐Ÿง  What is KNN?")
st.markdown("""
K-Nearest Neighbors (KNN) is a simple, powerful **supervised learning algorithm** used for both **classification** and **regression** tasks.
- It doesn't build a model during training.
- Instead, it stores the data and makes predictions based on the **K nearest data points** during inference.
""")
# How It Works
st.markdown("### โš™๏ธ How KNN Works")
with st.expander("Training Phase"):
st.markdown("""
- KNN does not perform any actual training.
- It memorizes the entire dataset and waits for a test point to arrive.
""")
with st.expander("Prediction - Classification"):
st.markdown("""
1. Select a value for **K**.
2. Calculate distances from the new point to all training points.
3. Pick the **K closest** data points.
4. Use **majority voting** to determine the class.
""")
with st.expander("Prediction - Regression"):
st.markdown("""
1. Choose **K**.
2. Find the **K nearest** neighbors.
3. Predict the output by:
- **Averaging** the target values (Standard)
- **Weighted average**, giving more weight to closer points
""")
# Distance Metrics
st.markdown("### ๐Ÿ“ Distance Metrics")
st.markdown("""
- **Euclidean**: Most common (straight-line distance)
- **Manhattan**: Grid-like path distance
- **Minkowski**: Generalized form
These metrics help determine how โ€œcloseโ€ two points are.
""")
# Overfitting vs Underfitting
st.markdown("### โš–๏ธ Underfitting, Overfitting & Best Fit")
st.markdown("""
- **Low K (e.g., 1)**: Model becomes too sensitive โ†’ Overfitting
- **High K (e.g., 20+)**: Model becomes too generalized โ†’ Underfitting
- **Best Fit**: Use cross-validation to find a balanced K.
""")
# Cross-Validation
st.markdown("### ๐Ÿ” Cross-Validation for K")
st.markdown("""
To choose the best `K`, we split the dataset and check performance on unseen data:
- **Training Error**: Model accuracy on training data.
- **CV Error**: Model accuracy on validation data.
- Goal: Find K that minimizes both errors.
""")
# Hyperparameters
st.markdown("### ๐Ÿ› ๏ธ KNN Hyperparameters")
st.markdown("""
- `k`: Number of neighbors (main parameter)
- `weights`: `'uniform'` vs `'distance'`
- `metric`: Distance metric (`euclidean`, `manhattan`, etc.)
- `n_jobs`: Parallel processing for faster computation
""")
# Feature Scaling
st.markdown("### ๐Ÿ“Š Feature Scaling")
st.markdown("""
Distance-based algorithms require scaling:
- **Normalization**: Rescales features to [0, 1]
- **Standardization**: Transforms data to mean 0, std dev 1
๐Ÿ‘‰ Apply scaling **before** using KNN!
""")
# Weighted KNN
st.markdown("### ๐Ÿงฎ Weighted KNN")
st.markdown("""
In **Weighted KNN**, closer neighbors contribute more to the prediction.
Why? Because closer points are usually more relevant.
""")
# Decision Regions
st.markdown("### ๐Ÿ—บ๏ธ Decision Boundaries in KNN")
st.markdown("""
KNN draws boundaries between different classes:
- `K=1`: Very jagged (overfits)
- `K` increases: Boundaries smooth out (less overfitting)
Visualization helps you understand this behavior better.
""")
# Hyperparameter Search
st.markdown("### ๐Ÿ” Hyperparameter Tuning Methods")
st.markdown("""
- **Grid Search**: Try every possible combination (exhaustive)
- **Random Search**: Randomly tries values
- **Bayesian Optimization**: Learns from results to choose next best combination
""")
# Final Note
st.markdown("### โœ… Summary")
st.markdown("""
KNN is intuitive, requires no training, but is sensitive to:
- Feature scaling
- Choice of K
- Noise in data
Use **cross-validation**, **scaling**, and **hyperparameter tuning** to get the best results.
""")