Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| st.set_page_config(page_title="KNN Algorithm", page_icon="๐", layout="wide") | |
| # Page Title | |
| st.markdown("<h1>K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True) | |
| # Introduction | |
| st.markdown("### ๐ง What is KNN?") | |
| st.markdown(""" | |
| K-Nearest Neighbors (KNN) is a simple, powerful **supervised learning algorithm** used for both **classification** and **regression** tasks. | |
| - It doesn't build a model during training. | |
| - Instead, it stores the data and makes predictions based on the **K nearest data points** during inference. | |
| """) | |
| # How It Works | |
| st.markdown("### โ๏ธ How KNN Works") | |
| with st.expander("Training Phase"): | |
| st.markdown(""" | |
| - KNN does not perform any actual training. | |
| - It memorizes the entire dataset and waits for a test point to arrive. | |
| """) | |
| with st.expander("Prediction - Classification"): | |
| st.markdown(""" | |
| 1. Select a value for **K**. | |
| 2. Calculate distances from the new point to all training points. | |
| 3. Pick the **K closest** data points. | |
| 4. Use **majority voting** to determine the class. | |
| """) | |
| with st.expander("Prediction - Regression"): | |
| st.markdown(""" | |
| 1. Choose **K**. | |
| 2. Find the **K nearest** neighbors. | |
| 3. Predict the output by: | |
| - **Averaging** the target values (Standard) | |
| - **Weighted average**, giving more weight to closer points | |
| """) | |
| # Distance Metrics | |
| st.markdown("### ๐ Distance Metrics") | |
| st.markdown(""" | |
| - **Euclidean**: Most common (straight-line distance) | |
| - **Manhattan**: Grid-like path distance | |
| - **Minkowski**: Generalized form | |
| These metrics help determine how โcloseโ two points are. | |
| """) | |
| # Overfitting vs Underfitting | |
| st.markdown("### โ๏ธ Underfitting, Overfitting & Best Fit") | |
| st.markdown(""" | |
| - **Low K (e.g., 1)**: Model becomes too sensitive โ Overfitting | |
| - **High K (e.g., 20+)**: Model becomes too generalized โ Underfitting | |
| - **Best Fit**: Use cross-validation to find a balanced K. | |
| """) | |
| # Cross-Validation | |
| st.markdown("### ๐ Cross-Validation for K") | |
| st.markdown(""" | |
| To choose the best `K`, we split the dataset and check performance on unseen data: | |
| - **Training Error**: Model accuracy on training data. | |
| - **CV Error**: Model accuracy on validation data. | |
| - Goal: Find K that minimizes both errors. | |
| """) | |
| # Hyperparameters | |
| st.markdown("### ๐ ๏ธ KNN Hyperparameters") | |
| st.markdown(""" | |
| - `k`: Number of neighbors (main parameter) | |
| - `weights`: `'uniform'` vs `'distance'` | |
| - `metric`: Distance metric (`euclidean`, `manhattan`, etc.) | |
| - `n_jobs`: Parallel processing for faster computation | |
| """) | |
| # Feature Scaling | |
| st.markdown("### ๐ Feature Scaling") | |
| st.markdown(""" | |
| Distance-based algorithms require scaling: | |
| - **Normalization**: Rescales features to [0, 1] | |
| - **Standardization**: Transforms data to mean 0, std dev 1 | |
| ๐ Apply scaling **before** using KNN! | |
| """) | |
| # Weighted KNN | |
| st.markdown("### ๐งฎ Weighted KNN") | |
| st.markdown(""" | |
| In **Weighted KNN**, closer neighbors contribute more to the prediction. | |
| Why? Because closer points are usually more relevant. | |
| """) | |
| # Decision Regions | |
| st.markdown("### ๐บ๏ธ Decision Boundaries in KNN") | |
| st.markdown(""" | |
| KNN draws boundaries between different classes: | |
| - `K=1`: Very jagged (overfits) | |
| - `K` increases: Boundaries smooth out (less overfitting) | |
| Visualization helps you understand this behavior better. | |
| """) | |
| # Hyperparameter Search | |
| st.markdown("### ๐ Hyperparameter Tuning Methods") | |
| st.markdown(""" | |
| - **Grid Search**: Try every possible combination (exhaustive) | |
| - **Random Search**: Randomly tries values | |
| - **Bayesian Optimization**: Learns from results to choose next best combination | |
| """) | |
| # Final Note | |
| st.markdown("### โ Summary") | |
| st.markdown(""" | |
| KNN is intuitive, requires no training, but is sensitive to: | |
| - Feature scaling | |
| - Choice of K | |
| - Noise in data | |
| Use **cross-validation**, **scaling**, and **hyperparameter tuning** to get the best results. | |
| """) | |