import streamlit as st st.set_page_config(page_title="KNN", page_icon="🤖", layout="wide") # Styling - Removed background color, kept font styles st.markdown(""" """, unsafe_allow_html=True) # Title st.markdown("

Understanding K-Nearest Neighbors (KNN)

", unsafe_allow_html=True) st.image("https://cdn-uploads.huggingface.co/production/uploads/66be28cc7e8987822d129400/7vQBEhJAO_Bju6_SsK0Ms.png") # Introduction st.write(""" K-Nearest Neighbors (KNN) is a basic yet powerful machine learning algorithm used for both **classification** and **regression** tasks. It makes predictions by looking at the 'K' closest data points in the training set. ### Key Characteristics: - KNN is a **non-parametric** and **instance-based** algorithm. - It **stores** the training data instead of learning a function from it. - Predictions are made based on **similarity** (distance metrics like Euclidean). """) # Working of KNN st.markdown("

How KNN Works

", unsafe_allow_html=True) st.subheader("Training Phase") st.write(""" - There is **no actual training** involved. - The algorithm simply memorizes the training data. """) st.subheader("Prediction Phase (Classification)") st.write(""" 1. Select a value for **K** (number of neighbors). 2. Measure distances between the new point and training samples. 3. Identify the **K nearest points**. 4. Assign the class based on **majority vote**. """) st.subheader("Prediction Phase (Regression)") st.write(""" 1. Choose a value for **K**. 2. Compute distances to training data. 3. Select the **K closest neighbors**. 4. Predict the output as the **average** (or weighted average) of these neighbors' values. """) # Overfitting vs Underfitting st.markdown("

Model Fit: Overfitting vs Underfitting

", unsafe_allow_html=True) st.write(""" - **Overfitting**: Happens when K is too low (e.g., K=1); the model becomes too sensitive to noise. - **Underfitting**: Happens when K is too high; model may miss important patterns. - **Optimal Fit**: Requires selecting a K value that provides a good balance between bias and variance. """) # Training vs CV Error st.markdown("

Training vs Cross-Validation Error

", unsafe_allow_html=True) st.write(""" To choose the best `K`, monitor both: - **Training Error**: Error on the training set. - **Cross-Validation (CV) Error**: Error on a validation set, helps assess generalization. High training accuracy but poor CV accuracy = overfitting. Low training and CV accuracy = underfitting. """) # Hyperparameter tuning st.markdown("

KNN Hyperparameters

", unsafe_allow_html=True) st.write(""" Main parameters to tune: - `n_neighbors` (K value) - `weights`: 'uniform' or 'distance' - `metric`: Distance metric, e.g., 'euclidean', 'manhattan' - `n_jobs`: Use multiple processors for speed These can be optimized using Grid Search, Random Search, or Bayesian methods. """) # Feature Scaling st.markdown("

Why Feature Scaling is Crucial

", unsafe_allow_html=True) st.write(""" KNN relies on distance between points, so features must be on the same scale. Options: - **Normalization** (MinMax Scaling): Range [0, 1] - **Standardization** (Z-score): Mean 0, Std 1 **Important**: Apply scaling *after* splitting your data to avoid data leakage. """) # Weighted KNN st.markdown("

Weighted KNN

", unsafe_allow_html=True) st.write(""" In Weighted KNN, closer neighbors contribute more to the prediction. This is especially useful when nearby data points are more reliable than distant ones. """) # Decision regions st.markdown("

Decision Boundaries

", unsafe_allow_html=True) st.write(""" - K=1 produces sharp, complex boundaries → risk of overfitting. - Larger K smoothens the boundary → reduces variance but increases bias. """) # Cross-validation st.markdown("

Understanding Cross-Validation

", unsafe_allow_html=True) st.write(""" Cross-validation helps evaluate how well the model generalizes. **K-Fold Cross Validation**: - Split data into K parts. - Train on K-1 parts, test on the remaining. - Repeat K times and average the performance. """) # Hyperparameter search methods st.markdown("

Hyperparameter Tuning Methods

", unsafe_allow_html=True) st.write(""" - **Grid Search**: Tests all combinations — reliable but slow. - **Random Search**: Randomly samples combinations — faster, may miss optimal. - **Bayesian Optimization**: Uses past performance to choose next candidates — efficient and smart. """) # Link to implementation st.markdown("

KNN Code Implementation

", unsafe_allow_html=True) st.markdown( "Click here to view the notebook", unsafe_allow_html=True ) # Summary st.write(""" KNN is a straightforward but effective algorithm. To get the best results: - Scale your data properly. - Use cross-validation. - Carefully choose hyperparameters using tuning methods. """)