Spaces:

sree4411
/

ML_ALGORITHMS

Sleeping

App Files Files Community

ML_ALGORITHMS / pages /KNN.py

sree4411

Update pages/KNN.py

8e2c169 verified about 1 year ago

raw

history blame contribute delete

3.9 kB

	import streamlit as st

	st.set_page_config(page_title="KNN Algorithm", page_icon="📍", layout="wide")


	# Page Title
	st.markdown("<h1>K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)

	# Introduction
	st.markdown("### 🧠 What is KNN?")
	st.markdown("""
	K-Nearest Neighbors (KNN) is a simple, powerful supervised learning algorithm used for both classification and regression tasks.

	- It doesn't build a model during training.
	- Instead, it stores the data and makes predictions based on the K nearest data points during inference.
	""")

	# How It Works
	st.markdown("### ⚙️ How KNN Works")
	with st.expander("Training Phase"):
	st.markdown("""
	- KNN does not perform any actual training.
	- It memorizes the entire dataset and waits for a test point to arrive.
	""")

	with st.expander("Prediction - Classification"):
	st.markdown("""
	1. Select a value for K.
	2. Calculate distances from the new point to all training points.
	3. Pick the K closest data points.
	4. Use majority voting to determine the class.
	""")

	with st.expander("Prediction - Regression"):
	st.markdown("""
	1. Choose K.
	2. Find the K nearest neighbors.
	3. Predict the output by:
	- Averaging the target values (Standard)
	- Weighted average, giving more weight to closer points
	""")

	# Distance Metrics
	st.markdown("### 📏 Distance Metrics")
	st.markdown("""
	- Euclidean: Most common (straight-line distance)
	- Manhattan: Grid-like path distance
	- Minkowski: Generalized form

	These metrics help determine how “close” two points are.
	""")

	# Overfitting vs Underfitting
	st.markdown("### ⚖️ Underfitting, Overfitting & Best Fit")
	st.markdown("""
	- Low K (e.g., 1): Model becomes too sensitive → Overfitting
	- High K (e.g., 20+): Model becomes too generalized → Underfitting
	- Best Fit: Use cross-validation to find a balanced K.
	""")

	# Cross-Validation
	st.markdown("### 🔁 Cross-Validation for K")
	st.markdown("""
	To choose the best `K`, we split the dataset and check performance on unseen data:

	- Training Error: Model accuracy on training data.
	- CV Error: Model accuracy on validation data.
	- Goal: Find K that minimizes both errors.
	""")

	# Hyperparameters
	st.markdown("### 🛠️ KNN Hyperparameters")
	st.markdown("""
	- `k`: Number of neighbors (main parameter)
	- `weights`: `'uniform'` vs `'distance'`
	- `metric`: Distance metric (`euclidean`, `manhattan`, etc.)
	- `n_jobs`: Parallel processing for faster computation
	""")

	# Feature Scaling
	st.markdown("### 📊 Feature Scaling")
	st.markdown("""
	Distance-based algorithms require scaling:
	- Normalization: Rescales features to [0, 1]
	- Standardization: Transforms data to mean 0, std dev 1

	👉 Apply scaling before using KNN!
	""")

	# Weighted KNN
	st.markdown("### 🧮 Weighted KNN")
	st.markdown("""
	In Weighted KNN, closer neighbors contribute more to the prediction.

	Why? Because closer points are usually more relevant.
	""")

	# Decision Regions
	st.markdown("### 🗺️ Decision Boundaries in KNN")
	st.markdown("""
	KNN draws boundaries between different classes:
	- `K=1`: Very jagged (overfits)
	- `K` increases: Boundaries smooth out (less overfitting)

	Visualization helps you understand this behavior better.
	""")

	# Hyperparameter Search
	st.markdown("### 🔍 Hyperparameter Tuning Methods")
	st.markdown("""
	- Grid Search: Try every possible combination (exhaustive)
	- Random Search: Randomly tries values
	- Bayesian Optimization: Learns from results to choose next best combination
	""")

	# Final Note
	st.markdown("### ✅ Summary")
	st.markdown("""
	KNN is intuitive, requires no training, but is sensitive to:
	- Feature scaling
	- Choice of K
	- Noise in data

	Use cross-validation, scaling, and hyperparameter tuning to get the best results.
	""")