Spaces:

Harika22
/

Machine_learning

Sleeping

App Files Files Community

Machine_learning / pages /9_KNN.py

Harika22

Update pages/9_KNN.py

fcf7f83 verified 7 months ago

raw

history blame contribute delete

4.24 kB

	import streamlit as st

	st.set_page_config(page_title="KNN", page_icon="🤖", layout="wide")

	st.markdown("<h1 style='text-align: center; color: #FF4C60;'>🔍 K-Nearest Neighbors (KNN) Algorithm</h1>", unsafe_allow_html=True)

	st.sidebar.title("🤖 KNN App")
	st.sidebar.markdown("Explore KNN concepts step-by-step using the sections below.")

	option = st.radio(
	"Select a concept to learn:",
	(
	"📘 What is KNN?",
	"⚙️ How Does KNN Work?",
	"🎯 Underfitting vs Overfitting",
	"📉 Training vs Cross-Validation Error",
	"🛠️ Hyperparameter Tuning",
	"⚖️ Feature Scaling",
	"🧮 Weighted KNN",
	"🗺️ Decision Regions",
	"🔁 Cross-Validation Explained"
	)
	)

	if option == "📘 What is KNN?":
	st.write("""
	K-Nearest Neighbors (KNN) is a non-parametric, lazy learning algorithm used for both classification and regression.

	✅ It stores all training data instead of learning a function.
	✅ It uses distance metrics (e.g., Euclidean, Manhattan) to make predictions.
	✅ Suitable for small to moderately sized datasets.
	""")

	elif option == "⚙️ How Does KNN Work?":
	st.write("""
	Training Phase:
	- No actual training occurs. KNN memorizes the training dataset.

	Prediction Phase (Classification):
	1. Choose a value of K
	2. Calculate distances from the new point to all others
	3. Pick K closest points
	4. Use majority vote to classify

	Prediction Phase (Regression):
	- Average the values of the K nearest neighbors.
	""")

	elif option == "🎯 Underfitting vs Overfitting":
	st.write("""
	- Overfitting: The model is too specific to the training data. Poor on unseen data.
	- Underfitting: The model is too simple. Poor even on training data.
	- Ideal Model: A balance that performs well on both seen and unseen data.
	""")

	elif option == "📉 Training vs Cross-Validation Error":
	st.write("""
	- Training Error is the error on the known training data.
	- Cross-Validation Error is from unseen validation data.

	✅ Use cross-validation to pick the best value of `K`.
	🔍 Big gap = Overfitting; Both high = Underfitting.
	""")

	elif option == "🛠️ Hyperparameter Tuning":
	st.write("""
	- K: Number of neighbors — test multiple values.
	- Weights: Equal (`uniform`) or based on distance (`distance`).
	- Metric: How distance is measured (Euclidean, Manhattan).
	- Use Grid Search, Random Search, or Optuna for best tuning.
	""")

	elif option == "⚖️ Feature Scaling":
	st.write("""
	KNN uses distances — so features must be on the same scale.
	- Normalization scales data between 0 and 1.
	- Standardization centers data around mean 0.
	⚠️ Always scale data after splitting to avoid leakage.
	""")

	elif option == "🧮 Weighted KNN":
	st.write("""
	Weighted KNN assigns higher importance to closer neighbors.
	- Use `weights='distance'` to apply this logic in libraries like scikit-learn.
	- Helps in noisy datasets or when closer points are more meaningful.
	""")

	elif option == "🗺️ Decision Regions":
	st.write("""
	- Small `k` values create complex, wiggly decision boundaries (overfitting).
	- Larger `k` smooths the boundary (better generalization).
	- Visualizing decision regions helps understand the algorithm’s behavior.
	""")

	elif option == "🔁 Cross-Validation Explained":
	st.write("""
	- K-Fold Cross-Validation splits data into `K` parts.
	- The model trains on K-1 parts and tests on the remaining part.
	- Helps evaluate model stability and avoid overfitting.
	""")

	st.markdown("<h2 style='color: #58a6ff;'>📓 Try KNN in Colab:</h2>", unsafe_allow_html=True)
	st.markdown("""
	<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank'>
	🔗 Open Jupyter Notebook on Colab
	</a>
	""", unsafe_allow_html=True)

	st.success("KNN is easy to understand and surprisingly powerful! Tune it well, scale your data, and validate your model to get the best results.")