Spaces:

Harika22
/

Machine_learning

Sleeping

App Files Files Community

Harika22 commited on May 26, 2025

Commit

8b1a429

verified ·

1 Parent(s): 14ddc47

Update pages/9_KNN.py

Browse files

Files changed (1) hide show

pages/9_KNN.py +123 -0

pages/9_KNN.py CHANGED Viewed

	@@ -0,0 +1,123 @@

+import streamlit as st
+st.set_page_config(page_title="KNN Explained", page_icon="🤖", layout="wide")
+st.markdown("""
+    <style>
+        .stApp {
+            background-image: linear-gradient(120deg, #232526, #414345);
+            color: #f8f8f2;
+        }
+        h1, h2, h3, h4 {
+            color: #ff79c6;
+        }
+        .sidebar .sidebar-content {
+            background-color: #2c3e50;
+        }
+        .block-container {
+            padding-top: 2rem;
+            padding-bottom: 2rem;
+        }
+        a {
+            color: #8be9fd;
+            text-decoration: none;
+        }
+        a:hover {
+            color: #50fa7b;
+        }
+    </style>
+""", unsafe_allow_html=True)
+st.sidebar.title("🤖 KNN Explorer")
+st.sidebar.markdown("Discover the K-Nearest Neighbors algorithm step-by-step.")
+st.markdown("""
+    <h1 style='text-align: center;'>🔍 K-Nearest Neighbors (KNN) Simplified</h1>
+""", unsafe_allow_html=True)
+with st.expander("📘 What is KNN?"):
+    st.write("""
+        K-Nearest Neighbors (KNN) is a **simple**, **intuitive**, and **non-parametric** algorithm used in classification and regression.
+        It makes predictions based on the majority class or average of the `K` closest training samples.
+        ✅ No training phase required — just store the data.
+        ✅ Uses distance-based similarity (e.g., Euclidean).
+        ✅ Effective for well-separated and small datasets.
+    """)
+with st.expander("⚙️ How Does KNN Work?"):
+    st.write("""
+        **Training Phase:**
+        - KNN does not train in the traditional sense. It memorizes the training set.
+        **Prediction Phase:**
+        1. Choose a value of `K`
+        2. Calculate distance between test point and all training points
+        3. Pick `K` nearest neighbors
+        4. Predict class (majority vote) or value (average)
+    """)
+with st.expander("🎯 Underfitting vs Overfitting"):
+    st.write("""
+        - **Overfitting**: Very low training error but poor generalization. Happens with low `K` (e.g., `k=1`).
+        - **Underfitting**: Model is too simple to learn any patterns (e.g., very high `K`).
+        - **Sweet Spot**: Use **cross-validation** to pick the best `K` that balances both.
+    """)
+with st.expander("📉 Training vs Cross-Validation Error"):
+    st.write("""
+        - **Training Error**: How well the model does on the data it has seen.
+        - **CV Error**: Performance on unseen data using validation.
+        ⚖️ Aim for **low CV error** for best generalization.
+    """)
+with st.expander("🛠️ Hyperparameter Tuning"):
+    st.write("""
+        - `k`: Number of neighbors (e.g., 3, 5, 7)
+        - `weights`: 'uniform' or 'distance' (weight closer neighbors more)
+        - `metric`: Distance function (Euclidean, Manhattan, etc.)
+        🔍 Use **Grid Search**, **Randomized Search**, or **Bayesian Optimization** to tune these.
+    """)
+with st.expander("⚖️ Why Scaling Matters"):
+    st.write("""
+        KNN is based on distance — so features on different scales can skew results.
+        ✅ Use **StandardScaler** (Z-score) or **MinMaxScaler** for preprocessing.
+        ⚠️ Always scale **after splitting** the data.
+    """)
+with st.expander("📐 Weighted KNN"):
+    st.write("""
+        - Weighted KNN assigns more importance to closer neighbors.
+        - It’s useful when closer points are more likely to belong to the same class.
+        - Just use `weights='distance'` in most libraries like scikit-learn.
+    """)
+with st.expander("🗺️ Decision Boundaries"):
+    st.write("""
+        - `k=1`: Sharp, complex boundaries — can lead to overfitting.
+        - Larger `k`: Smoother boundaries — better generalization.
+        - Visualize using 2D plots to understand how `K` affects predictions.
+    """)
+with st.expander("🔁 What is Cross-Validation?"):
+    st.write("""
+        - **K-Fold Cross-Validation** splits data into `K` parts (folds).
+        - Train on `K-1` folds, test on the remaining.
+        - Repeat `K` times and average results.
+        ✅ Helps prevent overfitting and guides hyperparameter selection.
+    """)
+st.markdown("""
+    <h2 style='color: #8be9fd;'>📓 Try KNN in Action:</h2>
+    <a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank'>
+    🚀 Open Colab Notebook
+    </a>
+""", unsafe_allow_html=True)
+st.success("KNN is simple yet powerful. Use scaling, choose the right K, and always validate your results!")