Harika22 commited on
Commit
8b1a429
Β·
verified Β·
1 Parent(s): 14ddc47

Update pages/9_KNN.py

Browse files
Files changed (1) hide show
  1. pages/9_KNN.py +123 -0
pages/9_KNN.py CHANGED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ st.set_page_config(page_title="KNN Explained", page_icon="πŸ€–", layout="wide")
4
+
5
+ st.markdown("""
6
+ <style>
7
+ .stApp {
8
+ background-image: linear-gradient(120deg, #232526, #414345);
9
+ color: #f8f8f2;
10
+ }
11
+ h1, h2, h3, h4 {
12
+ color: #ff79c6;
13
+ }
14
+ .sidebar .sidebar-content {
15
+ background-color: #2c3e50;
16
+ }
17
+ .block-container {
18
+ padding-top: 2rem;
19
+ padding-bottom: 2rem;
20
+ }
21
+ a {
22
+ color: #8be9fd;
23
+ text-decoration: none;
24
+ }
25
+ a:hover {
26
+ color: #50fa7b;
27
+ }
28
+ </style>
29
+ """, unsafe_allow_html=True)
30
+
31
+ st.sidebar.title("πŸ€– KNN Explorer")
32
+ st.sidebar.markdown("Discover the K-Nearest Neighbors algorithm step-by-step.")
33
+
34
+ st.markdown("""
35
+ <h1 style='text-align: center;'>πŸ” K-Nearest Neighbors (KNN) Simplified</h1>
36
+ """, unsafe_allow_html=True)
37
+
38
+ with st.expander("πŸ“˜ What is KNN?"):
39
+ st.write("""
40
+ K-Nearest Neighbors (KNN) is a **simple**, **intuitive**, and **non-parametric** algorithm used in classification and regression.
41
+ It makes predictions based on the majority class or average of the `K` closest training samples.
42
+
43
+ βœ… No training phase required β€” just store the data.
44
+ βœ… Uses distance-based similarity (e.g., Euclidean).
45
+ βœ… Effective for well-separated and small datasets.
46
+ """)
47
+
48
+ with st.expander("βš™οΈ How Does KNN Work?"):
49
+ st.write("""
50
+ **Training Phase:**
51
+ - KNN does not train in the traditional sense. It memorizes the training set.
52
+
53
+ **Prediction Phase:**
54
+ 1. Choose a value of `K`
55
+ 2. Calculate distance between test point and all training points
56
+ 3. Pick `K` nearest neighbors
57
+ 4. Predict class (majority vote) or value (average)
58
+ """)
59
+
60
+ with st.expander("🎯 Underfitting vs Overfitting"):
61
+ st.write("""
62
+ - **Overfitting**: Very low training error but poor generalization. Happens with low `K` (e.g., `k=1`).
63
+ - **Underfitting**: Model is too simple to learn any patterns (e.g., very high `K`).
64
+ - **Sweet Spot**: Use **cross-validation** to pick the best `K` that balances both.
65
+ """)
66
+
67
+ with st.expander("πŸ“‰ Training vs Cross-Validation Error"):
68
+ st.write("""
69
+ - **Training Error**: How well the model does on the data it has seen.
70
+ - **CV Error**: Performance on unseen data using validation.
71
+
72
+ βš–οΈ Aim for **low CV error** for best generalization.
73
+ """)
74
+
75
+ with st.expander("πŸ› οΈ Hyperparameter Tuning"):
76
+ st.write("""
77
+ - `k`: Number of neighbors (e.g., 3, 5, 7)
78
+ - `weights`: 'uniform' or 'distance' (weight closer neighbors more)
79
+ - `metric`: Distance function (Euclidean, Manhattan, etc.)
80
+
81
+ πŸ” Use **Grid Search**, **Randomized Search**, or **Bayesian Optimization** to tune these.
82
+ """)
83
+
84
+ with st.expander("βš–οΈ Why Scaling Matters"):
85
+ st.write("""
86
+ KNN is based on distance β€” so features on different scales can skew results.
87
+
88
+ βœ… Use **StandardScaler** (Z-score) or **MinMaxScaler** for preprocessing.
89
+ ⚠️ Always scale **after splitting** the data.
90
+ """)
91
+
92
+ with st.expander("πŸ“ Weighted KNN"):
93
+ st.write("""
94
+ - Weighted KNN assigns more importance to closer neighbors.
95
+ - It’s useful when closer points are more likely to belong to the same class.
96
+ - Just use `weights='distance'` in most libraries like scikit-learn.
97
+ """)
98
+
99
+
100
+ with st.expander("πŸ—ΊοΈ Decision Boundaries"):
101
+ st.write("""
102
+ - `k=1`: Sharp, complex boundaries β€” can lead to overfitting.
103
+ - Larger `k`: Smoother boundaries β€” better generalization.
104
+ - Visualize using 2D plots to understand how `K` affects predictions.
105
+ """)
106
+
107
+ with st.expander("πŸ” What is Cross-Validation?"):
108
+ st.write("""
109
+ - **K-Fold Cross-Validation** splits data into `K` parts (folds).
110
+ - Train on `K-1` folds, test on the remaining.
111
+ - Repeat `K` times and average results.
112
+
113
+ βœ… Helps prevent overfitting and guides hyperparameter selection.
114
+ """)
115
+
116
+ st.markdown("""
117
+ <h2 style='color: #8be9fd;'>πŸ““ Try KNN in Action:</h2>
118
+ <a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank'>
119
+ πŸš€ Open Colab Notebook
120
+ </a>
121
+ """, unsafe_allow_html=True)
122
+
123
+ st.success("KNN is simple yet powerful. Use scaling, choose the right K, and always validate your results!")