Ramyamaheswari commited on
Commit
fb162fb
·
verified ·
1 Parent(s): dc44e5f

Create app.py

Browse files
Files changed (1) hide show
  1. app.py +145 -0
app.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ st.set_page_config(page_title="KNN", page_icon="🤖", layout="wide")
4
+
5
+ # Styling - Removed background color, kept font styles
6
+ st.markdown("""
7
+ <style>
8
+ h1, h2, h3 {
9
+ color: #003366;
10
+ }
11
+ .custom-font, p {
12
+ font-family: 'Arial', sans-serif;
13
+ font-size: 18px;
14
+ line-height: 1.6;
15
+ }
16
+ </style>
17
+ """, unsafe_allow_html=True)
18
+
19
+ # Title
20
+ st.markdown("<h1 style='color: #003366;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)
21
+
22
+ # Introduction
23
+ st.write("""
24
+ K-Nearest Neighbors (KNN) is a basic yet powerful machine learning algorithm used for both **classification** and **regression** tasks. It makes predictions by looking at the 'K' closest data points in the training set.
25
+
26
+ ### Key Characteristics:
27
+ - KNN is a **non-parametric** and **instance-based** algorithm.
28
+ - It **stores** the training data instead of learning a function from it.
29
+ - Predictions are made based on **similarity** (distance metrics like Euclidean).
30
+ """)
31
+
32
+ # Working of KNN
33
+ st.markdown("<h2 style='color: #003366;'>How KNN Works</h2>", unsafe_allow_html=True)
34
+
35
+ st.subheader("Training Phase")
36
+ st.write("""
37
+ - There is **no actual training** involved.
38
+ - The algorithm simply memorizes the training data.
39
+ """)
40
+
41
+ st.subheader("Prediction Phase (Classification)")
42
+ st.write("""
43
+ 1. Select a value for **K** (number of neighbors).
44
+ 2. Measure distances between the new point and training samples.
45
+ 3. Identify the **K nearest points**.
46
+ 4. Assign the class based on **majority vote**.
47
+ """)
48
+
49
+ st.subheader("Prediction Phase (Regression)")
50
+ st.write("""
51
+ 1. Choose a value for **K**.
52
+ 2. Compute distances to training data.
53
+ 3. Select the **K closest neighbors**.
54
+ 4. Predict the output as the **average** (or weighted average) of these neighbors' values.
55
+ """)
56
+
57
+ # Overfitting vs Underfitting
58
+ st.markdown("<h2 style='color: #003366;'>Model Fit: Overfitting vs Underfitting</h2>", unsafe_allow_html=True)
59
+ st.write("""
60
+ - **Overfitting**: Happens when K is too low (e.g., K=1); the model becomes too sensitive to noise.
61
+ - **Underfitting**: Happens when K is too high; model may miss important patterns.
62
+ - **Optimal Fit**: Requires selecting a K value that provides a good balance between bias and variance.
63
+ """)
64
+
65
+ # Training vs CV Error
66
+ st.markdown("<h2 style='color: #003366;'>Training vs Cross-Validation Error</h2>", unsafe_allow_html=True)
67
+ st.write("""
68
+ To choose the best `K`, monitor both:
69
+ - **Training Error**: Error on the training set.
70
+ - **Cross-Validation (CV) Error**: Error on a validation set, helps assess generalization.
71
+
72
+ High training accuracy but poor CV accuracy = overfitting.
73
+ Low training and CV accuracy = underfitting.
74
+ """)
75
+
76
+ # Hyperparameter tuning
77
+ st.markdown("<h2 style='color: #003366;'>KNN Hyperparameters</h2>", unsafe_allow_html=True)
78
+ st.write("""
79
+ Main parameters to tune:
80
+ - `n_neighbors` (K value)
81
+ - `weights`: 'uniform' or 'distance'
82
+ - `metric`: Distance metric, e.g., 'euclidean', 'manhattan'
83
+ - `n_jobs`: Use multiple processors for speed
84
+
85
+ These can be optimized using Grid Search, Random Search, or Bayesian methods.
86
+ """)
87
+
88
+ # Feature Scaling
89
+ st.markdown("<h2 style='color: #003366;'>Why Feature Scaling is Crucial</h2>", unsafe_allow_html=True)
90
+ st.write("""
91
+ KNN relies on distance between points, so features must be on the same scale.
92
+ Options:
93
+ - **Normalization** (MinMax Scaling): Range [0, 1]
94
+ - **Standardization** (Z-score): Mean 0, Std 1
95
+
96
+ **Important**: Apply scaling *after* splitting your data to avoid data leakage.
97
+ """)
98
+
99
+ # Weighted KNN
100
+ st.markdown("<h2 style='color: #003366;'>Weighted KNN</h2>", unsafe_allow_html=True)
101
+ st.write("""
102
+ In Weighted KNN, closer neighbors contribute more to the prediction.
103
+ This is especially useful when nearby data points are more reliable than distant ones.
104
+ """)
105
+
106
+ # Decision regions
107
+ st.markdown("<h2 style='color: #003366;'>Decision Boundaries</h2>", unsafe_allow_html=True)
108
+ st.write("""
109
+ - K=1 produces sharp, complex boundaries → risk of overfitting.
110
+ - Larger K smoothens the boundary → reduces variance but increases bias.
111
+ """)
112
+
113
+ # Cross-validation
114
+ st.markdown("<h2 style='color: #003366;'>Understanding Cross-Validation</h2>", unsafe_allow_html=True)
115
+ st.write("""
116
+ Cross-validation helps evaluate how well the model generalizes.
117
+ **K-Fold Cross Validation**:
118
+ - Split data into K parts.
119
+ - Train on K-1 parts, test on the remaining.
120
+ - Repeat K times and average the performance.
121
+ """)
122
+
123
+ # Hyperparameter search methods
124
+ st.markdown("<h2 style='color: #003366;'>Hyperparameter Tuning Methods</h2>", unsafe_allow_html=True)
125
+ st.write("""
126
+ - **Grid Search**: Tests all combinations — reliable but slow.
127
+ - **Random Search**: Randomly samples combinations — faster, may miss optimal.
128
+ - **Bayesian Optimization**: Uses past performance to choose next candidates — efficient and smart.
129
+ """)
130
+
131
+ # Link to implementation
132
+ st.markdown("<h2 style='color: #003366;'>KNN Code Implementation</h2>", unsafe_allow_html=True)
133
+ st.markdown(
134
+ "<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank' style='font-size: 16px; color: #003366;'>Click here to view the notebook</a>",
135
+ unsafe_allow_html=True
136
+ )
137
+
138
+ # Summary
139
+ st.write("""
140
+ KNN is a straightforward but effective algorithm.
141
+ To get the best results:
142
+ - Scale your data properly.
143
+ - Use cross-validation.
144
+ - Carefully choose hyperparameters using tuning methods.
145
+ """)