File size: 5,555 Bytes
fb162fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1a8e8b
fb162fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1a8e8b
fb162fb
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
import streamlit as st

st.set_page_config(page_title="KNN", page_icon="πŸ€–", layout="wide")

# Styling - Removed background color, kept font styles
st.markdown("""
    <style>
        h1, h2, h3 {
            color: #003366;
        }
        .custom-font, p {
            font-family: 'Arial', sans-serif;
            font-size: 18px;
            line-height: 1.6;
        }
    </style>
    """, unsafe_allow_html=True)

# Title
st.markdown("<h1 style='color: #003366;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)
st.image("https://cdn-uploads.huggingface.co/production/uploads/66be28cc7e8987822d129400/7vQBEhJAO_Bju6_SsK0Ms.png")

# Introduction
st.write("""
K-Nearest Neighbors (KNN) is a basic yet powerful machine learning algorithm used for both **classification** and **regression** tasks. It makes predictions by looking at the 'K' closest data points in the training set.

### Key Characteristics:
- KNN is a **non-parametric** and **instance-based** algorithm.
- It **stores** the training data instead of learning a function from it.
- Predictions are made based on **similarity** (distance metrics like Euclidean).
""")

# Working of KNN
st.markdown("<h2 style='color: #003366;'>How KNN Works</h2>", unsafe_allow_html=True)

st.subheader("Training Phase")
st.write("""
- There is **no actual training** involved.
- The algorithm simply memorizes the training data.
""")

st.subheader("Prediction Phase (Classification)")
st.write("""
1. Select a value for **K** (number of neighbors).
2. Measure distances between the new point and training samples.
3. Identify the **K nearest points**.
4. Assign the class based on **majority vote**.
""")

st.subheader("Prediction Phase (Regression)")
st.write("""
1. Choose a value for **K**.
2. Compute distances to training data.
3. Select the **K closest neighbors**.
4. Predict the output as the **average** (or weighted average) of these neighbors' values.
""")

# Overfitting vs Underfitting
st.markdown("<h2 style='color: #003366;'>Model Fit: Overfitting vs Underfitting</h2>", unsafe_allow_html=True)
st.write("""
- **Overfitting**: Happens when K is too low (e.g., K=1); the model becomes too sensitive to noise.
- **Underfitting**: Happens when K is too high; model may miss important patterns.
- **Optimal Fit**: Requires selecting a K value that provides a good balance between bias and variance.
""")

# Training vs CV Error
st.markdown("<h2 style='color: #003366;'>Training vs Cross-Validation Error</h2>", unsafe_allow_html=True)
st.write("""
To choose the best `K`, monitor both:
- **Training Error**: Error on the training set.
- **Cross-Validation (CV) Error**: Error on a validation set, helps assess generalization.

High training accuracy but poor CV accuracy = overfitting.  
Low training and CV accuracy = underfitting.
""")

# Hyperparameter tuning
st.markdown("<h2 style='color: #003366;'>KNN Hyperparameters</h2>", unsafe_allow_html=True)
st.write("""
Main parameters to tune:
- `n_neighbors` (K value)
- `weights`: 'uniform' or 'distance'
- `metric`: Distance metric, e.g., 'euclidean', 'manhattan'
- `n_jobs`: Use multiple processors for speed

These can be optimized using Grid Search, Random Search, or Bayesian methods.
""")

# Feature Scaling
st.markdown("<h2 style='color: #003366;'>Why Feature Scaling is Crucial</h2>", unsafe_allow_html=True)
st.write("""
KNN relies on distance between points, so features must be on the same scale.  
Options:
- **Normalization** (MinMax Scaling): Range [0, 1]
- **Standardization** (Z-score): Mean 0, Std 1

**Important**: Apply scaling *after* splitting your data to avoid data leakage.
""")

# Weighted KNN
st.markdown("<h2 style='color: #003366;'>Weighted KNN</h2>", unsafe_allow_html=True)
st.write("""
In Weighted KNN, closer neighbors contribute more to the prediction.  
This is especially useful when nearby data points are more reliable than distant ones.
""")

# Decision regions
st.markdown("<h2 style='color: #003366;'>Decision Boundaries</h2>", unsafe_allow_html=True)
st.write("""
- K=1 produces sharp, complex boundaries β†’ risk of overfitting.
- Larger K smoothens the boundary β†’ reduces variance but increases bias.
""")

# Cross-validation
st.markdown("<h2 style='color: #003366;'>Understanding Cross-Validation</h2>", unsafe_allow_html=True)
st.write("""
Cross-validation helps evaluate how well the model generalizes.  
**K-Fold Cross Validation**:
- Split data into K parts.
- Train on K-1 parts, test on the remaining.
- Repeat K times and average the performance.
""")

# Hyperparameter search methods
st.markdown("<h2 style='color: #003366;'>Hyperparameter Tuning Methods</h2>", unsafe_allow_html=True)
st.write("""
- **Grid Search**: Tests all combinations β€” reliable but slow.
- **Random Search**: Randomly samples combinations β€” faster, may miss optimal.
- **Bayesian Optimization**: Uses past performance to choose next candidates β€” efficient and smart.
""")

# Link to implementation
st.markdown("<h2 style='color: #003366;'>KNN Code Implementation</h2>", unsafe_allow_html=True)
st.markdown(
    "<a href='https://colab.research.google.com/drive/12lD7ceLj5BPiB6tgxaWXciB1IOMYGyZg#scrollTo=96210031-7967-41c4-9de2-56135c423404' target='_blank' style='font-size: 16px; color: #003366;'>Click here to view the notebook</a>",
    unsafe_allow_html=True
)

# Summary
st.write("""
KNN is a straightforward but effective algorithm.  
To get the best results:
- Scale your data properly.
- Use cross-validation.
- Carefully choose hyperparameters using tuning methods.
""")