Spaces:
Sleeping
Sleeping
File size: 5,555 Bytes
fb162fb a1a8e8b fb162fb a1a8e8b fb162fb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
import streamlit as st
st.set_page_config(page_title="KNN", page_icon="π€", layout="wide")
# Styling - Removed background color, kept font styles
st.markdown("""
<style>
h1, h2, h3 {
color: #003366;
}
.custom-font, p {
font-family: 'Arial', sans-serif;
font-size: 18px;
line-height: 1.6;
}
</style>
""", unsafe_allow_html=True)
# Title
st.markdown("<h1 style='color: #003366;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)
st.image("https://cdn-uploads.huggingface.co/production/uploads/66be28cc7e8987822d129400/7vQBEhJAO_Bju6_SsK0Ms.png")
# Introduction
st.write("""
K-Nearest Neighbors (KNN) is a basic yet powerful machine learning algorithm used for both **classification** and **regression** tasks. It makes predictions by looking at the 'K' closest data points in the training set.
### Key Characteristics:
- KNN is a **non-parametric** and **instance-based** algorithm.
- It **stores** the training data instead of learning a function from it.
- Predictions are made based on **similarity** (distance metrics like Euclidean).
""")
# Working of KNN
st.markdown("<h2 style='color: #003366;'>How KNN Works</h2>", unsafe_allow_html=True)
st.subheader("Training Phase")
st.write("""
- There is **no actual training** involved.
- The algorithm simply memorizes the training data.
""")
st.subheader("Prediction Phase (Classification)")
st.write("""
1. Select a value for **K** (number of neighbors).
2. Measure distances between the new point and training samples.
3. Identify the **K nearest points**.
4. Assign the class based on **majority vote**.
""")
st.subheader("Prediction Phase (Regression)")
st.write("""
1. Choose a value for **K**.
2. Compute distances to training data.
3. Select the **K closest neighbors**.
4. Predict the output as the **average** (or weighted average) of these neighbors' values.
""")
# Overfitting vs Underfitting
st.markdown("<h2 style='color: #003366;'>Model Fit: Overfitting vs Underfitting</h2>", unsafe_allow_html=True)
st.write("""
- **Overfitting**: Happens when K is too low (e.g., K=1); the model becomes too sensitive to noise.
- **Underfitting**: Happens when K is too high; model may miss important patterns.
- **Optimal Fit**: Requires selecting a K value that provides a good balance between bias and variance.
""")
# Training vs CV Error
st.markdown("<h2 style='color: #003366;'>Training vs Cross-Validation Error</h2>", unsafe_allow_html=True)
st.write("""
To choose the best `K`, monitor both:
- **Training Error**: Error on the training set.
- **Cross-Validation (CV) Error**: Error on a validation set, helps assess generalization.
High training accuracy but poor CV accuracy = overfitting.
Low training and CV accuracy = underfitting.
""")
# Hyperparameter tuning
st.markdown("<h2 style='color: #003366;'>KNN Hyperparameters</h2>", unsafe_allow_html=True)
st.write("""
Main parameters to tune:
- `n_neighbors` (K value)
- `weights`: 'uniform' or 'distance'
- `metric`: Distance metric, e.g., 'euclidean', 'manhattan'
- `n_jobs`: Use multiple processors for speed
These can be optimized using Grid Search, Random Search, or Bayesian methods.
""")
# Feature Scaling
st.markdown("<h2 style='color: #003366;'>Why Feature Scaling is Crucial</h2>", unsafe_allow_html=True)
st.write("""
KNN relies on distance between points, so features must be on the same scale.
Options:
- **Normalization** (MinMax Scaling): Range [0, 1]
- **Standardization** (Z-score): Mean 0, Std 1
**Important**: Apply scaling *after* splitting your data to avoid data leakage.
""")
# Weighted KNN
st.markdown("<h2 style='color: #003366;'>Weighted KNN</h2>", unsafe_allow_html=True)
st.write("""
In Weighted KNN, closer neighbors contribute more to the prediction.
This is especially useful when nearby data points are more reliable than distant ones.
""")
# Decision regions
st.markdown("<h2 style='color: #003366;'>Decision Boundaries</h2>", unsafe_allow_html=True)
st.write("""
- K=1 produces sharp, complex boundaries β risk of overfitting.
- Larger K smoothens the boundary β reduces variance but increases bias.
""")
# Cross-validation
st.markdown("<h2 style='color: #003366;'>Understanding Cross-Validation</h2>", unsafe_allow_html=True)
st.write("""
Cross-validation helps evaluate how well the model generalizes.
**K-Fold Cross Validation**:
- Split data into K parts.
- Train on K-1 parts, test on the remaining.
- Repeat K times and average the performance.
""")
# Hyperparameter search methods
st.markdown("<h2 style='color: #003366;'>Hyperparameter Tuning Methods</h2>", unsafe_allow_html=True)
st.write("""
- **Grid Search**: Tests all combinations β reliable but slow.
- **Random Search**: Randomly samples combinations β faster, may miss optimal.
- **Bayesian Optimization**: Uses past performance to choose next candidates β efficient and smart.
""")
# Link to implementation
st.markdown("<h2 style='color: #003366;'>KNN Code Implementation</h2>", unsafe_allow_html=True)
st.markdown(
"<a href='https://colab.research.google.com/drive/12lD7ceLj5BPiB6tgxaWXciB1IOMYGyZg#scrollTo=96210031-7967-41c4-9de2-56135c423404' target='_blank' style='font-size: 16px; color: #003366;'>Click here to view the notebook</a>",
unsafe_allow_html=True
)
# Summary
st.write("""
KNN is a straightforward but effective algorithm.
To get the best results:
- Scale your data properly.
- Use cross-validation.
- Carefully choose hyperparameters using tuning methods.
""")
|