import streamlit as st
st.set_page_config(page_title="KNN", page_icon="๐ค", layout="wide")
st.markdown("
๐ K-Nearest Neighbors (KNN) Algorithm
", unsafe_allow_html=True)
st.sidebar.title("๐ค KNN App")
st.sidebar.markdown("Explore KNN concepts step-by-step using the sections below.")
option = st.radio(
"Select a concept to learn:",
(
"๐ What is KNN?",
"โ๏ธ How Does KNN Work?",
"๐ฏ Underfitting vs Overfitting",
"๐ Training vs Cross-Validation Error",
"๐ ๏ธ Hyperparameter Tuning",
"โ๏ธ Feature Scaling",
"๐งฎ Weighted KNN",
"๐บ๏ธ Decision Regions",
"๐ Cross-Validation Explained"
)
)
if option == "๐ What is KNN?":
st.write("""
K-Nearest Neighbors (KNN) is a **non-parametric**, **lazy learning** algorithm used for both classification and regression.
โ
It stores all training data instead of learning a function.
โ
It uses distance metrics (e.g., Euclidean, Manhattan) to make predictions.
โ
Suitable for small to moderately sized datasets.
""")
elif option == "โ๏ธ How Does KNN Work?":
st.write("""
**Training Phase:**
- No actual training occurs. KNN memorizes the training dataset.
**Prediction Phase (Classification):**
1. Choose a value of **K**
2. Calculate distances from the new point to all others
3. Pick **K closest** points
4. Use majority vote to classify
**Prediction Phase (Regression):**
- Average the values of the K nearest neighbors.
""")
elif option == "๐ฏ Underfitting vs Overfitting":
st.write("""
- **Overfitting**: The model is too specific to the training data. Poor on unseen data.
- **Underfitting**: The model is too simple. Poor even on training data.
- **Ideal Model**: A balance that performs well on both seen and unseen data.
""")
elif option == "๐ Training vs Cross-Validation Error":
st.write("""
- **Training Error** is the error on the known training data.
- **Cross-Validation Error** is from unseen validation data.
โ
Use cross-validation to pick the best value of `K`.
๐ Big gap = Overfitting; Both high = Underfitting.
""")
elif option == "๐ ๏ธ Hyperparameter Tuning":
st.write("""
- **K**: Number of neighbors โ test multiple values.
- **Weights**: Equal (`uniform`) or based on distance (`distance`).
- **Metric**: How distance is measured (Euclidean, Manhattan).
- Use **Grid Search**, **Random Search**, or **Optuna** for best tuning.
""")
elif option == "โ๏ธ Feature Scaling":
st.write("""
KNN uses distances โ so features must be on the same scale.
- **Normalization** scales data between 0 and 1.
- **Standardization** centers data around mean 0.
โ ๏ธ Always scale data after splitting to avoid leakage.
""")
elif option == "๐งฎ Weighted KNN":
st.write("""
Weighted KNN assigns higher importance to closer neighbors.
- Use `weights='distance'` to apply this logic in libraries like scikit-learn.
- Helps in noisy datasets or when closer points are more meaningful.
""")
elif option == "๐บ๏ธ Decision Regions":
st.write("""
- Small `k` values create complex, wiggly decision boundaries (overfitting).
- Larger `k` smooths the boundary (better generalization).
- Visualizing decision regions helps understand the algorithmโs behavior.
""")
elif option == "๐ Cross-Validation Explained":
st.write("""
- **K-Fold Cross-Validation** splits data into `K` parts.
- The model trains on K-1 parts and tests on the remaining part.
- Helps evaluate model stability and avoid overfitting.
""")
st.markdown("๐ Try KNN in Colab:
", unsafe_allow_html=True)
st.markdown("""
๐ Open Jupyter Notebook on Colab
""", unsafe_allow_html=True)
st.success("KNN is easy to understand and surprisingly powerful! Tune it well, scale your data, and validate your model to get the best results.")