🔍 K-Nearest Neighbors (KNN) Algorithm

import streamlit as st

st.set_page_config(page_title="KNN", page_icon="🤖", layout="wide")

st.markdown("<h1 style='text-align: center; color: #FF4C60;'>🔍 K-Nearest Neighbors (KNN) Algorithm</h1>", unsafe_allow_html=True)

st.sidebar.title("🤖 KNN App")
st.sidebar.markdown("Explore KNN concepts step-by-step using the sections below.")

option = st.radio(
    "Select a concept to learn:",
    (
        "📘 What is KNN?",
        "⚙️ How Does KNN Work?",
        "🎯 Underfitting vs Overfitting",
        "📉 Training vs Cross-Validation Error",
        "🛠️ Hyperparameter Tuning",
        "⚖️ Feature Scaling",
        "🧮 Weighted KNN",
        "🗺️ Decision Regions",
        "🔁 Cross-Validation Explained"
    )
)

if option == "📘 What is KNN?":
    st.write("""
    K-Nearest Neighbors (KNN) is a **non-parametric**, **lazy learning** algorithm used for both classification and regression.
    
    ✅ It stores all training data instead of learning a function.  
    ✅ It uses distance metrics (e.g., Euclidean, Manhattan) to make predictions.  
    ✅ Suitable for small to moderately sized datasets.
    """)

elif option == "⚙️ How Does KNN Work?":
    st.write("""
    **Training Phase:**
    - No actual training occurs. KNN memorizes the training dataset.

    **Prediction Phase (Classification):**
    1. Choose a value of **K**
    2. Calculate distances from the new point to all others
    3. Pick **K closest** points
    4. Use majority vote to classify

    **Prediction Phase (Regression):**
    - Average the values of the K nearest neighbors.
    """)

elif option == "🎯 Underfitting vs Overfitting":
    st.write("""
    - **Overfitting**: The model is too specific to the training data. Poor on unseen data.
    - **Underfitting**: The model is too simple. Poor even on training data.
    - **Ideal Model**: A balance that performs well on both seen and unseen data.
    """)

elif option == "📉 Training vs Cross-Validation Error":
    st.write("""
    - **Training Error** is the error on the known training data.
    - **Cross-Validation Error** is from unseen validation data.
    
    ✅ Use cross-validation to pick the best value of `K`.  
    🔍 Big gap = Overfitting; Both high = Underfitting.
    """)

elif option == "🛠️ Hyperparameter Tuning":
    st.write("""
    - **K**: Number of neighbors — test multiple values.
    - **Weights**: Equal (`uniform`) or based on distance (`distance`).
    - **Metric**: How distance is measured (Euclidean, Manhattan).
    - Use **Grid Search**, **Random Search**, or **Optuna** for best tuning.
    """)

elif option == "⚖️ Feature Scaling":
    st.write("""
    KNN uses distances — so features must be on the same scale.
    - **Normalization** scales data between 0 and 1.
    - **Standardization** centers data around mean 0.
    ⚠️ Always scale data after splitting to avoid leakage.
    """)

elif option == "🧮 Weighted KNN":
    st.write("""
    Weighted KNN assigns higher importance to closer neighbors.
    - Use `weights='distance'` to apply this logic in libraries like scikit-learn.
    - Helps in noisy datasets or when closer points are more meaningful.
    """)

elif option == "🗺️ Decision Regions":
    st.write("""
    - Small `k` values create complex, wiggly decision boundaries (overfitting).
    - Larger `k` smooths the boundary (better generalization).
    - Visualizing decision regions helps understand the algorithm’s behavior.
    """)

elif option == "🔁 Cross-Validation Explained":
    st.write("""
    - **K-Fold Cross-Validation** splits data into `K` parts.
    - The model trains on K-1 parts and tests on the remaining part.
    - Helps evaluate model stability and avoid overfitting.
    """)

st.markdown("<h2 style='color: #58a6ff;'>📓 Try KNN in Colab:</h2>", unsafe_allow_html=True)
st.markdown("""
<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank'>
🔗 Open Jupyter Notebook on Colab
</a>
""", unsafe_allow_html=True)

st.success("KNN is easy to understand and surprisingly powerful! Tune it well, scale your data, and validate your model to get the best results.")