Spaces:

DOMMETI
/

From_Zero_to_ML_Hero

Sleeping

App Files Files Community

DOMMETI commited on Apr 10, 2025

Commit

d4c475e

verified ·

1 Parent(s): 9a3b5ab

Create 11_Dession_Tree.py

Browse files

Files changed (1) hide show

pages/11_Dession_Tree.py +194 -0

pages/11_Dession_Tree.py ADDED Viewed

	@@ -0,0 +1,194 @@

+import streamlit as st
+# Page configuration
+st.set_page_config(page_title="Decision Tree", page_icon="🌳", layout="wide")
+# Custom dark theme and styling
+st.markdown("""
+    <style>
+        .stApp {
+            background-color: #1e1e1e;
+            color: white;
+        }
+        h1, h2, h3 {
+            color: #FF4C60;
+        }
+        .sidebar .sidebar-content {
+            background-color: #1e1e1e;
+        }
+        a {
+            color: #58a6ff;
+            text-decoration: none;
+        }
+        a:hover {
+            color: #1f78d1;
+        }
+    </style>
+""", unsafe_allow_html=True)
+# Sidebar
+st.sidebar.title("🌳 Decision Tree")
+st.sidebar.markdown("Learn all about Decision Trees with intuitive sections.")
+st.sidebar.markdown("---")
+# Main Title
+st.markdown("<h1 style='text-align: center;'>🌳 Decision Tree Algorithm (Theory)</h1>", unsafe_allow_html=True)
+# What is a Decision Tree?
+with st.expander("📘 What is a Decision Tree?"):
+    st.write("""
+    A **Decision Tree** is a supervised machine learning algorithm used for **classification** and **regression**.
+    It models decisions using a tree structure:
+    - 🟢 **Root Node**: Represents the entire dataset
+    - 🔵 **Internal Nodes**: Feature-based decision points
+    - 🟣 **Leaf Nodes**: Output/Prediction
+    The tree splits based on **if-else** logic using the best feature at each level.
+    """)
+# Entropy
+with st.expander("🧮 Entropy - Measuring Uncertainty"):
+    st.write("""
+    **Entropy** measures the impurity or disorder in the data.
+    It's used in Decision Trees to decide the best split.
+    **Formula:**
+    $$
+    H(Y) = - \sum_{i=1}^{n} p_i \log_2(p_i)
+    $$
+    Where:
+    - \( p_i \) = Probability of class \( i \)
+    **Example**:
+    For a dataset with two classes (Yes = 0.5, No = 0.5):
+    $$
+    H(Y) = - (0.5 \log_2 0.5 + 0.5 \log_2 0.5) = 1
+    $$
+    ✅ Maximum entropy = 1 → complete randomness.
+    """)
+# Gini Impurity
+with st.expander("⚖️ Gini Impurity - Measuring Purity"):
+    st.write("""
+    **Gini Impurity** is another metric to evaluate split quality.
+    **Formula:**
+    $$
+    Gini(Y) = 1 - \sum_{i=1}^{n} p_i^2
+    $$
+    Where:
+    - \( p_i \) = Probability of class \( i \)
+    **Example**:
+    For two classes (Yes = 0.5, No = 0.5):
+    $$
+    Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5
+    $$
+    ✅ Gini of 0.5 means equal class distribution (impure).
+    """)
+# Construction
+with st.expander("🔧 Tree Construction Process"):
+    st.write("""
+    The tree is built **top-down**, selecting features that reduce impurity the most.
+    Splitting stops when:
+    - Impurity = 0
+    - Max depth reached
+    - No further splits possible
+    Each decision creates **branches**, until final predictions are in the **leaf nodes**.
+    """)
+# Iris Example
+with st.expander("🌸 Example: Iris Dataset Tree"):
+    st.write("""
+    The Decision Tree for the Iris dataset classifies flowers into:
+    - Setosa
+    - Versicolor
+    - Virginica
+    Based on petal/sepal length & width.
+    🧠 Each node checks a feature and threshold, sending the sample left or right.
+    """)
+# Classification
+with st.expander("🧪 Classification: Training & Testing"):
+    st.write("""
+    **Training Phase:**
+    - Learn rules from training data using Entropy/Gini
+    **Testing Phase:**
+    - Follow the decision path based on feature values
+    - Reach a leaf node with the predicted class
+    Example: Predicting the Iris species based on petal width.
+    """)
+# Regression
+with st.expander("📈 Regression: Training & Testing"):
+    st.write("""
+    **Training Phase:**
+    - Build the tree using splits that minimize **Mean Squared Error (MSE)**
+    **Testing Phase:**
+    - Average of outputs in the leaf node is the prediction
+    Example: Predicting house prices from square footage, etc.
+    """)
+# Pre-Pruning
+with st.expander("✂️ Pre-Pruning Techniques"):
+    st.write("""
+    Limit the tree's growth to prevent overfitting.
+    - `max_depth`: Limits depth of tree
+    - `min_samples_split`: Min samples to split a node
+    - `min_samples_leaf`: Min samples in a leaf node
+    - `max_features`: Limits features considered per split
+    """)
+# Post-Pruning
+with st.expander("🔙 Post-Pruning Techniques"):
+    st.write("""
+    Prune a fully grown tree to remove weak branches.
+    Techniques:
+    - **Cost Complexity Pruning** using α (alpha)
+    - **Validation-based pruning**: Use a validation set to remove non-helpful branches
+    """)
+# Feature Selection
+with st.expander("📊 Feature Selection using Decision Tree"):
+    st.write("""
+    Decision Trees rank features by their **information gain** or impurity reduction.
+    **Feature Importance Formula:**
+    $$
+    Importance(f) = \frac{\text{Total reduction in impurity from } f}{\text{Total reduction in impurity from all features}}
+    $$
+    Higher score = more impact on the model’s decisions.
+    """)
+# Colab Link
+st.markdown("---")
+st.markdown("### 📓 Try It Yourself: Open the Colab Notebook")
+st.markdown("""
+<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank'>
+🔗 Open Decision Tree Notebook in Colab
+</a>
+""", unsafe_allow_html=True)
+# Final note
+st.success("Decision Trees are interpretable, powerful, and great for both classification and regression. Keep exploring!")