Ramyamaheswari's picture
Update app.py
2ab91a3 verified
raw
history blame
6.1 kB
import streamlit as st
# Set page configuration
st.set_page_config(page_title="Decision Tree Theory", layout="wide")
# Custom CSS styling
st.markdown("""
<style>
.stApp {
background-color: #4A90E2;
}
h1, h2, h3 {
color: #003366;
}
.custom-font, p {
font-family: 'Arial', sans-serif;
font-size: 18px;
color: white;
line-height: 1.6;
}
</style>
""", unsafe_allow_html=True)
# Title
st.markdown("<h1 style='color: #003366;'>Understanding Decision Trees</h1>", unsafe_allow_html=True)
# Introduction
st.markdown("""
A **Decision Tree** is a versatile supervised learning algorithm used for both **classification** and **regression** tasks. It mimics human decision-making by using a tree-like model of decisions and their possible consequences.
The basic structure includes:
- **Root Node**: Represents the complete dataset.
- **Internal Nodes**: Represent conditions on features.
- **Leaf Nodes**: Represent outcomes or predictions.
Think of it as a flowchart where each internal node asks a question, and each branch represents the outcome, eventually leading to a final decision.
""", unsafe_allow_html=True)
# Entropy
st.markdown("<h2 style='color: #003366;'>Entropy: Quantifying Uncertainty</h2>", unsafe_allow_html=True)
st.markdown("""
**Entropy** measures the amount of randomness or disorder in the data. Itโ€™s commonly used in classification problems to decide how informative a feature is.
Entropy formula:
""")
st.image("entropy-formula-2.jpg", width=300)
st.markdown("""
Where:
- \( p(i) \) is the probability of class \( i \).
**Example**:
- If \( P(Yes) = 0.5 \), \( P(No) = 0.5 \),
Then:
$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$
This indicates maximum uncertainty (perfectly balanced classes).
""", unsafe_allow_html=True)
# Gini Impurity
st.markdown("<h2 style='color: #003366;'>Gini Impurity: Measuring Impurity</h2>", unsafe_allow_html=True)
st.markdown("""
**Gini Impurity** is another popular impurity measure. It calculates how often a randomly chosen element would be incorrectly labeled.
Formula:
""")
st.image("gini.png", width=300)
st.markdown("""
**Example**:
- \( P(Yes) = 0.5 \), \( P(No) = 0.5 \)
Then:
$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$
A lower Gini value means purer splits.
""", unsafe_allow_html=True)
# Tree Construction
st.markdown("<h2 style='color: #003366;'>Building the Decision Tree</h2>", unsafe_allow_html=True)
st.markdown("""
Decision Trees are built **top-down**, starting from the root. At each node, the algorithm selects the feature that best splits the data using metrics like **Entropy** or **Gini**.
Splitting stops when:
- The data is pure (contains one class), or
- A stopping condition is met (like maximum depth).
""", unsafe_allow_html=True)
# Iris Tree Visualization
st.markdown("<h2 style='color: #003366;'>Visualizing: Iris Dataset Tree</h2>", unsafe_allow_html=True)
st.markdown("""
Here's an example decision tree trained on the famous **Iris dataset**, which classifies flower species based on petal and sepal measurements.
""", unsafe_allow_html=True)
st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)
# Training & Testing - Classification
st.markdown("<h2 style='color: #003366;'>Training & Testing (Classification)</h2>", unsafe_allow_html=True)
st.markdown("""
**Training**:
- Select features and split based on impurity reduction.
- Recursively grow the tree until stopping criteria are met.
**Testing**:
- Traverse the tree with new data.
- Follow the decision rules until you reach a leaf node (prediction).
๐Ÿ’ก *Example: For Iris, classify the flower as Setosa, Versicolor, or Virginica based on petal dimensions.*
""", unsafe_allow_html=True)
# Training & Testing - Regression
st.markdown("<h2 style='color: #003366;'>Training & Testing (Regression)</h2>", unsafe_allow_html=True)
st.markdown("""
**Training**:
- Split data to minimize **Mean Squared Error (MSE)**.
**Testing**:
- Output the mean value in the corresponding leaf.
๐Ÿ’ก *Example: Predict house price using features like size, location, and number of rooms.*
""", unsafe_allow_html=True)
# Pre-Pruning
st.markdown("<h2 style='color: #003366;'>Pre-Pruning: Control Overfitting Early</h2>", unsafe_allow_html=True)
st.markdown("""
Pre-pruning stops the tree from growing too deep and complex. Common techniques include:
- **Max Depth**
- **Min Samples Split**
- **Min Samples Leaf**
- **Max Features**
These help in generalizing better and reducing noise.
""", unsafe_allow_html=True)
# Post-Pruning
st.markdown("<h2 style='color: #003366;'>Post-Pruning: Simplify After Growth</h2>", unsafe_allow_html=True)
st.markdown("""
In **post-pruning**, we allow the tree to grow fully, then trim unnecessary branches:
- **Cost Complexity Pruning**
- **Validation-based Pruning**
This helps reduce overfitting and improves model simplicity.
""", unsafe_allow_html=True)
# Feature Importance
st.markdown("<h2 style='color: #003366;'>Feature Selection with Decision Trees</h2>", unsafe_allow_html=True)
st.markdown("""
Decision Trees provide insight into which features are most important based on how often and how effectively they split data.
""")
st.image("feature.png", width=500)
st.markdown("""
๐Ÿ’ก *Higher importance โ†’ More influential in decision making.*
""", unsafe_allow_html=True)
# Notebook Link
st.markdown("<h2 style='color: #003366;'>Explore Hands-On Implementation</h2>", unsafe_allow_html=True)
st.markdown(
"<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #003366;'>๐Ÿ”— Open Jupyter Notebook on Google Colab</a>",
unsafe_allow_html=True
)