Spaces:
Sleeping
Sleeping
File size: 5,278 Bytes
15bf019 c70fab9 15bf019 c70fab9 15bf019 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
import streamlit as st
from sklearn import tree
from sklearn.datasets import load_iris
from sklearn.tree import export_graphviz
import graphviz
import pandas as pd
st.set_page_config(page_title="Decision Tree Explorer", page_icon="π³", layout="wide")
st.title("π³ Decision Tree Algorithm Explorer")
st.write("Understand how Decision Trees work with simple explanations, visuals, and real-world examples.")
section = st.radio("Choose a topic to explore:", [
"What is a Decision Tree?",
"How It Works",
"Entropy vs Gini",
"Tree Construction",
"Classification vs Regression",
"Pruning",
"Feature Importance",
"Visualize Example Tree",
"Try with Iris Data"
])
if section == "What is a Decision Tree?":
st.header("π What is a Decision Tree?")
st.markdown("""
A **Decision Tree** is a flowchart-like model that makes decisions based on a series of questions.
- π― Used in both **classification** (e.g., spam vs. not spam) and **regression** (e.g., predicting price).
- π± It starts at a **root**, asks a question, and branches out based on answers.
- π Ends at a **leaf node** β which is the prediction.
**Real-life example:**
You're deciding what to wear. You ask:
1. Is it raining?
2. Is it cold?
β Based on your answers, you decide: jacket, umbrella, or just a T-shirt.
""")
elif section == "How It Works":
st.header("βοΈ How Does It Work?")
st.markdown("""
**Step-by-step:**
1. Start with the whole dataset.
2. Choose the feature that best splits the data.
3. Split the dataset.
4. Repeat until you reach a stopping condition.
**Used concepts:**
- Entropy (information gain)
- Gini impurity
""")
elif section == "Entropy vs Gini":
st.header("π Entropy vs Gini")
st.markdown("""
### Entropy
Measures randomness or disorder in data.
$$
H(Y) = - \sum p_i \log_2 p_i
$$
### Gini Impurity
Measures the probability of wrong classification.
$$
Gini(Y) = 1 - \sum p_i^2
$$
**Which to use?**
- Gini is faster β default in scikit-learn.
- Entropy gives more information-theoretic understanding.
""")
elif section == "Tree Construction":
st.header("π§ How is the Tree Built?")
st.markdown("""
The tree is built **top-down** using a greedy algorithm:
- Best feature is chosen using Gini or Entropy.
- Splits continue until stopping criteria (e.g., max depth, pure leaf).
**Tip**: Too many splits = overfitting!
""")
elif section == "Classification vs Regression":
st.header("π Classification vs Regression")
st.markdown("""
- **Classification Tree**: Predicts categories (Yes/No, Spam/Ham).
- **Regression Tree**: Predicts continuous values (e.g., house price).
**Example:**
- Classification: Will a customer churn?
- Regression: What will be the next monthβs sales?
""")
elif section == "Pruning":
st.header("βοΈ Pruning Techniques")
st.markdown("""
**Why prune?**
To avoid overfitting by cutting unnecessary branches.
### Pre-Pruning
- `max_depth`: limit depth
- `min_samples_split`: split only if enough samples
- `min_samples_leaf`: limit how small leaves can be
### Post-Pruning
- Cost Complexity Pruning (using Ξ±)
""")
elif section == "Feature Importance":
st.header("π Feature Importance")
st.markdown("""
Decision Trees calculate how important each feature is by how much it reduces impurity.
**Formula:**
$$
Importance = \frac{Total\ Gain\ from\ Feature}{Total\ Gain\ from\ All\ Features}
$$
Useful for feature selection and explaining model decisions.
""")
elif section == "Visualize Example Tree":
st.header("πΏ Visualize a Small Tree Example")
iris = load_iris()
clf = tree.DecisionTreeClassifier(max_depth=3)
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
st.graphviz_chart(dot_data)
elif section == "Try with Iris Data":
st.header("πΈ Try with Iris Dataset")
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
st.write("Here's a preview of the dataset:")
st.dataframe(df.head())
st.markdown("### Build and visualize a Decision Tree")
max_depth = st.slider("Select max depth of the tree:", 1, 5, 3)
clf = tree.DecisionTreeClassifier(max_depth=max_depth)
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True)
st.graphviz_chart(dot_data)
st.markdown("---")
st.success("β
Decision Trees are simple yet powerful! Tune them well, visualize their structure, and understand every split.")
|