Spaces:
Sleeping
Sleeping
Update pages/10_Decision_Tree.py
Browse files- pages/10_Decision_Tree.py +158 -0
pages/10_Decision_Tree.py
CHANGED
|
@@ -0,0 +1,158 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
from sklearn import tree
|
| 3 |
+
from sklearn.datasets import load_iris
|
| 4 |
+
from sklearn.tree import export_graphviz
|
| 5 |
+
import graphviz
|
| 6 |
+
import pandas as pd
|
| 7 |
+
|
| 8 |
+
st.set_page_config(page_title="Decision Tree Explorer", page_icon="π³", layout="wide")
|
| 9 |
+
|
| 10 |
+
st.title("π³ Decision Tree Algorithm Explorer")
|
| 11 |
+
st.write("Understand how Decision Trees work with simple explanations, visuals, and real-world examples.")
|
| 12 |
+
|
| 13 |
+
section = st.radio("Choose a topic to explore:", [
|
| 14 |
+
"What is a Decision Tree?",
|
| 15 |
+
"How It Works",
|
| 16 |
+
"Entropy vs Gini",
|
| 17 |
+
"Tree Construction",
|
| 18 |
+
"Classification vs Regression",
|
| 19 |
+
"Pruning",
|
| 20 |
+
"Feature Importance",
|
| 21 |
+
"Visualize Example Tree",
|
| 22 |
+
"Try with Iris Data"
|
| 23 |
+
])
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
if section == "What is a Decision Tree?":
|
| 27 |
+
st.header("π What is a Decision Tree?")
|
| 28 |
+
st.markdown("""
|
| 29 |
+
A **Decision Tree** is a flowchart-like model that makes decisions based on a series of questions.
|
| 30 |
+
|
| 31 |
+
- π― Used in both **classification** (e.g., spam vs. not spam) and **regression** (e.g., predicting price).
|
| 32 |
+
- π± It starts at a **root**, asks a question, and branches out based on answers.
|
| 33 |
+
- π Ends at a **leaf node** β which is the prediction.
|
| 34 |
+
|
| 35 |
+
**Real-life example:**
|
| 36 |
+
You're deciding what to wear. You ask:
|
| 37 |
+
1. Is it raining?
|
| 38 |
+
2. Is it cold?
|
| 39 |
+
β Based on your answers, you decide: jacket, umbrella, or just a T-shirt.
|
| 40 |
+
""")
|
| 41 |
+
|
| 42 |
+
elif section == "How It Works":
|
| 43 |
+
st.header("βοΈ How Does It Work?")
|
| 44 |
+
st.markdown("""
|
| 45 |
+
**Step-by-step:**
|
| 46 |
+
1. Start with the whole dataset.
|
| 47 |
+
2. Choose the feature that best splits the data.
|
| 48 |
+
3. Split the dataset.
|
| 49 |
+
4. Repeat until you reach a stopping condition.
|
| 50 |
+
|
| 51 |
+
**Used concepts:**
|
| 52 |
+
- Entropy (information gain)
|
| 53 |
+
- Gini impurity
|
| 54 |
+
""")
|
| 55 |
+
|
| 56 |
+
elif section == "Entropy vs Gini":
|
| 57 |
+
st.header("π Entropy vs Gini")
|
| 58 |
+
st.markdown("""
|
| 59 |
+
### Entropy
|
| 60 |
+
Measures randomness or disorder in data.
|
| 61 |
+
$$
|
| 62 |
+
H(Y) = - \sum p_i \log_2 p_i
|
| 63 |
+
$$
|
| 64 |
+
|
| 65 |
+
### Gini Impurity
|
| 66 |
+
Measures the probability of wrong classification.
|
| 67 |
+
$$
|
| 68 |
+
Gini(Y) = 1 - \sum p_i^2
|
| 69 |
+
$$
|
| 70 |
+
|
| 71 |
+
**Which to use?**
|
| 72 |
+
- Gini is faster β default in scikit-learn.
|
| 73 |
+
- Entropy gives more information-theoretic understanding.
|
| 74 |
+
""")
|
| 75 |
+
|
| 76 |
+
elif section == "Tree Construction":
|
| 77 |
+
st.header("π§ How is the Tree Built?")
|
| 78 |
+
st.markdown("""
|
| 79 |
+
The tree is built **top-down** using a greedy algorithm:
|
| 80 |
+
|
| 81 |
+
- Best feature is chosen using Gini or Entropy.
|
| 82 |
+
- Splits continue until stopping criteria (e.g., max depth, pure leaf).
|
| 83 |
+
|
| 84 |
+
**Tip**: Too many splits = overfitting!
|
| 85 |
+
""")
|
| 86 |
+
|
| 87 |
+
elif section == "Classification vs Regression":
|
| 88 |
+
st.header("π Classification vs Regression")
|
| 89 |
+
st.markdown("""
|
| 90 |
+
- **Classification Tree**: Predicts categories (Yes/No, Spam/Ham).
|
| 91 |
+
- **Regression Tree**: Predicts continuous values (e.g., house price).
|
| 92 |
+
|
| 93 |
+
**Example:**
|
| 94 |
+
- Classification: Will a customer churn?
|
| 95 |
+
- Regression: What will be the next monthβs sales?
|
| 96 |
+
""")
|
| 97 |
+
|
| 98 |
+
elif section == "Pruning":
|
| 99 |
+
st.header("βοΈ Pruning Techniques")
|
| 100 |
+
st.markdown("""
|
| 101 |
+
**Why prune?**
|
| 102 |
+
To avoid overfitting by cutting unnecessary branches.
|
| 103 |
+
|
| 104 |
+
### Pre-Pruning
|
| 105 |
+
- `max_depth`: limit depth
|
| 106 |
+
- `min_samples_split`: split only if enough samples
|
| 107 |
+
- `min_samples_leaf`: limit how small leaves can be
|
| 108 |
+
|
| 109 |
+
### Post-Pruning
|
| 110 |
+
- Cost Complexity Pruning (using Ξ±)
|
| 111 |
+
""")
|
| 112 |
+
|
| 113 |
+
elif section == "Feature Importance":
|
| 114 |
+
st.header("π Feature Importance")
|
| 115 |
+
st.markdown("""
|
| 116 |
+
Decision Trees calculate how important each feature is by how much it reduces impurity.
|
| 117 |
+
|
| 118 |
+
**Formula:**
|
| 119 |
+
$$
|
| 120 |
+
Importance = \frac{Total\ Gain\ from\ Feature}{Total\ Gain\ from\ All\ Features}
|
| 121 |
+
$$
|
| 122 |
+
|
| 123 |
+
Useful for feature selection and explaining model decisions.
|
| 124 |
+
""")
|
| 125 |
+
|
| 126 |
+
elif section == "Visualize Example Tree":
|
| 127 |
+
st.header("πΏ Visualize a Small Tree Example")
|
| 128 |
+
iris = load_iris()
|
| 129 |
+
clf = tree.DecisionTreeClassifier(max_depth=3)
|
| 130 |
+
clf = clf.fit(iris.data, iris.target)
|
| 131 |
+
dot_data = tree.export_graphviz(clf, out_file=None,
|
| 132 |
+
feature_names=iris.feature_names,
|
| 133 |
+
class_names=iris.target_names,
|
| 134 |
+
filled=True, rounded=True,
|
| 135 |
+
special_characters=True)
|
| 136 |
+
graph = graphviz.source(dot_data)
|
| 137 |
+
st.graphviz_chart(dot_data)
|
| 138 |
+
|
| 139 |
+
elif section == "Try with Iris Data":
|
| 140 |
+
st.header("πΈ Try with Iris Dataset")
|
| 141 |
+
iris = load_iris()
|
| 142 |
+
df = pd.DataFrame(iris.data, columns=iris.feature_names)
|
| 143 |
+
df['target'] = iris.target
|
| 144 |
+
st.write("Here's a preview of the dataset:")
|
| 145 |
+
st.dataframe(df.head())
|
| 146 |
+
|
| 147 |
+
st.markdown("### Build and visualize a Decision Tree")
|
| 148 |
+
max_depth = st.slider("Select max depth of the tree:", 1, 5, 3)
|
| 149 |
+
clf = tree.DecisionTreeClassifier(max_depth=max_depth)
|
| 150 |
+
clf = clf.fit(iris.data, iris.target)
|
| 151 |
+
dot_data = tree.export_graphviz(clf, out_file=None,
|
| 152 |
+
feature_names=iris.feature_names,
|
| 153 |
+
class_names=iris.target_names,
|
| 154 |
+
filled=True, rounded=True)
|
| 155 |
+
st.graphviz_chart(dot_data)
|
| 156 |
+
|
| 157 |
+
st.markdown("---")
|
| 158 |
+
st.success("β
Decision Trees are simple yet powerful! Tune them well, visualize their structure, and understand every split.")
|