Harika22 commited on
Commit
15bf019
Β·
verified Β·
1 Parent(s): 2138c0e

Update pages/10_Decision_Tree.py

Browse files
Files changed (1) hide show
  1. pages/10_Decision_Tree.py +158 -0
pages/10_Decision_Tree.py CHANGED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from sklearn import tree
3
+ from sklearn.datasets import load_iris
4
+ from sklearn.tree import export_graphviz
5
+ import graphviz
6
+ import pandas as pd
7
+
8
+ st.set_page_config(page_title="Decision Tree Explorer", page_icon="🌳", layout="wide")
9
+
10
+ st.title("🌳 Decision Tree Algorithm Explorer")
11
+ st.write("Understand how Decision Trees work with simple explanations, visuals, and real-world examples.")
12
+
13
+ section = st.radio("Choose a topic to explore:", [
14
+ "What is a Decision Tree?",
15
+ "How It Works",
16
+ "Entropy vs Gini",
17
+ "Tree Construction",
18
+ "Classification vs Regression",
19
+ "Pruning",
20
+ "Feature Importance",
21
+ "Visualize Example Tree",
22
+ "Try with Iris Data"
23
+ ])
24
+
25
+
26
+ if section == "What is a Decision Tree?":
27
+ st.header("πŸ“˜ What is a Decision Tree?")
28
+ st.markdown("""
29
+ A **Decision Tree** is a flowchart-like model that makes decisions based on a series of questions.
30
+
31
+ - 🎯 Used in both **classification** (e.g., spam vs. not spam) and **regression** (e.g., predicting price).
32
+ - 🌱 It starts at a **root**, asks a question, and branches out based on answers.
33
+ - πŸ”š Ends at a **leaf node** β€” which is the prediction.
34
+
35
+ **Real-life example:**
36
+ You're deciding what to wear. You ask:
37
+ 1. Is it raining?
38
+ 2. Is it cold?
39
+ β†’ Based on your answers, you decide: jacket, umbrella, or just a T-shirt.
40
+ """)
41
+
42
+ elif section == "How It Works":
43
+ st.header("βš™οΈ How Does It Work?")
44
+ st.markdown("""
45
+ **Step-by-step:**
46
+ 1. Start with the whole dataset.
47
+ 2. Choose the feature that best splits the data.
48
+ 3. Split the dataset.
49
+ 4. Repeat until you reach a stopping condition.
50
+
51
+ **Used concepts:**
52
+ - Entropy (information gain)
53
+ - Gini impurity
54
+ """)
55
+
56
+ elif section == "Entropy vs Gini":
57
+ st.header("πŸ“Š Entropy vs Gini")
58
+ st.markdown("""
59
+ ### Entropy
60
+ Measures randomness or disorder in data.
61
+ $$
62
+ H(Y) = - \sum p_i \log_2 p_i
63
+ $$
64
+
65
+ ### Gini Impurity
66
+ Measures the probability of wrong classification.
67
+ $$
68
+ Gini(Y) = 1 - \sum p_i^2
69
+ $$
70
+
71
+ **Which to use?**
72
+ - Gini is faster β†’ default in scikit-learn.
73
+ - Entropy gives more information-theoretic understanding.
74
+ """)
75
+
76
+ elif section == "Tree Construction":
77
+ st.header("πŸ”§ How is the Tree Built?")
78
+ st.markdown("""
79
+ The tree is built **top-down** using a greedy algorithm:
80
+
81
+ - Best feature is chosen using Gini or Entropy.
82
+ - Splits continue until stopping criteria (e.g., max depth, pure leaf).
83
+
84
+ **Tip**: Too many splits = overfitting!
85
+ """)
86
+
87
+ elif section == "Classification vs Regression":
88
+ st.header("πŸ“ˆ Classification vs Regression")
89
+ st.markdown("""
90
+ - **Classification Tree**: Predicts categories (Yes/No, Spam/Ham).
91
+ - **Regression Tree**: Predicts continuous values (e.g., house price).
92
+
93
+ **Example:**
94
+ - Classification: Will a customer churn?
95
+ - Regression: What will be the next month’s sales?
96
+ """)
97
+
98
+ elif section == "Pruning":
99
+ st.header("βœ‚οΈ Pruning Techniques")
100
+ st.markdown("""
101
+ **Why prune?**
102
+ To avoid overfitting by cutting unnecessary branches.
103
+
104
+ ### Pre-Pruning
105
+ - `max_depth`: limit depth
106
+ - `min_samples_split`: split only if enough samples
107
+ - `min_samples_leaf`: limit how small leaves can be
108
+
109
+ ### Post-Pruning
110
+ - Cost Complexity Pruning (using Ξ±)
111
+ """)
112
+
113
+ elif section == "Feature Importance":
114
+ st.header("πŸ“Œ Feature Importance")
115
+ st.markdown("""
116
+ Decision Trees calculate how important each feature is by how much it reduces impurity.
117
+
118
+ **Formula:**
119
+ $$
120
+ Importance = \frac{Total\ Gain\ from\ Feature}{Total\ Gain\ from\ All\ Features}
121
+ $$
122
+
123
+ Useful for feature selection and explaining model decisions.
124
+ """)
125
+
126
+ elif section == "Visualize Example Tree":
127
+ st.header("🌿 Visualize a Small Tree Example")
128
+ iris = load_iris()
129
+ clf = tree.DecisionTreeClassifier(max_depth=3)
130
+ clf = clf.fit(iris.data, iris.target)
131
+ dot_data = tree.export_graphviz(clf, out_file=None,
132
+ feature_names=iris.feature_names,
133
+ class_names=iris.target_names,
134
+ filled=True, rounded=True,
135
+ special_characters=True)
136
+ graph = graphviz.source(dot_data)
137
+ st.graphviz_chart(dot_data)
138
+
139
+ elif section == "Try with Iris Data":
140
+ st.header("🌸 Try with Iris Dataset")
141
+ iris = load_iris()
142
+ df = pd.DataFrame(iris.data, columns=iris.feature_names)
143
+ df['target'] = iris.target
144
+ st.write("Here's a preview of the dataset:")
145
+ st.dataframe(df.head())
146
+
147
+ st.markdown("### Build and visualize a Decision Tree")
148
+ max_depth = st.slider("Select max depth of the tree:", 1, 5, 3)
149
+ clf = tree.DecisionTreeClassifier(max_depth=max_depth)
150
+ clf = clf.fit(iris.data, iris.target)
151
+ dot_data = tree.export_graphviz(clf, out_file=None,
152
+ feature_names=iris.feature_names,
153
+ class_names=iris.target_names,
154
+ filled=True, rounded=True)
155
+ st.graphviz_chart(dot_data)
156
+
157
+ st.markdown("---")
158
+ st.success("βœ… Decision Trees are simple yet powerful! Tune them well, visualize their structure, and understand every split.")