DOMMETI commited on
Commit
d4c475e
ยท
verified ยท
1 Parent(s): 9a3b5ab

Create 11_Dession_Tree.py

Browse files
Files changed (1) hide show
  1. pages/11_Dession_Tree.py +194 -0
pages/11_Dession_Tree.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ # Page configuration
4
+ st.set_page_config(page_title="Decision Tree", page_icon="๐ŸŒณ", layout="wide")
5
+
6
+ # Custom dark theme and styling
7
+ st.markdown("""
8
+ <style>
9
+ .stApp {
10
+ background-color: #1e1e1e;
11
+ color: white;
12
+ }
13
+ h1, h2, h3 {
14
+ color: #FF4C60;
15
+ }
16
+ .sidebar .sidebar-content {
17
+ background-color: #1e1e1e;
18
+ }
19
+ a {
20
+ color: #58a6ff;
21
+ text-decoration: none;
22
+ }
23
+ a:hover {
24
+ color: #1f78d1;
25
+ }
26
+ </style>
27
+ """, unsafe_allow_html=True)
28
+
29
+ # Sidebar
30
+ st.sidebar.title("๐ŸŒณ Decision Tree")
31
+ st.sidebar.markdown("Learn all about Decision Trees with intuitive sections.")
32
+ st.sidebar.markdown("---")
33
+
34
+ # Main Title
35
+ st.markdown("<h1 style='text-align: center;'>๐ŸŒณ Decision Tree Algorithm (Theory)</h1>", unsafe_allow_html=True)
36
+
37
+ # What is a Decision Tree?
38
+ with st.expander("๐Ÿ“˜ What is a Decision Tree?"):
39
+ st.write("""
40
+ A **Decision Tree** is a supervised machine learning algorithm used for **classification** and **regression**.
41
+ It models decisions using a tree structure:
42
+
43
+ - ๐ŸŸข **Root Node**: Represents the entire dataset
44
+ - ๐Ÿ”ต **Internal Nodes**: Feature-based decision points
45
+ - ๐ŸŸฃ **Leaf Nodes**: Output/Prediction
46
+
47
+ The tree splits based on **if-else** logic using the best feature at each level.
48
+ """)
49
+
50
+ # Entropy
51
+ with st.expander("๐Ÿงฎ Entropy - Measuring Uncertainty"):
52
+ st.write("""
53
+ **Entropy** measures the impurity or disorder in the data.
54
+ It's used in Decision Trees to decide the best split.
55
+
56
+ **Formula:**
57
+
58
+ $$
59
+ H(Y) = - \sum_{i=1}^{n} p_i \log_2(p_i)
60
+ $$
61
+
62
+ Where:
63
+ - \( p_i \) = Probability of class \( i \)
64
+
65
+ **Example**:
66
+ For a dataset with two classes (Yes = 0.5, No = 0.5):
67
+
68
+ $$
69
+ H(Y) = - (0.5 \log_2 0.5 + 0.5 \log_2 0.5) = 1
70
+ $$
71
+
72
+ โœ… Maximum entropy = 1 โ†’ complete randomness.
73
+ """)
74
+
75
+ # Gini Impurity
76
+ with st.expander("โš–๏ธ Gini Impurity - Measuring Purity"):
77
+ st.write("""
78
+ **Gini Impurity** is another metric to evaluate split quality.
79
+
80
+ **Formula:**
81
+
82
+ $$
83
+ Gini(Y) = 1 - \sum_{i=1}^{n} p_i^2
84
+ $$
85
+
86
+ Where:
87
+ - \( p_i \) = Probability of class \( i \)
88
+
89
+ **Example**:
90
+ For two classes (Yes = 0.5, No = 0.5):
91
+
92
+ $$
93
+ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5
94
+ $$
95
+
96
+ โœ… Gini of 0.5 means equal class distribution (impure).
97
+ """)
98
+
99
+ # Construction
100
+ with st.expander("๐Ÿ”ง Tree Construction Process"):
101
+ st.write("""
102
+ The tree is built **top-down**, selecting features that reduce impurity the most.
103
+ Splitting stops when:
104
+ - Impurity = 0
105
+ - Max depth reached
106
+ - No further splits possible
107
+
108
+ Each decision creates **branches**, until final predictions are in the **leaf nodes**.
109
+ """)
110
+
111
+ # Iris Example
112
+ with st.expander("๐ŸŒธ Example: Iris Dataset Tree"):
113
+ st.write("""
114
+ The Decision Tree for the Iris dataset classifies flowers into:
115
+ - Setosa
116
+ - Versicolor
117
+ - Virginica
118
+
119
+ Based on petal/sepal length & width.
120
+
121
+ ๐Ÿง  Each node checks a feature and threshold, sending the sample left or right.
122
+ """)
123
+
124
+ # Classification
125
+ with st.expander("๐Ÿงช Classification: Training & Testing"):
126
+ st.write("""
127
+ **Training Phase:**
128
+ - Learn rules from training data using Entropy/Gini
129
+
130
+ **Testing Phase:**
131
+ - Follow the decision path based on feature values
132
+ - Reach a leaf node with the predicted class
133
+
134
+ Example: Predicting the Iris species based on petal width.
135
+ """)
136
+
137
+ # Regression
138
+ with st.expander("๐Ÿ“ˆ Regression: Training & Testing"):
139
+ st.write("""
140
+ **Training Phase:**
141
+ - Build the tree using splits that minimize **Mean Squared Error (MSE)**
142
+
143
+ **Testing Phase:**
144
+ - Average of outputs in the leaf node is the prediction
145
+
146
+ Example: Predicting house prices from square footage, etc.
147
+ """)
148
+
149
+ # Pre-Pruning
150
+ with st.expander("โœ‚๏ธ Pre-Pruning Techniques"):
151
+ st.write("""
152
+ Limit the tree's growth to prevent overfitting.
153
+
154
+ - `max_depth`: Limits depth of tree
155
+ - `min_samples_split`: Min samples to split a node
156
+ - `min_samples_leaf`: Min samples in a leaf node
157
+ - `max_features`: Limits features considered per split
158
+ """)
159
+
160
+ # Post-Pruning
161
+ with st.expander("๐Ÿ”™ Post-Pruning Techniques"):
162
+ st.write("""
163
+ Prune a fully grown tree to remove weak branches.
164
+
165
+ Techniques:
166
+ - **Cost Complexity Pruning** using ฮฑ (alpha)
167
+ - **Validation-based pruning**: Use a validation set to remove non-helpful branches
168
+ """)
169
+
170
+ # Feature Selection
171
+ with st.expander("๐Ÿ“Š Feature Selection using Decision Tree"):
172
+ st.write("""
173
+ Decision Trees rank features by their **information gain** or impurity reduction.
174
+
175
+ **Feature Importance Formula:**
176
+
177
+ $$
178
+ Importance(f) = \frac{\text{Total reduction in impurity from } f}{\text{Total reduction in impurity from all features}}
179
+ $$
180
+
181
+ Higher score = more impact on the modelโ€™s decisions.
182
+ """)
183
+
184
+ # Colab Link
185
+ st.markdown("---")
186
+ st.markdown("### ๐Ÿ““ Try It Yourself: Open the Colab Notebook")
187
+ st.markdown("""
188
+ <a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank'>
189
+ ๐Ÿ”— Open Decision Tree Notebook in Colab
190
+ </a>
191
+ """, unsafe_allow_html=True)
192
+
193
+ # Final note
194
+ st.success("Decision Trees are interpretable, powerful, and great for both classification and regression. Keep exploring!")