Ramyamaheswari commited on
Commit
2ab91a3
·
verified ·
1 Parent(s): 1a25497

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +153 -80
app.py CHANGED
@@ -1,87 +1,160 @@
1
  import streamlit as st
2
- import pandas as pd
3
- from sklearn.datasets import load_iris
4
- from sklearn.model_selection import train_test_split
5
- from sklearn.tree import DecisionTreeClassifier, plot_tree
6
- from sklearn.preprocessing import StandardScaler
7
- from sklearn.metrics import classification_report, accuracy_score
8
- import matplotlib.pyplot as plt
9
 
10
- st.set_page_config(page_title="Explore Decision Tree Algorithm", layout="wide")
11
- st.title("🌳 Decision Tree Classifier Demystified")
12
 
 
13
  st.markdown("""
14
- ## 🧠 What is a Decision Tree?
15
-
16
- A Decision Tree is a flowchart-like tree structure where each internal node represents a test on a feature,
17
- each branch represents an outcome of that test, and each leaf node represents a class label.
18
-
19
- Think of it like *20 Questions*, but for data.
20
-
21
- ---
22
- ## ⚙️ How Decision Trees Work
23
-
24
- 1. Split the dataset based on feature values.
25
- 2. Choose the best feature using criteria like *Gini Index, **Entropy, or **Information Gain*.
26
- 3. Repeat recursively until leaf nodes are pure or max depth is reached.
27
-
28
- Decision Trees are:
29
- - Easy to understand and interpret
30
- - Able to handle both numerical and categorical data
31
- - Prone to overfitting if not pruned
32
-
33
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  """)
35
-
36
- st.subheader("🌼 Try Decision Tree on the Iris Dataset")
37
-
38
- iris = load_iris()
39
- df = pd.DataFrame(iris.data, columns=iris.feature_names)
40
- df['target'] = iris.target
41
-
42
- st.dataframe(df.head(), use_container_width=True)
43
-
44
- criterion = st.radio("Select the Splitting Criterion", ["gini", "entropy"])
45
- max_depth = st.slider("Select Max Depth of Tree", 1, 10, value=3)
46
-
47
- X = df.drop('target', axis=1)
48
- y = df['target']
49
-
50
- # Standardize features
51
- scaler = StandardScaler()
52
- X_scaled = scaler.fit_transform(X)
53
-
54
- X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
55
-
56
- model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
57
- model.fit(X_train, y_train)
58
- y_pred = model.predict(X_test)
59
-
60
- acc = accuracy_score(y_test, y_pred)
61
- st.success(f"✅ Model Accuracy: {acc*100:.2f}%")
62
-
63
- st.markdown("### 📊 Classification Report")
64
- st.text(classification_report(y_test, y_pred, target_names=iris.target_names))
65
-
66
- st.markdown("### 🌳 Visualizing the Decision Tree")
67
- fig, ax = plt.subplots(figsize=(10, 6))
68
- plot_tree(model, filled=True, feature_names=iris.feature_names, class_names=iris.target_names, fontsize=10)
69
- st.pyplot(fig)
70
-
71
  st.markdown("""
72
- ---
73
- ## 💡 Highlights of Decision Trees:
74
- - Visual and easy to explain.
75
- - No need for feature scaling.
76
- - Can model non-linear relationships.
77
- - Can easily overfit — use pruning or set max depth.
78
-
79
- ## 🔧 When to Use Decision Trees?
80
- Use them when:
81
- - You need a quick, explainable model.
82
- - Feature relationships are non-linear.
83
- - Interpretability is more important than performance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
- ---
86
- 🎯 *Tip:* Watch out for overfitting. Decision Trees love to memorize the training data if left unchecked.
87
- """)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import streamlit as st
 
 
 
 
 
 
 
2
 
3
+ # Set page configuration
4
+ st.set_page_config(page_title="Decision Tree Theory", layout="wide")
5
 
6
+ # Custom CSS styling
7
  st.markdown("""
8
+ <style>
9
+ .stApp {
10
+ background-color: #4A90E2;
11
+ }
12
+ h1, h2, h3 {
13
+ color: #003366;
14
+ }
15
+ .custom-font, p {
16
+ font-family: 'Arial', sans-serif;
17
+ font-size: 18px;
18
+ color: white;
19
+ line-height: 1.6;
20
+ }
21
+ </style>
22
+ """, unsafe_allow_html=True)
23
+
24
+ # Title
25
+ st.markdown("<h1 style='color: #003366;'>Understanding Decision Trees</h1>", unsafe_allow_html=True)
26
+
27
+ # Introduction
28
+ st.markdown("""
29
+ A **Decision Tree** is a versatile supervised learning algorithm used for both **classification** and **regression** tasks. It mimics human decision-making by using a tree-like model of decisions and their possible consequences.
30
+
31
+ The basic structure includes:
32
+ - **Root Node**: Represents the complete dataset.
33
+ - **Internal Nodes**: Represent conditions on features.
34
+ - **Leaf Nodes**: Represent outcomes or predictions.
35
+
36
+ Think of it as a flowchart where each internal node asks a question, and each branch represents the outcome, eventually leading to a final decision.
37
+ """, unsafe_allow_html=True)
38
+
39
+ # Entropy
40
+ st.markdown("<h2 style='color: #003366;'>Entropy: Quantifying Uncertainty</h2>", unsafe_allow_html=True)
41
+ st.markdown("""
42
+ **Entropy** measures the amount of randomness or disorder in the data. It’s commonly used in classification problems to decide how informative a feature is.
43
+
44
+ Entropy formula:
45
  """)
46
+ st.image("entropy-formula-2.jpg", width=300)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  st.markdown("""
48
+ Where:
49
+ - \( p(i) \) is the probability of class \( i \).
50
+
51
+ **Example**:
52
+ - If \( P(Yes) = 0.5 \), \( P(No) = 0.5 \),
53
+
54
+ Then:
55
+ $$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$
56
+
57
+ This indicates maximum uncertainty (perfectly balanced classes).
58
+ """, unsafe_allow_html=True)
59
+
60
+ # Gini Impurity
61
+ st.markdown("<h2 style='color: #003366;'>Gini Impurity: Measuring Impurity</h2>", unsafe_allow_html=True)
62
+ st.markdown("""
63
+ **Gini Impurity** is another popular impurity measure. It calculates how often a randomly chosen element would be incorrectly labeled.
64
+
65
+ Formula:
66
+ """)
67
+ st.image("gini.png", width=300)
68
+ st.markdown("""
69
+ **Example**:
70
+ - \( P(Yes) = 0.5 \), \( P(No) = 0.5 \)
71
+
72
+ Then:
73
+ $$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$
74
+
75
+ A lower Gini value means purer splits.
76
+ """, unsafe_allow_html=True)
77
+
78
+ # Tree Construction
79
+ st.markdown("<h2 style='color: #003366;'>Building the Decision Tree</h2>", unsafe_allow_html=True)
80
+ st.markdown("""
81
+ Decision Trees are built **top-down**, starting from the root. At each node, the algorithm selects the feature that best splits the data using metrics like **Entropy** or **Gini**.
82
+
83
+ Splitting stops when:
84
+ - The data is pure (contains one class), or
85
+ - A stopping condition is met (like maximum depth).
86
+ """, unsafe_allow_html=True)
87
+
88
+ # Iris Tree Visualization
89
+ st.markdown("<h2 style='color: #003366;'>Visualizing: Iris Dataset Tree</h2>", unsafe_allow_html=True)
90
+ st.markdown("""
91
+ Here's an example decision tree trained on the famous **Iris dataset**, which classifies flower species based on petal and sepal measurements.
92
+ """, unsafe_allow_html=True)
93
+ st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)
94
 
95
+ # Training & Testing - Classification
96
+ st.markdown("<h2 style='color: #003366;'>Training & Testing (Classification)</h2>", unsafe_allow_html=True)
97
+ st.markdown("""
98
+ **Training**:
99
+ - Select features and split based on impurity reduction.
100
+ - Recursively grow the tree until stopping criteria are met.
101
+
102
+ **Testing**:
103
+ - Traverse the tree with new data.
104
+ - Follow the decision rules until you reach a leaf node (prediction).
105
+
106
+ 💡 *Example: For Iris, classify the flower as Setosa, Versicolor, or Virginica based on petal dimensions.*
107
+ """, unsafe_allow_html=True)
108
+
109
+ # Training & Testing - Regression
110
+ st.markdown("<h2 style='color: #003366;'>Training & Testing (Regression)</h2>", unsafe_allow_html=True)
111
+ st.markdown("""
112
+ **Training**:
113
+ - Split data to minimize **Mean Squared Error (MSE)**.
114
+
115
+ **Testing**:
116
+ - Output the mean value in the corresponding leaf.
117
+
118
+ 💡 *Example: Predict house price using features like size, location, and number of rooms.*
119
+ """, unsafe_allow_html=True)
120
+
121
+ # Pre-Pruning
122
+ st.markdown("<h2 style='color: #003366;'>Pre-Pruning: Control Overfitting Early</h2>", unsafe_allow_html=True)
123
+ st.markdown("""
124
+ Pre-pruning stops the tree from growing too deep and complex. Common techniques include:
125
+
126
+ - **Max Depth**
127
+ - **Min Samples Split**
128
+ - **Min Samples Leaf**
129
+ - **Max Features**
130
+
131
+ These help in generalizing better and reducing noise.
132
+ """, unsafe_allow_html=True)
133
+
134
+ # Post-Pruning
135
+ st.markdown("<h2 style='color: #003366;'>Post-Pruning: Simplify After Growth</h2>", unsafe_allow_html=True)
136
+ st.markdown("""
137
+ In **post-pruning**, we allow the tree to grow fully, then trim unnecessary branches:
138
+
139
+ - **Cost Complexity Pruning**
140
+ - **Validation-based Pruning**
141
+
142
+ This helps reduce overfitting and improves model simplicity.
143
+ """, unsafe_allow_html=True)
144
+
145
+ # Feature Importance
146
+ st.markdown("<h2 style='color: #003366;'>Feature Selection with Decision Trees</h2>", unsafe_allow_html=True)
147
+ st.markdown("""
148
+ Decision Trees provide insight into which features are most important based on how often and how effectively they split data.
149
+ """)
150
+ st.image("feature.png", width=500)
151
+ st.markdown("""
152
+ 💡 *Higher importance → More influential in decision making.*
153
+ """, unsafe_allow_html=True)
154
+
155
+ # Notebook Link
156
+ st.markdown("<h2 style='color: #003366;'>Explore Hands-On Implementation</h2>", unsafe_allow_html=True)
157
+ st.markdown(
158
+ "<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #003366;'>🔗 Open Jupyter Notebook on Google Colab</a>",
159
+ unsafe_allow_html=True
160
+ )