Spaces:
Sleeping
Sleeping
File size: 4,384 Bytes
5168235 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
# Set up page
st.set_page_config(page_title="Explore Decision Tree Algorithm", layout="wide")
st.title("π³ Decision Tree Classifier: Explained with Iris Dataset")
# Intro Section
st.markdown("""
## π§ What is a Decision Tree?
A **Decision Tree** is a machine learning algorithm that uses a tree-like structure to make decisions.
Each **internal node** asks a question about a feature, each **branch** is the outcome of that question, and each **leaf node** gives us a final decision or prediction.
> π§© Think of it like playing "20 Questions" to guess what something is β each question narrows down the possibilities.
---
## βοΈ How Decision Trees Work
1. Start with all data at the root.
2. Pick the **best feature** to split the data (using Gini or Entropy).
3. Repeat this process for every split until:
- All points are classified
- Or the **maximum depth** is reached
π Criteria used to choose the best feature:
- **Gini Index** (default)
- **Entropy** (Information Gain)
---
### π Pros and Cons
β
Easy to understand & visualize
β
Handles numerical and categorical data
β
No need for feature scaling
β οΈ Can overfit if not controlled (use `max_depth`, `min_samples_leaf`, or pruning)
---
""")
# Dataset and DataFrame
st.subheader("πΌ Let's Explore the Iris Dataset")
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["target"] = iris.target
df["species"] = df["target"].apply(lambda x: iris.target_names[x])
st.markdown("Here's a peek at the dataset π")
st.dataframe(df.head(), use_container_width=True)
# Feature distribution visualization
st.markdown("### π Visualize Features")
selected_features = st.multiselect("Pick features to visualize", iris.feature_names, default=iris.feature_names[:2])
if len(selected_features) == 2:
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x=selected_features[0], y=selected_features[1], hue="species", palette="Set2", s=80)
st.pyplot(plt.gcf())
plt.clf()
# Sidebar controls
st.sidebar.header("π² Model Settings")
criterion = st.sidebar.radio("Splitting Criterion", ["gini", "entropy"])
max_depth = st.sidebar.slider("Max Depth", 1, 10, value=3)
# Prepare data
X = df[iris.feature_names]
y = df["target"]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Train model
model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Model performance
acc = accuracy_score(y_test, y_pred)
st.success(f"β
Model Accuracy: {acc*100:.2f}%")
# Classification report
st.markdown("### π§Ύ Classification Report")
st.text(classification_report(y_test, y_pred, target_names=iris.target_names))
# Confusion matrix
st.markdown("### π Confusion Matrix")
cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots()
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
st.pyplot(fig)
# Decision tree plot
st.markdown("### π³ Visualizing the Tree Structure")
fig, ax = plt.subplots(figsize=(12, 6))
plot_tree(model, filled=True, feature_names=iris.feature_names, class_names=iris.target_names, fontsize=10)
st.pyplot(fig)
# Final tips
st.markdown("""
---
## π‘ Key Takeaways
- Decision Trees are great for **interpretable models**.
- They require **little to no preprocessing**.
- They're **prone to overfitting**, especially on small datasets β use settings like `max_depth` or pruning techniques.
## π When to Use a Decision Tree?
- When interpretability matters
- When data includes both **numerical and categorical** variables
- When you want to **quickly prototype** and understand your data
> π― *Tip:* Combine multiple trees in an ensemble (like **Random Forest** or **Gradient Boosting**) for better performance!
---
""")
|