Spaces:

UmaKumpatla
/

DecisionTree

Sleeping

App Files Files Community

DecisionTree / app.py

UmaKumpatla

Update app.py

5168235 verified 10 months ago

raw

history blame contribute delete

4.38 kB

	import streamlit as st
	import pandas as pd
	import matplotlib.pyplot as plt
	import seaborn as sns
	from sklearn.datasets import load_iris
	from sklearn.model_selection import train_test_split
	from sklearn.tree import DecisionTreeClassifier, plot_tree
	from sklearn.preprocessing import StandardScaler
	from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

	# Set up page
	st.set_page_config(page_title="Explore Decision Tree Algorithm", layout="wide")
	st.title("🌳 Decision Tree Classifier: Explained with Iris Dataset")

	# Intro Section
	st.markdown("""
	## 🧠 What is a Decision Tree?

	A Decision Tree is a machine learning algorithm that uses a tree-like structure to make decisions.
	Each internal node asks a question about a feature, each branch is the outcome of that question, and each leaf node gives us a final decision or prediction.

	> 🧩 Think of it like playing "20 Questions" to guess what something is — each question narrows down the possibilities.

	---

	## ⚙️ How Decision Trees Work

	1. Start with all data at the root.
	2. Pick the best feature to split the data (using Gini or Entropy).
	3. Repeat this process for every split until:
	- All points are classified
	- Or the maximum depth is reached

	🔍 Criteria used to choose the best feature:
	- Gini Index (default)
	- Entropy (Information Gain)

	---

	### 📈 Pros and Cons

	✅ Easy to understand & visualize
	✅ Handles numerical and categorical data
	✅ No need for feature scaling
	⚠️ Can overfit if not controlled (use `max_depth`, `min_samples_leaf`, or pruning)

	---
	""")

	# Dataset and DataFrame
	st.subheader("🌼 Let's Explore the Iris Dataset")
	iris = load_iris()
	df = pd.DataFrame(iris.data, columns=iris.feature_names)
	df["target"] = iris.target
	df["species"] = df["target"].apply(lambda x: iris.target_names[x])

	st.markdown("Here's a peek at the dataset 👇")
	st.dataframe(df.head(), use_container_width=True)

	# Feature distribution visualization
	st.markdown("### 📊 Visualize Features")
	selected_features = st.multiselect("Pick features to visualize", iris.feature_names, default=iris.feature_names[:2])
	if len(selected_features) == 2:
	plt.figure(figsize=(8, 5))
	sns.scatterplot(data=df, x=selected_features[0], y=selected_features[1], hue="species", palette="Set2", s=80)
	st.pyplot(plt.gcf())
	plt.clf()

	# Sidebar controls
	st.sidebar.header("🌲 Model Settings")
	criterion = st.sidebar.radio("Splitting Criterion", ["gini", "entropy"])
	max_depth = st.sidebar.slider("Max Depth", 1, 10, value=3)

	# Prepare data
	X = df[iris.feature_names]
	y = df["target"]

	scaler = StandardScaler()
	X_scaled = scaler.fit_transform(X)

	X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

	# Train model
	model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
	model.fit(X_train, y_train)
	y_pred = model.predict(X_test)

	# Model performance
	acc = accuracy_score(y_test, y_pred)
	st.success(f"✅ Model Accuracy: {acc*100:.2f}%")

	# Classification report
	st.markdown("### 🧾 Classification Report")
	st.text(classification_report(y_test, y_pred, target_names=iris.target_names))

	# Confusion matrix
	st.markdown("### 🔍 Confusion Matrix")
	cm = confusion_matrix(y_test, y_pred)
	fig, ax = plt.subplots()
	sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names)
	plt.xlabel("Predicted")
	plt.ylabel("Actual")
	st.pyplot(fig)

	# Decision tree plot
	st.markdown("### 🌳 Visualizing the Tree Structure")
	fig, ax = plt.subplots(figsize=(12, 6))
	plot_tree(model, filled=True, feature_names=iris.feature_names, class_names=iris.target_names, fontsize=10)
	st.pyplot(fig)

	# Final tips
	st.markdown("""
	---
	## 💡 Key Takeaways

	- Decision Trees are great for interpretable models.
	- They require little to no preprocessing.
	- They're prone to overfitting, especially on small datasets — use settings like `max_depth` or pruning techniques.

	## 📌 When to Use a Decision Tree?
	- When interpretability matters
	- When data includes both numerical and categorical variables
	- When you want to quickly prototype and understand your data

	> 🎯 Tip: Combine multiple trees in an ensemble (like Random Forest or Gradient Boosting) for better performance!

	---
	""")