Spaces:

Harika22
/

Machine_learning

Sleeping

App Files Files Community

Machine_learning / pages /10_Decision_Tree.py

Harika22

Update pages/10_Decision_Tree.py

c70fab9 verified 7 months ago

raw

history blame contribute delete

5.28 kB

	import streamlit as st
	from sklearn import tree
	from sklearn.datasets import load_iris
	from sklearn.tree import export_graphviz
	import graphviz
	import pandas as pd

	st.set_page_config(page_title="Decision Tree Explorer", page_icon="🌳", layout="wide")

	st.title("🌳 Decision Tree Algorithm Explorer")
	st.write("Understand how Decision Trees work with simple explanations, visuals, and real-world examples.")

	section = st.radio("Choose a topic to explore:", [
	"What is a Decision Tree?",
	"How It Works",
	"Entropy vs Gini",
	"Tree Construction",
	"Classification vs Regression",
	"Pruning",
	"Feature Importance",
	"Visualize Example Tree",
	"Try with Iris Data"
	])


	if section == "What is a Decision Tree?":
	st.header("📘 What is a Decision Tree?")
	st.markdown("""
	A Decision Tree is a flowchart-like model that makes decisions based on a series of questions.

	- 🎯 Used in both classification (e.g., spam vs. not spam) and regression (e.g., predicting price).
	- 🌱 It starts at a root, asks a question, and branches out based on answers.
	- 🔚 Ends at a leaf node — which is the prediction.

	Real-life example:
	You're deciding what to wear. You ask:
	1. Is it raining?
	2. Is it cold?
	→ Based on your answers, you decide: jacket, umbrella, or just a T-shirt.
	""")

	elif section == "How It Works":
	st.header("⚙️ How Does It Work?")
	st.markdown("""
	Step-by-step:
	1. Start with the whole dataset.
	2. Choose the feature that best splits the data.
	3. Split the dataset.
	4. Repeat until you reach a stopping condition.

	Used concepts:
	- Entropy (information gain)
	- Gini impurity
	""")

	elif section == "Entropy vs Gini":
	st.header("📊 Entropy vs Gini")
	st.markdown("""
	### Entropy
	Measures randomness or disorder in data.
	$$
	H(Y) = - \sum p_i \log_2 p_i
	$$

	### Gini Impurity
	Measures the probability of wrong classification.
	$$
	Gini(Y) = 1 - \sum p_i^2
	$$

	Which to use?
	- Gini is faster → default in scikit-learn.
	- Entropy gives more information-theoretic understanding.
	""")

	elif section == "Tree Construction":
	st.header("🔧 How is the Tree Built?")
	st.markdown("""
	The tree is built top-down using a greedy algorithm:

	- Best feature is chosen using Gini or Entropy.
	- Splits continue until stopping criteria (e.g., max depth, pure leaf).

	Tip: Too many splits = overfitting!
	""")

	elif section == "Classification vs Regression":
	st.header("📈 Classification vs Regression")
	st.markdown("""
	- Classification Tree: Predicts categories (Yes/No, Spam/Ham).
	- Regression Tree: Predicts continuous values (e.g., house price).

	Example:
	- Classification: Will a customer churn?
	- Regression: What will be the next month’s sales?
	""")

	elif section == "Pruning":
	st.header("✂️ Pruning Techniques")
	st.markdown("""
	Why prune?
	To avoid overfitting by cutting unnecessary branches.

	### Pre-Pruning
	- `max_depth`: limit depth
	- `min_samples_split`: split only if enough samples
	- `min_samples_leaf`: limit how small leaves can be

	### Post-Pruning
	- Cost Complexity Pruning (using α)
	""")

	elif section == "Feature Importance":
	st.header("📌 Feature Importance")
	st.markdown("""
	Decision Trees calculate how important each feature is by how much it reduces impurity.

	Formula:
	$$
	Importance = \frac{Total\ Gain\ from\ Feature}{Total\ Gain\ from\ All\ Features}
	$$

	Useful for feature selection and explaining model decisions.
	""")

	elif section == "Visualize Example Tree":
	st.header("🌿 Visualize a Small Tree Example")
	iris = load_iris()
	clf = tree.DecisionTreeClassifier(max_depth=3)
	clf = clf.fit(iris.data, iris.target)
	dot_data = tree.export_graphviz(clf, out_file=None,
	feature_names=iris.feature_names,
	class_names=iris.target_names,
	filled=True, rounded=True,
	special_characters=True)
	st.graphviz_chart(dot_data)


	elif section == "Try with Iris Data":
	st.header("🌸 Try with Iris Dataset")
	iris = load_iris()
	df = pd.DataFrame(iris.data, columns=iris.feature_names)
	df['target'] = iris.target
	st.write("Here's a preview of the dataset:")
	st.dataframe(df.head())

	st.markdown("### Build and visualize a Decision Tree")
	max_depth = st.slider("Select max depth of the tree:", 1, 5, 3)
	clf = tree.DecisionTreeClassifier(max_depth=max_depth)
	clf = clf.fit(iris.data, iris.target)
	dot_data = tree.export_graphviz(clf, out_file=None,
	feature_names=iris.feature_names,
	class_names=iris.target_names,
	filled=True, rounded=True)
	st.graphviz_chart(dot_data)

	st.markdown("---")
	st.success("✅ Decision Trees are simple yet powerful! Tune them well, visualize their structure, and understand every split.")