Spaces:

Ramyamaheswari
/

DecisionTree_Algorithm

Sleeping

App Files Files Community

DecisionTree_Algorithm / app.py

Ramyamaheswari

Update app.py

2ab91a3 verified about 1 year ago

raw

history blame

6.1 kB

	import streamlit as st

	# Set page configuration
	st.set_page_config(page_title="Decision Tree Theory", layout="wide")

	# Custom CSS styling
	st.markdown("""
	<style>
	.stApp {
	background-color: #4A90E2;
	}
	h1, h2, h3 {
	color: #003366;
	}
	.custom-font, p {
	font-family: 'Arial', sans-serif;
	font-size: 18px;
	color: white;
	line-height: 1.6;
	}
	</style>
	""", unsafe_allow_html=True)

	# Title
	st.markdown("<h1 style='color: #003366;'>Understanding Decision Trees</h1>", unsafe_allow_html=True)

	# Introduction
	st.markdown("""
	A Decision Tree is a versatile supervised learning algorithm used for both classification and regression tasks. It mimics human decision-making by using a tree-like model of decisions and their possible consequences.

	The basic structure includes:
	- Root Node: Represents the complete dataset.
	- Internal Nodes: Represent conditions on features.
	- Leaf Nodes: Represent outcomes or predictions.

	Think of it as a flowchart where each internal node asks a question, and each branch represents the outcome, eventually leading to a final decision.
	""", unsafe_allow_html=True)

	# Entropy
	st.markdown("<h2 style='color: #003366;'>Entropy: Quantifying Uncertainty</h2>", unsafe_allow_html=True)
	st.markdown("""
	Entropy measures the amount of randomness or disorder in the data. It’s commonly used in classification problems to decide how informative a feature is.

	Entropy formula:
	""")
	st.image("entropy-formula-2.jpg", width=300)
	st.markdown("""
	Where:
	- $ p(i) $ is the probability of class $ i $.

	Example:
	- If $ P(Yes) = 0.5 $, $ P(No) = 0.5 $,

	Then:
	$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$

	This indicates maximum uncertainty (perfectly balanced classes).
	""", unsafe_allow_html=True)

	# Gini Impurity
	st.markdown("<h2 style='color: #003366;'>Gini Impurity: Measuring Impurity</h2>", unsafe_allow_html=True)
	st.markdown("""
	Gini Impurity is another popular impurity measure. It calculates how often a randomly chosen element would be incorrectly labeled.

	Formula:
	""")
	st.image("gini.png", width=300)
	st.markdown("""
	Example:
	- $ P(Yes) = 0.5 $, $ P(No) = 0.5 $

	Then:
	$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$

	A lower Gini value means purer splits.
	""", unsafe_allow_html=True)

	# Tree Construction
	st.markdown("<h2 style='color: #003366;'>Building the Decision Tree</h2>", unsafe_allow_html=True)
	st.markdown("""
	Decision Trees are built top-down, starting from the root. At each node, the algorithm selects the feature that best splits the data using metrics like Entropy or Gini.

	Splitting stops when:
	- The data is pure (contains one class), or
	- A stopping condition is met (like maximum depth).
	""", unsafe_allow_html=True)

	# Iris Tree Visualization
	st.markdown("<h2 style='color: #003366;'>Visualizing: Iris Dataset Tree</h2>", unsafe_allow_html=True)
	st.markdown("""
	Here's an example decision tree trained on the famous Iris dataset, which classifies flower species based on petal and sepal measurements.
	""", unsafe_allow_html=True)
	st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)

	# Training & Testing - Classification
	st.markdown("<h2 style='color: #003366;'>Training & Testing (Classification)</h2>", unsafe_allow_html=True)
	st.markdown("""
	Training:
	- Select features and split based on impurity reduction.
	- Recursively grow the tree until stopping criteria are met.

	Testing:
	- Traverse the tree with new data.
	- Follow the decision rules until you reach a leaf node (prediction).

	💡 Example: For Iris, classify the flower as Setosa, Versicolor, or Virginica based on petal dimensions.
	""", unsafe_allow_html=True)

	# Training & Testing - Regression
	st.markdown("<h2 style='color: #003366;'>Training & Testing (Regression)</h2>", unsafe_allow_html=True)
	st.markdown("""
	Training:
	- Split data to minimize Mean Squared Error (MSE).

	Testing:
	- Output the mean value in the corresponding leaf.

	💡 Example: Predict house price using features like size, location, and number of rooms.
	""", unsafe_allow_html=True)

	# Pre-Pruning
	st.markdown("<h2 style='color: #003366;'>Pre-Pruning: Control Overfitting Early</h2>", unsafe_allow_html=True)
	st.markdown("""
	Pre-pruning stops the tree from growing too deep and complex. Common techniques include:

	- Max Depth
	- Min Samples Split
	- Min Samples Leaf
	- Max Features

	These help in generalizing better and reducing noise.
	""", unsafe_allow_html=True)

	# Post-Pruning
	st.markdown("<h2 style='color: #003366;'>Post-Pruning: Simplify After Growth</h2>", unsafe_allow_html=True)
	st.markdown("""
	In post-pruning, we allow the tree to grow fully, then trim unnecessary branches:

	- Cost Complexity Pruning
	- Validation-based Pruning

	This helps reduce overfitting and improves model simplicity.
	""", unsafe_allow_html=True)

	# Feature Importance
	st.markdown("<h2 style='color: #003366;'>Feature Selection with Decision Trees</h2>", unsafe_allow_html=True)
	st.markdown("""
	Decision Trees provide insight into which features are most important based on how often and how effectively they split data.
	""")
	st.image("feature.png", width=500)
	st.markdown("""
	💡 Higher importance → More influential in decision making.
	""", unsafe_allow_html=True)

	# Notebook Link
	st.markdown("<h2 style='color: #003366;'>Explore Hands-On Implementation</h2>", unsafe_allow_html=True)
	st.markdown(
	"<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #003366;'>🔗 Open Jupyter Notebook on Google Colab</a>",
	unsafe_allow_html=True
	)