Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| import pandas as pd | |
| import matplotlib.pyplot as plt | |
| import seaborn as sns | |
| from sklearn.datasets import load_iris | |
| from sklearn.model_selection import train_test_split | |
| from sklearn.tree import DecisionTreeClassifier, plot_tree | |
| from sklearn.preprocessing import StandardScaler | |
| from sklearn.metrics import classification_report, accuracy_score, confusion_matrix | |
| # Set up page | |
| st.set_page_config(page_title="Explore Decision Tree Algorithm", layout="wide") | |
| st.title("π³ Decision Tree Classifier: Explained with Iris Dataset") | |
| # Intro Section | |
| st.markdown(""" | |
| ## π§ What is a Decision Tree? | |
| A **Decision Tree** is a machine learning algorithm that uses a tree-like structure to make decisions. | |
| Each **internal node** asks a question about a feature, each **branch** is the outcome of that question, and each **leaf node** gives us a final decision or prediction. | |
| > π§© Think of it like playing "20 Questions" to guess what something is β each question narrows down the possibilities. | |
| --- | |
| ## βοΈ How Decision Trees Work | |
| 1. Start with all data at the root. | |
| 2. Pick the **best feature** to split the data (using Gini or Entropy). | |
| 3. Repeat this process for every split until: | |
| - All points are classified | |
| - Or the **maximum depth** is reached | |
| π Criteria used to choose the best feature: | |
| - **Gini Index** (default) | |
| - **Entropy** (Information Gain) | |
| --- | |
| ### π Pros and Cons | |
| β Easy to understand & visualize | |
| β Handles numerical and categorical data | |
| β No need for feature scaling | |
| β οΈ Can overfit if not controlled (use `max_depth`, `min_samples_leaf`, or pruning) | |
| --- | |
| """) | |
| # Dataset and DataFrame | |
| st.subheader("πΌ Let's Explore the Iris Dataset") | |
| iris = load_iris() | |
| df = pd.DataFrame(iris.data, columns=iris.feature_names) | |
| df["target"] = iris.target | |
| df["species"] = df["target"].apply(lambda x: iris.target_names[x]) | |
| st.markdown("Here's a peek at the dataset π") | |
| st.dataframe(df.head(), use_container_width=True) | |
| # Feature distribution visualization | |
| st.markdown("### π Visualize Features") | |
| selected_features = st.multiselect("Pick features to visualize", iris.feature_names, default=iris.feature_names[:2]) | |
| if len(selected_features) == 2: | |
| plt.figure(figsize=(8, 5)) | |
| sns.scatterplot(data=df, x=selected_features[0], y=selected_features[1], hue="species", palette="Set2", s=80) | |
| st.pyplot(plt.gcf()) | |
| plt.clf() | |
| # Sidebar controls | |
| st.sidebar.header("π² Model Settings") | |
| criterion = st.sidebar.radio("Splitting Criterion", ["gini", "entropy"]) | |
| max_depth = st.sidebar.slider("Max Depth", 1, 10, value=3) | |
| # Prepare data | |
| X = df[iris.feature_names] | |
| y = df["target"] | |
| scaler = StandardScaler() | |
| X_scaled = scaler.fit_transform(X) | |
| X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42) | |
| # Train model | |
| model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42) | |
| model.fit(X_train, y_train) | |
| y_pred = model.predict(X_test) | |
| # Model performance | |
| acc = accuracy_score(y_test, y_pred) | |
| st.success(f"β Model Accuracy: {acc*100:.2f}%") | |
| # Classification report | |
| st.markdown("### π§Ύ Classification Report") | |
| st.text(classification_report(y_test, y_pred, target_names=iris.target_names)) | |
| # Confusion matrix | |
| st.markdown("### π Confusion Matrix") | |
| cm = confusion_matrix(y_test, y_pred) | |
| fig, ax = plt.subplots() | |
| sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names) | |
| plt.xlabel("Predicted") | |
| plt.ylabel("Actual") | |
| st.pyplot(fig) | |
| # Decision tree plot | |
| st.markdown("### π³ Visualizing the Tree Structure") | |
| fig, ax = plt.subplots(figsize=(12, 6)) | |
| plot_tree(model, filled=True, feature_names=iris.feature_names, class_names=iris.target_names, fontsize=10) | |
| st.pyplot(fig) | |
| # Final tips | |
| st.markdown(""" | |
| --- | |
| ## π‘ Key Takeaways | |
| - Decision Trees are great for **interpretable models**. | |
| - They require **little to no preprocessing**. | |
| - They're **prone to overfitting**, especially on small datasets β use settings like `max_depth` or pruning techniques. | |
| ## π When to Use a Decision Tree? | |
| - When interpretability matters | |
| - When data includes both **numerical and categorical** variables | |
| - When you want to **quickly prototype** and understand your data | |
| > π― *Tip:* Combine multiple trees in an ensemble (like **Random Forest** or **Gradient Boosting**) for better performance! | |
| --- | |
| """) | |