Spaces:

VLAI-AIVN
/

AIO2025M03_Demo_AdaBoost

Sleeping

App Files Files Community

AIO2025M03_Demo_AdaBoost / README.md

wjnwjn59

fix depth error

863c992 3 months ago

preview code

raw

history blame contribute delete

5.55 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: AIO2025M03 AdaBoost Demo
emoji: 🚀
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit

AIO2025 Module 03 - AdaBoost Demo

This interactive demo showcases AdaBoost (Adaptive Boosting) algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring sequential ensemble learning where each weak learner is trained adaptively based on the errors of previous learners through dynamic visualizations and real-time parameter adjustment.

🚀 Features

Core Functionality

Dual Problem Types: Support for both classification and regression tasks
Multiple Datasets: Built-in sample datasets including Titanic dataset
Custom Data: Upload your own CSV/Excel files
Interactive Parameters: Adjust boosting parameters in real-time
Dynamic Input Generation: Automatic feature input creation based on dataset

AdaBoost Parameters

Number of Estimators: Control sequential learning steps (limited to 1000 for performance)
Learning Rate: Step size shrinkage for adaptive learning (0.0001-2.0)
Max Depth: Individual weak learner depth (default: 1, decision stumps work best)
Base Estimator: Decision tree with limited depth (weak learner)

Visualizations

Boosting Progress Chart: Shows how predictions evolve sequentially as each estimator adapts to errors
Individual Estimator Visualization: Detailed view of selected weak learner structure
Feature Importance: Displays which features matter most across all estimators
AdaBoost Process: Sequential aggregation display showing how predictions build up adaptively

🚀 Quick Start

Select Data: Choose from sample datasets or upload your own
Configure Target: Select the target column for prediction
Set Parameters: Adjust AdaBoost hyperparameters
Enter Features: Provide values for the new data point
Run Prediction: Execute the AdaBoost and view results

📊 Sample Datasets

Classification Datasets

Titanic: Passenger survival prediction (default dataset)
Iris: Classic 3-class flower classification
Wine: Wine type classification based on chemical properties
Breast Cancer: Binary classification for cancer detection

Regression Dataset

Diabetes: Medical regression dataset

🛠️ Technical Details

Dependencies

scikit-learn: AdaBoost implementation
pandas: Data manipulation
numpy: Numerical operations
plotly: Interactive visualizations
gradio: Web interface

Architecture

Modular Design: Separated core logic in src/adaboost_core.py
Dynamic UI: Automatic parameter and input generation
Error Handling: Comprehensive validation and error messages
Responsive Design: Adapts to different screen sizes

💡 Key Concepts

AdaBoost Benefits

Sequential Learning: Each weak learner adapts to errors of previous learners
Strong Performance: Often achieves high accuracy with simple weak learners
Feature Importance: Robust importance scores through adaptive learning
Adaptive Weighting: Focuses on misclassified examples by increasing their weights
Error Reduction: Each iteration reduces prediction errors adaptively

Sequential Ensemble Learning

Adaptive Reweighting: Increases weights of misclassified examples
Weighted Voting: Final prediction is weighted sum of weak learner predictions
Weak Learners: Uses simple estimators like decision stumps
Error-Focused Training: Each new learner focuses on previously misclassified examples

🔧 Customization

Adding New Datasets

Place CSV files in the data/ directory
Update SAMPLE_DATA_CONFIG in app.py
Ensure target column is properly configured

Modifying Parameters

Edit parameter ranges in the UI components
Adjust default values for different use cases
Add new parameter types as needed

📈 Performance Tips

Number of Estimators: Limited to 1000 for optimal performance in this demo
Learning Rate: Default 1.0 works well; lower rates (0.0001-0.1) create more conservative models, higher rates (1.0-2.0) for faster learning
Max Depth: Decision stumps (depth 1) typically optimal for AdaBoost
Weak Learners: Simple estimators work best to avoid overfitting

🎯 Use Cases

Classification

Medical diagnosis (Titanic survival prediction)
Customer segmentation
Fraud detection
Text classification

Regression

Price prediction
Demand forecasting
Quality assessment
Risk modeling

📝 Notes

Auto-detection: Problem type automatically detected from target column
Data Validation: Comprehensive input validation and error handling
Memory Efficient: Optimized for sequential learning
Real-time Updates: Instant parameter adjustment and visualization
Estimator Selection: Interactive dropdown to explore individual weak learners (up to 100)
Adaptive Nature: Each estimator adapts to previous learners' mistakes through reweighting

🔗 Related Resources

This demo is part of AIO2025 Module 03 - Machine Learning Fundamentals