Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: AIO2025M03 AdaBoost Demo
emoji: π
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
AIO2025 Module 03 - AdaBoost Demo
This interactive demo showcases AdaBoost (Adaptive Boosting) algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring sequential ensemble learning where each weak learner is trained adaptively based on the errors of previous learners through dynamic visualizations and real-time parameter adjustment.
π Features
Core Functionality
- Dual Problem Types: Support for both classification and regression tasks
- Multiple Datasets: Built-in sample datasets including Titanic dataset
- Custom Data: Upload your own CSV/Excel files
- Interactive Parameters: Adjust boosting parameters in real-time
- Dynamic Input Generation: Automatic feature input creation based on dataset
AdaBoost Parameters
- Number of Estimators: Control sequential learning steps (limited to 1000 for performance)
- Learning Rate: Step size shrinkage for adaptive learning (0.0001-2.0)
- Max Depth: Individual weak learner depth (default: 1, decision stumps work best)
- Base Estimator: Decision tree with limited depth (weak learner)
Visualizations
- Boosting Progress Chart: Shows how predictions evolve sequentially as each estimator adapts to errors
- Individual Estimator Visualization: Detailed view of selected weak learner structure
- Feature Importance: Displays which features matter most across all estimators
- AdaBoost Process: Sequential aggregation display showing how predictions build up adaptively
π Quick Start
- Select Data: Choose from sample datasets or upload your own
- Configure Target: Select the target column for prediction
- Set Parameters: Adjust AdaBoost hyperparameters
- Enter Features: Provide values for the new data point
- Run Prediction: Execute the AdaBoost and view results
π Sample Datasets
Classification Datasets
- Titanic: Passenger survival prediction (default dataset)
- Iris: Classic 3-class flower classification
- Wine: Wine type classification based on chemical properties
- Breast Cancer: Binary classification for cancer detection
Regression Dataset
- Diabetes: Medical regression dataset
π οΈ Technical Details
Dependencies
scikit-learn: AdaBoost implementationpandas: Data manipulationnumpy: Numerical operationsplotly: Interactive visualizationsgradio: Web interface
Architecture
- Modular Design: Separated core logic in
src/adaboost_core.py - Dynamic UI: Automatic parameter and input generation
- Error Handling: Comprehensive validation and error messages
- Responsive Design: Adapts to different screen sizes
π‘ Key Concepts
AdaBoost Benefits
- Sequential Learning: Each weak learner adapts to errors of previous learners
- Strong Performance: Often achieves high accuracy with simple weak learners
- Feature Importance: Robust importance scores through adaptive learning
- Adaptive Weighting: Focuses on misclassified examples by increasing their weights
- Error Reduction: Each iteration reduces prediction errors adaptively
Sequential Ensemble Learning
- Adaptive Reweighting: Increases weights of misclassified examples
- Weighted Voting: Final prediction is weighted sum of weak learner predictions
- Weak Learners: Uses simple estimators like decision stumps
- Error-Focused Training: Each new learner focuses on previously misclassified examples
π§ Customization
Adding New Datasets
- Place CSV files in the
data/directory - Update
SAMPLE_DATA_CONFIGinapp.py - Ensure target column is properly configured
Modifying Parameters
- Edit parameter ranges in the UI components
- Adjust default values for different use cases
- Add new parameter types as needed
π Performance Tips
- Number of Estimators: Limited to 1000 for optimal performance in this demo
- Learning Rate: Default 1.0 works well; lower rates (0.0001-0.1) create more conservative models, higher rates (1.0-2.0) for faster learning
- Max Depth: Decision stumps (depth 1) typically optimal for AdaBoost
- Weak Learners: Simple estimators work best to avoid overfitting
π― Use Cases
Classification
- Medical diagnosis (Titanic survival prediction)
- Customer segmentation
- Fraud detection
- Text classification
Regression
- Price prediction
- Demand forecasting
- Quality assessment
- Risk modeling
π Notes
- Auto-detection: Problem type automatically detected from target column
- Data Validation: Comprehensive input validation and error handling
- Memory Efficient: Optimized for sequential learning
- Real-time Updates: Instant parameter adjustment and visualization
- Estimator Selection: Interactive dropdown to explore individual weak learners (up to 100)
- Adaptive Nature: Each estimator adapts to previous learners' mistakes through reweighting
π Related Resources
This demo is part of AIO2025 Module 03 - Machine Learning Fundamentals