wjnwjn59's picture
fix depth error
863c992

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: AIO2025M03 AdaBoost Demo
emoji: πŸš€
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit

AIO2025 Module 03 - AdaBoost Demo

This interactive demo showcases AdaBoost (Adaptive Boosting) algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring sequential ensemble learning where each weak learner is trained adaptively based on the errors of previous learners through dynamic visualizations and real-time parameter adjustment.

πŸš€ Features

Core Functionality

  • Dual Problem Types: Support for both classification and regression tasks
  • Multiple Datasets: Built-in sample datasets including Titanic dataset
  • Custom Data: Upload your own CSV/Excel files
  • Interactive Parameters: Adjust boosting parameters in real-time
  • Dynamic Input Generation: Automatic feature input creation based on dataset

AdaBoost Parameters

  • Number of Estimators: Control sequential learning steps (limited to 1000 for performance)
  • Learning Rate: Step size shrinkage for adaptive learning (0.0001-2.0)
  • Max Depth: Individual weak learner depth (default: 1, decision stumps work best)
  • Base Estimator: Decision tree with limited depth (weak learner)

Visualizations

  • Boosting Progress Chart: Shows how predictions evolve sequentially as each estimator adapts to errors
  • Individual Estimator Visualization: Detailed view of selected weak learner structure
  • Feature Importance: Displays which features matter most across all estimators
  • AdaBoost Process: Sequential aggregation display showing how predictions build up adaptively

πŸš€ Quick Start

  1. Select Data: Choose from sample datasets or upload your own
  2. Configure Target: Select the target column for prediction
  3. Set Parameters: Adjust AdaBoost hyperparameters
  4. Enter Features: Provide values for the new data point
  5. Run Prediction: Execute the AdaBoost and view results

πŸ“Š Sample Datasets

Classification Datasets

  • Titanic: Passenger survival prediction (default dataset)
  • Iris: Classic 3-class flower classification
  • Wine: Wine type classification based on chemical properties
  • Breast Cancer: Binary classification for cancer detection

Regression Dataset

  • Diabetes: Medical regression dataset

πŸ› οΈ Technical Details

Dependencies

  • scikit-learn: AdaBoost implementation
  • pandas: Data manipulation
  • numpy: Numerical operations
  • plotly: Interactive visualizations
  • gradio: Web interface

Architecture

  • Modular Design: Separated core logic in src/adaboost_core.py
  • Dynamic UI: Automatic parameter and input generation
  • Error Handling: Comprehensive validation and error messages
  • Responsive Design: Adapts to different screen sizes

πŸ’‘ Key Concepts

AdaBoost Benefits

  • Sequential Learning: Each weak learner adapts to errors of previous learners
  • Strong Performance: Often achieves high accuracy with simple weak learners
  • Feature Importance: Robust importance scores through adaptive learning
  • Adaptive Weighting: Focuses on misclassified examples by increasing their weights
  • Error Reduction: Each iteration reduces prediction errors adaptively

Sequential Ensemble Learning

  • Adaptive Reweighting: Increases weights of misclassified examples
  • Weighted Voting: Final prediction is weighted sum of weak learner predictions
  • Weak Learners: Uses simple estimators like decision stumps
  • Error-Focused Training: Each new learner focuses on previously misclassified examples

πŸ”§ Customization

Adding New Datasets

  1. Place CSV files in the data/ directory
  2. Update SAMPLE_DATA_CONFIG in app.py
  3. Ensure target column is properly configured

Modifying Parameters

  • Edit parameter ranges in the UI components
  • Adjust default values for different use cases
  • Add new parameter types as needed

πŸ“ˆ Performance Tips

  • Number of Estimators: Limited to 1000 for optimal performance in this demo
  • Learning Rate: Default 1.0 works well; lower rates (0.0001-0.1) create more conservative models, higher rates (1.0-2.0) for faster learning
  • Max Depth: Decision stumps (depth 1) typically optimal for AdaBoost
  • Weak Learners: Simple estimators work best to avoid overfitting

🎯 Use Cases

Classification

  • Medical diagnosis (Titanic survival prediction)
  • Customer segmentation
  • Fraud detection
  • Text classification

Regression

  • Price prediction
  • Demand forecasting
  • Quality assessment
  • Risk modeling

πŸ“ Notes

  • Auto-detection: Problem type automatically detected from target column
  • Data Validation: Comprehensive input validation and error handling
  • Memory Efficient: Optimized for sequential learning
  • Real-time Updates: Instant parameter adjustment and visualization
  • Estimator Selection: Interactive dropdown to explore individual weak learners (up to 100)
  • Adaptive Nature: Each estimator adapts to previous learners' mistakes through reweighting

πŸ”— Related Resources


This demo is part of AIO2025 Module 03 - Machine Learning Fundamentals