YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
AutoML Regression Model for Shoe Dataset
Model Summary
This model was trained using AutoGluon Tabular (v1.4.0) on the dataset maryzhang/hw1-24679-tabular-dataset.
The task is regression, predicting the actual measured shoe length (mm) from shoe attributes.
- Best Model:
CatBoost_r177_BAG_L1(bagged ensemble of CatBoost models) - Test R² Score: 0.8904 (≈ 89% variance explained)
- Validation R² Score: 0.8049
- Pearson correlation: 0.9473
- RMSE: 1.80 mm
- MAE: 1.10 mm
- Median AE: 0.68 mm
These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.
Leaderboard (Top 5 Models)
| Rank | Model | Test R² | Val R² | Pred Time (s) | Fit Time (s) |
|---|---|---|---|---|---|
| 1 | CatBoost_r177_BAG_L1 | 0.8994 | 0.8049 | 0.0293 | 27.14 |
| 2 | LightGBMLarge_BAG_L2 | 0.8971 | 0.7995 | 0.7011 | 238.93 |
| 3 | CatBoost_BAG_L2 | 0.8939 | 0.8405 | 0.6155 | 276.40 |
| 4 | CatBoost_r9_BAG_L1 | 0.8917 | 0.7889 | 0.0606 | 53.87 |
| 5 | WeightedEnsemble_L3 | 0.8904 | 0.8500 | 0.9871 | 333.68 |
Dataset
- Source: maryzhang/hw1-24679-tabular-dataset
- Size: 338 samples (30 original, 308 augmented)
- Features:
- US size (numeric)
- Shoe size (mm) (numeric)
- Type of shoe (categorical)
- Shoe color (categorical)
- Shoe brand (categorical)
- Target: Actual measured shoe length (mm)
- Splits: 80% training, 20% testing (random_state=42)
Preprocessing
- Converted Hugging Face dataset to Pandas DataFrame
- Train/test split with stratified random seed
- AutoGluon handled categorical encoding, normalization, and feature selection automatically
Training Setup
- Framework: AutoGluon Tabular v1.4.0
- Search Strategy: Bagged/stacked ensembles with model selection (
presets="best") - Time Budget: 1200 seconds (20 minutes)
- Evaluation Metric: R²
- Hyperparameter Search: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)
Metrics
- R²: 0.8904 (test)
- RMSE: 1.80 mm
- MAE: 1.10 mm
- Median AE: 0.68 mm
- Uncertainty: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.
Intended Use
- Educational: Demonstrates AutoML regression in CMU course 24-679
- Limitations:
- Small dataset size (338 samples) → not robust for production use
- Augmented data may not reflect real-world variability
- Not suitable for medical or industrial applications
Ethical Considerations
- Predictions should not be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
- Dataset augmentation could introduce biases not present in real measurements.
License
- Dataset: MIT License
- Model: MIT License
Hardware / Compute
- Training: Google Colab (CPU runtime)
- Time: ~20 minutes wall-clock time
- RAM: <8 GB used
AI Usage Disclosure
- Model training and hyperparameter search used AutoML (AutoGluon).
- Model card text and documentation partially generated with AI assistance (ChatGPT).
Acknowledgments
- Dataset by Mary Zhang (CMU 24-679)
- Model training and documentation by Yash Sakhale
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support