YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AutoML Regression Model for Shoe Dataset

Model Summary

This model was trained using AutoGluon Tabular (v1.4.0) on the dataset maryzhang/hw1-24679-tabular-dataset.
The task is regression, predicting the actual measured shoe length (mm) from shoe attributes.

  • Best Model: CatBoost_r177_BAG_L1 (bagged ensemble of CatBoost models)
  • Test R² Score: 0.8904 (≈ 89% variance explained)
  • Validation R² Score: 0.8049
  • Pearson correlation: 0.9473
  • RMSE: 1.80 mm
  • MAE: 1.10 mm
  • Median AE: 0.68 mm

These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.


Leaderboard (Top 5 Models)

Rank Model Test R² Val R² Pred Time (s) Fit Time (s)
1 CatBoost_r177_BAG_L1 0.8994 0.8049 0.0293 27.14
2 LightGBMLarge_BAG_L2 0.8971 0.7995 0.7011 238.93
3 CatBoost_BAG_L2 0.8939 0.8405 0.6155 276.40
4 CatBoost_r9_BAG_L1 0.8917 0.7889 0.0606 53.87
5 WeightedEnsemble_L3 0.8904 0.8500 0.9871 333.68

Dataset

  • Source: maryzhang/hw1-24679-tabular-dataset
  • Size: 338 samples (30 original, 308 augmented)
  • Features:
    • US size (numeric)
    • Shoe size (mm) (numeric)
    • Type of shoe (categorical)
    • Shoe color (categorical)
    • Shoe brand (categorical)
  • Target: Actual measured shoe length (mm)
  • Splits: 80% training, 20% testing (random_state=42)

Preprocessing

  • Converted Hugging Face dataset to Pandas DataFrame
  • Train/test split with stratified random seed
  • AutoGluon handled categorical encoding, normalization, and feature selection automatically

Training Setup

  • Framework: AutoGluon Tabular v1.4.0
  • Search Strategy: Bagged/stacked ensembles with model selection (presets="best")
  • Time Budget: 1200 seconds (20 minutes)
  • Evaluation Metric: R²
  • Hyperparameter Search: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)

Metrics

  • : 0.8904 (test)
  • RMSE: 1.80 mm
  • MAE: 1.10 mm
  • Median AE: 0.68 mm
  • Uncertainty: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.

Intended Use

  • Educational: Demonstrates AutoML regression in CMU course 24-679
  • Limitations:
    • Small dataset size (338 samples) → not robust for production use
    • Augmented data may not reflect real-world variability
    • Not suitable for medical or industrial applications

Ethical Considerations

  • Predictions should not be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
  • Dataset augmentation could introduce biases not present in real measurements.

License

  • Dataset: MIT License
  • Model: MIT License

Hardware / Compute

  • Training: Google Colab (CPU runtime)
  • Time: ~20 minutes wall-clock time
  • RAM: <8 GB used

AI Usage Disclosure

  • Model training and hyperparameter search used AutoML (AutoGluon).
  • Model card text and documentation partially generated with AI assistance (ChatGPT).

Acknowledgments

  • Dataset by Mary Zhang (CMU 24-679)
  • Model training and documentation by Yash Sakhale
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support