ysakhale
/

Homework2-task1

Model card Files Files and versions

xet

Community

ysakhale commited on Sep 21, 2025

Commit

2854858

verified ·

1 Parent(s): abe3196

Update README.md

Browse files

Files changed (1) hide show

README.md +57 -9

README.md CHANGED Viewed

@@ -4,14 +4,16 @@
 This model was trained using **AutoGluon Tabular (v1.4.0)** on the dataset [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset).
 The task is **regression**, predicting the **actual measured shoe length (mm)** from shoe attributes.
-- **Best Model**: `CatBoost_r177_BAG_L1` (stacked ensemble of CatBoost models)
-- **Test R² Score**: **0.8904**
 - **Validation R² Score**: 0.8049
 - **Pearson correlation**: 0.9473
 - **RMSE**: 1.80 mm
 - **MAE**: 1.10 mm
 - **Median AE**: 0.68 mm
 ---
 ## Leaderboard (Top 5 Models)
@@ -34,24 +36,70 @@ The task is **regression**, predicting the **actual measured shoe length (mm)**
   - Type of shoe (categorical)
   - Shoe color (categorical)
   - Shoe brand (categorical)
 ---
 ## Intended Use
-- Educational use only
-- Demonstration of **AutoML for regression** in CMU course 24-679
-Not suitable for real-world footwear sizing or medical/orthopedic applications.
 ---
 ## License
-- Dataset: MIT
-- Model: MIT
 ---
 ## Acknowledgments
 - Dataset by **Mary Zhang (CMU 24-679)**
-- AutoML training performed by **Yash Sakhale** using AutoGluon v1.4.0

 This model was trained using **AutoGluon Tabular (v1.4.0)** on the dataset [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset).
 The task is **regression**, predicting the **actual measured shoe length (mm)** from shoe attributes.
+- **Best Model**: `CatBoost_r177_BAG_L1` (bagged ensemble of CatBoost models)
+- **Test R² Score**: **0.8904** (≈ 89% variance explained)
 - **Validation R² Score**: 0.8049
 - **Pearson correlation**: 0.9473
 - **RMSE**: 1.80 mm
 - **MAE**: 1.10 mm
 - **Median AE**: 0.68 mm
+These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.
 ---
 ## Leaderboard (Top 5 Models)
   - Type of shoe (categorical)
   - Shoe color (categorical)
   - Shoe brand (categorical)
+- **Target**: *Actual measured shoe length (mm)*
+- **Splits**: 80% training, 20% testing (random_state=42)
+---
+## Preprocessing
+- Converted Hugging Face dataset to Pandas DataFrame
+- Train/test split with stratified random seed
+- AutoGluon handled categorical encoding, normalization, and feature selection automatically
+---
+## Training Setup
+- **Framework**: AutoGluon Tabular v1.4.0
+- **Search Strategy**: Bagged/stacked ensembles with model selection (`presets="best"`)
+- **Time Budget**: 1200 seconds (20 minutes)
+- **Evaluation Metric**: R²
+- **Hyperparameter Search**: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)
+---
+## Metrics
+- **R²**: 0.8904 (test)
+- **RMSE**: 1.80 mm
+- **MAE**: 1.10 mm
+- **Median AE**: 0.68 mm
+- **Uncertainty**: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.
 ---
 ## Intended Use
+- **Educational**: Demonstrates AutoML regression in CMU course 24-679
+- **Limitations**:
+  - Small dataset size (338 samples) → not robust for production use
+  - Augmented data may not reflect real-world variability
+  - Not suitable for medical or industrial applications
+---
+## Ethical Considerations
+- Predictions should **not** be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
+- Dataset augmentation could introduce biases not present in real measurements.
 ---
 ## License
+- **Dataset**: MIT License
+- **Model**: MIT License
+---
+## Hardware / Compute
+- **Training**: Google Colab (CPU runtime)
+- **Time**: ~20 minutes wall-clock time
+- **RAM**: <8 GB used
+---
+## AI Usage Disclosure
+- Model training and hyperparameter search used **AutoML (AutoGluon)**.
+- Model card text and documentation partially generated with **AI assistance (ChatGPT)**.
 ---
 ## Acknowledgments
 - Dataset by **Mary Zhang (CMU 24-679)**
+- Model training and documentation by **Yash Sakhale**