ysakhale commited on
Commit
2854858
·
verified ·
1 Parent(s): abe3196

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -9
README.md CHANGED
@@ -4,14 +4,16 @@
4
  This model was trained using **AutoGluon Tabular (v1.4.0)** on the dataset [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset).
5
  The task is **regression**, predicting the **actual measured shoe length (mm)** from shoe attributes.
6
 
7
- - **Best Model**: `CatBoost_r177_BAG_L1` (stacked ensemble of CatBoost models)
8
- - **Test R² Score**: **0.8904**
9
  - **Validation R² Score**: 0.8049
10
  - **Pearson correlation**: 0.9473
11
  - **RMSE**: 1.80 mm
12
  - **MAE**: 1.10 mm
13
  - **Median AE**: 0.68 mm
14
 
 
 
15
  ---
16
 
17
  ## Leaderboard (Top 5 Models)
@@ -34,24 +36,70 @@ The task is **regression**, predicting the **actual measured shoe length (mm)**
34
  - Type of shoe (categorical)
35
  - Shoe color (categorical)
36
  - Shoe brand (categorical)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ---
39
 
40
  ## Intended Use
41
- - Educational use only
42
- - Demonstration of **AutoML for regression** in CMU course 24-679
 
 
 
 
 
43
 
44
- Not suitable for real-world footwear sizing or medical/orthopedic applications.
 
 
45
 
46
  ---
47
 
48
  ## License
49
- - Dataset: MIT
50
- - Model: MIT
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ---
53
 
54
  ## Acknowledgments
55
  - Dataset by **Mary Zhang (CMU 24-679)**
56
- - AutoML training performed by **Yash Sakhale** using AutoGluon v1.4.0
57
-
 
4
  This model was trained using **AutoGluon Tabular (v1.4.0)** on the dataset [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset).
5
  The task is **regression**, predicting the **actual measured shoe length (mm)** from shoe attributes.
6
 
7
+ - **Best Model**: `CatBoost_r177_BAG_L1` (bagged ensemble of CatBoost models)
8
+ - **Test R² Score**: **0.8904** (≈ 89% variance explained)
9
  - **Validation R² Score**: 0.8049
10
  - **Pearson correlation**: 0.9473
11
  - **RMSE**: 1.80 mm
12
  - **MAE**: 1.10 mm
13
  - **Median AE**: 0.68 mm
14
 
15
+ These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.
16
+
17
  ---
18
 
19
  ## Leaderboard (Top 5 Models)
 
36
  - Type of shoe (categorical)
37
  - Shoe color (categorical)
38
  - Shoe brand (categorical)
39
+ - **Target**: *Actual measured shoe length (mm)*
40
+ - **Splits**: 80% training, 20% testing (random_state=42)
41
+
42
+ ---
43
+
44
+ ## Preprocessing
45
+ - Converted Hugging Face dataset to Pandas DataFrame
46
+ - Train/test split with stratified random seed
47
+ - AutoGluon handled categorical encoding, normalization, and feature selection automatically
48
+
49
+ ---
50
+
51
+ ## Training Setup
52
+ - **Framework**: AutoGluon Tabular v1.4.0
53
+ - **Search Strategy**: Bagged/stacked ensembles with model selection (`presets="best"`)
54
+ - **Time Budget**: 1200 seconds (20 minutes)
55
+ - **Evaluation Metric**: R²
56
+ - **Hyperparameter Search**: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)
57
+
58
+ ---
59
+
60
+ ## Metrics
61
+ - **R²**: 0.8904 (test)
62
+ - **RMSE**: 1.80 mm
63
+ - **MAE**: 1.10 mm
64
+ - **Median AE**: 0.68 mm
65
+ - **Uncertainty**: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.
66
 
67
  ---
68
 
69
  ## Intended Use
70
+ - **Educational**: Demonstrates AutoML regression in CMU course 24-679
71
+ - **Limitations**:
72
+ - Small dataset size (338 samples) → not robust for production use
73
+ - Augmented data may not reflect real-world variability
74
+ - Not suitable for medical or industrial applications
75
+
76
+ ---
77
 
78
+ ## Ethical Considerations
79
+ - Predictions should **not** be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
80
+ - Dataset augmentation could introduce biases not present in real measurements.
81
 
82
  ---
83
 
84
  ## License
85
+ - **Dataset**: MIT License
86
+ - **Model**: MIT License
87
+
88
+ ---
89
+
90
+ ## Hardware / Compute
91
+ - **Training**: Google Colab (CPU runtime)
92
+ - **Time**: ~20 minutes wall-clock time
93
+ - **RAM**: <8 GB used
94
+
95
+ ---
96
+
97
+ ## AI Usage Disclosure
98
+ - Model training and hyperparameter search used **AutoML (AutoGluon)**.
99
+ - Model card text and documentation partially generated with **AI assistance (ChatGPT)**.
100
 
101
  ---
102
 
103
  ## Acknowledgments
104
  - Dataset by **Mary Zhang (CMU 24-679)**
105
+ - Model training and documentation by **Yash Sakhale**