Homework2-task1 / README.md

Update README.md

2854858 verified 3 months ago

3.74 kB

	# AutoML Regression Model for Shoe Dataset

	## Model Summary
	This model was trained using AutoGluon Tabular (v1.4.0) on the dataset [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset).
	The task is regression, predicting the actual measured shoe length (mm) from shoe attributes.

	- Best Model: `CatBoost_r177_BAG_L1` (bagged ensemble of CatBoost models)
	- Test R² Score: 0.8904 (≈ 89% variance explained)
	- Validation R² Score: 0.8049
	- Pearson correlation: 0.9473
	- RMSE: 1.80 mm
	- MAE: 1.10 mm
	- Median AE: 0.68 mm

	These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.

	---

	## Leaderboard (Top 5 Models)
	\| Rank \| Model \| Test R² \| Val R² \| Pred Time (s) \| Fit Time (s) \|
	\|------\|------------------------\|---------\|---------\|---------------\|--------------\|
	\| 1 \| CatBoost_r177_BAG_L1 \| 0.8994 \| 0.8049 \| 0.0293 \| 27.14 \|
	\| 2 \| LightGBMLarge_BAG_L2 \| 0.8971 \| 0.7995 \| 0.7011 \| 238.93 \|
	\| 3 \| CatBoost_BAG_L2 \| 0.8939 \| 0.8405 \| 0.6155 \| 276.40 \|
	\| 4 \| CatBoost_r9_BAG_L1 \| 0.8917 \| 0.7889 \| 0.0606 \| 53.87 \|
	\| 5 \| WeightedEnsemble_L3 \| 0.8904 \| 0.8500 \| 0.9871 \| 333.68 \|

	---

	## Dataset
	- Source: [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset)
	- Size: 338 samples (30 original, 308 augmented)
	- Features:
	- US size (numeric)
	- Shoe size (mm) (numeric)
	- Type of shoe (categorical)
	- Shoe color (categorical)
	- Shoe brand (categorical)
	- Target: Actual measured shoe length (mm)
	- Splits: 80% training, 20% testing (random_state=42)

	---

	## Preprocessing
	- Converted Hugging Face dataset to Pandas DataFrame
	- Train/test split with stratified random seed
	- AutoGluon handled categorical encoding, normalization, and feature selection automatically

	---

	## Training Setup
	- Framework: AutoGluon Tabular v1.4.0
	- Search Strategy: Bagged/stacked ensembles with model selection (`presets="best"`)
	- Time Budget: 1200 seconds (20 minutes)
	- Evaluation Metric: R²
	- Hyperparameter Search: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)

	---

	## Metrics
	- R²: 0.8904 (test)
	- RMSE: 1.80 mm
	- MAE: 1.10 mm
	- Median AE: 0.68 mm
	- Uncertainty: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.

	---

	## Intended Use
	- Educational: Demonstrates AutoML regression in CMU course 24-679
	- Limitations:
	- Small dataset size (338 samples) → not robust for production use
	- Augmented data may not reflect real-world variability
	- Not suitable for medical or industrial applications

	---

	## Ethical Considerations
	- Predictions should not be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
	- Dataset augmentation could introduce biases not present in real measurements.

	---

	## License
	- Dataset: MIT License
	- Model: MIT License

	---

	## Hardware / Compute
	- Training: Google Colab (CPU runtime)
	- Time: ~20 minutes wall-clock time
	- RAM: <8 GB used

	---

	## AI Usage Disclosure
	- Model training and hyperparameter search used AutoML (AutoGluon).
	- Model card text and documentation partially generated with AI assistance (ChatGPT).

	---

	## Acknowledgments
	- Dataset by Mary Zhang (CMU 24-679)
	- Model training and documentation by Yash Sakhale

	# AutoML Regression Model for Shoe Dataset

	## Model Summary
	This model was trained using AutoGluon Tabular (v1.4.0) on the dataset [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset).
	The task is regression, predicting the actual measured shoe length (mm) from shoe attributes.

	- Best Model: `CatBoost_r177_BAG_L1` (bagged ensemble of CatBoost models)
	- Test R² Score: 0.8904 (≈ 89% variance explained)
	- Validation R² Score: 0.8049
	- Pearson correlation: 0.9473
	- RMSE: 1.80 mm
	- MAE: 1.10 mm
	- Median AE: 0.68 mm

	These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.

	---

	## Leaderboard (Top 5 Models)
	\| Rank \| Model \| Test R² \| Val R² \| Pred Time (s) \| Fit Time (s) \|
	\|------\|------------------------\|---------\|---------\|---------------\|--------------\|
	\| 1 \| CatBoost_r177_BAG_L1 \| 0.8994 \| 0.8049 \| 0.0293 \| 27.14 \|
	\| 2 \| LightGBMLarge_BAG_L2 \| 0.8971 \| 0.7995 \| 0.7011 \| 238.93 \|
	\| 3 \| CatBoost_BAG_L2 \| 0.8939 \| 0.8405 \| 0.6155 \| 276.40 \|
	\| 4 \| CatBoost_r9_BAG_L1 \| 0.8917 \| 0.7889 \| 0.0606 \| 53.87 \|
	\| 5 \| WeightedEnsemble_L3 \| 0.8904 \| 0.8500 \| 0.9871 \| 333.68 \|

	---

	## Dataset
	- Source: [maryzhang/hw1-24679-tabular-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset)
	- Size: 338 samples (30 original, 308 augmented)
	- Features:
	- US size (numeric)
	- Shoe size (mm) (numeric)
	- Type of shoe (categorical)
	- Shoe color (categorical)
	- Shoe brand (categorical)
	- Target: Actual measured shoe length (mm)
	- Splits: 80% training, 20% testing (random_state=42)

	---

	## Preprocessing
	- Converted Hugging Face dataset to Pandas DataFrame
	- Train/test split with stratified random seed
	- AutoGluon handled categorical encoding, normalization, and feature selection automatically

	---

	## Training Setup
	- Framework: AutoGluon Tabular v1.4.0
	- Search Strategy: Bagged/stacked ensembles with model selection (`presets="best"`)
	- Time Budget: 1200 seconds (20 minutes)
	- Evaluation Metric: R²
	- Hyperparameter Search: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)

	---

	## Metrics
	- R²: 0.8904 (test)
	- RMSE: 1.80 mm
	- MAE: 1.10 mm
	- Median AE: 0.68 mm
	- Uncertainty: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.

	---

	## Intended Use
	- Educational: Demonstrates AutoML regression in CMU course 24-679
	- Limitations:
	- Small dataset size (338 samples) → not robust for production use
	- Augmented data may not reflect real-world variability
	- Not suitable for medical or industrial applications

	---

	## Ethical Considerations
	- Predictions should not be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
	- Dataset augmentation could introduce biases not present in real measurements.

	---

	## License
	- Dataset: MIT License
	- Model: MIT License

	---

	## Hardware / Compute
	- Training: Google Colab (CPU runtime)
	- Time: ~20 minutes wall-clock time
	- RAM: <8 GB used

	---

	## AI Usage Disclosure
	- Model training and hyperparameter search used AutoML (AutoGluon).
	- Model card text and documentation partially generated with AI assistance (ChatGPT).

	---

	## Acknowledgments
	- Dataset by Mary Zhang (CMU 24-679)
	- Model training and documentation by Yash Sakhale