Model Card - AutoGluon ResNet18 (Aesthetic Classifier)
Overview
The AutoGluon ResNet18 model was trained on Closet Multimodal v1 to classify each outfit’s aesthetic (Street, Minimalist, Casual, Elegant …).
It serves as a baseline supervised benchmark against the zero-shot CLIP model.
Model Details
| Field | Description |
|---|---|
| Developed by | Bareethul Kader & Nada Khan |
| Framework | AutoGluon Multimodal |
| Repository | bareethul/outfit-vibe-autogluon |
| License | MIT |
Intended Use
Direct Use
- Educational AutoML demo on small image data.
- Benchmark vs. pre-trained vision–language models.
Out-of-Scope Use
- Production fashion recommendation or fit prediction.
Dataset
Source: bareethul/closet_multimodal_v1
Task: Multiclass aesthetic classification
Size: 500 images
Training Setup
- Framework: AutoGluon MultiModalPredictor
- Backbone: ResNet18
- Metric: Accuracy
- Split: 80/20 train/test
- Epochs: ≈ 50
- Hardware: Google Colab (T4 GPU)
- Preset:
medium_quality
Results
| Metric | Score |
|---|---|
| Test Accuracy | 0.48 |
| Weighted F1 | 0.47 |
Interpretation: Shows small-data limitations and establishes baseline performance for comparison with CLIP.
Limitations / Ethical Notes
- Prone to overfitting on tiny datasets.
- Subjective aesthetic labels.
- Educational use only.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support