Model Card - AutoGluon ResNet18 (Aesthetic Classifier)

Overview

The AutoGluon ResNet18 model was trained on Closet Multimodal v1 to classify each outfit’s aesthetic (Street, Minimalist, Casual, Elegant …).
It serves as a baseline supervised benchmark against the zero-shot CLIP model.


Model Details

Field Description
Developed by Bareethul Kader & Nada Khan
Framework AutoGluon Multimodal
Repository bareethul/outfit-vibe-autogluon
License MIT

Intended Use

Direct Use

  • Educational AutoML demo on small image data.
  • Benchmark vs. pre-trained vision–language models.

Out-of-Scope Use

  • Production fashion recommendation or fit prediction.

Dataset

Source: bareethul/closet_multimodal_v1
Task: Multiclass aesthetic classification
Size: 500 images


Training Setup

  • Framework: AutoGluon MultiModalPredictor
  • Backbone: ResNet18
  • Metric: Accuracy
  • Split: 80/20 train/test
  • Epochs: ≈ 50
  • Hardware: Google Colab (T4 GPU)
  • Preset: medium_quality

Results

Metric Score
Test Accuracy 0.48
Weighted F1 0.47

Interpretation: Shows small-data limitations and establishes baseline performance for comparison with CLIP.


Limitations / Ethical Notes

  • Prone to overfitting on tiny datasets.
  • Subjective aesthetic labels.
  • Educational use only.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support