Model Card - AutoGluon ResNet18 (Aesthetic Classifier)

Overview

The AutoGluon ResNet18 model was trained on Closet Multimodal v1 to classify each outfit’s aesthetic (Street, Minimalist, Casual, Elegant …).
It serves as a baseline supervised benchmark against the zero-shot CLIP model.

Model Details

Field	Description
Developed by	Bareethul Kader & Nada Khan
Framework	AutoGluon Multimodal
Repository	bareethul/outfit-vibe-autogluon
License	MIT

Intended Use

Direct Use

Educational AutoML demo on small image data.
Benchmark vs. pre-trained vision–language models.

Out-of-Scope Use

Production fashion recommendation or fit prediction.

Dataset

Source: bareethul/closet_multimodal_v1
Task: Multiclass aesthetic classification
Size: 500 images

Training Setup

Framework: AutoGluon MultiModalPredictor
Backbone: ResNet18
Metric: Accuracy
Split: 80/20 train/test
Epochs: ≈ 50
Hardware: Google Colab (T4 GPU)
Preset: medium_quality

Results

Metric	Score
Test Accuracy	0.48
Weighted F1	0.47

Interpretation: Shows small-data limitations and establishes baseline performance for comparison with CLIP.

Limitations / Ethical Notes

Prone to overfitting on tiny datasets.
Subjective aesthetic labels.
Educational use only.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support