|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google/siglip2-so400m-patch14-384 |
|
|
pipeline_tag: image-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- fashion |
|
|
- product |
|
|
- usage |
|
|
- Casual |
|
|
- Ethnic |
|
|
- Formal |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# **Fashion-Product-Usage** |
|
|
|
|
|
> **Fashion-Product-Usage** is a vision-language model fine-tuned from **google/siglip2-base-patch16-224** using the **SiglipForImageClassification** architecture. It classifies fashion product images based on their intended usage context. |
|
|
|
|
|
```py |
|
|
Classification Report: |
|
|
precision recall f1-score support |
|
|
|
|
|
Casual 0.8529 0.9716 0.9084 34392 |
|
|
Ethnic 0.8365 0.7528 0.7925 3208 |
|
|
Formal 0.7246 0.3006 0.4250 2345 |
|
|
Home 0.0000 0.0000 0.0000 1 |
|
|
Party 0.0000 0.0000 0.0000 29 |
|
|
Smart Casual 0.0000 0.0000 0.0000 67 |
|
|
Sports 0.7157 0.1848 0.2938 4004 |
|
|
Travel 0.0000 0.0000 0.0000 26 |
|
|
|
|
|
accuracy 0.8458 44072 |
|
|
macro avg 0.3912 0.2762 0.3024 44072 |
|
|
weighted avg 0.8300 0.8458 0.8159 44072 |
|
|
``` |
|
|
|
|
|
The model predicts one of the following usage categories: |
|
|
|
|
|
- **0:** Casual |
|
|
- **1:** Ethnic |
|
|
- **2:** Formal |
|
|
- **3:** Home |
|
|
- **4:** Party |
|
|
- **5:** Smart Casual |
|
|
- **6:** Sports |
|
|
- **7:** Travel |
|
|
|
|
|
--- |
|
|
|
|
|
# **Run with Transformers 🤗** |
|
|
|
|
|
```python |
|
|
!pip install -q transformers torch pillow gradio |
|
|
``` |
|
|
|
|
|
```python |
|
|
import gradio as gr |
|
|
from transformers import AutoImageProcessor, SiglipForImageClassification |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load model and processor |
|
|
model_name = "prithivMLmods/Fashion-Product-Usage" # Replace with your actual model path |
|
|
model = SiglipForImageClassification.from_pretrained(model_name) |
|
|
processor = AutoImageProcessor.from_pretrained(model_name) |
|
|
|
|
|
# Label mapping |
|
|
id2label = { |
|
|
0: "Casual", |
|
|
1: "Ethnic", |
|
|
2: "Formal", |
|
|
3: "Home", |
|
|
4: "Party", |
|
|
5: "Smart Casual", |
|
|
6: "Sports", |
|
|
7: "Travel" |
|
|
} |
|
|
|
|
|
def classify_usage(image): |
|
|
"""Predicts the usage type of a fashion product.""" |
|
|
image = Image.fromarray(image).convert("RGB") |
|
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() |
|
|
|
|
|
predictions = {id2label[i]: round(probs[i], 3) for i in range(len(probs))} |
|
|
return predictions |
|
|
|
|
|
# Gradio interface |
|
|
iface = gr.Interface( |
|
|
fn=classify_usage, |
|
|
inputs=gr.Image(type="numpy"), |
|
|
outputs=gr.Label(label="Usage Prediction Scores"), |
|
|
title="Fashion-Product-Usage", |
|
|
description="Upload a fashion product image to predict its intended usage (Casual, Formal, Party, etc.)." |
|
|
) |
|
|
|
|
|
# Launch the app |
|
|
if __name__ == "__main__": |
|
|
iface.launch() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
# **Intended Use** |
|
|
|
|
|
This model can be used for: |
|
|
|
|
|
- **Product tagging in e-commerce catalogs** |
|
|
- **Context-aware product recommendations** |
|
|
- **Fashion search optimization** |
|
|
- **Data annotation for training recommendation engines** |