|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- flwrlabs/pacs |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google/siglip2-base-patch16-224 |
|
|
pipeline_tag: image-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- PACS-DG |
|
|
- Image-Classification |
|
|
- domain generalization |
|
|
- SigLIP2 |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# **PACS-DG-SigLIP2** |
|
|
|
|
|
> **PACS-DG-SigLIP2** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-class domain generalization** classification. It is trained to distinguish visual domains such as **art paintings**, **cartoons**, **photos**, and **sketches** using the **SiglipForImageClassification** architecture. |
|
|
|
|
|
> [!note] |
|
|
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786 |
|
|
|
|
|
```py |
|
|
Classification Report: |
|
|
precision recall f1-score support |
|
|
|
|
|
art_painting 0.8538 0.9380 0.8939 2048 |
|
|
cartoon 0.9891 0.9330 0.9603 2344 |
|
|
photo 0.9029 0.8635 0.8828 1670 |
|
|
sketch 0.9990 1.0000 0.9995 3929 |
|
|
|
|
|
accuracy 0.9488 9991 |
|
|
macro avg 0.9362 0.9336 0.9341 9991 |
|
|
weighted avg 0.9509 0.9488 0.9491 9991 |
|
|
``` |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
# **ID2Label Mapping** |
|
|
|
|
|
```py |
|
|
from datasets import load_dataset |
|
|
|
|
|
# Load the dataset |
|
|
dataset = load_dataset("flwrlabs/pacs") |
|
|
|
|
|
# Extract unique masterCategory values (assuming it's a string field) |
|
|
labels = sorted(set(example["domain"] for example in dataset["train"])) |
|
|
|
|
|
# Create id2label mapping |
|
|
id2label = {str(i): label for i, label in enumerate(labels)} |
|
|
|
|
|
# Print the mapping |
|
|
print(id2label) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## **Label Space: 4 Domain Categories** |
|
|
|
|
|
The model predicts the most probable visual domain from the following: |
|
|
|
|
|
``` |
|
|
Class 0: "art_painting" |
|
|
Class 1: "cartoon" |
|
|
Class 2: "photo" |
|
|
Class 3: "sketch" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## **Install dependencies** |
|
|
|
|
|
```bash |
|
|
pip install -q transformers torch pillow gradio |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## **Inference Code** |
|
|
|
|
|
```python |
|
|
import gradio as gr |
|
|
from transformers import AutoImageProcessor, SiglipForImageClassification |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load model and processor |
|
|
model_name = "prithivMLmods/PACS-DG-SigLIP2" # Update to your actual model path on Hugging Face |
|
|
model = SiglipForImageClassification.from_pretrained(model_name) |
|
|
processor = AutoImageProcessor.from_pretrained(model_name) |
|
|
|
|
|
# Label map |
|
|
id2label = { |
|
|
"0": "art_painting", |
|
|
"1": "cartoon", |
|
|
"2": "photo", |
|
|
"3": "sketch" |
|
|
} |
|
|
|
|
|
def classify_pacs_image(image): |
|
|
image = Image.fromarray(image).convert("RGB") |
|
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() |
|
|
|
|
|
prediction = { |
|
|
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs)) |
|
|
} |
|
|
|
|
|
return prediction |
|
|
|
|
|
# Gradio Interface |
|
|
iface = gr.Interface( |
|
|
fn=classify_pacs_image, |
|
|
inputs=gr.Image(type="numpy"), |
|
|
outputs=gr.Label(num_top_classes=4, label="Predicted Domain Probabilities"), |
|
|
title="PACS-DG-SigLIP2", |
|
|
description="Upload an image to classify its visual domain: Art Painting, Cartoon, Photo, or Sketch." |
|
|
) |
|
|
|
|
|
if __name__ == "__main__": |
|
|
iface.launch() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## **Intended Use** |
|
|
|
|
|
The **PACS-DG-SigLIP2** model is designed to support tasks in **domain generalization**, particularly: |
|
|
|
|
|
- **Cross-domain Visual Recognition** – Identify the domain style of an image. |
|
|
- **Robust Representation Learning** – Aid in training or evaluating models on domain-shifted inputs. |
|
|
- **Dataset Characterization** – Use as a tool to explore domain imbalance or drift. |
|
|
- **Educational Tools** – Help understand how models distinguish between stylistic image variations. |