File size: 4,036 Bytes
b5761b9 ed2ebdb 1f33131 ed2ebdb c03d88e ed2ebdb 5c8b10e cfea7b6 23ceb7e cfea7b6 c03d88e 84f250e ffdd0c6 c03d88e ed2ebdb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | ---
license: apache-2.0
datasets:
- flwrlabs/pacs
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
- PACS-DG
- Image-Classification
- domain generalization
- SigLIP2
---

# **PACS-DG-SigLIP2**
> **PACS-DG-SigLIP2** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-class domain generalization** classification. It is trained to distinguish visual domains such as **art paintings**, **cartoons**, **photos**, and **sketches** using the **SiglipForImageClassification** architecture.
> [!note]
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786
```py
Classification Report:
precision recall f1-score support
art_painting 0.8538 0.9380 0.8939 2048
cartoon 0.9891 0.9330 0.9603 2344
photo 0.9029 0.8635 0.8828 1670
sketch 0.9990 1.0000 0.9995 3929
accuracy 0.9488 9991
macro avg 0.9362 0.9336 0.9341 9991
weighted avg 0.9509 0.9488 0.9491 9991
```

---
# **ID2Label Mapping**
```py
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("flwrlabs/pacs")
# Extract unique masterCategory values (assuming it's a string field)
labels = sorted(set(example["domain"] for example in dataset["train"]))
# Create id2label mapping
id2label = {str(i): label for i, label in enumerate(labels)}
# Print the mapping
print(id2label)
```
---
## **Label Space: 4 Domain Categories**
The model predicts the most probable visual domain from the following:
```
Class 0: "art_painting"
Class 1: "cartoon"
Class 2: "photo"
Class 3: "sketch"
```
---
## **Install dependencies**
```bash
pip install -q transformers torch pillow gradio
```
---
## **Inference Code**
```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/PACS-DG-SigLIP2" # Update to your actual model path on Hugging Face
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label map
id2label = {
"0": "art_painting",
"1": "cartoon",
"2": "photo",
"3": "sketch"
}
def classify_pacs_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_pacs_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=4, label="Predicted Domain Probabilities"),
title="PACS-DG-SigLIP2",
description="Upload an image to classify its visual domain: Art Painting, Cartoon, Photo, or Sketch."
)
if __name__ == "__main__":
iface.launch()
```
---
## **Intended Use**
The **PACS-DG-SigLIP2** model is designed to support tasks in **domain generalization**, particularly:
- **Cross-domain Visual Recognition** – Identify the domain style of an image.
- **Robust Representation Learning** – Aid in training or evaluating models on domain-shifted inputs.
- **Dataset Characterization** – Use as a tool to explore domain imbalance or drift.
- **Educational Tools** – Help understand how models distinguish between stylistic image variations. |