PACS-DG-SigLIP2 / README.md

Update README.md

1f33131 verified 10 months ago

4.04 kB

	---
	license: apache-2.0
	datasets:
	- flwrlabs/pacs
	language:
	- en
	base_model:
	- google/siglip2-base-patch16-224
	pipeline_tag: image-classification
	library_name: transformers
	tags:
	- PACS-DG
	- Image-Classification
	- domain generalization
	- SigLIP2
	---

	![4.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/2M1HRenGKvzLJiAdaexKs.png)

	# PACS-DG-SigLIP2

	> PACS-DG-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class domain generalization classification. It is trained to distinguish visual domains such as art paintings, cartoons, photos, and sketches using the SiglipForImageClassification architecture.

	> [!note]
	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

	```py
	Classification Report:
	precision recall f1-score support

	art_painting 0.8538 0.9380 0.8939 2048
	cartoon 0.9891 0.9330 0.9603 2344
	photo 0.9029 0.8635 0.8828 1670
	sketch 0.9990 1.0000 0.9995 3929

	accuracy 0.9488 9991
	macro avg 0.9362 0.9336 0.9341 9991
	weighted avg 0.9509 0.9488 0.9491 9991
	```

	![download (1).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/lCLDN4U4zT8U2viaJyV1d.png)

	---

	# ID2Label Mapping

	```py
	from datasets import load_dataset

	# Load the dataset
	dataset = load_dataset("flwrlabs/pacs")

	# Extract unique masterCategory values (assuming it's a string field)
	labels = sorted(set(example["domain"] for example in dataset["train"]))

	# Create id2label mapping
	id2label = {str(i): label for i, label in enumerate(labels)}

	# Print the mapping
	print(id2label)
	```

	---

	## Label Space: 4 Domain Categories

	The model predicts the most probable visual domain from the following:

	```
	Class 0: "art_painting"
	Class 1: "cartoon"
	Class 2: "photo"
	Class 3: "sketch"
	```

	---

	## Install dependencies

	```bash
	pip install -q transformers torch pillow gradio
	```

	---

	## Inference Code

	```python
	import gradio as gr
	from transformers import AutoImageProcessor, SiglipForImageClassification
	from PIL import Image
	import torch

	# Load model and processor
	model_name = "prithivMLmods/PACS-DG-SigLIP2" # Update to your actual model path on Hugging Face
	model = SiglipForImageClassification.from_pretrained(model_name)
	processor = AutoImageProcessor.from_pretrained(model_name)

	# Label map
	id2label = {
	"0": "art_painting",
	"1": "cartoon",
	"2": "photo",
	"3": "sketch"
	}

	def classify_pacs_image(image):
	image = Image.fromarray(image).convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

	prediction = {
	id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
	}

	return prediction

	# Gradio Interface
	iface = gr.Interface(
	fn=classify_pacs_image,
	inputs=gr.Image(type="numpy"),
	outputs=gr.Label(num_top_classes=4, label="Predicted Domain Probabilities"),
	title="PACS-DG-SigLIP2",
	description="Upload an image to classify its visual domain: Art Painting, Cartoon, Photo, or Sketch."
	)

	if __name__ == "__main__":
	iface.launch()
	```

	---

	## Intended Use

	The PACS-DG-SigLIP2 model is designed to support tasks in domain generalization, particularly:

	- Cross-domain Visual Recognition – Identify the domain style of an image.
	- Robust Representation Learning – Aid in training or evaluating models on domain-shifted inputs.
	- Dataset Characterization – Use as a tool to explore domain imbalance or drift.
	- Educational Tools – Help understand how models distinguish between stylistic image variations.