SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper
•
2502.14786
•
Published
•
157
Face-Confidence-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to distinguish between images of confident faces and unconfident faces using the SiglipForImageClassification architecture.
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786
Classification report:
precision recall f1-score support
confident 0.8468 0.8179 0.8321 4872
unconfident 0.8691 0.8909 0.8799 6611
accuracy 0.8600 11483
macro avg 0.8580 0.8544 0.8560 11483
weighted avg 0.8596 0.8600 0.8596 11483
The model classifies each image into one of the following categories:
Class 0: "confident"
Class 1: "unconfident"
pip install -q transformers torch pillow gradio
Image Scale (Optimal): 256 × 256
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Face-Confidence-SigLIP2" # Replace with your model path if different
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "confident",
"1": "unconfident"
}
def classify_face_confidence(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_face_confidence,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=2, label="Face Confidence Classification"),
title="Face-Confidence-SigLIP2",
description="Upload an image to detect if a face looks confident or unconfident."
)
if __name__ == "__main__":
iface.launch()
Face-Confidence-SigLIP2 can be used for:
Base model
google/siglip2-base-patch16-224