metadata
license: cc-by-nc-4.0
datasets:
- uoft-cs/cifar10
language:
- en
base_model:
- facebook/metaclip-2-worldwide-s16
pipeline_tag: image-classification
library_name: transformers
tags:
- text-generation-inference
- cifar10
MetaCLIP-2-Cifar10
MetaCLIP-2-Cifar10 is an image classification vision–language encoder model fine-tuned from facebook/metaclip-2-worldwide-s16 for a single-label classification task. It is designed to identify and categorize images into the ten CIFAR-10 object classes using the MetaClip2ForImageClassification architecture.
MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062
Classification report:
precision recall f1-score support
airplane 0.9813 0.9685 0.9748 2000
automobile 0.9777 0.9850 0.9813 2000
bird 0.9560 0.9560 0.9560 2000
cat 0.9104 0.9395 0.9247 2000
deer 0.9566 0.9580 0.9573 2000
dog 0.9476 0.9215 0.9343 2000
frog 0.9774 0.9735 0.9755 2000
horse 0.9704 0.9670 0.9687 2000
ship 0.9782 0.9890 0.9836 2000
truck 0.9774 0.9735 0.9755 2000
accuracy 0.9631 20000
macro avg 0.9633 0.9632 0.9632 20000
weighted avg 0.9633 0.9631 0.9632 20000
The model classifies images into the following categories:
- Class 0: airplane
- Class 1: automobile
- Class 2: bird
- Class 3: cat
- Class 4: deer
- Class 5: dog
- Class 6: frog
- Class 7: horse
- Class 8: ship
- Class 9: truck
Run with Transformers
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/MetaCLIP-2-Cifar10"
model = AutoModelForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def cifar10_classification(image):
"""Predicts the CIFAR-10 class represented in an image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
labels = {
"0": "airplane",
"1": "automobile",
"2": "bird",
"3": "cat",
"4": "deer",
"5": "dog",
"6": "frog",
"7": "horse",
"8": "ship",
"9": "truck"
}
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=cifar10_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="CIFAR-10 Classification",
description="Upload an image to classify it into one of the CIFAR-10 categories."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
Sample Inference:
Intended Use:
The MetaCLIP-2-Cifar10 model is designed for object classification across the ten CIFAR-10 categories. Potential use cases include:
- Educational & Research Applications: Benchmarking experiments, model comparison, and deep learning studies.
- Lightweight Vision Systems: Useful for systems requiring simple object recognition.
- Dataset Exploration: Assisting in data inspection, annotation, and visualization.
- Prototype Systems: Ideal for rapid prototyping in classification pipelines.








