|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: zero-shot-image-classification |
|
|
tags: |
|
|
- datology |
|
|
- clip |
|
|
- vision |
|
|
- OpenCLIP |
|
|
- datacomp |
|
|
- zero-shot-classification |
|
|
--- |
|
|
|
|
|
# DatologyAI CLIP Classification Optimized ViT-B/32 |
|
|
|
|
|
**DatologyAI CLIP** is a state-of-the-art contrastive vision-language model that achieves superior performance through advanced data curation alone, without any architectural or training modifications. This classification-optimized ViT-B/32 model outperforms SigLIP2, MetaCLIP, and DFN on zero-shot classification benchmarks. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
DatologyAI's CLIP model demonstrates that careful data curation can drive state-of-the-art performance without modifications to model architecture or training paradigms. Key achievements include: |
|
|
|
|
|
- **76.91% ImageNet1k accuracy** (vs 74.0% for SigLIP2) |
|
|
- **8x training efficiency** compared to standard approaches |
|
|
- Trained on 13B curated image-text pairs from DataComp |
|
|
- Standard CLIP architecture and training procedure |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
You can use this model for zero-shot image classification or as a vision encoder for VLMs and other vision tasks. |
|
|
|
|
|
### Zero-shot Image Classification |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from PIL import Image |
|
|
import open_clip |
|
|
|
|
|
# Load model and preprocessing |
|
|
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:DatologyAI/cls-opt-vit-b-32') |
|
|
tokenizer = open_clip.get_tokenizer('hf-hub:DatologyAI/cls-opt-vit-b-32') |
|
|
|
|
|
# Load image |
|
|
image = preprocess(Image.open("path/to/image.jpg")).unsqueeze(0) |
|
|
|
|
|
# Define candidate labels |
|
|
labels = ["a dog", "a cat", "a bird"] |
|
|
text = tokenizer(labels) |
|
|
|
|
|
# Run inference |
|
|
with torch.no_grad(): |
|
|
image_features = model.encode_image(image) |
|
|
text_features = model.encode_text(text) |
|
|
|
|
|
# Normalize features |
|
|
image_features /= image_features.norm(dim=-1, keepdim=True) |
|
|
text_features /= text_features.norm(dim=-1, keepdim=True) |
|
|
|
|
|
# Calculate similarity |
|
|
similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1) |
|
|
|
|
|
# Get predictions |
|
|
values, indices = similarity[0].topk(3) |
|
|
for value, index in zip(values, indices): |
|
|
print(f"{labels[index]}: {value.item():.2%}") |
|
|
``` |
|
|
|
|
|
### Image Encoding |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from PIL import Image |
|
|
import open_clip |
|
|
|
|
|
# Load model |
|
|
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:DatologyAI/cls-opt-vit-b-32') |
|
|
model.eval() |
|
|
|
|
|
# Process image |
|
|
image = preprocess(Image.open("path/to/image.jpg")).unsqueeze(0) |
|
|
|
|
|
# Extract features |
|
|
with torch.no_grad(): |
|
|
image_features = model.encode_image(image) |
|
|
|
|
|
print(f"Feature shape: {image_features.shape}") # [1, 512] |
|
|
``` |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
DatologyAI's training pipeline focuses on sophisticated data curation techniques including: |
|
|
|
|
|
1. **Improved target distribution matching** - Task-specific alignment of image features for classification |
|
|
2. **Enhanced synthetic data generation** - Optimized caption generation for classification tasks |
|
|
3. **Predictive metrics for curation quality** - Rapid iteration without full model training |
|
|
|
|
|
The model uses standard CLIP training objectives with no architectural modifications. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on 13B image-text (multi-epoch) curated from the **DataComp-XL** dataset using DatologyAI's proprietary curation pipeline. The curation process selected high-quality, classification-relevant subsets from the 10B available pairs in DataComp-XL. |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
### Zero-shot Classification Performance |
|
|
|
|
|
| Benchmark | DatologyAI | SigLIP2 | MetaCLIP | |
|
|
|-----------|------------|---------|----------| |
|
|
| ImageNet1k | **76.91%** | 74.0% | 67.7% | |
|
|
| ImageNetv2 | **70.2%** | 67.1% | 60.4% | |
|
|
|
|
|
### Training Efficiency |
|
|
- Matches SigLIP2 performance with only **5B samples** (87.5% compute reduction) |
|
|
- Matches MetaCLIP performance with only **1B samples** (92% compute reduction) |
|
|
|
|
|
Full details see [blog post](). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** DatologyAI |
|
|
- **Model type:** CLIP (Contrastive Language-Image Pre-training) |
|
|
- **Architecture:** Vision Transformer B/32 |
|
|
- **License:** Apache 2.0 |
|
|
- **Training framework:** OpenCLIP 2.24.0 |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture |
|
|
- **Vision Encoder:** ViT-B/32 (86M parameters) |
|
|
- Patch size: 32×32 |
|
|
- Image size: 224×224 |
|
|
- Embedding dimension: 512 |
|
|
- **Text Encoder:** 12-layer Transformer |
|
|
- Context length: 77 tokens |
|
|
- Vocabulary size: 49,408 (BPE tokenizer) |
|
|
|
|
|
### Training Configuration |
|
|
- **Optimizer:** AdamW (β1=0.9, β2=0.98, ε=1e-6) |
|
|
- **Learning rate:** 5.0e-04 with cosine schedule |
|
|
- **Weight decay:** 0.1 |
|
|
- **Batch size:** 32,768 |
|
|
- **Training samples:** 13B image-text pairs |
|
|
- **Hardware:** Distributed training on H100 GPUs |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{datologyai2025clip, |
|
|
title={CLIP Gets a Data Upgrade: Outperforming SoTA with Improved Data Curation Only}, |
|
|
author={DatologyAI Team}, |
|
|
journal={DatologyAI Blog}, |
|
|
year={2025}, |
|
|
url={https://datologyai.com/blog/clip-data-upgrade} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Additional Information |
|
|
|
|
|
For more details on our data curation methodology and comprehensive benchmark results, please visit our [blog post](https://datologyai.com/blog/clip-data-upgrade). |
|
|
|
|
|
**Contact:** [team@datologyai.com](mailto:team@datologyai.com) |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
DatologyAI Team - [team@datologyai.com](mailto:team@datologyai.com) |