YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ruclip-vit-large-patch14-224
RuCLIP (Russian Contrastive Language–Image Pretraining) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and multimodal learning.
Model was trained by Sber AI and SberDevices teams.
- Task:
text ranking;image ranking;zero-shot image classification; - Type:
encoder - Num Parameters:
430M - Training Data Volume:
240 million text-image pairs - Language:
Russian - Context Length:
77 - Transformer Layers:
12 - Transformer Width:
768 - Transformer Heads:
12 - Image Size:
224 - Vision Layers:
24 - Vision Width:
1024 - Vision Patch Size:
14
Usage Github
pip install ruclip
clip, processor = ruclip.load("ruclip-vit-large-patch14-224", device="cuda")
Performance
We have evaluated the performance on the following datasets:
| Dataset | Metric Name | Metric Result |
|---|---|---|
| Food101 | acc | 0.597 |
| CIFAR10 | acc | 0.878 |
| CIFAR100 | acc | 0.511 |
| Birdsnap | acc | 0.172 |
| SUN397 | acc | 0.484 |
| Stanford Cars | acc | 0.559 |
| DTD | acc | 0.370 |
| MNIST | acc | 0.337 |
| STL10 | acc | 0.934 |
| PCam | acc | 0.520 |
| CLEVR | acc | 0.152 |
| Rendered SST2 | acc | 0.529 |
| ImageNet | acc | 0.426 |
| FGVC Aircraft | mean-per-class | 0.046 |
| Oxford Pets | mean-per-class | 0.604 |
| Caltech101 | mean-per-class | 0.777 |
| Flowers102 | mean-per-class | 0.455 |
| HatefulMemes | roc-auc | 0.530 |
Authors
- Downloads last month
- 687
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support