| tags: | |
| - transformers | |
| - xlm-roberta | |
| - eva02 | |
| - clip | |
| library_name: transformers | |
| license: cc-by-nc-4.0 | |
| # Jina CLIP | |
| Core implementation of Jina CLIP. The model uses: | |
| * the [EVA 02](https://github.com/baaivision/EVA/tree/master/EVA-CLIP/rei/eva_clip) architecture for the vision tower | |
| * the [Jina XLM RoBERTa with Flash Attention](https://huggingface.co/jinaai/xlm-roberta-flash-implementation) model as a text tower | |
| ## Models that use this implementation | |
| - [jinaai/jina-clip-v2](https://huggingface.co/jinaai/jina-clip-v2) | |
| - [jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1) | |
| ## Requirements | |
| To use the Jina CLIP source code, the following packages are required: | |
| * `torch` | |
| * `timm` | |
| * `transformers` | |
| * `einops` | |
| * `xformers` to use x-attention | |
| * `flash-attn` to use flash attention | |
| * `apex` to use fused layer normalization | |