| | --- |
| | language: sv |
| | --- |
| | |
| | <br /> |
| | <p align="center"> |
| | <h1 align="center">Swe-CLIP 500k</h1> |
| | |
| | <p align="center"> |
| | <a href="https://github.com/FreddeFrallan/Multilingual-CLIP/tree/main/Model%20Cards/Swe-CLIP%20500k">Github Model Card</a> |
| | </p> |
| | </p> |
| | |
| |
|
| | ## Usage |
| | To use this model along with the original CLIP vision encoder you need to download the code and additional linear weights from the [Multilingual-CLIP Github](https://github.com/FreddeFrallan/Multilingual-CLIP). |
| | Once this is done, you can load and use the model with the following code |
| | ```python |
| | from src import multilingual_clip |
| | |
| | model = multilingual_clip.load_model('Swe-CLIP-500k') |
| | embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta']) |
| | print(embeddings.shape) |
| | # Yields: torch.Size([2, 640]) |
| | ``` |
| |
|
| | <!-- ABOUT THE PROJECT --> |
| | ## About |
| | A [KB/Bert-Swedish-Cased](https://huggingface.co/KB/bert-base-swedish-cased) tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br> |
| |
|
| | Training data pairs was generated by sampling 500k sentences from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into Swedish. |
| | All translation was done using the [Huggingface Opus Model](https://huggingface.co/Helsinki-NLP/opus-mt-en-sv), which seemingly procudes higher quality translations than relying on the [AWS translate service](https://aws.amazon.com/translate/). |
| |
|