---
tags:
- mrl
- multilingual
---
# vectorizer.banana

This model is a vectorizer developed by Sinequa.
It produces an embedding vector given a passage or a query.
The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index.

Model name: `vectorizer.banana`

## Supported Languages

Since this model is a distilled version of the [BGE-M3](https://huggingface.co/BAAI/bge-m3) model, it can theoritically handle 100+ languages.

## Scores

We computed the differences in performance w.r.t the original [BGE-M3](https://huggingface.co/BAAI/bge-m3) on MS MARCO EN. Scores on famous benchmarks (BEIR, MIRACL, MTEB, etc.) can be found directly in the model card of BGE-M3 under line "Dense". We expect the performance to drop linearly with the same scale than the observed with MS MARCO EN for other datasets.

| Model                                          | Performance Relative to BGE-M3 |
|:-----------------------------------------------|:------------------------------:|
| vectorizer.banana (1024 dimensions)            | 99.3%                          |
| vectorizer.banana (768 dimensions)             | 98.8%                          |
| vectorizer.banana (512 dimensions)             | 98%                            |
| **vectorizer.banana (256 dimensions*)**        | 95.7%                          |

\* *The default dimension within Sinequa*

## Inference Times

| GPU                                       | Quantization type |  Batch size 1    |  Batch size 32  |
|:------------------------------------------|:------------------|-----------------:|----------------:|
| NVIDIA A10                                | FP16              |           4.5 ms |           43 ms |
| NVIDIA T4                                 | FP16              |           2.5 ms |           35 ms |

## GPU Memory Usage

| Quantization type                                |   Memory    |
|:-------------------------------------------------|------------:|
| FP16                                             |    1450 MiB |

Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.

## Requirements
- Minimal Sinequa version: 11.11.0.2306
- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)


## Model Details

### Configuration

Note that this model will be packaged with a default MRL cutoff of 256 dimensions . In order to use the 1024 dimensions or any other value the `mrl-cutoff` parameter needs to be set.

### Training

This model used the [BGE-M3](https://huggingface.co/BAAI/bge-m3), a good and compact multilingual embedding model as a backbone for distillation.

The original model size was 24 layers and then reduced to 5 layers.
To obtain a low dimensional output space (256 compared to the original 1024), [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) was used at training time.