vectorizer.banana / README.md
loic-dagnas-sinequa's picture
Update README.md
3cf5ddb verified
---
tags:
- mrl
- multilingual
---
# vectorizer.banana
This model is a vectorizer developed by Sinequa.
It produces an embedding vector given a passage or a query.
The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index.
Model name: `vectorizer.banana`
## Supported Languages
Since this model is a distilled version of the [BGE-M3](https://huggingface.co/BAAI/bge-m3) model, it can theoritically handle 100+ languages.
## Scores
We computed the differences in performance w.r.t the original [BGE-M3](https://huggingface.co/BAAI/bge-m3) on MS MARCO EN. Scores on famous benchmarks (BEIR, MIRACL, MTEB, etc.) can be found directly in the model card of BGE-M3 under line "Dense". We expect the performance to drop linearly with the same scale than the observed with MS MARCO EN for other datasets.
| Model | Performance Relative to BGE-M3 |
|:-----------------------------------------------|:------------------------------:|
| vectorizer.banana (1024 dimensions) | 99.3% |
| vectorizer.banana (768 dimensions) | 98.8% |
| vectorizer.banana (512 dimensions) | 98% |
| **vectorizer.banana (256 dimensions*)** | 95.7% |
\* *The default dimension within Sinequa*
## Inference Times
| GPU | Quantization type | Batch size 1 | Batch size 32 |
|:------------------------------------------|:------------------|-----------------:|----------------:|
| NVIDIA A10 | FP16 | 4.5 ms | 43 ms |
| NVIDIA T4 | FP16 | 2.5 ms | 35 ms |
## GPU Memory Usage
| Quantization type | Memory |
|:-------------------------------------------------|------------:|
| FP16 | 1450 MiB |
Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.
## Requirements
- Minimal Sinequa version: 11.11.0.2306
- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
## Model Details
### Configuration
Note that this model will be packaged with a default MRL cutoff of 256 dimensions . In order to use the 1024 dimensions or any other value the `mrl-cutoff` parameter needs to be set.
### Training
This model used the [BGE-M3](https://huggingface.co/BAAI/bge-m3), a good and compact multilingual embedding model as a backbone for distillation.
The original model size was 24 layers and then reduced to 5 layers.
To obtain a low dimensional output space (256 compared to the original 1024), [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) was used at training time.