--- tags: - mrl - multilingual --- # vectorizer.banana This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index. Model name: `vectorizer.banana` ## Supported Languages Since this model is a distilled version of the [BGE-M3](https://huggingface.co/BAAI/bge-m3) model, it can theoritically handle 100+ languages. ## Scores We computed the differences in performance w.r.t the original [BGE-M3](https://huggingface.co/BAAI/bge-m3) on MS MARCO EN. Scores on famous benchmarks (BEIR, MIRACL, MTEB, etc.) can be found directly in the model card of BGE-M3 under line "Dense". We expect the performance to drop linearly with the same scale than the observed with MS MARCO EN for other datasets. | Model | Performance Relative to BGE-M3 | |:-----------------------------------------------|:------------------------------:| | vectorizer.banana (1024 dimensions) | 99.3% | | vectorizer.banana (768 dimensions) | 98.8% | | vectorizer.banana (512 dimensions) | 98% | | **vectorizer.banana (256 dimensions*)** | 95.7% | \* *The default dimension within Sinequa* ## Inference Times | GPU | Quantization type | Batch size 1 | Batch size 32 | |:------------------------------------------|:------------------|-----------------:|----------------:| | NVIDIA A10 | FP16 | 4.5 ms | 43 ms | | NVIDIA T4 | FP16 | 2.5 ms | 35 ms | ## GPU Memory Usage | Quantization type | Memory | |:-------------------------------------------------|------------:| | FP16 | 1450 MiB | Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU. ## Requirements - Minimal Sinequa version: 11.11.0.2306 - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use) ## Model Details ### Configuration Note that this model will be packaged with a default MRL cutoff of 256 dimensions . In order to use the 1024 dimensions or any other value the `mrl-cutoff` parameter needs to be set. ### Training This model used the [BGE-M3](https://huggingface.co/BAAI/bge-m3), a good and compact multilingual embedding model as a backbone for distillation. The original model size was 24 layers and then reduced to 5 layers. To obtain a low dimensional output space (256 compared to the original 1024), [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) was used at training time.