vectorizer.banana
This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index.
Model name: vectorizer.banana
Supported Languages
Since this model is a distilled version of the BGE-M3 model, it can theoritically handle 100+ languages.
Scores
We computed the differences in performance w.r.t the original BGE-M3 on MS MARCO EN. Scores on famous benchmarks (BEIR, MIRACL, MTEB, etc.) can be found directly in the model card of BGE-M3 under line "Dense". We expect the performance to drop linearly with the same scale than the observed with MS MARCO EN for other datasets.
| Model | Performance Relative to BGE-M3 |
|---|---|
| vectorizer.banana (1024 dimensions) | 99.3% |
| vectorizer.banana (768 dimensions) | 98.8% |
| vectorizer.banana (512 dimensions) | 98% |
| vectorizer.banana (256 dimensions)* | 95.7% |
* The default dimension within Sinequa
Inference Times
| GPU | Quantization type | Batch size 1 | Batch size 32 |
|---|---|---|---|
| NVIDIA A10 | FP16 | 4.5 ms | 43 ms |
| NVIDIA T4 | FP16 | 2.5 ms | 35 ms |
GPU Memory Usage
| Quantization type | Memory |
|---|---|
| FP16 | 1450 MiB |
Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.
Requirements
- Minimal Sinequa version: 11.11.0.2306
- Cuda compute capability: above 5.0 (above 6.0 for FP16 use)
Model Details
Configuration
Note that this model will be packaged with a default MRL cutoff of 256 dimensions . In order to use the 1024 dimensions or any other value the mrl-cutoff parameter needs to be set.
Training
This model used the BGE-M3, a good and compact multilingual embedding model as a backbone for distillation.
The original model size was 24 layers and then reduced to 5 layers. To obtain a low dimensional output space (256 compared to the original 1024), Matryoshka Representation Learning was used at training time.
- Downloads last month
- 14