vectorizer.banana

This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index.

Model name: vectorizer.banana

Supported Languages

Since this model is a distilled version of the BGE-M3 model, it can theoritically handle 100+ languages.

Scores

We computed the differences in performance w.r.t the original BGE-M3 on MS MARCO EN. Scores on famous benchmarks (BEIR, MIRACL, MTEB, etc.) can be found directly in the model card of BGE-M3 under line "Dense". We expect the performance to drop linearly with the same scale than the observed with MS MARCO EN for other datasets.

Model	Performance Relative to BGE-M3
vectorizer.banana (1024 dimensions)	99.3%
vectorizer.banana (768 dimensions)	98.8%
vectorizer.banana (512 dimensions)	98%
vectorizer.banana (256 dimensions)*	95.7%

* The default dimension within Sinequa

Inference Times

GPU	Quantization type	Batch size 1	Batch size 32
NVIDIA A10	FP16	4.5 ms	43 ms
NVIDIA T4	FP16	2.5 ms	35 ms

GPU Memory Usage

Quantization type	Memory
FP16	1450 MiB

Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.

Requirements

Minimal Sinequa version: 11.11.0.2306
Cuda compute capability: above 5.0 (above 6.0 for FP16 use)

Model Details

Configuration

Note that this model will be packaged with a default MRL cutoff of 256 dimensions . In order to use the 1024 dimensions or any other value the mrl-cutoff parameter needs to be set.

Training

This model used the BGE-M3, a good and compact multilingual embedding model as a backbone for distillation.

The original model size was 24 layers and then reduced to 5 layers. To obtain a low dimensional output space (256 compared to the original 1024), Matryoshka Representation Learning was used at training time.

Downloads last month: 14

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including sinequa/vectorizer.banana

Best Neural Search models for multilingual content🌍

Collection