sinequa
/

vectorizer.banana

Model card Files Files and versions

vectorizer.banana / README.md

loic-dagnas-sinequa's picture

loic-dagnas-sinequa

Update README.md

3cf5ddb verified 5 months ago

|

history blame contribute delete

3.17 kB

	---
	tags:
	- mrl
	- multilingual
	---
	# vectorizer.banana

	This model is a vectorizer developed by Sinequa.
	It produces an embedding vector given a passage or a query.
	The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index.

	Model name: `vectorizer.banana`

	## Supported Languages

	Since this model is a distilled version of the [BGE-M3](https://huggingface.co/BAAI/bge-m3) model, it can theoritically handle 100+ languages.

	## Scores

	We computed the differences in performance w.r.t the original [BGE-M3](https://huggingface.co/BAAI/bge-m3) on MS MARCO EN. Scores on famous benchmarks (BEIR, MIRACL, MTEB, etc.) can be found directly in the model card of BGE-M3 under line "Dense". We expect the performance to drop linearly with the same scale than the observed with MS MARCO EN for other datasets.

	\| Model \| Performance Relative to BGE-M3 \|
	\|:-----------------------------------------------\|:------------------------------:\|
	\| vectorizer.banana (1024 dimensions) \| 99.3% \|
	\| vectorizer.banana (768 dimensions) \| 98.8% \|
	\| vectorizer.banana (512 dimensions) \| 98% \|
	\| *vectorizer.banana (256 dimensions)** \| 95.7% \|

	\* The default dimension within Sinequa

	## Inference Times

	\| GPU \| Quantization type \| Batch size 1 \| Batch size 32 \|
	\|:------------------------------------------\|:------------------\|-----------------:\|----------------:\|
	\| NVIDIA A10 \| FP16 \| 4.5 ms \| 43 ms \|
	\| NVIDIA T4 \| FP16 \| 2.5 ms \| 35 ms \|

	## GPU Memory Usage

	\| Quantization type \| Memory \|
	\|:-------------------------------------------------\|------------:\|
	\| FP16 \| 1450 MiB \|

	Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.

	## Requirements
	- Minimal Sinequa version: 11.11.0.2306
	- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)


	## Model Details

	### Configuration

	Note that this model will be packaged with a default MRL cutoff of 256 dimensions . In order to use the 1024 dimensions or any other value the `mrl-cutoff` parameter needs to be set.

	### Training

	This model used the [BGE-M3](https://huggingface.co/BAAI/bge-m3), a good and compact multilingual embedding model as a backbone for distillation.

	The original model size was 24 layers and then reduced to 5 layers.
	To obtain a low dimensional output space (256 compared to the original 1024), [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) was used at training time.