BGE-M3 ONNX (Full Multi-Vector)

ONNX conversion of BAAI/bge-m3 with full multi-vector support.

Converted using yuniko-software/bge-m3-onnx method.

Why This Conversion?

Most BGE-M3 ONNX conversions only output dense embeddings. This conversion preserves all three retrieval methods:

Output Use Case
Dense vectors (1024-dim) Semantic similarity search
Sparse vectors Lexical/keyword matching (hybrid search)
ColBERT vectors Late interaction retrieval

Multilingual Support

100+ languages including English, Chinese, Japanese, Korean, German, French, Spanish, Arabic, Hindi, and many more.

Model Info

Property Value
Embedding Dimension 1024
Max Sequence Length 8192
Languages 100+
Model Size ~2.1 GB

Files

File Description
bge_m3_model.onnx Main model graph
bge_m3_model.onnx_data External weights
bge_m3_tokenizer.onnx ONNX tokenizer
tokenizer.json HuggingFace tokenizer

Usage

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("maxiboch/bge-m3-onnx")
session = ort.InferenceSession("bge_m3_model.onnx", 
    providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

inputs = tokenizer("Your text here", return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, dict(inputs))

dense_embeddings = outputs[0]   # Semantic embeddings
sparse_weights = outputs[1]     # For hybrid search
colbert_vecs = outputs[2]       # For late interaction

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maxiboch/bge-m3-onnx

Base model

BAAI/bge-m3
Quantized
(71)
this model