BGE-M3 ONNX (Full Multi-Vector)
ONNX conversion of BAAI/bge-m3 with full multi-vector support.
Converted using yuniko-software/bge-m3-onnx method.
Why This Conversion?
Most BGE-M3 ONNX conversions only output dense embeddings. This conversion preserves all three retrieval methods:
| Output | Use Case |
|---|---|
| Dense vectors (1024-dim) | Semantic similarity search |
| Sparse vectors | Lexical/keyword matching (hybrid search) |
| ColBERT vectors | Late interaction retrieval |
Multilingual Support
100+ languages including English, Chinese, Japanese, Korean, German, French, Spanish, Arabic, Hindi, and many more.
Model Info
| Property | Value |
|---|---|
| Embedding Dimension | 1024 |
| Max Sequence Length | 8192 |
| Languages | 100+ |
| Model Size | ~2.1 GB |
Files
| File | Description |
|---|---|
bge_m3_model.onnx |
Main model graph |
bge_m3_model.onnx_data |
External weights |
bge_m3_tokenizer.onnx |
ONNX tokenizer |
tokenizer.json |
HuggingFace tokenizer |
Usage
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("maxiboch/bge-m3-onnx")
session = ort.InferenceSession("bge_m3_model.onnx",
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
inputs = tokenizer("Your text here", return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, dict(inputs))
dense_embeddings = outputs[0] # Semantic embeddings
sparse_weights = outputs[1] # For hybrid search
colbert_vecs = outputs[2] # For late interaction
Credits
- Original model: BAAI/bge-m3
- Conversion method: yuniko-software/bge-m3-onnx
- Converted by: @maxiboch
- License: MIT
Model tree for maxiboch/bge-m3-onnx
Base model
BAAI/bge-m3