Instructions to use newtechstudio/bge-m3-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use newtechstudio/bge-m3-onnx with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("newtechstudio/bge-m3-onnx") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
bge-m3 (ONNX, dynamic axes)
Self-converted ONNX export of BAAI/bge-m3, hosted by Newtech Studio for use with text-embeddings-inference (TEI).
Why this exists
BAAI's upstream bge-m3 repo includes ONNX files, but their export bakes a
static batch dimension into the graph. TEI's ORT backend then logs:
WARN: Backend does not support a batch size > 8
WARN: forcing `max_batch_requests=8`
…silently capping every server we ran at batch=8 regardless of what we
configured via --max-client-batch-size. Under heavy indexing load this
caused TEI to chop a single client batch of 64+ chunks into 8+ internal
sub-batches, throttling throughput well below what the hardware could deliver.
This export uses optimum-cli's default dynamic batch + sequence axes,
which lets ORT honor whatever batch size TEI's CLI flags allow. On a
CPU-only TEI deployment with --max-client-batch-size=128 the bulk lane
goes from 1/16 effective utilization to full single-batch throughput.
Same weights as upstream — the export only changes the graph's shape declarations and the file format, not the math.
Precision
fp32 (~2.3 GB external data). We tried fp16 (1.1 GB) but TEI's ORT backend on CPU explicitly rejects float16:
ERROR: Could not start ORT backend: Dtype float16 is not supported
for `ort`, only float32.
If you're running TEI on a CUDA / TensorRT backend, an fp16 build would work and halve the disk + memory footprint; on CPU stick with fp32.
Reproduction
pip install -U "optimum[exporters,onnxruntime]" transformers onnx
optimum-cli export onnx \
--model BAAI/bge-m3 \
--task feature-extraction \
--opset 17 \
./out
Output: out/model.onnx (graph) + out/model.onnx_data (weights, external)
plus config.json, tokenizer.json, tokenizer_config.json,
special_tokens_map.json. Layout matches what TEI expects.
TEI usage
command:
- --model-id=newtechstudio/bge-m3-onnx
- --max-client-batch-size=128 # honored, no longer capped
- --max-batch-tokens=16384
License
Inherits the upstream license (MIT).
- Downloads last month
- 34
Model tree for newtechstudio/bge-m3-onnx
Base model
BAAI/bge-m3