bge-m3 (ONNX, dynamic axes)

Self-converted ONNX export of BAAI/bge-m3, hosted by Newtech Studio for use with text-embeddings-inference (TEI).

Why this exists

BAAI's upstream bge-m3 repo includes ONNX files, but their export bakes a static batch dimension into the graph. TEI's ORT backend then logs:

WARN: Backend does not support a batch size > 8
WARN: forcing `max_batch_requests=8`

…silently capping every server we ran at batch=8 regardless of what we configured via --max-client-batch-size. Under heavy indexing load this caused TEI to chop a single client batch of 64+ chunks into 8+ internal sub-batches, throttling throughput well below what the hardware could deliver.

This export uses optimum-cli's default dynamic batch + sequence axes, which lets ORT honor whatever batch size TEI's CLI flags allow. On a CPU-only TEI deployment with --max-client-batch-size=128 the bulk lane goes from 1/16 effective utilization to full single-batch throughput.

Same weights as upstream — the export only changes the graph's shape declarations and the file format, not the math.

Precision

fp32 (~2.3 GB external data). We tried fp16 (1.1 GB) but TEI's ORT backend on CPU explicitly rejects float16:

ERROR: Could not start ORT backend: Dtype float16 is not supported
for `ort`, only float32.

If you're running TEI on a CUDA / TensorRT backend, an fp16 build would work and halve the disk + memory footprint; on CPU stick with fp32.

Reproduction

pip install -U "optimum[exporters,onnxruntime]" transformers onnx
optimum-cli export onnx \
  --model BAAI/bge-m3 \
  --task feature-extraction \
  --opset 17 \
  ./out

Output: out/model.onnx (graph) + out/model.onnx_data (weights, external) plus config.json, tokenizer.json, tokenizer_config.json, special_tokens_map.json. Layout matches what TEI expects.

TEI usage

command:
  - --model-id=newtechstudio/bge-m3-onnx
  - --max-client-batch-size=128       # honored, no longer capped
  - --max-batch-tokens=16384

License

Inherits the upstream license (MIT).

Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for newtechstudio/bge-m3-onnx

Base model

BAAI/bge-m3
Quantized
(266)
this model