BGE-M3 INT8 ONNX

INT8 quantized ONNX version of BGE-M3, optimized for faster inference and lower memory usage while preserving strong multilingual embedding performance.

Model Details

Base Model: BAAI/bge-m3
Format: ONNX
Quantization: INT8
Embedding Size: 1024
Max Sequence Length: 8192

This model was quantized from the original BGE-M3 model for improved deployment efficiency on CPU and edge environments.

Downloads last month: 9

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support