BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
Paper
• 2402.03216 • Published
• 6
This is the BAAI/bge-m3 model converted to MLX format with 8-bit quantization for Apple Silicon.
BGE-M3 is a versatile embedding model capable of:
This 8-bit quantized version offers the best quality among quantized variants.
| Property | Value |
|---|---|
| Architecture | XLM-RoBERTa |
| Precision | 8-bit (affine quantization) |
| Embedding Dimension | 1024 |
| Max Sequence Length | 8192 |
| Model Size | ~592 MB |
| Quantization Group Size | 64 |
| Languages | 100+ languages |
| Version | Size | Compression |
|---|---|---|
| FP16 | 1.1 GB | - |
| 8-bit | 592 MB | 46% |
| 6-bit | 457 MB | 58% |
| 4-bit | 321 MB | 71% |
from mlx_embeddings.utils import load_model, load_tokenizer
import mlx.core as mx
model_path = "mlx-community/bge-m3-mlx-8bit"
# Load model and tokenizer
model = load_model(model_path)
tokenizer = load_tokenizer(model_path)
# Generate embeddings
text = "Hello, world!"
tokens = tokenizer.encode(text)
input_ids = mx.array([tokens])
output = model(input_ids)
embedding = output.last_hidden_state.mean(axis=1) # Mean pooling
print(f"Embedding shape: {embedding.shape}") # (1, 1024)
curl http://127.0.0.1:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "bge-m3-mlx-8bit", "input": "Your text here"}'
MIT license (inherited from BAAI/bge-m3)
@article{bge_m3,
title={BGE M3-Embedding: Accurate, Efficient and Versatile Text Embedding},
author={Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Zhang, Zheng},
journal={arXiv preprint arXiv:2402.03216},
year={2024}
}
This is an unofficial MLX conversion of the BAAI/bge-m3 model. For the original model, see BAAI/bge-m3.
Quantized
Base model
BAAI/bge-m3