bge-m3 β OpenVINO IR (FP16)
This is a redistribution. For the model's intended use, training data, evaluation results, limitations, and citation, please see the upstream card: BAAI/bge-m3.
OpenVINO IR conversion of BAAI/bge-m3,
weights compressed to FP16. Drop-in for optimum-intel and
OpenArc.
This export covers the dense embedding head only (the multi-vector and sparse heads from the original bge-m3 release are not included).
Files
openvino_model.{xml,bin}β XLM-RoBERTa encoder, FP16 weights (~1.05 GB)openvino_tokenizer.{xml,bin}/openvino_detokenizer.{xml,bin}β OpenVINO Tokenizers IR1_Pooling/config.jsonβ sentence-transformers pooling metadata (pooling_mode_cls_token: true)- Standard HF tokenizer files:
tokenizer.json,tokenizer_config.json,special_tokens_map.json,sentencepiece.bpe.model LICENSEβ MIT, inherited from the upstream model.
Architecture
| Base model | XLM-RoBERTa large |
| Hidden size | 1024 |
| Layers | 24 |
| Max sequence length | 8194 |
| Vocabulary | 250 002 |
| Pool | CLS (declared via 1_Pooling/config.json) |
Usage with optimum-intel
from optimum.intel import OVModelForFeatureExtraction
from transformers import AutoTokenizer
import torch.nn.functional as F
model = OVModelForFeatureExtraction.from_pretrained("kread/bge-m3-fp16-ov", device="GPU")
tok = AutoTokenizer.from_pretrained("kread/bge-m3-fp16-ov")
inputs = tok("What is the capital of France?", return_tensors="pt",
padding=True, truncation=True, max_length=512)
embedding = F.normalize(model(**inputs).last_hidden_state[:, 0], p=2, dim=1) # CLS pool
Usage with OpenArc
Requires OpenArc with metadata-driven pooling dispatch (see
feat(embed): dispatch pooling mode from sentence-transformers config).
The shipped 1_Pooling/config.json triggers automatic CLS pooling on load.
openarc add bge-m3 \
--model-path /path/to/bge-m3-fp16-ov \
--model-type emb \
--engine optimum \
--device GPU
openarc serve
# POST /v1/embeddings {"model": "bge-m3", "input": "..."}
Conversion
Weight compression baked into the initial save (avoids the bus-error trap caused by overwriting an mmapped IR file mid-process):
from optimum.intel import OVModelForFeatureExtraction
import openvino as ov
m = OVModelForFeatureExtraction.from_pretrained("BAAI/bge-m3", export=True)
ov.save_model(m.model, "openvino_model.xml", compress_to_fp16=True)
m.save_pretrained("./out") # tokenizer + sentence-transformers metadata
openvino_tokenizer.{xml,bin} and openvino_detokenizer.{xml,bin} were
generated via openvino_tokenizers.convert_tokenizer(..., with_detokenizer=True).
Numerical equivalence
Single-query smoke test against the source PyTorch FP32 weights:
| Metric | Value |
|---|---|
| Embedding dim | 1024 |
cos(ov_fp16, pt_fp32) |
> 0.999 |
| Output unit-normed | β |
License
MIT, inherited from BAAI/bge-m3.
See LICENSE in this repo.
Citation
From the upstream model card:
@misc{bge-m3,
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
year={2024},
eprint={2402.03216},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 37
Model tree for kread/bge-m3-fp16-ov
Base model
BAAI/bge-m3