coreprinciple commited on
Commit
cccecfb
·
verified ·
1 Parent(s): 30e1d21

Upload 4 files

Browse files

## Add BGE-M3 (BAAI/bge-m3)

**Model**: [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)
**Architecture**: XLM-RoBERTa (large) — 568M parameters
**Embedding dimensions**: 1024
**Max sequence length**: 8192 tokens
**Languages**: 100+

### Conversion

Converted using `optimum-cli export onnx --model BAAI/bge-m3 --task feature-extraction`.
Validated ONNX output against PyTorch: cosine similarity > 0.9999 across English,
French, and Chinese test sentences.

### Local testing

Tested with Typesense 29.0 via Docker:
- Collection creation
- Document indexing with auto-embedding
- Semantic search (English)
- Cross-lingual semantic search (French query → English results)

### Config

model_type: xlm_roberta
vocab_file_name: sentencepiece.bpe.model

.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
  multilingual-e5-large/model.onnx_data filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
  multilingual-e5-large/model.onnx_data filter=lfs diff=lfs merge=lfs -text
36
+ bge-m3/model.onnx_data filter=lfs diff=lfs merge=lfs -text
bge-m3/config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "model_type": "xlm_roberta",
3
+ "vocab_file_name": "sentencepiece.bpe.model"
4
+ }
bge-m3/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15f047829962475807cda5391506557cb43287380bfb474466227a4e3faf3573
3
+ size 433457
bge-m3/model.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79d982f5ee30bbbe3d48ca0ab4597144b69891f19523e34af2ee525fba0fbd06
3
+ size 2266886160
bge-m3/sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051