Upload 4 files
Browse files## Add BGE-M3 (BAAI/bge-m3)
**Model**: [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)
**Architecture**: XLM-RoBERTa (large) — 568M parameters
**Embedding dimensions**: 1024
**Max sequence length**: 8192 tokens
**Languages**: 100+
### Conversion
Converted using `optimum-cli export onnx --model BAAI/bge-m3 --task feature-extraction`.
Validated ONNX output against PyTorch: cosine similarity > 0.9999 across English,
French, and Chinese test sentences.
### Local testing
Tested with Typesense 29.0 via Docker:
- Collection creation
- Document indexing with auto-embedding
- Semantic search (English)
- Cross-lingual semantic search (French query → English results)
### Config
model_type: xlm_roberta
vocab_file_name: sentencepiece.bpe.model
- .gitattributes +1 -0
- bge-m3/config.json +4 -0
- bge-m3/model.onnx +3 -0
- bge-m3/model.onnx_data +3 -0
- bge-m3/sentencepiece.bpe.model +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 35 |
multilingual-e5-large/model.onnx_data filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 35 |
multilingual-e5-large/model.onnx_data filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
bge-m3/model.onnx_data filter=lfs diff=lfs merge=lfs -text
|
bge-m3/config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "xlm_roberta",
|
| 3 |
+
"vocab_file_name": "sentencepiece.bpe.model"
|
| 4 |
+
}
|
bge-m3/model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:15f047829962475807cda5391506557cb43287380bfb474466227a4e3faf3573
|
| 3 |
+
size 433457
|
bge-m3/model.onnx_data
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:79d982f5ee30bbbe3d48ca0ab4597144b69891f19523e34af2ee525fba0fbd06
|
| 3 |
+
size 2266886160
|
bge-m3/sentencepiece.bpe.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
|
| 3 |
+
size 5069051
|