|
|
--- |
|
|
base_model: intfloat/multilingual-e5-large-instruct |
|
|
base_model_relation: quantized |
|
|
library_name: transformers.js |
|
|
pipeline_tag: feature-extraction |
|
|
tags: |
|
|
- transformers.js |
|
|
- sentence-transformers |
|
|
- onnx |
|
|
- feature-extraction |
|
|
- sentence-similarity |
|
|
- mteb |
|
|
- xlm-roberta |
|
|
- e5 |
|
|
- multilingual |
|
|
language: |
|
|
- multilingual |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# multilingual-e5-large-instruct (ONNX) |
|
|
|
|
|
ONNX export of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) |
|
|
with fp16 and int8 quantized variants. |
|
|
|
|
|
Compatible with both [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) (JavaScript) and |
|
|
[`sentence-transformers`](https://www.sbert.net/) (Python). |
|
|
|
|
|
## Available Models |
|
|
|
|
|
| File | Format | Size | Description | |
|
|
|------|--------|------|-------------| |
|
|
| `onnx/model.onnx` + `model.onnx_data` | fp32 | 2.1 GB | Full precision, external data format | |
|
|
| `onnx/model_fp16.onnx` | fp16 | 1.0 GB | Half precision, negligible quality loss | |
|
|
| `onnx/model_quantized.onnx` | int8 | 535 MB | Dynamic quantization, smallest size | |
|
|
|
|
|
## Usage with Transformers.js |
|
|
|
|
|
```javascript |
|
|
import { pipeline } from "@huggingface/transformers"; |
|
|
|
|
|
const extractor = await pipeline( |
|
|
"feature-extraction", |
|
|
"lmo3/multilingual-e5-large-instruct", |
|
|
{ dtype: "fp16" } // or "q8" for int8, omit for fp32 |
|
|
); |
|
|
|
|
|
// Queries use the instruct format |
|
|
const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"; |
|
|
const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true }); |
|
|
|
|
|
// Documents are embedded as-is (no prefix) |
|
|
const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true }); |
|
|
``` |
|
|
|
|
|
## Usage with sentence-transformers (Python) |
|
|
|
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
|
|
model = SentenceTransformer("lmo3/multilingual-e5-large-instruct") |
|
|
|
|
|
# Queries use the instruct format |
|
|
queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"] |
|
|
docs = ["It is sunny outside"] |
|
|
|
|
|
query_embeddings = model.encode(queries) |
|
|
doc_embeddings = model.encode(docs) |
|
|
``` |
|
|
|
|
|
## Key Differences from Base E5 |
|
|
|
|
|
This is the **instruct** variant of multilingual-e5-large. The key difference: |
|
|
|
|
|
- **Queries** must be prefixed with `Instruct: <task description>\nQuery: ` |
|
|
- **Documents** are embedded as-is, with no prefix |
|
|
|
|
|
The instruction tells the model what retrieval task you're performing, improving embedding quality. |
|
|
See the [original model card](https://huggingface.co/intfloat/multilingual-e5-large-instruct) for task-specific instructions and benchmark results. |
|
|
|
|
|
## Export Details |
|
|
|
|
|
- Exported via [Optimum](https://huggingface.co/docs/optimum) with ONNX opset 18 |
|
|
- fp16 quantized via `onnxruntime.transformers.optimizer` |
|
|
- int8 quantized via `onnxruntime.quantization.quantize_dynamic` |
|
|
- `config.json` patched with `transformers.js_config` for automatic external data handling |
|
|
|
|
|
## Original Model |
|
|
|
|
|
This is a conversion of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct): |
|
|
|
|
|
- **Architecture**: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads) |
|
|
- **Embedding dimension**: 1024 |
|
|
- **Languages**: 100+ languages |
|
|
- **License**: MIT |
|
|
|