File size: 3,253 Bytes
2fb4c86 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
base_model: intfloat/multilingual-e5-large-instruct
base_model_relation: quantized
library_name: transformers.js
pipeline_tag: feature-extraction
tags:
- transformers.js
- sentence-transformers
- onnx
- feature-extraction
- sentence-similarity
- mteb
- xlm-roberta
- e5
- multilingual
language:
- multilingual
license: mit
---
# multilingual-e5-large-instruct (ONNX)
ONNX export of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
with fp16 and int8 quantized variants.
Compatible with both [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) (JavaScript) and
[`sentence-transformers`](https://www.sbert.net/) (Python).
## Available Models
| File | Format | Size | Description |
|------|--------|------|-------------|
| `onnx/model.onnx` + `model.onnx_data` | fp32 | 2.1 GB | Full precision, external data format |
| `onnx/model_fp16.onnx` | fp16 | 1.0 GB | Half precision, negligible quality loss |
| `onnx/model_quantized.onnx` | int8 | 535 MB | Dynamic quantization, smallest size |
## Usage with Transformers.js
```javascript
import { pipeline } from "@huggingface/transformers";
const extractor = await pipeline(
"feature-extraction",
"lmo3/multilingual-e5-large-instruct",
{ dtype: "fp16" } // or "q8" for int8, omit for fp32
);
// Queries use the instruct format
const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?";
const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true });
// Documents are embedded as-is (no prefix)
const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true });
```
## Usage with sentence-transformers (Python)
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lmo3/multilingual-e5-large-instruct")
# Queries use the instruct format
queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"]
docs = ["It is sunny outside"]
query_embeddings = model.encode(queries)
doc_embeddings = model.encode(docs)
```
## Key Differences from Base E5
This is the **instruct** variant of multilingual-e5-large. The key difference:
- **Queries** must be prefixed with `Instruct: <task description>\nQuery: `
- **Documents** are embedded as-is, with no prefix
The instruction tells the model what retrieval task you're performing, improving embedding quality.
See the [original model card](https://huggingface.co/intfloat/multilingual-e5-large-instruct) for task-specific instructions and benchmark results.
## Export Details
- Exported via [Optimum](https://huggingface.co/docs/optimum) with ONNX opset 18
- fp16 quantized via `onnxruntime.transformers.optimizer`
- int8 quantized via `onnxruntime.quantization.quantize_dynamic`
- `config.json` patched with `transformers.js_config` for automatic external data handling
## Original Model
This is a conversion of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct):
- **Architecture**: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads)
- **Embedding dimension**: 1024
- **Languages**: 100+ languages
- **License**: MIT
|