File size: 3,253 Bytes
2fb4c86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
base_model: intfloat/multilingual-e5-large-instruct
base_model_relation: quantized
library_name: transformers.js
pipeline_tag: feature-extraction
tags:
  - transformers.js
  - sentence-transformers
  - onnx
  - feature-extraction
  - sentence-similarity
  - mteb
  - xlm-roberta
  - e5
  - multilingual
language:
  - multilingual
license: mit
---

# multilingual-e5-large-instruct (ONNX)

ONNX export of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
with fp16 and int8 quantized variants.

Compatible with both [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) (JavaScript) and
[`sentence-transformers`](https://www.sbert.net/) (Python).

## Available Models

| File | Format | Size | Description |
|------|--------|------|-------------|
| `onnx/model.onnx` + `model.onnx_data` | fp32 | 2.1 GB | Full precision, external data format |
| `onnx/model_fp16.onnx` | fp16 | 1.0 GB | Half precision, negligible quality loss |
| `onnx/model_quantized.onnx` | int8 | 535 MB | Dynamic quantization, smallest size |

## Usage with Transformers.js

```javascript
import { pipeline } from "@huggingface/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "lmo3/multilingual-e5-large-instruct",
  { dtype: "fp16" } // or "q8" for int8, omit for fp32
);

// Queries use the instruct format
const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?";
const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true });

// Documents are embedded as-is (no prefix)
const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true });
```

## Usage with sentence-transformers (Python)

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("lmo3/multilingual-e5-large-instruct")

# Queries use the instruct format
queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"]
docs = ["It is sunny outside"]

query_embeddings = model.encode(queries)
doc_embeddings = model.encode(docs)
```

## Key Differences from Base E5

This is the **instruct** variant of multilingual-e5-large. The key difference:

- **Queries** must be prefixed with `Instruct: <task description>\nQuery: `
- **Documents** are embedded as-is, with no prefix

The instruction tells the model what retrieval task you're performing, improving embedding quality.
See the [original model card](https://huggingface.co/intfloat/multilingual-e5-large-instruct) for task-specific instructions and benchmark results.

## Export Details

- Exported via [Optimum](https://huggingface.co/docs/optimum) with ONNX opset 18
- fp16 quantized via `onnxruntime.transformers.optimizer`
- int8 quantized via `onnxruntime.quantization.quantize_dynamic`
- `config.json` patched with `transformers.js_config` for automatic external data handling

## Original Model

This is a conversion of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct):

- **Architecture**: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads)
- **Embedding dimension**: 1024
- **Languages**: 100+ languages
- **License**: MIT