metadata
base_model: intfloat/multilingual-e5-large-instruct
base_model_relation: quantized
library_name: transformers.js
pipeline_tag: feature-extraction
tags:
- transformers.js
- sentence-transformers
- onnx
- feature-extraction
- sentence-similarity
- mteb
- xlm-roberta
- e5
- multilingual
language:
- multilingual
license: mit
multilingual-e5-large-instruct (ONNX)
ONNX export of intfloat/multilingual-e5-large-instruct with fp16 and int8 quantized variants.
Compatible with both @huggingface/transformers (JavaScript) and
sentence-transformers (Python).
Available Models
| File | Format | Size | Description |
|---|---|---|---|
onnx/model.onnx + model.onnx_data |
fp32 | 2.1 GB | Full precision, external data format |
onnx/model_fp16.onnx |
fp16 | 1.0 GB | Half precision, negligible quality loss |
onnx/model_quantized.onnx |
int8 | 535 MB | Dynamic quantization, smallest size |
Usage with Transformers.js
import { pipeline } from "@huggingface/transformers";
const extractor = await pipeline(
"feature-extraction",
"lmo3/multilingual-e5-large-instruct",
{ dtype: "fp16" } // or "q8" for int8, omit for fp32
);
// Queries use the instruct format
const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?";
const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true });
// Documents are embedded as-is (no prefix)
const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true });
Usage with sentence-transformers (Python)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lmo3/multilingual-e5-large-instruct")
# Queries use the instruct format
queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"]
docs = ["It is sunny outside"]
query_embeddings = model.encode(queries)
doc_embeddings = model.encode(docs)
Key Differences from Base E5
This is the instruct variant of multilingual-e5-large. The key difference:
- Queries must be prefixed with
Instruct: <task description>\nQuery: - Documents are embedded as-is, with no prefix
The instruction tells the model what retrieval task you're performing, improving embedding quality. See the original model card for task-specific instructions and benchmark results.
Export Details
- Exported via Optimum with ONNX opset 18
- fp16 quantized via
onnxruntime.transformers.optimizer - int8 quantized via
onnxruntime.quantization.quantize_dynamic config.jsonpatched withtransformers.js_configfor automatic external data handling
Original Model
This is a conversion of intfloat/multilingual-e5-large-instruct:
- Architecture: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads)
- Embedding dimension: 1024
- Languages: 100+ languages
- License: MIT