--- base_model: intfloat/multilingual-e5-large-instruct base_model_relation: quantized library_name: transformers.js pipeline_tag: feature-extraction tags: - transformers.js - sentence-transformers - onnx - feature-extraction - sentence-similarity - mteb - xlm-roberta - e5 - multilingual language: - multilingual license: mit --- # multilingual-e5-large-instruct (ONNX) ONNX export of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) with fp16 and int8 quantized variants. Compatible with both [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) (JavaScript) and [`sentence-transformers`](https://www.sbert.net/) (Python). ## Available Models | File | Format | Size | Description | |------|--------|------|-------------| | `onnx/model.onnx` + `model.onnx_data` | fp32 | 2.1 GB | Full precision, external data format | | `onnx/model_fp16.onnx` | fp16 | 1.0 GB | Half precision, negligible quality loss | | `onnx/model_quantized.onnx` | int8 | 535 MB | Dynamic quantization, smallest size | ## Usage with Transformers.js ```javascript import { pipeline } from "@huggingface/transformers"; const extractor = await pipeline( "feature-extraction", "lmo3/multilingual-e5-large-instruct", { dtype: "fp16" } // or "q8" for int8, omit for fp32 ); // Queries use the instruct format const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"; const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true }); // Documents are embedded as-is (no prefix) const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true }); ``` ## Usage with sentence-transformers (Python) ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("lmo3/multilingual-e5-large-instruct") # Queries use the instruct format queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"] docs = ["It is sunny outside"] query_embeddings = model.encode(queries) doc_embeddings = model.encode(docs) ``` ## Key Differences from Base E5 This is the **instruct** variant of multilingual-e5-large. The key difference: - **Queries** must be prefixed with `Instruct: \nQuery: ` - **Documents** are embedded as-is, with no prefix The instruction tells the model what retrieval task you're performing, improving embedding quality. See the [original model card](https://huggingface.co/intfloat/multilingual-e5-large-instruct) for task-specific instructions and benchmark results. ## Export Details - Exported via [Optimum](https://huggingface.co/docs/optimum) with ONNX opset 18 - fp16 quantized via `onnxruntime.transformers.optimizer` - int8 quantized via `onnxruntime.quantization.quantize_dynamic` - `config.json` patched with `transformers.js_config` for automatic external data handling ## Original Model This is a conversion of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct): - **Architecture**: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads) - **Embedding dimension**: 1024 - **Languages**: 100+ languages - **License**: MIT