lmo3's picture
Upload folder using huggingface_hub
2fb4c86 verified
metadata
base_model: intfloat/multilingual-e5-large-instruct
base_model_relation: quantized
library_name: transformers.js
pipeline_tag: feature-extraction
tags:
  - transformers.js
  - sentence-transformers
  - onnx
  - feature-extraction
  - sentence-similarity
  - mteb
  - xlm-roberta
  - e5
  - multilingual
language:
  - multilingual
license: mit

multilingual-e5-large-instruct (ONNX)

ONNX export of intfloat/multilingual-e5-large-instruct with fp16 and int8 quantized variants.

Compatible with both @huggingface/transformers (JavaScript) and sentence-transformers (Python).

Available Models

File Format Size Description
onnx/model.onnx + model.onnx_data fp32 2.1 GB Full precision, external data format
onnx/model_fp16.onnx fp16 1.0 GB Half precision, negligible quality loss
onnx/model_quantized.onnx int8 535 MB Dynamic quantization, smallest size

Usage with Transformers.js

import { pipeline } from "@huggingface/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "lmo3/multilingual-e5-large-instruct",
  { dtype: "fp16" } // or "q8" for int8, omit for fp32
);

// Queries use the instruct format
const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?";
const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true });

// Documents are embedded as-is (no prefix)
const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true });

Usage with sentence-transformers (Python)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("lmo3/multilingual-e5-large-instruct")

# Queries use the instruct format
queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"]
docs = ["It is sunny outside"]

query_embeddings = model.encode(queries)
doc_embeddings = model.encode(docs)

Key Differences from Base E5

This is the instruct variant of multilingual-e5-large. The key difference:

  • Queries must be prefixed with Instruct: <task description>\nQuery:
  • Documents are embedded as-is, with no prefix

The instruction tells the model what retrieval task you're performing, improving embedding quality. See the original model card for task-specific instructions and benchmark results.

Export Details

  • Exported via Optimum with ONNX opset 18
  • fp16 quantized via onnxruntime.transformers.optimizer
  • int8 quantized via onnxruntime.quantization.quantize_dynamic
  • config.json patched with transformers.js_config for automatic external data handling

Original Model

This is a conversion of intfloat/multilingual-e5-large-instruct:

  • Architecture: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads)
  • Embedding dimension: 1024
  • Languages: 100+ languages
  • License: MIT