YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

multilingual-e5-large โ€” ONNX INT8

Quantized ONNX version of intfloat/multilingual-e5-large for CPU inference.

Model Details

  • Base model: intfloat/multilingual-e5-large (560M params, XLM-RoBERTa based)
  • Format: ONNX with dynamic INT8 quantization (AVX512 VNNI optimized)
  • Embedding dimension: 1024
  • Max sequence length: 512 tokens
  • Languages: 100+ including Danish, English, German, French, etc.

Usage

Requires "passage: " prefix for documents and "query: " prefix for search queries (e5 model convention).

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("thomasbeste/multilingual-e5-large-onnx-int8")
model = ORTModelForFeatureExtraction.from_pretrained("thomasbeste/multilingual-e5-large-onnx-int8")

inputs = tokenizer("passage: Your text here", return_tensors="np", padding=True, truncation=True)
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(axis=1)  # Mean pooling
embedding = embedding / np.linalg.norm(embedding)    # L2 normalize

License

Same as base model: MIT

Downloads last month
611
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support