YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
multilingual-e5-large โ ONNX INT8
Quantized ONNX version of intfloat/multilingual-e5-large for CPU inference.
Model Details
- Base model: intfloat/multilingual-e5-large (560M params, XLM-RoBERTa based)
- Format: ONNX with dynamic INT8 quantization (AVX512 VNNI optimized)
- Embedding dimension: 1024
- Max sequence length: 512 tokens
- Languages: 100+ including Danish, English, German, French, etc.
Usage
Requires "passage: " prefix for documents and "query: " prefix for search queries (e5 model convention).
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import numpy as np
tokenizer = AutoTokenizer.from_pretrained("thomasbeste/multilingual-e5-large-onnx-int8")
model = ORTModelForFeatureExtraction.from_pretrained("thomasbeste/multilingual-e5-large-onnx-int8")
inputs = tokenizer("passage: Your text here", return_tensors="np", padding=True, truncation=True)
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(axis=1) # Mean pooling
embedding = embedding / np.linalg.norm(embedding) # L2 normalize
License
Same as base model: MIT
- Downloads last month
- 611
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support