Multilingual E5 Text Embeddings: A Technical Report
Paper
•
2402.05672
•
Published
•
22
This is a CoreML conversion of intfloat/multilingual-e5-small for iOS/macOS deployment.
Multilingual E5 Small is a multilingual sentence embedding model optimized for semantic search and retrieval tasks. This CoreML version enables on-device inference on Apple platforms.
| Property | Value |
|---|---|
| Base Model | intfloat/multilingual-e5-small |
| Embedding Dimensions | 384 |
| Max Sequence Length | 256 (configurable up to 512) |
| Model Size | ~224 MB |
| Precision | Float16 |
| Minimum iOS | 17.0 |
| Minimum macOS | 14.0 |
E5 models use prefixes to distinguish between queries and documents:
"query: your search query here""passage: your document text here"import CoreML
// Load model
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
let model = try MLModel(contentsOf: modelURL, configuration: config)
// Prepare inputs (after tokenization)
let inputIds: MLMultiArray = // tokenized input
let attentionMask: MLMultiArray = // attention mask
// Run inference
let input = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputIds),
"attention_mask": MLFeatureValue(multiArray: attentionMask)
])
let output = try model.prediction(from: input)
let embeddings = output.featureValue(for: "embeddings")?.multiArrayValue
Use the included tokenizer.json with swift-transformers:
import Tokenizers
let tokenizer = try await AutoTokenizer.from(modelFolder: tokenizerURL)
let encoded = tokenizer.encode(text: "query: your text")
Tested on iOS with Neural Engine:
| Device | Inference Time |
|---|---|
| iPhone 15 Pro | ~15ms |
| iPhone 13 | ~25ms |
| M1 Mac | ~10ms |
Tested with 10 mixed Japanese/English technical queries:
| Model | Accuracy | Avg Score |
|---|---|---|
| Apple NLEmbedding | 20% | 0.558 |
| This Model (E5) | 100% | 0.860 |
MultilingualE5Small.mlpackage/ - CoreML model packagetokenizer.json - Tokenizer vocabulary and configurationtokenizer_config.json - Tokenizer settingsConverted using coremltools with FP16 precision:
import coremltools as ct
mlmodel = ct.convert(
traced_model,
inputs=[
ct.TensorType(name="input_ids", shape=(1, 256), dtype=np.int32),
ct.TensorType(name="attention_mask", shape=(1, 256), dtype=np.int32),
],
outputs=[
ct.TensorType(name="embeddings", dtype=np.float16),
],
convert_to="mlprogram",
minimum_deployment_target=ct.target.iOS17,
compute_precision=ct.precision.FLOAT16,
)
MIT License (same as the base model)
@article{wang2024multilingual,
title={Multilingual E5 Text Embeddings: A Technical Report},
author={Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu},
journal={arXiv preprint arXiv:2402.05672},
year={2024}
}
Base model
intfloat/multilingual-e5-small