metadata
license: mit
language:
- multilingual
- en
- ja
- zh
- ko
- de
- fr
- es
- it
- pt
- ru
library_name: coreml
tags:
- sentence-transformers
- embeddings
- coreml
- ios
- macos
- multilingual
- e5
- semantic-search
base_model: intfloat/multilingual-e5-small
pipeline_tag: sentence-similarity
Multilingual E5 Small - CoreML
This is a CoreML conversion of intfloat/multilingual-e5-small for iOS/macOS deployment.
Model Description
Multilingual E5 Small is a multilingual sentence embedding model optimized for semantic search and retrieval tasks. This CoreML version enables on-device inference on Apple platforms.
Key Features
- Multilingual: Supports 100+ languages including English, Japanese, Chinese, Korean, German, French, Spanish, and more
- Search-optimized: Designed specifically for retrieval tasks
- Cross-lingual: Can match queries in one language to documents in another
- On-device: Runs locally on iPhone/iPad/Mac without internet
Model Details
| Property | Value |
|---|---|
| Base Model | intfloat/multilingual-e5-small |
| Embedding Dimensions | 384 |
| Max Sequence Length | 256 (configurable up to 512) |
| Model Size | ~224 MB |
| Precision | Float16 |
| Minimum iOS | 17.0 |
| Minimum macOS | 14.0 |
Usage
Input Format
E5 models use prefixes to distinguish between queries and documents:
- Query:
"query: your search query here" - Document:
"passage: your document text here"
Swift Example
import CoreML
// Load model
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
let model = try MLModel(contentsOf: modelURL, configuration: config)
// Prepare inputs (after tokenization)
let inputIds: MLMultiArray = // tokenized input
let attentionMask: MLMultiArray = // attention mask
// Run inference
let input = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputIds),
"attention_mask": MLFeatureValue(multiArray: attentionMask)
])
let output = try model.prediction(from: input)
let embeddings = output.featureValue(for: "embeddings")?.multiArrayValue
Tokenizer
Use the included tokenizer.json with swift-transformers:
import Tokenizers
let tokenizer = try await AutoTokenizer.from(modelFolder: tokenizerURL)
let encoded = tokenizer.encode(text: "query: your text")
Performance
Tested on iOS with Neural Engine:
| Device | Inference Time |
|---|---|
| iPhone 15 Pro | ~15ms |
| iPhone 13 | ~25ms |
| M1 Mac | ~10ms |
Accuracy Comparison
Tested with 10 mixed Japanese/English technical queries:
| Model | Accuracy | Avg Score |
|---|---|---|
| Apple NLEmbedding | 20% | 0.558 |
| This Model (E5) | 100% | 0.860 |
Files
MultilingualE5Small.mlpackage/- CoreML model packagetokenizer.json- Tokenizer vocabulary and configurationtokenizer_config.json- Tokenizer settings
Conversion
Converted using coremltools with FP16 precision:
import coremltools as ct
mlmodel = ct.convert(
traced_model,
inputs=[
ct.TensorType(name="input_ids", shape=(1, 256), dtype=np.int32),
ct.TensorType(name="attention_mask", shape=(1, 256), dtype=np.int32),
],
outputs=[
ct.TensorType(name="embeddings", dtype=np.float16),
],
convert_to="mlprogram",
minimum_deployment_target=ct.target.iOS17,
compute_precision=ct.precision.FLOAT16,
)
License
MIT License (same as the base model)
Citation
@article{wang2024multilingual,
title={Multilingual E5 Text Embeddings: A Technical Report},
author={Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu},
journal={arXiv preprint arXiv:2402.05672},
year={2024}
}