---
license: mit
language:
  - multilingual
  - en
  - vi
  - th
library_name: coreml
tags:
  - coreml
  - embeddings
  - sentence-similarity
  - retrieval
  - hark
base_model: intfloat/multilingual-e5-small
---

# multilingual-e5-small — CoreML (int8) for Hark

A **CoreML** (`mlprogram`, **int8-weight-quantized**) conversion of
[`intfloat/multilingual-e5-small`](https://huggingface.co/intfloat/multilingual-e5-small)
(384-dim, multilingual), packaged for on-device vault search in
[**Hark**](https://github.com/tuanda2912/hark) — a local-first, macOS-only meeting
transcription app. Runs on the Apple Neural Engine; **nothing is sent off the
machine** (Hark embeds the whole vault locally).

This repo exists so Hark can download a ready-to-run CoreML artifact instead of
shipping it in the app bundle. It is a faithful conversion — see *Provenance* and
*Validation* — not a new model.

## Files

| File | What |
|---|---|
| `MultilingualE5Small.mlpackage/` | the CoreML model (int8 weights, ~113 MB) |
| `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` | XLM-RoBERTa tokenizer (SentencePiece Unigram) |
| `sentencepiece.bpe.model` | the SentencePiece model |

Hark's loader snapshots this repo at a **pinned revision** into its app-support
models dir, compiles the `.mlpackage` to the ANE, and runs fully offline
thereafter.

## I/O contract

- **inputs:** `input_ids` (int32 `[1, L]`), `attention_mask` (int32 `[1, L]`), flexible `L ∈ 1..512`
- **output:** `last_hidden_state` (float32 `[1, L, 384]`)
- Hark applies **masked mean-pooling + L2-normalization** in Swift, and the e5
  asymmetric prefixes (`"query: "` / `"passage: "`). Reproduce those if you reuse
  this model directly.

## Provenance

- Converted from `intfloat/multilingual-e5-small` at source revision
  **`614241f622f53c4eeff9890bdc4f31cfecc418b3`** via
  [`engine/scripts/convert-embedder-coreml.py`](https://github.com/tuanda2912/hark/blob/main/engine/scripts/convert-embedder-coreml.py)
  (coremltools 9, `convert_to="mlprogram"`, `minimum_deployment_target=macOS14`).
- int8 weight quantization (per-channel, symmetric) via
  [`engine/scripts/quantize-embedder-int8.py`](https://github.com/tuanda2912/hark/blob/main/engine/scripts/quantize-embedder-int8.py)
  (`coremltools.optimize.coreml.linear_quantize_weights`).

## Validation

- **Fidelity:** worst-case cosine between the fp16 and int8 pooled+L2-normalized
  embeddings was **0.99986** across EN/VI/TH probe sentences — the int8 weights
  are essentially indistinguishable from fp16 for retrieval.
- **On-device:** Hark's gated cross-lingual + end-to-end retrieval tests pass on
  the Apple Neural Engine with this int8 artifact (EN↔VI/TH closer than
  unrelated; full chunk → embed → index → retrieve pipeline).

## License

MIT, inherited from `intfloat/multilingual-e5-small`. This is a format conversion
+ weight quantization of that model; all credit to the original authors.