| --- |
| license: mit |
| language: |
| - multilingual |
| - en |
| - vi |
| - th |
| library_name: coreml |
| tags: |
| - coreml |
| - embeddings |
| - sentence-similarity |
| - retrieval |
| - hark |
| base_model: intfloat/multilingual-e5-small |
| --- |
| |
| # multilingual-e5-small β CoreML (int8) for Hark |
|
|
| A **CoreML** (`mlprogram`, **int8-weight-quantized**) conversion of |
| [`intfloat/multilingual-e5-small`](https://huggingface.co/intfloat/multilingual-e5-small) |
| (384-dim, multilingual), packaged for on-device vault search in |
| [**Hark**](https://github.com/tuanda2912/hark) β a local-first, macOS-only meeting |
| transcription app. Runs on the Apple Neural Engine; **nothing is sent off the |
| machine** (Hark embeds the whole vault locally). |
|
|
| This repo exists so Hark can download a ready-to-run CoreML artifact instead of |
| shipping it in the app bundle. It is a faithful conversion β see *Provenance* and |
| *Validation* β not a new model. |
|
|
| ## Files |
|
|
| | File | What | |
| |---|---| |
| | `MultilingualE5Small.mlpackage/` | the CoreML model (int8 weights, ~113 MB) | |
| | `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` | XLM-RoBERTa tokenizer (SentencePiece Unigram) | |
| | `sentencepiece.bpe.model` | the SentencePiece model | |
|
|
| Hark's loader snapshots this repo at a **pinned revision** into its app-support |
| models dir, compiles the `.mlpackage` to the ANE, and runs fully offline |
| thereafter. |
|
|
| ## I/O contract |
|
|
| - **inputs:** `input_ids` (int32 `[1, L]`), `attention_mask` (int32 `[1, L]`), flexible `L β 1..512` |
| - **output:** `last_hidden_state` (float32 `[1, L, 384]`) |
| - Hark applies **masked mean-pooling + L2-normalization** in Swift, and the e5 |
| asymmetric prefixes (`"query: "` / `"passage: "`). Reproduce those if you reuse |
| this model directly. |
|
|
| ## Provenance |
|
|
| - Converted from `intfloat/multilingual-e5-small` at source revision |
| **`614241f622f53c4eeff9890bdc4f31cfecc418b3`** via |
| [`engine/scripts/convert-embedder-coreml.py`](https://github.com/tuanda2912/hark/blob/main/engine/scripts/convert-embedder-coreml.py) |
| (coremltools 9, `convert_to="mlprogram"`, `minimum_deployment_target=macOS14`). |
| - int8 weight quantization (per-channel, symmetric) via |
| [`engine/scripts/quantize-embedder-int8.py`](https://github.com/tuanda2912/hark/blob/main/engine/scripts/quantize-embedder-int8.py) |
| (`coremltools.optimize.coreml.linear_quantize_weights`). |
|
|
| ## Validation |
|
|
| - **Fidelity:** worst-case cosine between the fp16 and int8 pooled+L2-normalized |
| embeddings was **0.99986** across EN/VI/TH probe sentences β the int8 weights |
| are essentially indistinguishable from fp16 for retrieval. |
| - **On-device:** Hark's gated cross-lingual + end-to-end retrieval tests pass on |
| the Apple Neural Engine with this int8 artifact (ENβVI/TH closer than |
| unrelated; full chunk β embed β index β retrieve pipeline). |
|
|
| ## License |
|
|
| MIT, inherited from `intfloat/multilingual-e5-small`. This is a format conversion |
| + weight quantization of that model; all credit to the original authors. |
|
|