tuanda2912's picture
Add int8 CoreML multilingual-e5-small for Hark vault RAG
0a386d4 verified
metadata
license: mit
language:
  - multilingual
  - en
  - vi
  - th
library_name: coreml
tags:
  - coreml
  - embeddings
  - sentence-similarity
  - retrieval
  - hark
base_model: intfloat/multilingual-e5-small

multilingual-e5-small — CoreML (int8) for Hark

A CoreML (mlprogram, int8-weight-quantized) conversion of intfloat/multilingual-e5-small (384-dim, multilingual), packaged for on-device vault search in Hark — a local-first, macOS-only meeting transcription app. Runs on the Apple Neural Engine; nothing is sent off the machine (Hark embeds the whole vault locally).

This repo exists so Hark can download a ready-to-run CoreML artifact instead of shipping it in the app bundle. It is a faithful conversion — see Provenance and Validation — not a new model.

Files

File What
MultilingualE5Small.mlpackage/ the CoreML model (int8 weights, ~113 MB)
tokenizer.json, tokenizer_config.json, special_tokens_map.json XLM-RoBERTa tokenizer (SentencePiece Unigram)
sentencepiece.bpe.model the SentencePiece model

Hark's loader snapshots this repo at a pinned revision into its app-support models dir, compiles the .mlpackage to the ANE, and runs fully offline thereafter.

I/O contract

  • inputs: input_ids (int32 [1, L]), attention_mask (int32 [1, L]), flexible L ∈ 1..512
  • output: last_hidden_state (float32 [1, L, 384])
  • Hark applies masked mean-pooling + L2-normalization in Swift, and the e5 asymmetric prefixes ("query: " / "passage: "). Reproduce those if you reuse this model directly.

Provenance

Validation

  • Fidelity: worst-case cosine between the fp16 and int8 pooled+L2-normalized embeddings was 0.99986 across EN/VI/TH probe sentences — the int8 weights are essentially indistinguishable from fp16 for retrieval.
  • On-device: Hark's gated cross-lingual + end-to-end retrieval tests pass on the Apple Neural Engine with this int8 artifact (EN↔VI/TH closer than unrelated; full chunk → embed → index → retrieve pipeline).

License

MIT, inherited from intfloat/multilingual-e5-small. This is a format conversion

  • weight quantization of that model; all credit to the original authors.