| --- |
| language: multilingual |
| license: gemma |
| base_model: google/embeddinggemma-300m |
| tags: |
| - coreml |
| - apple-neural-engine |
| - gemma3 |
| - sentence-embedding |
| - on-device |
| - matryoshka |
| library_name: coreml |
| --- |
| |
| ## Use it from Swift |
|
|
| <!-- swift-usage-begin --> |
| ### Add the package |
|
|
| `Package.swift`: |
|
|
| ```swift |
| .package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"), |
| |
| // In your target: |
| .product(name: "CoreMLLLM", package: "CoreML-LLM"), |
| ``` |
|
|
| Platforms: iOS 18+ / macOS 15+. |
|
|
| ### Download + encode |
|
|
| ```swift |
| import CoreMLLLM |
| |
| let modelsDir = try FileManager.default.url( |
| for: .applicationSupportDirectory, in: .userDomainMask, |
| appropriateFor: nil, create: true) |
| |
| let eg = try await EmbeddingGemma.downloadAndLoad(modelsDir: modelsDir) |
| |
| // 768-dim L2-normalised embedding |
| let v = try eg.encode(text: "How do I list files in Swift?") |
| // Matryoshka: cheap-to-truncate dims (768 / 512 / 256 / 128) |
| let v256 = try eg.encode(text: "How do I list files in Swift?", |
| dim: 256) |
| |
| // Task-prefixed (RAG document vs. query) |
| let q = try eg.encode(text: "list files", |
| task: .retrievalQuery) |
| let d = try eg.encode(text: "Use FileManager.contentsOfDirectory(...)", |
| task: .retrievalDocument) |
| ``` |
|
|
| See [`Gemma3EmbeddingGemma.swift`](https://github.com/john-rocky/CoreML-LLM/blob/main/Sources/CoreMLLLM/Gemma3EmbeddingGemma.swift) |
| for task prefixes and dim list. |
| <!-- swift-usage-end --> |
|
|
|
|
|
|
| # EmbeddingGemma-300M for Apple CoreML (ANE-optimized) |
|
|
| CoreML conversion of `google/embeddinggemma-300m` produced with the |
| [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) pipeline. Targets |
| iOS 26 / macOS 26. |
|
|
| ## What's in this repo |
|
|
| | File | Notes | |
| |---|---| |
| | `encoder.mlmodelc/` | Compiled stateless bidirectional encoder (fp16, 588 MB) | |
| | `model_config.json` | I/O contract, Matryoshka dims, task prefixes | |
| | `hf_model/` | Tokenizer files | |
|
|
| ## ANE residency |
|
|
| **99.80% on Apple Neural Engine** (1950/1954 dispatched ops, verified via |
| `MLComputePlan` on macOS 26). Achieved by: |
| - residual-stream rescaling (semantic-preserving fp16 fit) |
| - fp16-safe L2 normalize (divide by max-abs first to keep `sum(x²)` bounded) |
| - iOS 26 deployment target |
|
|
| ## Use it |
|
|
| Via the [CoreML-LLM Swift package](https://github.com/john-rocky/CoreML-LLM): |
|
|
| ```swift |
| import CoreMLLLM |
| let bundleURL = try await Gemma3BundleDownloader.download( |
| .embeddingGemma300m, into: appSupportDir) |
| let eg = try await EmbeddingGemma.load(bundleURL: bundleURL) |
| let vec = try eg.encode(text: "On-device embeddings", |
| task: .retrievalQuery, |
| dim: 768) // or 512 / 256 / 128 (Matryoshka) |
| ``` |
|
|
| I/O contract: |
| - `input_ids (1, 128) int32`, `attention_mask (1, 128) fp16` (1.0 valid, 0.0 pad) |
| - `embedding (1, 768) fp16` — L2 unit norm; truncate the trailing dim and |
| re-normalize for Matryoshka 512 / 256 / 128 |
|
|
| The bundle in this repo is built for `max_seq_len=128`. For longer inputs, |
| re-run `python conversion/build_embeddinggemma_bundle.py --max-seq-len 2048`. |
|
|
| ## Sanity check |
|
|
| ``` |
| cosine("cat sat on mat", "feline rested on rug") = 0.7345 (high — similar) |
| cosine("cat sat on mat", "quantum mechanics") = 0.4650 (low — different) |
| ``` |
|
|
| ## License |
|
|
| Inherits Google's [Gemma terms of use](https://ai.google.dev/gemma/terms). |
|
|