--- language: multilingual license: gemma base_model: google/embeddinggemma-300m tags: - coreml - apple-neural-engine - gemma3 - sentence-embedding - on-device - matryoshka library_name: coreml --- ## Use it from Swift ### Add the package `Package.swift`: ```swift .package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"), // In your target: .product(name: "CoreMLLLM", package: "CoreML-LLM"), ``` Platforms: iOS 18+ / macOS 15+. ### Download + encode ```swift import CoreMLLLM let modelsDir = try FileManager.default.url( for: .applicationSupportDirectory, in: .userDomainMask, appropriateFor: nil, create: true) let eg = try await EmbeddingGemma.downloadAndLoad(modelsDir: modelsDir) // 768-dim L2-normalised embedding let v = try eg.encode(text: "How do I list files in Swift?") // Matryoshka: cheap-to-truncate dims (768 / 512 / 256 / 128) let v256 = try eg.encode(text: "How do I list files in Swift?", dim: 256) // Task-prefixed (RAG document vs. query) let q = try eg.encode(text: "list files", task: .retrievalQuery) let d = try eg.encode(text: "Use FileManager.contentsOfDirectory(...)", task: .retrievalDocument) ``` See [`Gemma3EmbeddingGemma.swift`](https://github.com/john-rocky/CoreML-LLM/blob/main/Sources/CoreMLLLM/Gemma3EmbeddingGemma.swift) for task prefixes and dim list. # EmbeddingGemma-300M for Apple CoreML (ANE-optimized) CoreML conversion of `google/embeddinggemma-300m` produced with the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) pipeline. Targets iOS 26 / macOS 26. ## What's in this repo | File | Notes | |---|---| | `encoder.mlmodelc/` | Compiled stateless bidirectional encoder (fp16, 588 MB) | | `model_config.json` | I/O contract, Matryoshka dims, task prefixes | | `hf_model/` | Tokenizer files | ## ANE residency **99.80% on Apple Neural Engine** (1950/1954 dispatched ops, verified via `MLComputePlan` on macOS 26). Achieved by: - residual-stream rescaling (semantic-preserving fp16 fit) - fp16-safe L2 normalize (divide by max-abs first to keep `sum(x²)` bounded) - iOS 26 deployment target ## Use it Via the [CoreML-LLM Swift package](https://github.com/john-rocky/CoreML-LLM): ```swift import CoreMLLLM let bundleURL = try await Gemma3BundleDownloader.download( .embeddingGemma300m, into: appSupportDir) let eg = try await EmbeddingGemma.load(bundleURL: bundleURL) let vec = try eg.encode(text: "On-device embeddings", task: .retrievalQuery, dim: 768) // or 512 / 256 / 128 (Matryoshka) ``` I/O contract: - `input_ids (1, 128) int32`, `attention_mask (1, 128) fp16` (1.0 valid, 0.0 pad) - `embedding (1, 768) fp16` — L2 unit norm; truncate the trailing dim and re-normalize for Matryoshka 512 / 256 / 128 The bundle in this repo is built for `max_seq_len=128`. For longer inputs, re-run `python conversion/build_embeddinggemma_bundle.py --max-seq-len 2048`. ## Sanity check ``` cosine("cat sat on mat", "feline rested on rug") = 0.7345 (high — similar) cosine("cat sat on mat", "quantum mechanics") = 0.4650 (low — different) ``` ## License Inherits Google's [Gemma terms of use](https://ai.google.dev/gemma/terms).