File size: 1,990 Bytes
3ff69a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: gemma
tags:
  - coreai
  - sentence-similarity
  - feature-extraction
  - apple-silicon
  - on-device
---

# EmbeddingGemma 300m β€” Core AI export

[google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) as a
single static Core AI graph: the full sentence-transformers pipeline (transformer β†’
mean pooling β†’ dense projection β†’ L2 normalize) runs in-graph, so one call returns a
normalized 768-d embedding. On-device semantic search / RAG for macOS 27 / iOS 27 beta.

Runs out of the box with [CoreAIKit](https://github.com/john-rocky/coreai-kit)'s
`TextEmbedder`:

```swift
let embedder = try await TextEmbedder()   // downloads this repo
let doc = try await embedder.embed(document: "Tokyo is the capital of Japan.")
let query = try await embedder.embed(query: "what is the capital of Japan")
let score = TextEmbedder.cosineSimilarity(doc, query)
```

Retrieval prompt prefixes (`task: search result | query: ` / `title: none | text: `)
are applied automatically by `TextEmbedder`.

## Bundle layout

```
model/
β”œβ”€β”€ embeddinggemma-300m_float32_static.aimodel
β”œβ”€β”€ tokenizer/            (HF tokenizer files)
└── reference.json        (torch reference cosines used by the parity test)
```

## Graph contract

| | name | shape | dtype |
|---|---|---|---|
| input | `input_ids` | [1, 256] | int32 (pad id 0, mask 0 over padding) |
| input | `attention_mask` | [1, 256] | int32 |
| output | `embedding` | [1, 768] | fp32, L2-normalized |

Precision: fp32. Cross-runtime parity vs the torch SentenceTransformer pipeline is
exact to 6 decimal places (see reference.json). fp16 variants (full cast AND
mixed-precision autocast) produce NaN embeddings on-device β€” Gemma3 activations
overflow half precision β€” so fp32 is shipped; a smaller int8 variant is future work.

## License

Gemma Terms of Use (see the upstream model card). Conversion script: this repo's
sibling, based on apple/coreai-models' recipe patterns (BSD-3-Clause).