mlboydaisuke
/

embeddinggemma-300m-coreml

apple-neural-engine

sentence-embedding

Model card Files Files and versions

embeddinggemma-300m-coreml / README.md

mlboydaisuke's picture

Upload README.md with huggingface_hub

b03901c verified 27 days ago

|

history blame contribute delete

3.32 kB

	---
	language: multilingual
	license: gemma
	base_model: google/embeddinggemma-300m
	tags:
	- coreml
	- apple-neural-engine
	- gemma3
	- sentence-embedding
	- on-device
	- matryoshka
	library_name: coreml
	---

	## Use it from Swift

	<!-- swift-usage-begin -->
	### Add the package

	`Package.swift`:

	```swift
	.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),

	// In your target:
	.product(name: "CoreMLLLM", package: "CoreML-LLM"),
	```

	Platforms: iOS 18+ / macOS 15+.

	### Download + encode

	```swift
	import CoreMLLLM

	let modelsDir = try FileManager.default.url(
	for: .applicationSupportDirectory, in: .userDomainMask,
	appropriateFor: nil, create: true)

	let eg = try await EmbeddingGemma.downloadAndLoad(modelsDir: modelsDir)

	// 768-dim L2-normalised embedding
	let v = try eg.encode(text: "How do I list files in Swift?")
	// Matryoshka: cheap-to-truncate dims (768 / 512 / 256 / 128)
	let v256 = try eg.encode(text: "How do I list files in Swift?",
	dim: 256)

	// Task-prefixed (RAG document vs. query)
	let q = try eg.encode(text: "list files",
	task: .retrievalQuery)
	let d = try eg.encode(text: "Use FileManager.contentsOfDirectory(...)",
	task: .retrievalDocument)
	```

	See [`Gemma3EmbeddingGemma.swift`](https://github.com/john-rocky/CoreML-LLM/blob/main/Sources/CoreMLLLM/Gemma3EmbeddingGemma.swift)
	for task prefixes and dim list.
	<!-- swift-usage-end -->



	# EmbeddingGemma-300M for Apple CoreML (ANE-optimized)

	CoreML conversion of `google/embeddinggemma-300m` produced with the
	[CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) pipeline. Targets
	iOS 26 / macOS 26.

	## What's in this repo

	\| File \| Notes \|
	\|---\|---\|
	\| `encoder.mlmodelc/` \| Compiled stateless bidirectional encoder (fp16, 588 MB) \|
	\| `model_config.json` \| I/O contract, Matryoshka dims, task prefixes \|
	\| `hf_model/` \| Tokenizer files \|

	## ANE residency

	99.80% on Apple Neural Engine (1950/1954 dispatched ops, verified via
	`MLComputePlan` on macOS 26). Achieved by:
	- residual-stream rescaling (semantic-preserving fp16 fit)
	- fp16-safe L2 normalize (divide by max-abs first to keep `sum(x²)` bounded)
	- iOS 26 deployment target

	## Use it

	Via the [CoreML-LLM Swift package](https://github.com/john-rocky/CoreML-LLM):

	```swift
	import CoreMLLLM
	let bundleURL = try await Gemma3BundleDownloader.download(
	.embeddingGemma300m, into: appSupportDir)
	let eg = try await EmbeddingGemma.load(bundleURL: bundleURL)
	let vec = try eg.encode(text: "On-device embeddings",
	task: .retrievalQuery,
	dim: 768) // or 512 / 256 / 128 (Matryoshka)
	```

	I/O contract:
	- `input_ids (1, 128) int32`, `attention_mask (1, 128) fp16` (1.0 valid, 0.0 pad)
	- `embedding (1, 768) fp16` — L2 unit norm; truncate the trailing dim and
	re-normalize for Matryoshka 512 / 256 / 128

	The bundle in this repo is built for `max_seq_len=128`. For longer inputs,
	re-run `python conversion/build_embeddinggemma_bundle.py --max-seq-len 2048`.

	## Sanity check

	```
	cosine("cat sat on mat", "feline rested on rug") = 0.7345 (high — similar)
	cosine("cat sat on mat", "quantum mechanics") = 0.4650 (low — different)
	```

	## License

	Inherits Google's [Gemma terms of use](https://ai.google.dev/gemma/terms).