โš ๏ธ Conference talk demo โ€” not production weights.

This model accompanies a conference keynote on local on-device AI. Published as a reference for the fine-tuning patterns shown on stage โ€” not a deployable artefact. No security audit, no SLA, pinned to the talk's state.


EmbeddingGemma 300M FT (q8_0) โ€” RAG Retrieval

Base model google/embeddinggemma-300m (308M params)
License Gemma Terms of Use
Training script finetune/train_embeddinggemma.py
Method Contrastive (sentence-transformers MultipleNegativesRankingLoss), 10 epochs max with save_best, lr=5e-6
Training data data/training-data/embeddinggemma_retrieval_{scenario}.jsonl (queryโ†”passage triplets with hard negatives)
Hardware tested Works on CPU (slow), MPS (medium), CUDA (fast). 308M params is small enough that hardware rarely matters.
Intended use Encoding documents and queries for semantic retrieval in ChromaDB. Output: 768-dim L2-normalised vectors.
Out of scope Text generation (it's an encoder-only model). Cross-domain retrieval โ€” the FT specialises it for the scenario's domain.
Reference eval (Nextera) MRR@10: 0.9533 โ†’ 0.9800 (base โ†’ FT). Recall@5: 98%. Eval-corpus caveat: measured on the held-out 25-query / 26-passage eval set โ€” small by design, for fast iteration. Production retrieval against the live KB (120 indexed chunks across 13 documents) was not measured separately; the MRR uplift against a real-size corpus may differ.
Known failure modes The FT narrows the model's domain โ€” out-of-domain queries (e.g. medical questions on the Nextera-FT model) retrieve nonsense with high confidence. Use the base model or a different scenario's FT for cross-domain queries.
Downloads last month
15
GGUF
Model size
0.3B params
Architecture
gemma-embedding
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for thinktecture/embeddinggemma-300m-ft-nextera-q8_0

Quantized
(45)
this model