--- language: en license: gemma base_model: google/embeddinggemma-300m tags: - sentence-transformers - embeddings - onnx - embeddinggemma-tuning-lab - sift - chrome-extension pipeline_tag: sentence-similarity --- # sift-finetuned Fine-tuned [EmbeddingGemma-300M](https://huggingface.co/google/embeddinggemma-300m) for personalized content scoring with [Sift](https://github.com/shreyaskarnik/Sift). > **License:** This model is a derivative of Google's Gemma and is subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). By using this model, you agree to those terms. ## What is this? This is a sentence embedding model fine-tuned on personal browsing labels collected with the Sift Chrome extension. It scores feed items (Hacker News, Reddit, X) against interest categories using cosine similarity, running entirely in the browser via [Transformers.js](https://huggingface.co/docs/transformers.js). ## Training - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) - **Loss:** MultipleNegativesRankingLoss (contrastive) - **Task prompt:** `task: classification | query: ` - **Epochs:** 4 - **Learning rate:** 2e-5 - **Framework:** [sentence-transformers](https://sbert.net/) ## ONNX Variants | File | Format | Use case | |------|--------|----------| | `onnx/model.onnx` | FP32 | Reference | | `onnx/model_quantized.onnx` | INT8 | Smaller download | | `onnx/model_q4.onnx` | 4-bit | WASM inference | | `onnx/model_no_gather_q4.onnx` | 4-bit | WebGPU inference | ## Usage with Sift Set this model ID (`shreyask/sift-finetuned`) in Sift's popup settings under **Model Source**. The extension loads it directly from HuggingFace — no authentication needed. ## Usage with Transformers.js ```javascript import { pipeline } from "@huggingface/transformers"; const extractor = await pipeline("feature-extraction", "shreyask/sift-finetuned", { dtype: "q4" }); const output = await extractor("Your text here", { pooling: "mean", normalize: true }); ``` ## Privacy ONNX files contain only numerical weights and tokenizer data — **no training examples, user labels, or personal information**.