Instructions to use tardellirs/embeddinggemma-pt-br-128k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use tardellirs/embeddinggemma-pt-br-128k with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("tardellirs/embeddinggemma-pt-br-128k") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
embeddinggemma-pt-br-128k
A Portuguese-only text-embedding model, vocabulary-trimmed from
google/embeddinggemma-300m to a 128k token
vocabulary (~207M params, MTEB(por) mean_16 0.7192 = 99.1% of the full model at
67% of its size). No training — only the token embedding matrix was sliced; the transformer
encoder and pooling/Dense heads are identical to the base model. Produced with
🛠️ embedding-vocab-trimmer.
Part of the embeddinggemma-pt-br family — 64k is the recommended sweet spot:
| model | params | mean_16 |
% of full |
|---|---|---|---|
| google/embeddinggemma-300m | ~308M | 0.7257 | 100% |
| embeddinggemma-pt-br-128k ⭐ | ~207M | 0.7192 | 99.1% |
| embeddinggemma-pt-br | ~157M | 0.7172 | 98.8% |
| embeddinggemma-pt-br-48k | ~144M | 0.7098 | 97.8% |
| embeddinggemma-pt-br-32k | ~131M | 0.6881 | 94.8% |
| embeddinggemma-pt-br-24k | ~125M | 0.6895 | 95.0% |
| embeddinggemma-pt-br-16k | ~119M | 0.6520 | 89.8% |
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("tardellirs/embeddinggemma-pt-br-128k")
emb = model.encode(["O Brasil é um país tropical da América do Sul."], normalize_embeddings=True)
Uses EmbeddingGemma's task prompts (prepend task: search result | query: / title: none | text: for retrieval).
Scope
A compression of Google's EmbeddingGemma to Portuguese (deployment/efficiency artifact; data provenance is the base model's). Vocabulary trimming compresses, it does not enhance. Derived under the Gemma license.
- Downloads last month
- -
Model tree for tardellirs/embeddinggemma-pt-br-128k
Base model
google/embeddinggemma-300m