BrazEmbed-PT-BR
Collection
Contamination-clean ~110M Brazilian-Portuguese embeddings (BERTimbau). #1 in the ~100M class on MTEB(por). โข 5 items โข Updated
How to use tardellirs/brazembed-pt-br-pairclf with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("tardellirs/brazembed-pt-br-pairclf")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]A component of BrazEmbed-PT-BR, the contamination-clean ~110M
Brazilian-Portuguese embedding system (task-routed, MTEB(por) mean_16 = 0.6567, #1 in the ~100M class). This
standalone SentenceTransformer (Brazilian BERTimbau + the Pair-classification (NLI) clean weight-soup) serves the Pair-classification (NLI) tasks.
from sentence_transformers import SentenceTransformer
m = SentenceTransformer("tardellirs/brazembed-pt-br-pairclf") # mean-pooling, L2-normalized, no instruction prefix
Use it directly for Pair-classification (NLI), or via the router (https://github.com/tardellirs/brazembed-pt-br โ route.py). For one general model, use
tardellirs/brazembed-pt-br. License MIT. Benchmark: MTEB(por) (soon).
Base model
neuralmind/bert-base-portuguese-cased