title: MTEB Portuguese
emoji: π
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
MTEB Portuguese
A public benchmark for evaluating text embedding models on Brazilian Portuguese, built as a thin extension on top of the mteb library.
What you'll find here
- π Leaderboard β interactive ranking, 54 models Γ 16 tasks, Pareto chart
- π
mteb-pt-resultsβ all per-task JSONs + per-query parquets, ~1100 files - π» GitHub repo β task definitions, evaluation scripts, paper sources, issue templates
Submit a model
We accept submissions via either channel β pick whichever fits:
Required for a submission:
model_id(HF repo path or vendor product name)- Per-task result JSONs for the 16 headline tasks
- Reproducible evaluation command
We re-run a sample of each submission to verify before merging.
Propose a new task
Open a GitHub Issue with the task template describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models.
Maintainer
Tardelli Stekel β IFSP, SΓ£o Paulo, Brazil βοΈ stekel@ifsp.edu.br
Contributions, corrections, and discussion all welcome.
Citation
@misc{mteb-portuguese-2026,
title = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese},
author = {Stekel, Tardelli},
year = {2026},
url = {https://huggingface.co/spaces/mteb-pt/leaderboard}
}
Acknowledgments
Built on top of the mteb library by Enevoldsen et al. (2025). Task datasets contributed by their original authors. Compute provided by Modal.