| --- |
| title: MTEB Portuguese |
| emoji: π |
| colorFrom: green |
| colorTo: yellow |
| sdk: static |
| pinned: false |
| license: apache-2.0 |
| --- |
| |
| # MTEB Portuguese |
|
|
| A public benchmark for evaluating text embedding models on **Brazilian Portuguese**, built as a thin extension on top of the [`mteb`](https://github.com/embeddings-benchmark/mteb) library. |
|
|
| ## What you'll find here |
|
|
| - π **[Leaderboard](https://huggingface.co/spaces/mteb-pt/leaderboard)** β interactive ranking, 54 models Γ 16 tasks, Pareto chart |
| - π **[`mteb-pt-results`](https://huggingface.co/datasets/mteb-pt/mteb-pt-results)** β all per-task JSONs + per-query parquets, ~1100 files |
| - π» **[GitHub repo](https://github.com/tardellirs/mteb-pt)** β task definitions, evaluation scripts, paper sources, issue templates |
|
|
| ## Submit a model |
|
|
| We accept submissions via either channel β pick whichever fits: |
|
|
| - π¬ [HF Discussion on the results dataset](https://huggingface.co/datasets/mteb-pt/mteb-pt-results/discussions/new) |
| - π [GitHub Issue with the model template](https://github.com/tardellirs/mteb-pt/issues/new?template=submit-model.yml) |
|
|
| Required for a submission: |
| 1. `model_id` (HF repo path or vendor product name) |
| 2. Per-task result JSONs for the 16 headline tasks |
| 3. Reproducible evaluation command |
|
|
| We re-run a sample of each submission to verify before merging. |
|
|
| ## Propose a new task |
|
|
| Open a [GitHub Issue with the task template](https://github.com/tardellirs/mteb-pt/issues/new?template=propose-task.yml) describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models. |
|
|
| ## Maintainer |
|
|
| **Tardelli Stekel** β IFSP, SΓ£o Paulo, Brazil |
| βοΈ <stekel@ifsp.edu.br> |
|
|
| Contributions, corrections, and discussion all welcome. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{mteb-portuguese-2026, |
| title = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese}, |
| author = {Stekel, Tardelli}, |
| year = {2026}, |
| url = {https://huggingface.co/spaces/mteb-pt/leaderboard} |
| } |
| ``` |
|
|
| ## Acknowledgments |
|
|
| Built on top of the [`mteb`](https://github.com/embeddings-benchmark/mteb) library by Enevoldsen et al. (2025). Task datasets contributed by their original authors. Compute provided by Modal. |
|
|