File size: 2,295 Bytes
eac61cb
0c94942
 
 
 
eac61cb
 
0c94942
eac61cb
 
0c94942
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
title: MTEB Portuguese
emoji: πŸ†
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
---

# MTEB Portuguese

A public benchmark for evaluating text embedding models on **Brazilian Portuguese**, built as a thin extension on top of the [`mteb`](https://github.com/embeddings-benchmark/mteb) library.

## What you'll find here

- πŸ† **[Leaderboard](https://huggingface.co/spaces/mteb-pt/leaderboard)** β€” interactive ranking, 54 models Γ— 16 tasks, Pareto chart
- πŸ“Š **[`mteb-pt-results`](https://huggingface.co/datasets/mteb-pt/mteb-pt-results)** β€” all per-task JSONs + per-query parquets, ~1100 files
- πŸ’» **[GitHub repo](https://github.com/tardellirs/mteb-pt)** β€” task definitions, evaluation scripts, paper sources, issue templates

## Submit a model

We accept submissions via either channel β€” pick whichever fits:

- πŸ’¬ [HF Discussion on the results dataset](https://huggingface.co/datasets/mteb-pt/mteb-pt-results/discussions/new)
- πŸ› [GitHub Issue with the model template](https://github.com/tardellirs/mteb-pt/issues/new?template=submit-model.yml)

Required for a submission:
1. `model_id` (HF repo path or vendor product name)
2. Per-task result JSONs for the 16 headline tasks
3. Reproducible evaluation command

We re-run a sample of each submission to verify before merging.

## Propose a new task

Open a [GitHub Issue with the task template](https://github.com/tardellirs/mteb-pt/issues/new?template=propose-task.yml) describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models.

## Maintainer

**Tardelli Stekel** β€” IFSP, SΓ£o Paulo, Brazil
βœ‰οΈ <stekel@ifsp.edu.br>

Contributions, corrections, and discussion all welcome.

## Citation

```bibtex
@misc{mteb-portuguese-2026,
  title  = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese},
  author = {Stekel, Tardelli},
  year   = {2026},
  url    = {https://huggingface.co/spaces/mteb-pt/leaderboard}
}
```

## Acknowledgments

Built on top of the [`mteb`](https://github.com/embeddings-benchmark/mteb) library by Enevoldsen et al. (2025). Task datasets contributed by their original authors. Compute provided by Modal.