Spaces:

mteb-pt
/

README

Running

App Files Files Community

README / README.md

tardellirs

Org card content

0c94942 verified about 13 hours ago

preview code

raw

history blame contribute delete

2.3 kB

metadata

title: MTEB Portuguese
emoji: 🏆
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0

MTEB Portuguese

A public benchmark for evaluating text embedding models on Brazilian Portuguese, built as a thin extension on top of the mteb library.

What you'll find here

🏆 Leaderboard — interactive ranking, 54 models × 16 tasks, Pareto chart
📊 mteb-pt-results — all per-task JSONs + per-query parquets, ~1100 files
💻 GitHub repo — task definitions, evaluation scripts, paper sources, issue templates

Submit a model

We accept submissions via either channel — pick whichever fits:

Required for a submission:

model_id (HF repo path or vendor product name)
Per-task result JSONs for the 16 headline tasks
Reproducible evaluation command

We re-run a sample of each submission to verify before merging.

Propose a new task

Open a GitHub Issue with the task template describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models.

Maintainer

Tardelli Stekel — IFSP, São Paulo, Brazil ✉️ stekel@ifsp.edu.br

Contributions, corrections, and discussion all welcome.

Citation

@misc{mteb-portuguese-2026,
  title  = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese},
  author = {Stekel, Tardelli},
  year   = {2026},
  url    = {https://huggingface.co/spaces/mteb-pt/leaderboard}
}

Acknowledgments

Built on top of the mteb library by Enevoldsen et al. (2025). Task datasets contributed by their original authors. Compute provided by Modal.