mteb-pt

non-profit
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

tardellirsΒ  updated a Space about 5 hours ago
mteb-pt/README
tardellirsΒ  published a Space about 5 hours ago
mteb-pt/README
tardellirsΒ  updated a Space about 6 hours ago
mteb-pt/leaderboard
View all activity

Organization Card

MTEB Portuguese

A public benchmark for evaluating text embedding models on Brazilian Portuguese, built on top of the mteb library.

What you'll find here

  • πŸ† Leaderboard β€” interactive ranking, 54 models Γ— 16 tasks, with Pareto chart
  • πŸ’» GitHub repo β€” task definitions, evaluation scripts, paper sources, issue templates
  • πŸ“š Task list & sources β€” every task linked to its original dataset / paper

Submit a model

Two channels β€” pick whichever fits:

Required for a submission:

  1. model_id (HF repo path or vendor product name)
  2. Per-task result JSONs for the 16 headline tasks
  3. Reproducible evaluation command

We re-run a sample of each submission to verify before merging.

Propose a new task

Open a GitHub Issue with the task template describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models.

Maintainer

Tardelli Stekel β€” IFSP, SΓ£o Paulo, Brazil
βœ‰οΈ stekel@ifsp.edu.br

Contributions, corrections, and discussion all welcome.

Citation

@misc{mteb-portuguese-2026,
  title  = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese},
  author = {Stekel, Tardelli},
  year   = {2026},
  url    = {https://huggingface.co/spaces/mteb-pt/leaderboard}
}

Acknowledgments

Built on top of the mteb library (Muennighoff et al., 2023). The multilingual sub-benchmark methodology follows MMTEB (Enevoldsen et al., 2025). Task datasets contributed by their original authors β€” see the task suite for sources.

models 0

None public yet

datasets 0

None public yet