Spaces:

mteb-pt
/

README

Running

App Files Files Community

README / README.md

tardellirs

Org card content

0c94942 verified about 15 hours ago

preview code

raw

history blame contribute delete

2.3 kB

	---
	title: MTEB Portuguese
	emoji: 🏆
	colorFrom: green
	colorTo: yellow
	sdk: static
	pinned: false
	license: apache-2.0
	---

	# MTEB Portuguese

	A public benchmark for evaluating text embedding models on Brazilian Portuguese, built as a thin extension on top of the [`mteb`](https://github.com/embeddings-benchmark/mteb) library.

	## What you'll find here

	- 🏆 [Leaderboard](https://huggingface.co/spaces/mteb-pt/leaderboard) — interactive ranking, 54 models × 16 tasks, Pareto chart
	- 📊 [`mteb-pt-results`](https://huggingface.co/datasets/mteb-pt/mteb-pt-results) — all per-task JSONs + per-query parquets, ~1100 files
	- 💻 [GitHub repo](https://github.com/tardellirs/mteb-pt) — task definitions, evaluation scripts, paper sources, issue templates

	## Submit a model

	We accept submissions via either channel — pick whichever fits:

	- 💬 [HF Discussion on the results dataset](https://huggingface.co/datasets/mteb-pt/mteb-pt-results/discussions/new)
	- 🐛 [GitHub Issue with the model template](https://github.com/tardellirs/mteb-pt/issues/new?template=submit-model.yml)

	Required for a submission:
	1. `model_id` (HF repo path or vendor product name)
	2. Per-task result JSONs for the 16 headline tasks
	3. Reproducible evaluation command

	We re-run a sample of each submission to verify before merging.

	## Propose a new task

	Open a [GitHub Issue with the task template](https://github.com/tardellirs/mteb-pt/issues/new?template=propose-task.yml) describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models.

	## Maintainer

	Tardelli Stekel — IFSP, São Paulo, Brazil
	✉️ <stekel@ifsp.edu.br>

	Contributions, corrections, and discussion all welcome.

	## Citation

	```bibtex
	@misc{mteb-portuguese-2026,
	title = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese},
	author = {Stekel, Tardelli},
	year = {2026},
	url = {https://huggingface.co/spaces/mteb-pt/leaderboard}
	}
	```

	## Acknowledgments

	Built on top of the [`mteb`](https://github.com/embeddings-benchmark/mteb) library by Enevoldsen et al. (2025). Task datasets contributed by their original authors. Compute provided by Modal.