File size: 5,136 Bytes
eac61cb cebd774 0d52e70 cebd774 0d52e70 cebd774 8f4cf2d cebd774 4dbe947 cebd774 bf4cdea cebd774 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | <!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>MTEB Portuguese</title>
<meta property="og:title" content="MTEB Portuguese β Brazilian Portuguese Embedding Benchmark" />
<meta property="og:description" content="54 models Γ 16 native PT-BR tasks. Interactive leaderboard with Pareto frontier." />
<meta property="og:image" content="https://huggingface.co/spaces/mteb-pt/README/resolve/main/pareto-banner.png" />
<meta property="og:url" content="https://huggingface.co/mteb-pt" />
<meta name="twitter:card" content="summary_large_image" />
<style>
:root { color-scheme: light dark; }
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif;
max-width: 720px; margin: 1.5rem auto; padding: 0 1rem;
line-height: 1.55; color: #1f2328; background: #fff;
}
@media (prefers-color-scheme: dark) {
body { color: #e6edf3; background: #0e1117; }
a { color: #58a6ff; }
code { background: #161b22; }
h1, h2 { color: #fff; }
blockquote { color: #8b949e; }
}
h1 { font-size: 1.6rem; margin: 0.5rem 0 0.25rem; }
h2 { font-size: 1.15rem; margin: 1.4rem 0 0.5rem; }
a { color: #0969da; text-decoration: none; }
a:hover { text-decoration: underline; }
code { background: #f6f8fa; padding: 0.1rem 0.3rem; border-radius: 4px; font-size: 0.85em; }
pre { background: #f6f8fa; padding: 0.75rem; border-radius: 6px; overflow-x: auto; }
@media (prefers-color-scheme: dark) { pre { background: #161b22; } }
.cta { display: flex; gap: 0.5rem; flex-wrap: wrap; margin: 0.5rem 0 1rem; }
.cta a {
display: inline-block; padding: 0.45rem 0.9rem; border-radius: 6px;
background: #f6f8fa; border: 1px solid #d0d7de; color: #1f2328;
font-weight: 500; font-size: 0.9rem;
}
.cta a.primary { background: #0969da; color: #fff; border-color: #0969da; }
.cta a:hover { text-decoration: none; filter: brightness(1.05); }
@media (prefers-color-scheme: dark) { .cta a { background: #161b22; color: #e6edf3; border-color: #30363d; } }
ul { padding-left: 1.2rem; }
.lead { color: #656d76; font-size: 1.05rem; }
@media (prefers-color-scheme: dark) { .lead { color: #8b949e; } }
.banner { width: 100%; border-radius: 8px; margin: 0.5rem 0 1rem; display: block; }
</style>
</head>
<body>
<h1>MTEB Portuguese</h1>
<p class="lead">A public benchmark for evaluating text embedding models on Brazilian Portuguese, built on top of the <a href="https://github.com/embeddings-benchmark/mteb">mteb</a> library.</p>
<h2>What you'll find here</h2>
<ul>
<li>π <a href="https://huggingface.co/spaces/mteb-pt/leaderboard"><b>Leaderboard</b></a> β interactive ranking, 54 models Γ 16 tasks, with Pareto chart</li>
<li>π» <a href="https://github.com/tardellirs/mteb-pt">GitHub repo</a> β task definitions, evaluation scripts, paper sources, issue templates</li>
<li>π <a href="https://github.com/tardellirs/mteb-pt#task-suite-16-headline-tasks">Task list & sources</a> β every task linked to its original dataset / paper</li>
</ul>
<h2>Submit a model</h2>
<p>Two channels β pick whichever fits:</p>
<div class="cta">
<a class="primary" href="https://huggingface.co/spaces/mteb-pt/leaderboard/discussions/new">π¬ Open HF Discussion</a>
<a href="https://github.com/tardellirs/mteb-pt/issues/new?template=submit-model.yml">π GitHub Issue</a>
</div>
<p>Required for a submission:</p>
<ol>
<li><code>model_id</code> (HF repo path or vendor product name)</li>
<li>Per-task result JSONs for the 16 headline tasks</li>
<li>Reproducible evaluation command</li>
</ol>
<p>We re-run a sample of each submission to verify before merging.</p>
<h2>Propose a new task</h2>
<p>Open a <a href="https://github.com/tardellirs/mteb-pt/issues/new?template=propose-task.yml">GitHub Issue with the task template</a> describing the dataset, license, size, and discrimination evidence. A task is accepted if it's native PT-BR (not machine-translated), has clear licensing, and discriminates between models.</p>
<h2>Maintainer</h2>
<p><b>Tardelli Stekel</b> β IFSP, SΓ£o Paulo, Brazil<br>
βοΈ <a href="mailto:stekel@ifsp.edu.br">stekel@ifsp.edu.br</a></p>
<p>Contributions, corrections, and discussion all welcome.</p>
<h2>Citation</h2>
<pre>@misc{mteb-portuguese-2026,
title = {MTEB Portuguese: A Massive Text Embedding Benchmark for Brazilian Portuguese},
author = {Stekel, Tardelli},
year = {2026},
url = {https://huggingface.co/spaces/mteb-pt/leaderboard}
}</pre>
<h2>Acknowledgments</h2>
<p>Built on top of the <a href="https://github.com/embeddings-benchmark/mteb">mteb</a> library (Muennighoff et al., 2023). The multilingual sub-benchmark methodology follows MMTEB (Enevoldsen et al., 2025). Task datasets contributed by their original authors β see the <a href="https://github.com/tardellirs/mteb-pt#task-suite-16-headline-tasks">task suite</a> for sources.</p>
</body>
</html>
|