Update index.html
Browse files- index.html +12 -0
index.html
CHANGED
|
@@ -24,6 +24,18 @@
|
|
| 24 |
</div>
|
| 25 |
</section>
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
<section class="section">
|
| 28 |
<div class="container content">
|
| 29 |
<h2 class="title is-3">🧩 Main Pipeline Steps</h2>
|
|
|
|
| 24 |
</div>
|
| 25 |
</section>
|
| 26 |
|
| 27 |
+
<section class="section">
|
| 28 |
+
<div class="container content">
|
| 29 |
+
<p>
|
| 30 |
+
High-quality multilingual data is crucial for training effective large language models (LLMs).<br>
|
| 31 |
+
<strong>JQL (Judging Quality across Languages)</strong> is a scalable and lightweight data filtering approach that distills the judgment capabilities of strong multilingual LLMs into efficient cross-lingual annotators. These annotators enable robust filtering of web-scale data.
|
| 32 |
+
</p>
|
| 33 |
+
<p>
|
| 34 |
+
JQL improves data quality, retains more tokens, and generalizes beyond high-resource European languages—achieving strong performance on Arabic, Thai, and Mandarin. It outperforms heuristic baselines and enables efficient multilingual pretraining data curation at scale.
|
| 35 |
+
</p>
|
| 36 |
+
</div>
|
| 37 |
+
</section>
|
| 38 |
+
|
| 39 |
<section class="section">
|
| 40 |
<div class="container content">
|
| 41 |
<h2 class="title is-3">🧩 Main Pipeline Steps</h2>
|