Update index.html
Browse files- index.html +12 -12
index.html
CHANGED
|
@@ -56,18 +56,18 @@
|
|
| 56 |
|
| 57 |
<section class="section">
|
| 58 |
<div class="container content">
|
| 59 |
-
<h2 class="title is-3"
|
| 60 |
-
<
|
| 61 |
-
<
|
| 62 |
-
<
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
</li>
|
| 69 |
-
<li><strong
|
| 70 |
-
</
|
| 71 |
</div>
|
| 72 |
</section>
|
| 73 |
|
|
|
|
| 56 |
|
| 57 |
<section class="section">
|
| 58 |
<div class="container content">
|
| 59 |
+
<h2 class="title is-3">🧩 Main Pipeline Steps</h2>
|
| 60 |
+
<figure>
|
| 61 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/64bfc4d55ce3d382c05c0f9a/1zPQcwqt9Li_gCvd04_2_.png" alt="JQL Pipeline Overview">
|
| 62 |
+
<figcaption><em>Figure 1: Overview of the JQL pipeline</em></figcaption>
|
| 63 |
+
</figure>
|
| 64 |
+
|
| 65 |
+
<ol>
|
| 66 |
+
<li><strong>📋 Ground Truth Creation:</strong> Human annotators label monolingual documents based on a structured instruction prompt. These documents are translated into all target languages to create a multilingual gold-standard dataset. (See Figure 1)</li>
|
| 67 |
+
<li><strong>🤖 LLM-as-a-Judge Selection & Data Annotation:</strong> Strong multilingual LLMs (e.g., Gemma, Mistral, LLaMA) are evaluated against the ground truth, and top-performing models are used to produce synthetic annotations. (See Figure 1)</li>
|
| 68 |
+
<li><strong>🪶 Lightweight Annotator Training:</strong> Train compact regression heads on frozen multilingual embeddings to create efficient, high-throughput annotators. (See Figure 1)</li>
|
| 69 |
+
<li><strong>🚀 Scalable Data Filtering:</strong> Use trained annotators to filter large-scale pretraining corpora using quantile thresholds. (See Figure 1)</li>
|
| 70 |
+
</ol>
|
| 71 |
</div>
|
| 72 |
</section>
|
| 73 |
|