MLMvsCLM
/

610m-mlm40-22k

@@ -1,7 +1,7 @@
 ---
-pipeline_tag: feature-extraction
 library_name: transformers
 license: apache-2.0
 ---
 # Overview
@@ -9,6 +9,7 @@ license: apache-2.0
 This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
 *   **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
 *   **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
 *   **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
@@ -16,18 +17,18 @@ This repository contains an encoder model, part of the research presented in the
 Model identifiers follow a consistent format that encodes key training details:
-* **Single-stage models**:
-`[model size]-[objective]-[number of steps]`.
-Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
-* **Two-stage models**:
-`[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
-Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
-* **Continued pretraining from decayed checkpoints**:
-These use the dec prefix on the first training stage.
-Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
-* **Intermediate checkpoints**:
-To refer to a specific training step before the final checkpoint, append the step number at the end.
-Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
 ## Usage

 ---
 library_name: transformers
 license: apache-2.0
+pipeline_tag: feature-extraction
 ---
 # Overview
 This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
 *   **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
+*   **Code:** [https://github.com/Nicolas-BZRD/EuroBERT](https://github.com/Nicolas-BZRD/EuroBERT)
 *   **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
 *   **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
 Model identifiers follow a consistent format that encodes key training details:
+*   **Single-stage models**:
+    `[model size]-[objective]-[number of steps]`.
+    Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
+*   **Two-stage models**:
+    `[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
+    Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
+*   **Continued pretraining from decayed checkpoints**:
+    These use the dec prefix on the first training stage.
+    Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
+*   **Intermediate checkpoints**:
+    To refer to a specific training step before the final checkpoint, append the step number at the end.
+    Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
 ## Usage