Add link to code repository
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
-
pipeline_tag: feature-extraction
|
| 3 |
library_name: transformers
|
| 4 |
license: apache-2.0
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Overview
|
|
@@ -11,23 +11,24 @@ This repository contains an encoder model, part of the research presented in the
|
|
| 11 |
* **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
|
| 12 |
* **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
|
| 13 |
* **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
|
|
|
|
| 14 |
|
| 15 |
## Model Naming
|
| 16 |
|
| 17 |
Model identifiers follow a consistent format that encodes key training details:
|
| 18 |
|
| 19 |
-
*
|
| 20 |
-
`[model size]-[objective]-[number of steps]`.
|
| 21 |
-
Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
|
| 22 |
-
*
|
| 23 |
-
`[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
|
| 24 |
-
Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
|
| 25 |
-
*
|
| 26 |
-
These use the dec prefix on the first training stage.
|
| 27 |
-
Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
|
| 28 |
-
*
|
| 29 |
-
To refer to a specific training step before the final checkpoint, append the step number at the end.
|
| 30 |
-
Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
|
| 31 |
|
| 32 |
## Usage
|
| 33 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
+
pipeline_tag: feature-extraction
|
| 5 |
---
|
| 6 |
|
| 7 |
# Overview
|
|
|
|
| 11 |
* **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
|
| 12 |
* **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
|
| 13 |
* **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
|
| 14 |
+
* **Code:** [https://github.com/Nicolas-BZRD/EuroBERT](https://github.com/Nicolas-BZRD/EuroBERT)
|
| 15 |
|
| 16 |
## Model Naming
|
| 17 |
|
| 18 |
Model identifiers follow a consistent format that encodes key training details:
|
| 19 |
|
| 20 |
+
* **Single-stage models**:
|
| 21 |
+
`[model size]-[objective]-[number of steps]`.
|
| 22 |
+
Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
|
| 23 |
+
* **Two-stage models**:
|
| 24 |
+
`[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
|
| 25 |
+
Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
|
| 26 |
+
* **Continued pretraining from decayed checkpoints**:
|
| 27 |
+
These use the dec prefix on the first training stage.
|
| 28 |
+
Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
|
| 29 |
+
* **Intermediate checkpoints**:
|
| 30 |
+
To refer to a specific training step before the final checkpoint, append the step number at the end.
|
| 31 |
+
Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
|
| 32 |
|
| 33 |
## Usage
|
| 34 |
|