Add link to GitHub repository

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +14 -13
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- pipeline_tag: feature-extraction
3
  library_name: transformers
4
  license: apache-2.0
 
5
  ---
6
 
7
  # Overview
@@ -9,6 +9,7 @@ license: apache-2.0
9
  This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
10
 
11
  * **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
 
12
  * **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
13
  * **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
14
 
@@ -16,18 +17,18 @@ This repository contains an encoder model, part of the research presented in the
16
 
17
  Model identifiers follow a consistent format that encodes key training details:
18
 
19
- * **Single-stage models**:
20
- `[model size]-[objective]-[number of steps]`.
21
- Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
22
- * **Two-stage models**:
23
- `[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
24
- Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
25
- * **Continued pretraining from decayed checkpoints**:
26
- These use the dec prefix on the first training stage.
27
- Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
28
- * **Intermediate checkpoints**:
29
- To refer to a specific training step before the final checkpoint, append the step number at the end.
30
- Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
31
 
32
  ## Usage
33
 
 
1
  ---
 
2
  library_name: transformers
3
  license: apache-2.0
4
+ pipeline_tag: feature-extraction
5
  ---
6
 
7
  # Overview
 
9
  This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
10
 
11
  * **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
12
+ * **Code:** [https://github.com/Nicolas-BZRD/EuroBERT](https://github.com/Nicolas-BZRD/EuroBERT)
13
  * **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
14
  * **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
15
 
 
17
 
18
  Model identifiers follow a consistent format that encodes key training details:
19
 
20
+ * **Single-stage models**:
21
+ `[model size]-[objective]-[number of steps]`.
22
+ Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
23
+ * **Two-stage models**:
24
+ `[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
25
+ Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
26
+ * **Continued pretraining from decayed checkpoints**:
27
+ These use the dec prefix on the first training stage.
28
+ Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
29
+ * **Intermediate checkpoints**:
30
+ To refer to a specific training step before the final checkpoint, append the step number at the end.
31
+ Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
32
 
33
  ## Usage
34