Add link to GitHub repository and refine usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- pipeline_tag: feature-extraction
3
  library_name: transformers
4
  license: apache-2.0
 
5
  ---
6
 
7
  # Overview
@@ -9,6 +9,7 @@ license: apache-2.0
9
  This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
10
 
11
  * **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
 
12
  * **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
13
  * **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
14
 
@@ -37,9 +38,8 @@ You can use this model for feature extraction with the Hugging Face `transformer
37
  from transformers import AutoTokenizer, AutoModel
38
  import torch
39
 
40
- # Replace with the actual model ID if different, e.g., "AhmedAliHassan/MLMvsCLM-Biphasic-210M"
41
- # This placeholder assumes the current repository is the model you want to load.
42
- model_name = "<YOUR_MODEL_ID_HERE>"
43
 
44
  # Load the tokenizer and model, ensuring trust_remote_code for custom architectures
45
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 
1
  ---
 
2
  library_name: transformers
3
  license: apache-2.0
4
+ pipeline_tag: feature-extraction
5
  ---
6
 
7
  # Overview
 
9
  This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
10
 
11
  * **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
12
+ * **Code:** [https://github.com/Nicolas-BZRD/EuroBERT](https://github.com/Nicolas-BZRD/EuroBERT)
13
  * **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
14
  * **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
15
 
 
38
  from transformers import AutoTokenizer, AutoModel
39
  import torch
40
 
41
+ # This example uses a representative model ID from the paper's artifacts.
42
+ model_name = "AhmedAliHassan/MLMvsCLM-Biphasic-210M"
 
43
 
44
  # Load the tokenizer and model, ensuring trust_remote_code for custom architectures
45
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)