Add link to GitHub repository and refine usage example
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
-
pipeline_tag: feature-extraction
|
| 3 |
library_name: transformers
|
| 4 |
license: apache-2.0
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Overview
|
|
@@ -9,6 +9,7 @@ license: apache-2.0
|
|
| 9 |
This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
|
| 10 |
|
| 11 |
* **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
|
|
|
|
| 12 |
* **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
|
| 13 |
* **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
|
| 14 |
|
|
@@ -37,9 +38,8 @@ You can use this model for feature extraction with the Hugging Face `transformer
|
|
| 37 |
from transformers import AutoTokenizer, AutoModel
|
| 38 |
import torch
|
| 39 |
|
| 40 |
-
#
|
| 41 |
-
|
| 42 |
-
model_name = "<YOUR_MODEL_ID_HERE>"
|
| 43 |
|
| 44 |
# Load the tokenizer and model, ensuring trust_remote_code for custom architectures
|
| 45 |
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
+
pipeline_tag: feature-extraction
|
| 5 |
---
|
| 6 |
|
| 7 |
# Overview
|
|
|
|
| 9 |
This repository contains an encoder model, part of the research presented in the paper *Should We Still Pretrain Encoders with Masked Language Modeling?* (Gisserot-Boukhlef et al.).
|
| 10 |
|
| 11 |
* **Paper:** [Should We Still Pretrain Encoders with Masked Language Modeling?](https://huggingface.co/papers/2507.00994)
|
| 12 |
+
* **Code:** [https://github.com/Nicolas-BZRD/EuroBERT](https://github.com/Nicolas-BZRD/EuroBERT)
|
| 13 |
* **Blog post:** [Link](https://huggingface.co/blog/Nicolas-BZRD/encoders-should-not-be-only-pre-trained-with-mlm)
|
| 14 |
* **Project page:** [https://hf.co/MLMvsCLM](https://hf.co/MLMvsCLM)
|
| 15 |
|
|
|
|
| 38 |
from transformers import AutoTokenizer, AutoModel
|
| 39 |
import torch
|
| 40 |
|
| 41 |
+
# This example uses a representative model ID from the paper's artifacts.
|
| 42 |
+
model_name = "AhmedAliHassan/MLMvsCLM-Biphasic-210M"
|
|
|
|
| 43 |
|
| 44 |
# Load the tokenizer and model, ensuring trust_remote_code for custom architectures
|
| 45 |
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|