Instructions to use kd13/RoPERT-MLM-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kd13/RoPERT-MLM-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="kd13/RoPERT-MLM-small", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("kd13/RoPERT-MLM-small", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -26,6 +26,8 @@ A compact BERT-style masked language model trained entirely from scratch on Book
|
|
| 26 |
|
| 27 |
**Embedding tying.** The MLM decoder projection matrix shares weights with the token embedding table, which reduces parameter count and typically improves token prediction quality.
|
| 28 |
|
|
|
|
|
|
|
| 29 |
---
|
| 30 |
|
| 31 |
## Training Details
|
|
|
|
| 26 |
|
| 27 |
**Embedding tying.** The MLM decoder projection matrix shares weights with the token embedding table, which reduces parameter count and typically improves token prediction quality.
|
| 28 |
|
| 29 |
+
**SwiGLU** activation function. This gated linear unit replaces the standard feed-forward network with a combination of Swish and GLU, improving training stability and model performance by providing a more expressive non-linearity.
|
| 30 |
+
|
| 31 |
---
|
| 32 |
|
| 33 |
## Training Details
|