HiTZ
/

BERnaT-large

@@ -9,7 +9,7 @@ Submitted to LREC 2026
 ## Model Details
-### Model Description
 BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
@@ -19,6 +19,36 @@ BERnaT is a family of monolingual Basque encoder-only language models trained to
 - **Model Type**: Encoder-only Transformer models (RoBERTa-style)
 - **Languages**: Basque (Euskara)
 ## Training Data
 The BERnaT family was pre-trained on a combination of:

 ## Model Details
+## Model Description
 BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
 - **Model Type**: Encoder-only Transformer models (RoBERTa-style)
 - **Languages**: Basque (Euskara)
+## Getting Started
+You can either use this model directly as the example below, or fine-tune it to your task of interest.
+```python
+>>> from transformers import pipeline
+>>> pipe = pipeline("fill-mask", model='HiTZ/BERnaT-base')
+>>> pipe("Kaixo! Ni <mask> naiz!")
+[{'score': 0.022003261372447014,
+  'token': 7497,
+  'token_str': ' euskalduna',
+  'sequence': 'Kaixo! Ni euskalduna naiz!'},
+ {'score': 0.016429167240858078,
+  'token': 14067,
+  'token_str': ' Olentzero',
+  'sequence': 'Kaixo! Ni Olentzero naiz!'},
+ {'score': 0.012804778292775154,
+  'token': 31087,
+  'token_str': ' ahobizi',
+  'sequence': 'Kaixo! Ni ahobizi naiz!'},
+ {'score': 0.01173020526766777,
+  'token': 331,
+  'token_str': ' ez',
+  'sequence': 'Kaixo! Ni ez naiz!'},
+ {'score': 0.010091394186019897,
+  'token': 7618,
+  'token_str': ' irakaslea',
+  'sequence': 'Kaixo! Ni irakaslea naiz!'}]
+```
 ## Training Data
 The BERnaT family was pre-trained on a combination of: