Fill-Mask
Transformers
Safetensors
roberta
OSainz commited on
Commit
9cb8c3a
·
verified ·
1 Parent(s): ba13b99

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -9,7 +9,7 @@ Submitted to LREC 2026
9
 
10
  ## Model Details
11
 
12
- ### Model Description
13
 
14
  BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
15
 
@@ -19,6 +19,36 @@ BERnaT is a family of monolingual Basque encoder-only language models trained to
19
  - **Model Type**: Encoder-only Transformer models (RoBERTa-style)
20
  - **Languages**: Basque (Euskara)
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Training Data
23
 
24
  The BERnaT family was pre-trained on a combination of:
 
9
 
10
  ## Model Details
11
 
12
+ ## Model Description
13
 
14
  BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
15
 
 
19
  - **Model Type**: Encoder-only Transformer models (RoBERTa-style)
20
  - **Languages**: Basque (Euskara)
21
 
22
+ ## Getting Started
23
+
24
+ You can either use this model directly as the example below, or fine-tune it to your task of interest.
25
+
26
+ ```python
27
+ >>> from transformers import pipeline
28
+ >>> pipe = pipeline("fill-mask", model='HiTZ/BERnaT-base')
29
+ >>> pipe("Kaixo! Ni <mask> naiz!")
30
+ [{'score': 0.022003261372447014,
31
+ 'token': 7497,
32
+ 'token_str': ' euskalduna',
33
+ 'sequence': 'Kaixo! Ni euskalduna naiz!'},
34
+ {'score': 0.016429167240858078,
35
+ 'token': 14067,
36
+ 'token_str': ' Olentzero',
37
+ 'sequence': 'Kaixo! Ni Olentzero naiz!'},
38
+ {'score': 0.012804778292775154,
39
+ 'token': 31087,
40
+ 'token_str': ' ahobizi',
41
+ 'sequence': 'Kaixo! Ni ahobizi naiz!'},
42
+ {'score': 0.01173020526766777,
43
+ 'token': 331,
44
+ 'token_str': ' ez',
45
+ 'sequence': 'Kaixo! Ni ez naiz!'},
46
+ {'score': 0.010091394186019897,
47
+ 'token': 7618,
48
+ 'token_str': ' irakaslea',
49
+ 'sequence': 'Kaixo! Ni irakaslea naiz!'}]
50
+ ```
51
+
52
  ## Training Data
53
 
54
  The BERnaT family was pre-trained on a combination of: