pablo-rf commited on
Commit
698ac70
·
verified ·
1 Parent(s): 710cbfb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ language:
5
+ - gl
6
+ base_model:
7
+ - microsoft/deberta-v3-xsmall
8
+ pipeline_tag: fill-mask
9
+ ---
10
+
11
+ # DeBERTa-xsmall-gl
12
+
13
+ **DeBERTa-xsmall-gl** is a continued pretraining checkpoint based on **microsoft/deberta-v3-xsmall**, adapted to Galician through large-scale masked-language modeling. It is intended as a strong general-purpose encoder for downstream NLP tasks in Galician.
14
+
15
+ ## Training
16
+
17
+ - **Base model:** microsoft/deberta-v3-xsmall
18
+ - **Epochs:** 3
19
+ - **Learning rate:** 6e-4
20
+ - **MLM probability:** 0.15
21
+ - **Max sequence length:** 512
22
+ - **Total batch size:** 1024
23
+ - **Training examples:** 6,139,791
24
+
25
+ ## Intended uses
26
+
27
+ - Masked language modeling (fill-mask)
28
+ - Encoder for classification, NER, QA, and general Galician NLP tasks
29
+ - Further domain adaptation via fine-tuning
30
+
31
+ ## How to use
32
+
33
+ ```python
34
+ from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
35
+
36
+ model_id = "proxectonos/deberta-xsmall-gl"
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ model = AutoModelForMaskedLM.from_pretrained(model_id)
40
+
41
+ fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
42
+
43
+ fill_mask("O Parlamento de Galicia aprobou a <mask> hoxe.")
44
+ ```
45
+
46
+ ## Citation
47
+
48
+ Please reference this model as: **mdeberta-gl (Proxecto Nós Team, 2025)**.