proxectonos
/

deberta-xsmall-gl

Model card Files Files and versions

pablo-rf commited on Dec 11, 2025

Commit

698ac70

·

verified ·

1 Parent(s): 710cbfb

Update README.md

Files changed (1) hide show

README.md +48 -3

README.md CHANGED Viewed

@@ -1,3 +1,48 @@
----
-license: apache-2.0
----

+---
+library_name: transformers
+license: mit
+language:
+- gl
+base_model:
+- microsoft/deberta-v3-xsmall
+pipeline_tag: fill-mask
+---
+# DeBERTa-xsmall-gl
+**DeBERTa-xsmall-gl** is a continued pretraining checkpoint based on **microsoft/deberta-v3-xsmall**, adapted to Galician through large-scale masked-language modeling. It is intended as a strong general-purpose encoder for downstream NLP tasks in Galician.
+## Training
+- **Base model:** microsoft/deberta-v3-xsmall
+- **Epochs:** 3
+- **Learning rate:** 6e-4
+- **MLM probability:** 0.15
+- **Max sequence length:** 512
+- **Total batch size:** 1024
+- **Training examples:** 6,139,791
+## Intended uses
+- Masked language modeling (fill-mask)
+- Encoder for classification, NER, QA, and general Galician NLP tasks
+- Further domain adaptation via fine-tuning
+## How to use
+```python
+from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
+model_id = "proxectonos/deberta-xsmall-gl"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForMaskedLM.from_pretrained(model_id)
+fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
+fill_mask("O Parlamento de Galicia aprobou a <mask> hoxe.")
+```
+## Citation
+Please reference this model as: **mdeberta-gl (Proxecto Nós Team, 2025)**.