File size: 1,561 Bytes
83d843e d574358 83d843e 6b569cf 83d843e 6b569cf 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e d574358 83d843e 51696fb d574358 83d843e d574358 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
library_name: transformers
license: mit
language:
- gl
base_model:
- microsoft/mdeberta-v3-base
pipeline_tag: fill-mask
---
# mDeBERTa-gl
**mDeBERTa-gl** is a continued pretraining checkpoint based on [**microsoft/mdeberta-v3-base**](https://huggingface.co/microsoft/mdeberta-v3-base), adapted to Galician through large-scale masked-language modeling. It is intended as a strong general-purpose encoder for downstream NLP tasks in Galician.
## Training
- **Base model:** microsoft/mdeberta-v3-base
- **Epochs:** 3
- **Learning rate:** 6e-4
- **MLM probability:** 0.15
- **Max sequence length:** 512
- **Total batch size:** 1024
- **Training examples:** 10,335,227
- **Mask token**: [MASK]
## Intended uses
- Masked language modeling (fill-mask)
- Encoder for classification, NER, QA, and general Galician NLP tasks
- Further domain adaptation via fine-tuning
## How to use
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
model_id = "proxectonos/mdeberta-gl"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
fill_mask("O Parlamento de Galicia aprobou a [MASK] hoxe.")
```
## Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA
## Citation
Please reference this model as: **mdeberta-gl (Proxecto Nós Team, 2025)**.
|