MrBERT-nos-gl-POS: Part-of-Speech Tagging for Galician
Fine-tuned version of MrBERT-nos-gl for morphosyntactic part-of-speech (POS) tagging in Galician. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.
Model Details
| Property | Value |
|---|---|
| Base model | proxectonos/MrBERT-nos-gl |
| Task | Token classification (POS tagging) |
| Language | Galician (gl) |
| License | Apache 2.0 |
| Tagset | EAGLES (FreeLing/CTAG-compatible morphosyntactic tags) |
Tagset
The model uses the EAGLES morphosyntactic tagset, standard for Iberian Romance languages. Tags are positional strings where each character encodes a grammatical attribute. The main categories are:
| Code | Category | Example tag | Decoded |
|---|---|---|---|
A |
Adjective | AQ0MS0 |
Qualitative, masculine, singular |
C |
Conjunction | CC / CS |
Coordinating / Subordinating |
D |
Determiner | DA0MS0 |
Article, masculine, singular |
F |
Punctuation | Fp / Fc |
Period / Comma |
I |
Interjection | I |
Interjection |
N |
Noun | NCMS0 |
Common, masculine, singular |
P |
Pronoun | PP3MS |
Personal, 3rd person, masc., sing. |
R |
Adverb | RG / RN |
General / Negative |
S |
Adposition | SPS00 |
Preposition, simple |
V |
Verb | VMIP3S0 |
Main, indicative, present, 3rd sing. |
W |
Date/Time | W |
Temporal expression |
Z |
Numeral | Z |
Number or quantity |
Reading a tag — example from the output:
VIS3S00 = V (verb) + I (main) + S (past/preterite) + 3 (3rd person) + S (singular) + 00 (unspecified gender/unused).
Tags use 0 to mark attributes that are not applicable or unspecified for a given form.
Usage
Installation
pip install transformers torch
Quick start
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-POS")
model = AutoModelForTokenClassification.from_pretrained("proxectonos/MrBERT-nos-gl-POS")
pos_tagger = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple",
)
text = "O gato negro durmiu tranquilamente sobre o sofá vermello."
results = pos_tagger(text)
for result in results:
print(
f"{result['word']:<20} [{result['entity_group']:<10}] {result['score']*100:.1f}%"
)
Example output
Enter text for POS tagging: O gato negro durmiu tranquilamente sobre o sofá vermello.
O [GMS ] 99.4%
gato [NCMS0 ] 99.9%
negro [A0MS ] 88.7%
durmiu [VIS3S00 ] 95.3%
tranquilamente [R0 ] 99.8%
sobre [S ] 100.0%
o [GMS ] 99.7%
sofá [NCMS0 ] 92.1%
vermello [A0MS ] 85.5%
Interactive CLI (optional)
For interactive exploration from the command line:
while True:
text = input("Enter text for POS tagging: ").strip()
if text.lower() in ["quit", "exit", "q"]:
break
results = pos_tagger(text)
for r in results:
bar = "█" * int(r['score'] * 20)
print(f" • {r['word']:<20} [{r['entity_group']:<10}] {r['score']*100:5.1f}% {bar}")
Acknowledgements
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)
Citation
@misc{proxectenos2026MrBERT-nos-gl-pos,
author = {{Proxecto Nós}},
title = {{MrBERT-nos-gl-POS}: Part-of-Speech Tagging for Galician},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nos-gl-POS}},
}
- Downloads last month
- 42