MrBERT-nos-gl-NER: Named Entity Recognition for Galician
Fine-tuned version of MrBERT-nos-gl for named entity recognition (NER) in Galician, covering four entity types. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.
Model Details
| Property | Value |
|---|---|
| Base model | proxectonos/MrBERT-nos-gl |
| Task | Token classification (NER) |
| Language | Galician (gl) |
| License | Apache 2.0 |
| Tagging scheme | BIO (enamex standard) |
Entity Types
The model recognises four entity categories following the enamex standard notation, in BIO format:
| Label | Description | Example |
|---|---|---|
PER |
Person names | María Soliña |
ORG |
Organizations | Xunta de Galicia |
LOC |
Locations | Cangas do Morrazo |
MISC |
Other named entities | Copa do Mundo |
Each label appears with a B- (beginning) or I- (inside) prefix in the raw output; the aggregation_strategy="simple" pipeline setting merges these into the entity group labels above.
Training Data
Fine-tuned on proxectonos/Galician_NER, a dataset combining four manually annotated Galician NER corpora (corNER, LREC, PUD, TreeGal), all drawn from journalistic Galician text.
Usage
Installation
pip install transformers torch
Quick start
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-NER")
model = AutoModelForTokenClassification.from_pretrained("proxectonos/MrBERT-nos-gl-NER")
ner = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple",
)
text = "María Soliña viviu en Cangas do Morrazo no século XVII."
results = ner(text)
for entity in results:
print(
f"{entity['word']:<20} [{entity['entity_group']:<10}] {entity['score']*100:.1f}%"
)
Example output
María Soliña [PER ] 95.1%
Cangas do Morrazo [LOC ] 88.3%
Interactive CLI (optional)
For interactive exploration from the command line:
while True:
text = input("Enter text for NER: ").strip()
if text.lower() in ["quit", "exit", "q"]:
break
for e in ner(text):
bar = "█" * int(e['score'] * 20)
print(f" • {e['word']:<20} [{e['entity_group']:<10}] {e['score']*100:5.1f}% {bar}")
Acknowledgements
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)
Citation
@misc{proxectenos2026MrBERT-nos-gl-ner,
author = {{Proxecto Nós}},
title = {{MrBERT-nos-gl-NER}: Named Entity Recognition for Galician},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nosgl-NER}},
}
- Downloads last month
- 49