MrBERT-nos-gl-NER: Named Entity Recognition for Galician

Fine-tuned version of MrBERT-nos-gl for named entity recognition (NER) in Galician, covering four entity types. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.

Model Details

Property	Value
Base model	`proxectonos/MrBERT-nos-gl`
Task	Token classification (NER)
Language	Galician (`gl`)
License	Apache 2.0
Tagging scheme	BIO (enamex standard)

Entity Types

The model recognises four entity categories following the enamex standard notation, in BIO format:

Label	Description	Example
`PER`	Person names	María Soliña
`ORG`	Organizations	Xunta de Galicia
`LOC`	Locations	Cangas do Morrazo
`MISC`	Other named entities	Copa do Mundo

Each label appears with a B- (beginning) or I- (inside) prefix in the raw output; the aggregation_strategy="simple" pipeline setting merges these into the entity group labels above.

Training Data

Fine-tuned on proxectonos/Galician_NER, a dataset combining four manually annotated Galician NER corpora (corNER, LREC, PUD, TreeGal), all drawn from journalistic Galician text.

Usage

Installation

pip install transformers torch

Quick start

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-NER")
model = AutoModelForTokenClassification.from_pretrained("proxectonos/MrBERT-nos-gl-NER")
 
ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple",
)
 
text = "María Soliña viviu en Cangas do Morrazo no século XVII."
results = ner(text)
for entity in results:
    print(
        f"{entity['word']:<20} [{entity['entity_group']:<10}] {entity['score']*100:.1f}%"
    )

Example output

María Soliña         [PER       ]  95.1%
Cangas do Morrazo    [LOC       ]  88.3%

Interactive CLI (optional)

For interactive exploration from the command line:

while True:
    text = input("Enter text for NER: ").strip()
    if text.lower() in ["quit", "exit", "q"]:
        break
    for e in ner(text):
        bar = "█" * int(e['score'] * 20)
        print(f"  • {e['word']:<20} [{e['entity_group']:<10}] {e['score']*100:5.1f}%  {bar}")

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)

Citation

@misc{proxectenos2026MrBERT-nos-gl-ner,
  author       = {{Proxecto Nós}},
  title        = {{MrBERT-nos-gl-NER}: Named Entity Recognition for Galician},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nosgl-NER}},
}