MrBERT-nos-gl-NER: Named Entity Recognition for Galician

Fine-tuned version of MrBERT-nos-gl for named entity recognition (NER) in Galician, covering four entity types. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.

Model Details

Property Value
Base model proxectonos/MrBERT-nos-gl
Task Token classification (NER)
Language Galician (gl)
License Apache 2.0
Tagging scheme BIO (enamex standard)

Entity Types

The model recognises four entity categories following the enamex standard notation, in BIO format:

Label Description Example
PER Person names María Soliña
ORG Organizations Xunta de Galicia
LOC Locations Cangas do Morrazo
MISC Other named entities Copa do Mundo

Each label appears with a B- (beginning) or I- (inside) prefix in the raw output; the aggregation_strategy="simple" pipeline setting merges these into the entity group labels above.

Training Data

Fine-tuned on proxectonos/Galician_NER, a dataset combining four manually annotated Galician NER corpora (corNER, LREC, PUD, TreeGal), all drawn from journalistic Galician text.

Usage

Installation

pip install transformers torch

Quick start

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-NER")
model = AutoModelForTokenClassification.from_pretrained("proxectonos/MrBERT-nos-gl-NER")
 
ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple",
)
 
text = "María Soliña viviu en Cangas do Morrazo no século XVII."
results = ner(text)
for entity in results:
    print(
        f"{entity['word']:<20} [{entity['entity_group']:<10}] {entity['score']*100:.1f}%"
    )

Example output

María Soliña         [PER       ]  95.1%
Cangas do Morrazo    [LOC       ]  88.3%

Interactive CLI (optional)

For interactive exploration from the command line:

while True:
    text = input("Enter text for NER: ").strip()
    if text.lower() in ["quit", "exit", "q"]:
        break
    for e in ner(text):
        bar = "█" * int(e['score'] * 20)
        print(f"  • {e['word']:<20} [{e['entity_group']:<10}] {e['score']*100:5.1f}%  {bar}")

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)

Citation

@misc{proxectenos2026MrBERT-nos-gl-ner,
  author       = {{Proxecto Nós}},
  title        = {{MrBERT-nos-gl-NER}: Named Entity Recognition for Galician},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nosgl-NER}},
}
Downloads last month
49
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for proxectonos/MrBERT-nos-gl-NER

Base model

BSC-LT/MrBERT
Finetuned
(4)
this model

Dataset used to train proxectonos/MrBERT-nos-gl-NER

Collection including proxectonos/MrBERT-nos-gl-NER