Danish NER XLM-RoBERTa-base

Named Entity Recognition model for Danish text, fine-tuned from XLM-RoBERTa-base.

Model Description

This model detects three types of named entities in Danish text:

  • PERSON (PER): Person names
  • ORGANIZATION (ORG): Company and organization names
  • LOCATION (LOC): Places, addresses, countries

Performance

Evaluated on the DaNE test set (565 sentences):

Metric Score
F1 84.6%
Precision 84.8%
Recall 84.4%

Per-Entity Performance

Entity F1 Precision Recall
PERSON 94.0% 93.5% 94.5%
ORGANIZATION 75.2% 76.1% 74.3%
LOCATION 82.2% 82.8% 81.6%

Comparison with Other Models

Model F1 Notes
This model 84.6% Best accuracy/speed balance
XLM-R-large (ours) 86.5% Higher accuracy, 2x slower
ScandiNER 78.7% Lower precision (71%)
DaCy-large ~83% spaCy-based

Usage

from transformers import pipeline

# Load model
ner = pipeline("ner", model="thomasbeste/danish-ner-xlmr-base", aggregation_strategy="simple")

# Detect entities
text = "Anders Jensen arbejder hos Novo Nordisk i København."
entities = ner(text)

for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")

Output:

PERSON: Anders Jensen (99.8%)
ORGANIZATION: Novo Nordisk (98.9%)
LOCATION: København (99.9%)

Training

  • Base model: xlm-roberta-base (277M parameters)
  • Training data: DaNE + WikiANN + synthetic examples
  • Epochs: 3
  • Hardware: NVIDIA RTX 3090

Intended Use

This model is designed for:

  • PII (Personal Identifiable Information) detection in Danish documents
  • Named entity extraction for Danish NLP pipelines
  • Legal document analysis
  • Customer data anonymization

Limitations

  • Optimized for Danish; may work on Swedish/Norwegian but not evaluated
  • Organizations with ambiguous names may be harder to detect
  • Very short texts may have lower accuracy

License

MIT License - free for commercial and non-commercial use.

Citation

@misc{danish-ner-xlmr-base,
  author = {Beste},
  title = {Danish NER XLM-RoBERTa-base},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/thomasbeste/danish-ner-xlmr-base}
}
Downloads last month
299
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train thomasbeste/danish-ner-xlmr-base