Danish NER XLM-RoBERTa-base

Named Entity Recognition model for Danish text, fine-tuned from XLM-RoBERTa-base.

Model Description

This model detects three types of named entities in Danish text:

PERSON (PER): Person names
ORGANIZATION (ORG): Company and organization names
LOCATION (LOC): Places, addresses, countries

Performance

Evaluated on the DaNE test set (565 sentences):

Metric	Score
F1	84.6%
Precision	84.8%
Recall	84.4%

Per-Entity Performance

Entity	F1	Precision	Recall
PERSON	94.0%	93.5%	94.5%
ORGANIZATION	75.2%	76.1%	74.3%
LOCATION	82.2%	82.8%	81.6%

Comparison with Other Models

Model	F1	Notes
This model	84.6%	Best accuracy/speed balance
XLM-R-large (ours)	86.5%	Higher accuracy, 2x slower
ScandiNER	78.7%	Lower precision (71%)
DaCy-large	~83%	spaCy-based

Usage

from transformers import pipeline

# Load model
ner = pipeline("ner", model="thomasbeste/danish-ner-xlmr-base", aggregation_strategy="simple")

# Detect entities
text = "Anders Jensen arbejder hos Novo Nordisk i København."
entities = ner(text)

for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")

Output:

PERSON: Anders Jensen (99.8%)
ORGANIZATION: Novo Nordisk (98.9%)
LOCATION: København (99.9%)

Training

Base model: xlm-roberta-base (277M parameters)
Training data: DaNE + WikiANN + synthetic examples
Epochs: 3
Hardware: NVIDIA RTX 3090

Intended Use

This model is designed for:

PII (Personal Identifiable Information) detection in Danish documents
Named entity extraction for Danish NLP pipelines
Legal document analysis
Customer data anonymization

Limitations

Optimized for Danish; may work on Swedish/Norwegian but not evaluated
Organizations with ambiguous names may be harder to detect
Very short texts may have lower accuracy

License

MIT License - free for commercial and non-commercial use.

Citation

@misc{danish-ner-xlmr-base,
  author = {Beste},
  title = {Danish NER XLM-RoBERTa-base},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/thomasbeste/danish-ner-xlmr-base}
}

Downloads last month: 299

Safetensors

Model size

0.3B params

Tensor type

F32

thomasbeste
/

danish-ner-xlmr-base