Danish NER XLM-RoBERTa-base
Named Entity Recognition model for Danish text, fine-tuned from XLM-RoBERTa-base.
Model Description
This model detects three types of named entities in Danish text:
- PERSON (PER): Person names
- ORGANIZATION (ORG): Company and organization names
- LOCATION (LOC): Places, addresses, countries
Performance
Evaluated on the DaNE test set (565 sentences):
| Metric | Score |
|---|---|
| F1 | 84.6% |
| Precision | 84.8% |
| Recall | 84.4% |
Per-Entity Performance
| Entity | F1 | Precision | Recall |
|---|---|---|---|
| PERSON | 94.0% | 93.5% | 94.5% |
| ORGANIZATION | 75.2% | 76.1% | 74.3% |
| LOCATION | 82.2% | 82.8% | 81.6% |
Comparison with Other Models
| Model | F1 | Notes |
|---|---|---|
| This model | 84.6% | Best accuracy/speed balance |
| XLM-R-large (ours) | 86.5% | Higher accuracy, 2x slower |
| ScandiNER | 78.7% | Lower precision (71%) |
| DaCy-large | ~83% | spaCy-based |
Usage
from transformers import pipeline
# Load model
ner = pipeline("ner", model="thomasbeste/danish-ner-xlmr-base", aggregation_strategy="simple")
# Detect entities
text = "Anders Jensen arbejder hos Novo Nordisk i København."
entities = ner(text)
for entity in entities:
print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")
Output:
PERSON: Anders Jensen (99.8%)
ORGANIZATION: Novo Nordisk (98.9%)
LOCATION: København (99.9%)
Training
- Base model: xlm-roberta-base (277M parameters)
- Training data: DaNE + WikiANN + synthetic examples
- Epochs: 3
- Hardware: NVIDIA RTX 3090
Intended Use
This model is designed for:
- PII (Personal Identifiable Information) detection in Danish documents
- Named entity extraction for Danish NLP pipelines
- Legal document analysis
- Customer data anonymization
Limitations
- Optimized for Danish; may work on Swedish/Norwegian but not evaluated
- Organizations with ambiguous names may be harder to detect
- Very short texts may have lower accuracy
License
MIT License - free for commercial and non-commercial use.
Citation
@misc{danish-ner-xlmr-base,
author = {Beste},
title = {Danish NER XLM-RoBERTa-base},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/thomasbeste/danish-ner-xlmr-base}
}
- Downloads last month
- 299