Danish XLM-R NER Large (Two-Stage)
A Danish Named Entity Recognition model based on xlm-roberta-large (560M parameters), fine-tuned in two stages for high-recall PII detection in Danish text.
Model Description
This model detects three entity types relevant for GDPR-compliant PII processing:
- PER - Person names
- ORG - Organizations (companies, institutions, government bodies)
- LOC - Locations (addresses, cities, countries)
MISC is intentionally excluded to reduce noise and focus on actionable PII entities.
Training
Two-stage fine-tuning approach:
- Stage 1 (broad NER): DANSK + DaNE + NorNE, 10 epochs, LR 2e-5
- Stage 2 (domain adaptation): DANSK-only, 1 epoch, LR 5e-6
This approach achieves the best balance between multi-domain generalization and Danish-specific performance.
Datasets
| Dataset | Role | Size | Domains |
|---|---|---|---|
| DANSK | Primary | 11.7K train | Web, News, Wiki, Legal, Dannet, Conversation, Social Media |
| DaNE | Supplementary | 4.4K train | News |
| NorNE | Stage 1 only | ~20K train | News (Norwegian Bokmaal + Nynorsk) |
Evaluation Results
DANSK (primary benchmark, multi-domain)
| Split | PER F1 | ORG F1 | LOC F1 | Micro F1 | Precision | Recall |
|---|---|---|---|---|---|---|
| Dev | 88.0 | 85.3 | 90.3 | 87.6 | 86.6 | 88.7 |
| Test | 84.8 | 84.6 | 90.3 | 86.5 | 85.4 | 87.5 |
DaNE (secondary benchmark, news domain)
| Split | PER F1 | ORG F1 | LOC F1 | Micro F1 | Precision | Recall |
|---|---|---|---|---|---|---|
| Dev | 97.5 | 85.1 | 92.9 | 93.0 | 93.2 | 92.9 |
| Test | 94.2 | 79.7 | 87.8 | 87.7 | 88.1 | 87.3 |
GPI Legal Documents (independent evaluation, Danish legal domain)
Evaluated on 30 human-corrected documents (contracts, invoices, case briefs, client letters):
| Entity | Precision | Recall | Notes |
|---|---|---|---|
| PER | 0.76 | 1.00 | Perfect recall; FPs are email addresses misclassified as PER |
| ORG | 0.94 | 0.96 | Near-perfect |
| LOC | 0.52 | 0.51 | Boundary errors (detects street, misses house number) — detection rate is near-perfect |
LOC score reflects strict span matching. The model consistently detects location entities but predicts shorter spans (e.g., "Gothersgade" instead of "Gothersgade 81"). A post-processing step to extend LOC spans to include adjacent numbers resolves this.
Usage
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
model_name = "thomasbeste/danish-xlmr-ner-large"
nlp = pipeline("ner", model=model_name, aggregation_strategy="simple")
text = "Anders Jensen fra Danske Bank bor på Vestergade 42 i København."
entities = nlp(text)
for ent in entities:
print(f" {ent['entity_group']}: {ent['word']} (score: {ent['score']:.3f})")
ONNX Deployment
For production use, export to ONNX INT8 for ~3x CPU speedup:
pip install optimum[onnxruntime]
# Export to ONNX
optimum-cli export onnx --model thomasbeste/danish-xlmr-ner-large ./model-onnx --task token-classification
# Quantize to INT8
python -c "
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
q = ORTQuantizer.from_pretrained('./model-onnx')
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
q.quantize(save_dir='./model-onnx-int8', quantization_config=qconfig)
"
Label Scheme
IOB2 format with 7 labels:
| ID | Label |
|---|---|
| 0 | O |
| 1 | B-PER |
| 2 | I-PER |
| 3 | B-ORG |
| 4 | I-ORG |
| 5 | B-LOC |
| 6 | I-LOC |
Intended Use
Designed for GDPR-compliant PII detection in Danish enterprise document processing pipelines. Optimized for recall over precision — a missed entity (false negative) is a compliance risk, while over-detection (false positive) is safe.
Limitations
- Optimized for Danish text. May work on other Scandinavian languages (Norwegian, Swedish) but not evaluated.
- LOC boundary detection tends to predict shorter spans than the full address. Post-processing recommended.
- Email addresses are sometimes misclassified as PER. Downstream validation (reject names containing
@) is recommended.
- Downloads last month
- -
Datasets used to train thomasbeste/danish-xlmr-ner-large
Evaluation results
- F1 (micro) on DANSK (dev)self-reported87.600
- Precision on DANSK (dev)self-reported86.600
- Recall on DANSK (dev)self-reported88.700
- F1 (micro) on DANSK (test)self-reported86.500
- Precision on DANSK (test)self-reported85.400
- Recall on DANSK (test)self-reported87.500
- F1 (micro) on DaNE (dev)self-reported93.000
- Precision on DaNE (dev)self-reported93.200
- Recall on DaNE (dev)self-reported92.900
- F1 (micro) on DaNE (test)self-reported87.700
- Precision on DaNE (test)self-reported88.100
- Recall on DaNE (test)self-reported87.300