Danish XLM-R NER Large (Two-Stage)

A Danish Named Entity Recognition model based on xlm-roberta-large (560M parameters), fine-tuned in two stages for high-recall PII detection in Danish text.

Model Description

This model detects three entity types relevant for GDPR-compliant PII processing:

PER - Person names
ORG - Organizations (companies, institutions, government bodies)
LOC - Locations (addresses, cities, countries)

MISC is intentionally excluded to reduce noise and focus on actionable PII entities.

Training

Two-stage fine-tuning approach:

Stage 1 (broad NER): DANSK + DaNE + NorNE, 10 epochs, LR 2e-5
Stage 2 (domain adaptation): DANSK-only, 1 epoch, LR 5e-6

This approach achieves the best balance between multi-domain generalization and Danish-specific performance.

Datasets

Dataset	Role	Size	Domains
DANSK	Primary	11.7K train	Web, News, Wiki, Legal, Dannet, Conversation, Social Media
DaNE	Supplementary	4.4K train	News
NorNE	Stage 1 only	~20K train	News (Norwegian Bokmaal + Nynorsk)

Evaluation Results

DANSK (primary benchmark, multi-domain)

Split	PER F1	ORG F1	LOC F1	Micro F1	Precision	Recall
Dev	88.0	85.3	90.3	87.6	86.6	88.7
Test	84.8	84.6	90.3	86.5	85.4	87.5

DaNE (secondary benchmark, news domain)

Split	PER F1	ORG F1	LOC F1	Micro F1	Precision	Recall
Dev	97.5	85.1	92.9	93.0	93.2	92.9
Test	94.2	79.7	87.8	87.7	88.1	87.3

GPI Legal Documents (independent evaluation, Danish legal domain)

Evaluated on 30 human-corrected documents (contracts, invoices, case briefs, client letters):

Entity	Precision	Recall	Notes
PER	0.76	1.00	Perfect recall; FPs are email addresses misclassified as PER
ORG	0.94	0.96	Near-perfect
LOC	0.52	0.51	Boundary errors (detects street, misses house number) — detection rate is near-perfect

LOC score reflects strict span matching. The model consistently detects location entities but predicts shorter spans (e.g., "Gothersgade" instead of "Gothersgade 81"). A post-processing step to extend LOC spans to include adjacent numbers resolves this.

Usage

from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

model_name = "thomasbeste/danish-xlmr-ner-large"
nlp = pipeline("ner", model=model_name, aggregation_strategy="simple")

text = "Anders Jensen fra Danske Bank bor på Vestergade 42 i København."
entities = nlp(text)

for ent in entities:
    print(f"  {ent['entity_group']}: {ent['word']} (score: {ent['score']:.3f})")

ONNX Deployment

For production use, export to ONNX INT8 for ~3x CPU speedup:

pip install optimum[onnxruntime]

# Export to ONNX
optimum-cli export onnx --model thomasbeste/danish-xlmr-ner-large ./model-onnx --task token-classification

# Quantize to INT8
python -c "
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
q = ORTQuantizer.from_pretrained('./model-onnx')
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
q.quantize(save_dir='./model-onnx-int8', quantization_config=qconfig)
"

Label Scheme

IOB2 format with 7 labels:

ID	Label
0	O
1	B-PER
2	I-PER
3	B-ORG
4	I-ORG
5	B-LOC
6	I-LOC

Intended Use

Designed for GDPR-compliant PII detection in Danish enterprise document processing pipelines. Optimized for recall over precision — a missed entity (false negative) is a compliance risk, while over-detection (false positive) is safe.

Limitations

Optimized for Danish text. May work on other Scandinavian languages (Norwegian, Swedish) but not evaluated.
LOC boundary detection tends to predict shorter spans than the full address. Post-processing recommended.
Email addresses are sometimes misclassified as PER. Downstream validation (reject names containing @) is recommended.

Downloads last month: 10

Safetensors

Model size

0.6B params

Tensor type

F32

Datasets used to train thomasbeste/danish-xlmr-ner-large

Evaluation results

F1 (micro) on DANSK (dev)
self-reported

87.600
Precision on DANSK (dev)
self-reported

86.600
Recall on DANSK (dev)
self-reported

88.700
F1 (micro) on DANSK (test)
self-reported

86.500
Precision on DANSK (test)
self-reported

85.400
Recall on DANSK (test)
self-reported

87.500
F1 (micro) on DaNE (dev)
self-reported

93.000
Precision on DaNE (dev)
self-reported

93.200