|
|
--- |
|
|
language: da |
|
|
license: mit |
|
|
tags: |
|
|
- token-classification |
|
|
- ner |
|
|
- named-entity-recognition |
|
|
- danish |
|
|
- xlm-roberta |
|
|
- scandinavian |
|
|
datasets: |
|
|
- alexandrainst/dane |
|
|
- wikiann |
|
|
- tollefj/nordic-ner |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
pipeline_tag: token-classification |
|
|
model-index: |
|
|
- name: danish-ner-xlmr-base |
|
|
results: |
|
|
- task: |
|
|
type: token-classification |
|
|
name: Named Entity Recognition |
|
|
dataset: |
|
|
name: DaNE |
|
|
type: alexandrainst/dane |
|
|
split: validation |
|
|
metrics: |
|
|
- name: F1 |
|
|
type: f1 |
|
|
value: 0.9102 |
|
|
--- |
|
|
|
|
|
# Danish NER XLM-RoBERTa (v8) |
|
|
|
|
|
State-of-the-art Named Entity Recognition model for Danish, fine-tuned from XLM-RoBERTa. |
|
|
|
|
|
**Updated 2026-02-03**: Now v8 with 91.02% F1 (previously 84.6%) |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Benchmark | F1 Score | |
|
|
|-----------|----------| |
|
|
| **DaNE (validation)** | **91.02%** | |
|
|
| Previous version | 84.6% | |
|
|
| nbailab baseline | 87.09% | |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
ner = pipeline("ner", model="thomasbeste/danish-ner-xlmr-base", aggregation_strategy="simple") |
|
|
result = ner("Anders Jensen arbejder hos Novo Nordisk i København.") |
|
|
|
|
|
for entity in result: |
|
|
print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.2f})") |
|
|
``` |
|
|
|
|
|
## Entity Types |
|
|
|
|
|
| Label | Description | Example | |
|
|
|-------|-------------|---------| |
|
|
| `PER` | Person names | Anders Jensen | |
|
|
| `ORG` | Organizations | Novo Nordisk A/S | |
|
|
| `LOC` | Locations | København | |
|
|
| `MISC` | Miscellaneous | Dansk | |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- DaNE (4.4k samples) |
|
|
- WikiANN Danish (20k samples) |
|
|
- NorNE Norwegian (30k samples) |
|
|
- High-quality synthetic data (60k samples) |
|
|
|
|
|
## License |
|
|
|
|
|
MIT |
|
|
|