NewsBERT-germ-210m

German news article classifier based on EuroBERT-210m, fine-tuned on hand-annotated news articles about the 2025 German federal election (Bundestagswahl).

Model Description

  • Base Model: EuroBERT/EuroBERT-210m (210M parameters)
  • Task: Multi-class text classification (13 categories)
  • Language: German
  • Training Data: Hand-annotated German news articles (1580 train, 395 validation, 385 test)
  • Dataset: Zorryy/news_articles_2025_elections_germany

Performance (Test Set)

Metric Score
F1 Macro 0.8208
F1 Weighted 0.8211
Precision Macro 0.8395
Recall Macro 0.8169
Accuracy 0.8182

Per-Class Performance

Category Precision Recall F1 Support
Klima / Energie 0.7879 0.8667 0.8254 30
Zuwanderung 0.8387 0.8667 0.8525 30
Renten 0.8571 0.8000 0.8276 30
Soziales Gefälle 0.6552 0.6333 0.6441 30
AfD/Rechte 0.7931 0.7667 0.7797 30
Arbeitslosigkeit 1.0000 0.7000 0.8235 30
Wirtschaftslage 0.5600 0.9333 0.7000 30
Politikverdruss 0.9000 0.7200 0.8000 25
Gesundheitswesen, Pflege 0.9032 0.9333 0.9180 30
Kosten/Löhne/Preise 0.9200 0.7667 0.8364 30
Ukraine/Krieg/Russland 0.9355 0.9667 0.9508 30
Bundeswehr/Verteidigung 0.9630 0.8667 0.9123 30
Andere 0.8000 0.8000 0.8000 30

Hyperparameters

Optimized via Optuna TPE sampler with 3-fold stratified cross-validation (Best Trial: 1, CV F1 Macro: 0.8369).

Parameter Value
Learning Rate 3.13e-05
LR Scheduler linear
Epochs 15
Batch Size (per device) 8
Effective Batch Size 16
Warmup Ratio 0.1455
Weight Decay 0.0021
Label Smoothing 0.0832
Max Sequence Length 2048
Gradient Clipping 0.5
Optimizer adamw_torch_fused
Early Stopping patience=3
Mixed Precision BF16=True, FP16=False

Training Data

The model was trained on hand-annotated German news articles related to the 2025 German federal election. Articles were manually labeled into 13 topic categories by human annotators. The dataset is available at Zorryy/news_articles_2025_elections_germany.

Important: All labels are hand-annotated (not machine-generated), ensuring high label quality.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Zorryy/NewsBERT-germ-210m",
    trust_remote_code=True,
)

text = "Die Bundesregierung plant neue Massnahmen zur Reduzierung der CO2-Emissionen."
result = classifier(text)
print(result)
# [{"label": "Klima / Energie", "score": 0.95}]

Categories

The model classifies articles into 13 categories:

  1. Klima / Energie
  2. Zuwanderung
  3. Renten
  4. Soziales Gefälle
  5. AfD/Rechte
  6. Arbeitslosigkeit
  7. Wirtschaftslage
  8. Politikverdruss
  9. Gesundheitswesen, Pflege
  10. Kosten/Löhne/Preise
  11. Ukraine/Krieg/Russland
  12. Bundeswehr/Verteidigung
  13. Andere

Limitations

  • Trained specifically on German news articles about the 2025 Bundestagswahl
  • May not generalize well to other domains, time periods, or languages
  • Performance varies by category (see per-class metrics above)

Citation

If you use this model, please cite the underlying dataset and base model.

Downloads last month
137
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zorryy/NewsBERT-germ-210m

Finetuned
(56)
this model