NewsBERT-germ-210m
German news article classifier based on EuroBERT-210m, fine-tuned on hand-annotated news articles about the 2025 German federal election (Bundestagswahl).
Model Description
- Base Model: EuroBERT/EuroBERT-210m (210M parameters)
- Task: Multi-class text classification (13 categories)
- Language: German
- Training Data: Hand-annotated German news articles (1580 train, 395 validation, 385 test)
- Dataset: Zorryy/news_articles_2025_elections_germany
Performance (Test Set)
| Metric | Score |
|---|---|
| F1 Macro | 0.8208 |
| F1 Weighted | 0.8211 |
| Precision Macro | 0.8395 |
| Recall Macro | 0.8169 |
| Accuracy | 0.8182 |
Per-Class Performance
| Category | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Klima / Energie | 0.7879 | 0.8667 | 0.8254 | 30 |
| Zuwanderung | 0.8387 | 0.8667 | 0.8525 | 30 |
| Renten | 0.8571 | 0.8000 | 0.8276 | 30 |
| Soziales Gefälle | 0.6552 | 0.6333 | 0.6441 | 30 |
| AfD/Rechte | 0.7931 | 0.7667 | 0.7797 | 30 |
| Arbeitslosigkeit | 1.0000 | 0.7000 | 0.8235 | 30 |
| Wirtschaftslage | 0.5600 | 0.9333 | 0.7000 | 30 |
| Politikverdruss | 0.9000 | 0.7200 | 0.8000 | 25 |
| Gesundheitswesen, Pflege | 0.9032 | 0.9333 | 0.9180 | 30 |
| Kosten/Löhne/Preise | 0.9200 | 0.7667 | 0.8364 | 30 |
| Ukraine/Krieg/Russland | 0.9355 | 0.9667 | 0.9508 | 30 |
| Bundeswehr/Verteidigung | 0.9630 | 0.8667 | 0.9123 | 30 |
| Andere | 0.8000 | 0.8000 | 0.8000 | 30 |
Hyperparameters
Optimized via Optuna TPE sampler with 3-fold stratified cross-validation (Best Trial: 1, CV F1 Macro: 0.8369).
| Parameter | Value |
|---|---|
| Learning Rate | 3.13e-05 |
| LR Scheduler | linear |
| Epochs | 15 |
| Batch Size (per device) | 8 |
| Effective Batch Size | 16 |
| Warmup Ratio | 0.1455 |
| Weight Decay | 0.0021 |
| Label Smoothing | 0.0832 |
| Max Sequence Length | 2048 |
| Gradient Clipping | 0.5 |
| Optimizer | adamw_torch_fused |
| Early Stopping | patience=3 |
| Mixed Precision | BF16=True, FP16=False |
Training Data
The model was trained on hand-annotated German news articles related to the 2025 German federal election. Articles were manually labeled into 13 topic categories by human annotators. The dataset is available at Zorryy/news_articles_2025_elections_germany.
Important: All labels are hand-annotated (not machine-generated), ensuring high label quality.
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Zorryy/NewsBERT-germ-210m",
trust_remote_code=True,
)
text = "Die Bundesregierung plant neue Massnahmen zur Reduzierung der CO2-Emissionen."
result = classifier(text)
print(result)
# [{"label": "Klima / Energie", "score": 0.95}]
Categories
The model classifies articles into 13 categories:
- Klima / Energie
- Zuwanderung
- Renten
- Soziales Gefälle
- AfD/Rechte
- Arbeitslosigkeit
- Wirtschaftslage
- Politikverdruss
- Gesundheitswesen, Pflege
- Kosten/Löhne/Preise
- Ukraine/Krieg/Russland
- Bundeswehr/Verteidigung
- Andere
Limitations
- Trained specifically on German news articles about the 2025 Bundestagswahl
- May not generalize well to other domains, time periods, or languages
- Performance varies by category (see per-class metrics above)
Citation
If you use this model, please cite the underlying dataset and base model.
- Downloads last month
- 137
Model tree for Zorryy/NewsBERT-germ-210m
Base model
EuroBERT/EuroBERT-210m