NewsBERT-germ-210m

German news article classifier based on EuroBERT-210m, fine-tuned on hand-annotated news articles about the 2025 German federal election (Bundestagswahl).

Model Description

Base Model: EuroBERT/EuroBERT-210m (210M parameters)
Task: Multi-class text classification (13 categories)
Language: German
Training Data: Hand-annotated German news articles (1580 train, 395 validation, 385 test)
Dataset: Zorryy/news_articles_2025_elections_germany

Performance (Test Set)

Metric	Score
F1 Macro	0.8208
F1 Weighted	0.8211
Precision Macro	0.8395
Recall Macro	0.8169
Accuracy	0.8182

Per-Class Performance

Category	Precision	Recall	F1	Support
Klima / Energie	0.7879	0.8667	0.8254	30
Zuwanderung	0.8387	0.8667	0.8525	30
Renten	0.8571	0.8000	0.8276	30
Soziales Gefälle	0.6552	0.6333	0.6441	30
AfD/Rechte	0.7931	0.7667	0.7797	30
Arbeitslosigkeit	1.0000	0.7000	0.8235	30
Wirtschaftslage	0.5600	0.9333	0.7000	30
Politikverdruss	0.9000	0.7200	0.8000	25
Gesundheitswesen, Pflege	0.9032	0.9333	0.9180	30
Kosten/Löhne/Preise	0.9200	0.7667	0.8364	30
Ukraine/Krieg/Russland	0.9355	0.9667	0.9508	30
Bundeswehr/Verteidigung	0.9630	0.8667	0.9123	30
Andere	0.8000	0.8000	0.8000	30

Hyperparameters

Optimized via Optuna TPE sampler with 3-fold stratified cross-validation (Best Trial: 1, CV F1 Macro: 0.8369).

Parameter	Value
Learning Rate	3.13e-05
LR Scheduler	linear
Epochs	15
Batch Size (per device)	8
Effective Batch Size	16
Warmup Ratio	0.1455
Weight Decay	0.0021
Label Smoothing	0.0832
Max Sequence Length	2048
Gradient Clipping	0.5
Optimizer	adamw_torch_fused
Early Stopping	patience=3
Mixed Precision	BF16=True, FP16=False

Training Data

The model was trained on hand-annotated German news articles related to the 2025 German federal election. Articles were manually labeled into 13 topic categories by human annotators. The dataset is available at Zorryy/news_articles_2025_elections_germany.

Important: All labels are hand-annotated (not machine-generated), ensuring high label quality.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Zorryy/NewsBERT-germ-210m",
    trust_remote_code=True,
)

text = "Die Bundesregierung plant neue Massnahmen zur Reduzierung der CO2-Emissionen."
result = classifier(text)
print(result)
# [{"label": "Klima / Energie", "score": 0.95}]

Limitations

Trained specifically on German news articles about the 2025 Bundestagswahl
May not generalize well to other domains, time periods, or languages
Performance varies by category (see per-class metrics above)

Citation

If you use this model, please cite the underlying dataset and base model.

Downloads last month: 7

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Zorryy/NewsBERT-germ-210m

Base model

EuroBERT/EuroBERT-210m

Finetuned

(68)

this model

Zorryy
/

NewsBERT-germ-210m