|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
base_model: EuroBERT/EuroBERT-210m |
|
|
base_model_relation: finetune |
|
|
tags: |
|
|
- eurobert |
|
|
- fine-tuned |
|
|
- transformers |
|
|
- pytorch |
|
|
- sequence-classification |
|
|
- binary-classification |
|
|
- geopolitics |
|
|
- multilingual |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
- fr |
|
|
- es |
|
|
- it |
|
|
--- |
|
|
|
|
|
|
|
|
# EuroBERT Geopolitical Classifier (Binary) |
|
|
|
|
|
Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text. |
|
|
|
|
|
- **Task:** Sequence classification (binary) |
|
|
- **Labels:** `non_geopolitical` (0), `geopolitical` (1) |
|
|
- **Intended use:** Detects whether an article reflects geopolitical tension. |
|
|
- **Languages:** English, German, French, Spanish, Italian |
|
|
- **Framework:** 🤗 Transformers (PyTorch) |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick start |
|
|
|
|
|
### Inference with `transformers` |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_id = "Durrani95/eurobert-geopolitical-binary" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
|
|
|
|
texts = [ |
|
|
"Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.", |
|
|
"Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.", |
|
|
|
|
|
] |
|
|
|
|
|
inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = torch.softmax(logits, dim=1) |
|
|
|
|
|
for text, p in zip(texts, probs): |
|
|
label_id = int(p.argmax()) |
|
|
label = model.config.id2label[label_id] |
|
|
confidence = float(p[label_id]) |
|
|
print(f"{label:>16} {confidence:6.2%} | {text}") |
|
|
``` |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Labels |
|
|
|
|
|
```json |
|
|
{ |
|
|
"0": "non_geopolitical", |
|
|
"1": "geopolitical" |
|
|
} |
|
|
``` |
|
|
|
|
|
You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training & Evaluation |
|
|
|
|
|
- **Base model:** `EuroBERT/EuroBERT-210m` |
|
|
- **Objective:** Cross-entropy (binary) |
|
|
- **Data:** European news text labeled for geopolitical relevance |
|
|
- **Hardware:** A100 GPU |
|
|
- **Epochs:** 1 |
|
|
- **Optimizer:** AdamW with linear scheduler |
|
|
- **Metrics (validation set):** |
|
|
|
|
|
| Metric | Score | |
|
|
|:-------|------:| |
|
|
| Accuracy | 0.95 | |
|
|
| F1-score | 0.95 | |
|
|
| Precision | 0.93 | |
|
|
| Recall | 0.97 | |
|
|
|
|
|
### Training setup |
|
|
|
|
|
| Parameter | Value | |
|
|
|------------|--------| |
|
|
| Learning rate | 3e-5 | |
|
|
| Desired (effective) batch size | 64 | |
|
|
| Actual GPU batch size | 16 | |
|
|
| Gradient accumulation | 4 steps | |
|
|
| Weight decay | 1e-5 | |
|
|
| Betas | (0.9, 0.95) | |
|
|
| Epsilon | 1e-8 | |
|
|
| Max epochs | 1 | |
|
|
| |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations & Risks |
|
|
|
|
|
- May be sensitive to domain shift (non-news, social media text) |
|
|
- Class imbalance can affect thresholding; calibrate on your validation data |
|
|
- Multilingual performance can vary across languages and registers |
|
|
|
|
|
--- |
|
|
|
|
|
## How to cite |
|
|
|
|
|
If you use this model, please cite this repository and the EuroBERT base model. |
|
|
|