File size: 2,965 Bytes
ea0cb4c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
library_name: transformers
pipeline_tag: text-classification
base_model: EuroBERT/EuroBERT-210m
base_model_relation: finetune
tags:
- eurobert
- fine-tuned
- transformers
- pytorch
- sequence-classification
- binary-classification
- geopolitics
- multilingual
language:
- en
- de
- fr
- es
- it
---
# EuroBERT Geopolitical Classifier (Binary)
Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text.
- **Task:** Sequence classification (binary)
- **Labels:** `non_geopolitical` (0), `geopolitical` (1)
- **Intended use:** Detects whether an article reflects geopolitical tension (best performance on full article-level text)
- **Languages:** English, German, French, Spanish, Italian
- **Framework:** 🤗 Transformers (PyTorch)
---
## Quick start
### Inference with `transformers`
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "Durrani95/eurobert-geopolitical-binary"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
texts = [
"Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
"Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",
]
inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=1)
for text, p in zip(texts, probs):
label_id = int(p.argmax())
label = model.config.id2label[label_id]
confidence = float(p[label_id])
print(f"{label:>16} {confidence:6.2%} | {text}")
```
---
## Labels
```json
{
"0": "non_geopolitical",
"1": "geopolitical"
}
```
You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off.
---
## Training & Evaluation
- **Base model:** `EuroBERT/EuroBERT-210m`
- **Objective:** Cross-entropy (binary)
- **Data:** European news text labeled for geopolitical relevance
- **Hardware:** A100 GPU
- **Epochs:** 1
- **Optimizer:** AdamW with linear scheduler
- **Metrics (validation set):**
| Metric | Score |
|:-------|------:|
| Accuracy | 0.95 |
| F1-score | 0.95 |
| Precision | 0.93 |
| Recall | 0.97 |
### Training setup
| Parameter | Value |
|------------|--------|
| Learning rate | 3e-5 |
| Desired (effective) batch size | 64 |
| Actual GPU batch size | 16 |
| Gradient accumulation | 4 steps |
| Weight decay | 1e-5 |
| Betas | (0.9, 0.95) |
| Epsilon | 1e-8 |
| Max epochs | 1 |
|
---
## Limitations & Risks
- May be sensitive to domain shift (non-news, social media text)
- Class imbalance can affect thresholding; calibrate on your validation data
- Multilingual performance can vary across languages and registers
---
## How to cite
If you use this model, please cite this repository and the EuroBERT base model.
|