File size: 2,965 Bytes
ea0cb4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
library_name: transformers
pipeline_tag: text-classification
base_model: EuroBERT/EuroBERT-210m
base_model_relation: finetune
tags:
  - eurobert
  - fine-tuned
  - transformers
  - pytorch
  - sequence-classification
  - binary-classification
  - geopolitics
  - multilingual
language:
  - en
  - de
  - fr
  - es
  - it
---


# EuroBERT Geopolitical Classifier (Binary)

Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text.

- **Task:** Sequence classification (binary)
- **Labels:** `non_geopolitical` (0), `geopolitical` (1)
- **Intended use:** Detects whether an article reflects geopolitical tension  (best performance on full article-level text)
- **Languages:** English, German, French, Spanish, Italian
- **Framework:** 🤗 Transformers (PyTorch)

---

## Quick start

### Inference with `transformers`

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Durrani95/eurobert-geopolitical-binary"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

texts = [
    "Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
    "Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",

]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=1)

for text, p in zip(texts, probs):
    label_id = int(p.argmax())
    label = model.config.id2label[label_id]
    confidence = float(p[label_id])
    print(f"{label:>16}  {confidence:6.2%}  | {text}")
```


---

## Labels

```json
{
  "0": "non_geopolitical",
  "1": "geopolitical"
}
```

You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off.

---

## Training & Evaluation

- **Base model:** `EuroBERT/EuroBERT-210m`
- **Objective:** Cross-entropy (binary)
- **Data:** European news text labeled for geopolitical relevance
- **Hardware:** A100 GPU
- **Epochs:** 1
- **Optimizer:** AdamW with linear scheduler
- **Metrics (validation set):**

| Metric | Score |
|:-------|------:|
| Accuracy | 0.95 |
| F1-score | 0.95 |
| Precision | 0.93 |
| Recall | 0.97 |

### Training setup

| Parameter | Value |
|------------|--------|
| Learning rate | 3e-5 |
| Desired (effective) batch size | 64 |
| Actual GPU batch size | 16 |
| Gradient accumulation | 4 steps |
| Weight decay | 1e-5 |
| Betas | (0.9, 0.95) |
| Epsilon | 1e-8 |
| Max epochs | 1 |
|

---

## Limitations & Risks

- May be sensitive to domain shift (non-news, social media text)
- Class imbalance can affect thresholding; calibrate on your validation data
- Multilingual performance can vary across languages and registers

---

## How to cite

If you use this model, please cite this repository and the EuroBERT base model.