File size: 2,919 Bytes

c3b1b05
0d3b0b2
c3b1b05
0d3b0b2
 
c3b1b05
0d3b0b2
 
 
 
 
 
 
 
c3b1b05
0d3b0b2
 
 
 
 
c3b1b05
 
0d3b0b2
c3b1b05
 
dd69524
c3b1b05
 
 
c9b81c4
fb29e59
c3b1b05
 
 
 
 
 
 
 
 
 
 
 
033188e
c3b1b05
 
 
 
 
0fd1181
033188e
 
c3b1b05
 
e894f21
c3b1b05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb29e59
 
 
 
 
 
 
 
cfddb1c
 
 
 
fb29e59
 
 
 
 
 
 
 
 
 
 
 
 
3782028
c3b1b05
 
 
 
 
fb29e59
c3b1b05
fb29e59
c3b1b05
 
 
 
 
fb29e59

---
library_name: transformers
pipeline_tag: text-classification
base_model: EuroBERT/EuroBERT-210m
base_model_relation: finetune
tags:
  - eurobert
  - fine-tuned
  - transformers
  - pytorch
  - sequence-classification
  - binary-classification
  - geopolitics
  - multilingual
language:
  - en
  - de
  - fr
  - es
  - it
---


# EuroBERT Geopolitical Classifier (Binary)

Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text.

- **Task:** Sequence classification (binary)
- **Labels:** `non_geopolitical` (0), `geopolitical` (1)
- **Intended use:** Detects whether an article reflects geopolitical tension.
- **Languages:** English, German, French, Spanish, Italian
- **Framework:** 🤗 Transformers (PyTorch)

---

## Quick start

### Inference with `transformers`

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Durrani95/eurobert-geopolitical-binary"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

texts = [
    "Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
    "Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",

]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=1)

for text, p in zip(texts, probs):
    label_id = int(p.argmax())
    label = model.config.id2label[label_id]
    confidence = float(p[label_id])
    print(f"{label:>16}  {confidence:6.2%}  | {text}")
```


---

## Labels

```json
{
  "0": "non_geopolitical",
  "1": "geopolitical"
}
```

You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off.

---

## Training & Evaluation

- **Base model:** `EuroBERT/EuroBERT-210m`
- **Objective:** Cross-entropy (binary)
- **Data:** European news text labeled for geopolitical relevance
- **Hardware:** A100 GPU
- **Epochs:** 1
- **Optimizer:** AdamW with linear scheduler
- **Metrics (validation set):**

| Metric | Score |
|:-------|------:|
| Accuracy | 0.95 |
| F1-score | 0.95 |
| Precision | 0.93 |
| Recall | 0.97 |

### Training setup

| Parameter | Value |
|------------|--------|
| Learning rate | 3e-5 |
| Desired (effective) batch size | 64 |
| Actual GPU batch size | 16 |
| Gradient accumulation | 4 steps |
| Weight decay | 1e-5 |
| Betas | (0.9, 0.95) |
| Epsilon | 1e-8 |
| Max epochs | 1 |
|

---

## Limitations & Risks

- May be sensitive to domain shift (non-news, social media text)
- Class imbalance can affect thresholding; calibrate on your validation data
- Multilingual performance can vary across languages and registers

---

## How to cite

If you use this model, please cite this repository and the EuroBERT base model.