Durrani95
/

eurobert-geopolitical-binary

+---
+library_name: transformers
+pipeline_tag: text-classification
+base_model: EuroBERT/EuroBERT-210m
+base_model_relation: finetune
+tags:
+  - eurobert
+  - fine-tuned
+  - transformers
+  - pytorch
+  - sequence-classification
+  - binary-classification
+  - geopolitics
+  - multilingual
+language:
+  - en
+  - de
+  - fr
+  - es
+  - it
+---
+# EuroBERT Geopolitical Classifier (Binary)
+Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text.
+- **Task:** Sequence classification (binary)
+- **Labels:** `non_geopolitical` (0), `geopolitical` (1)
+- **Intended use:** Detects whether an article reflects geopolitical tension  (best performance on full article-level text)
+- **Languages:** English, German, French, Spanish, Italian
+- **Framework:** 🤗 Transformers (PyTorch)
+---
+## Quick start
+### Inference with `transformers`
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_id = "Durrani95/eurobert-geopolitical-binary"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+texts = [
+    "Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
+    "Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",
+]
+inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+    probs = torch.softmax(logits, dim=1)
+for text, p in zip(texts, probs):
+    label_id = int(p.argmax())
+    label = model.config.id2label[label_id]
+    confidence = float(p[label_id])
+    print(f"{label:>16}  {confidence:6.2%}  | {text}")
+```
+---
+## Labels
+```json
+{
+  "0": "non_geopolitical",
+  "1": "geopolitical"
+}
+```
+You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off.
+---
+## Training & Evaluation
+- **Base model:** `EuroBERT/EuroBERT-210m`
+- **Objective:** Cross-entropy (binary)
+- **Data:** European news text labeled for geopolitical relevance
+- **Hardware:** A100 GPU
+- **Epochs:** 1
+- **Optimizer:** AdamW with linear scheduler
+- **Metrics (validation set):**
+| Metric | Score |
+|:-------|------:|
+| Accuracy | 0.95 |
+| F1-score | 0.95 |
+| Precision | 0.93 |
+| Recall | 0.97 |
+### Training setup
+| Parameter | Value |
+|------------|--------|
+| Learning rate | 3e-5 |
+| Desired (effective) batch size | 64 |
+| Actual GPU batch size | 16 |
+| Gradient accumulation | 4 steps |
+| Weight decay | 1e-5 |
+| Betas | (0.9, 0.95) |
+| Epsilon | 1e-8 |
+| Max epochs | 1 |
+|
+---
+## Limitations & Risks
+- May be sensitive to domain shift (non-news, social media text)
+- Class imbalance can affect thresholding; calibrate on your validation data
+- Multilingual performance can vary across languages and registers
+---
+## How to cite
+If you use this model, please cite this repository and the EuroBERT base model.