eurobert-geopolitical-binary / README_eurobert_geopol_binary.md
Durrani95's picture
Add fine-tuned EuroBERT for binary geopolitical classification
ea0cb4c verified
metadata
library_name: transformers
pipeline_tag: text-classification
base_model: EuroBERT/EuroBERT-210m
base_model_relation: finetune
tags:
  - eurobert
  - fine-tuned
  - transformers
  - pytorch
  - sequence-classification
  - binary-classification
  - geopolitics
  - multilingual
language:
  - en
  - de
  - fr
  - es
  - it

EuroBERT Geopolitical Classifier (Binary)

Fine-tuned EuroBERT/EuroBERT-210m for binary classification of geopolitical tension in European news text.

  • Task: Sequence classification (binary)
  • Labels: non_geopolitical (0), geopolitical (1)
  • Intended use: Detects whether an article reflects geopolitical tension (best performance on full article-level text)
  • Languages: English, German, French, Spanish, Italian
  • Framework: 🤗 Transformers (PyTorch)

Quick start

Inference with transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Durrani95/eurobert-geopolitical-binary"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

texts = [
    "Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
    "Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",

]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=1)

for text, p in zip(texts, probs):
    label_id = int(p.argmax())
    label = model.config.id2label[label_id]
    confidence = float(p[label_id])
    print(f"{label:>16}  {confidence:6.2%}  | {text}")

Labels

{
  "0": "non_geopolitical",
  "1": "geopolitical"
}

You may apply a decision threshold (e.g., score >= 0.5) depending on your precision/recall trade-off.


Training & Evaluation

  • Base model: EuroBERT/EuroBERT-210m
  • Objective: Cross-entropy (binary)
  • Data: European news text labeled for geopolitical relevance
  • Hardware: A100 GPU
  • Epochs: 1
  • Optimizer: AdamW with linear scheduler
  • Metrics (validation set):
Metric Score
Accuracy 0.95
F1-score 0.95
Precision 0.93
Recall 0.97

Training setup

Parameter Value
Learning rate 3e-5
Desired (effective) batch size 64
Actual GPU batch size 16
Gradient accumulation 4 steps
Weight decay 1e-5
Betas (0.9, 0.95)
Epsilon 1e-8
Max epochs 1

Limitations & Risks

  • May be sensitive to domain shift (non-news, social media text)
  • Class imbalance can affect thresholding; calibrate on your validation data
  • Multilingual performance can vary across languages and registers

How to cite

If you use this model, please cite this repository and the EuroBERT base model.