eurobert-geopolitical-binary / README_eurobert_geopol_binary.md

Durrani95

Add fine-tuned EuroBERT for binary geopolitical classification

ea0cb4c verified 3 months ago

preview code

raw

history blame contribute delete

2.97 kB

metadata

library_name: transformers
pipeline_tag: text-classification
base_model: EuroBERT/EuroBERT-210m
base_model_relation: finetune
tags:
  - eurobert
  - fine-tuned
  - transformers
  - pytorch
  - sequence-classification
  - binary-classification
  - geopolitics
  - multilingual
language:
  - en
  - de
  - fr
  - es
  - it

EuroBERT Geopolitical Classifier (Binary)

Fine-tuned EuroBERT/EuroBERT-210m for binary classification of geopolitical tension in European news text.

Task: Sequence classification (binary)
Labels: non_geopolitical (0), geopolitical (1)
Intended use: Detects whether an article reflects geopolitical tension (best performance on full article-level text)
Languages: English, German, French, Spanish, Italian
Framework: 🤗 Transformers (PyTorch)

Quick start

Inference with `transformers`

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Durrani95/eurobert-geopolitical-binary"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

texts = [
    "Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
    "Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",

]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=1)

for text, p in zip(texts, probs):
    label_id = int(p.argmax())
    label = model.config.id2label[label_id]
    confidence = float(p[label_id])
    print(f"{label:>16}  {confidence:6.2%}  | {text}")

Labels

{
  "0": "non_geopolitical",
  "1": "geopolitical"
}

You may apply a decision threshold (e.g., score >= 0.5) depending on your precision/recall trade-off.

Training & Evaluation

Base model: EuroBERT/EuroBERT-210m
Objective: Cross-entropy (binary)
Data: European news text labeled for geopolitical relevance
Hardware: A100 GPU
Epochs: 1
Optimizer: AdamW with linear scheduler
Metrics (validation set):

Metric	Score
Accuracy	0.95
F1-score	0.95
Precision	0.93
Recall	0.97

Training setup

Parameter	Value
Learning rate	3e-5
Desired (effective) batch size	64
Actual GPU batch size	16
Gradient accumulation	4 steps
Weight decay	1e-5
Betas	(0.9, 0.95)
Epsilon	1e-8
Max epochs	1

Limitations & Risks

May be sensitive to domain shift (non-news, social media text)
Class imbalance can affect thresholding; calibrate on your validation data
Multilingual performance can vary across languages and registers

How to cite

If you use this model, please cite this repository and the EuroBERT base model.