--- library_name: transformers pipeline_tag: text-classification base_model: EuroBERT/EuroBERT-210m base_model_relation: finetune tags: - eurobert - fine-tuned - transformers - pytorch - sequence-classification - binary-classification - geopolitics - multilingual language: - en - de - fr - es - it --- # EuroBERT Geopolitical Classifier (Binary) Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text. - **Task:** Sequence classification (binary) - **Labels:** `non_geopolitical` (0), `geopolitical` (1) - **Intended use:** Detects whether an article reflects geopolitical tension (best performance on full article-level text) - **Languages:** English, German, French, Spanish, Italian - **Framework:** 🤗 Transformers (PyTorch) --- ## Quick start ### Inference with `transformers` ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_id = "Durrani95/eurobert-geopolitical-binary" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) texts = [ "Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.", "Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.", ] inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits probs = torch.softmax(logits, dim=1) for text, p in zip(texts, probs): label_id = int(p.argmax()) label = model.config.id2label[label_id] confidence = float(p[label_id]) print(f"{label:>16} {confidence:6.2%} | {text}") ``` --- ## Labels ```json { "0": "non_geopolitical", "1": "geopolitical" } ``` You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off. --- ## Training & Evaluation - **Base model:** `EuroBERT/EuroBERT-210m` - **Objective:** Cross-entropy (binary) - **Data:** European news text labeled for geopolitical relevance - **Hardware:** A100 GPU - **Epochs:** 1 - **Optimizer:** AdamW with linear scheduler - **Metrics (validation set):** | Metric | Score | |:-------|------:| | Accuracy | 0.95 | | F1-score | 0.95 | | Precision | 0.93 | | Recall | 0.97 | ### Training setup | Parameter | Value | |------------|--------| | Learning rate | 3e-5 | | Desired (effective) batch size | 64 | | Actual GPU batch size | 16 | | Gradient accumulation | 4 steps | | Weight decay | 1e-5 | | Betas | (0.9, 0.95) | | Epsilon | 1e-8 | | Max epochs | 1 | | --- ## Limitations & Risks - May be sensitive to domain shift (non-news, social media text) - Class imbalance can affect thresholding; calibrate on your validation data - Multilingual performance can vary across languages and registers --- ## How to cite If you use this model, please cite this repository and the EuroBERT base model.