File size: 2,919 Bytes
c3b1b05 0d3b0b2 c3b1b05 0d3b0b2 c3b1b05 0d3b0b2 c3b1b05 0d3b0b2 c3b1b05 0d3b0b2 c3b1b05 dd69524 c3b1b05 c9b81c4 fb29e59 c3b1b05 033188e c3b1b05 0fd1181 033188e c3b1b05 e894f21 c3b1b05 fb29e59 cfddb1c fb29e59 3782028 c3b1b05 fb29e59 c3b1b05 fb29e59 c3b1b05 fb29e59 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
library_name: transformers
pipeline_tag: text-classification
base_model: EuroBERT/EuroBERT-210m
base_model_relation: finetune
tags:
- eurobert
- fine-tuned
- transformers
- pytorch
- sequence-classification
- binary-classification
- geopolitics
- multilingual
language:
- en
- de
- fr
- es
- it
---
# EuroBERT Geopolitical Classifier (Binary)
Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** classification of geopolitical tension in European news text.
- **Task:** Sequence classification (binary)
- **Labels:** `non_geopolitical` (0), `geopolitical` (1)
- **Intended use:** Detects whether an article reflects geopolitical tension.
- **Languages:** English, German, French, Spanish, Italian
- **Framework:** 🤗 Transformers (PyTorch)
---
## Quick start
### Inference with `transformers`
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "Durrani95/eurobert-geopolitical-binary"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
texts = [
"Energy Sanctions Deepen Divide Between Western Bloc and Major Oil Exporters.",
"Military Exercises Near Disputed Waters Raise Fears of Regional Escalations.",
]
inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=1)
for text, p in zip(texts, probs):
label_id = int(p.argmax())
label = model.config.id2label[label_id]
confidence = float(p[label_id])
print(f"{label:>16} {confidence:6.2%} | {text}")
```
---
## Labels
```json
{
"0": "non_geopolitical",
"1": "geopolitical"
}
```
You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off.
---
## Training & Evaluation
- **Base model:** `EuroBERT/EuroBERT-210m`
- **Objective:** Cross-entropy (binary)
- **Data:** European news text labeled for geopolitical relevance
- **Hardware:** A100 GPU
- **Epochs:** 1
- **Optimizer:** AdamW with linear scheduler
- **Metrics (validation set):**
| Metric | Score |
|:-------|------:|
| Accuracy | 0.95 |
| F1-score | 0.95 |
| Precision | 0.93 |
| Recall | 0.97 |
### Training setup
| Parameter | Value |
|------------|--------|
| Learning rate | 3e-5 |
| Desired (effective) batch size | 64 |
| Actual GPU batch size | 16 |
| Gradient accumulation | 4 steps |
| Weight decay | 1e-5 |
| Betas | (0.9, 0.95) |
| Epsilon | 1e-8 |
| Max epochs | 1 |
|
---
## Limitations & Risks
- May be sensitive to domain shift (non-news, social media text)
- Class imbalance can affect thresholding; calibrate on your validation data
- Multilingual performance can vary across languages and registers
---
## How to cite
If you use this model, please cite this repository and the EuroBERT base model.
|