Durrani95 commited on
Commit
c3b1b05
·
verified ·
1 Parent(s): d9a5d91

Add fine-tuned EuroBERT for binary geopolitical classification

Browse files
Files changed (3) hide show
  1. .amlignore +6 -0
  2. .amlignore.amltmp +6 -0
  3. README.md +113 -0
.amlignore ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ ## This file was auto generated by the Azure Machine Learning Studio. Please do not remove.
2
+ ## Read more about the .amlignore file here: https://docs.microsoft.com/azure/machine-learning/how-to-save-write-experiment-files#storage-limits-of-experiment-snapshots
3
+
4
+ .ipynb_aml_checkpoints/
5
+ *.amltmp
6
+ *.amltemp
.amlignore.amltmp ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ ## This file was auto generated by the Azure Machine Learning Studio. Please do not remove.
2
+ ## Read more about the .amlignore file here: https://docs.microsoft.com/azure/machine-learning/how-to-save-write-experiment-files#storage-limits-of-experiment-snapshots
3
+
4
+ .ipynb_aml_checkpoints/
5
+ *.amltmp
6
+ *.amltemp
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-classification
3
+ tags:
4
+ - eurobert
5
+ - transformers
6
+ - pytorch
7
+ - sequence-classification
8
+ - binary-classification
9
+ - geopolitics
10
+ - multilingual
11
+ language:
12
+ - en
13
+ - de
14
+ - fr
15
+ - es
16
+ - it
17
+ ---
18
+
19
+ # EuroBERT Geopolitical Classifier (Binary)
20
+
21
+ Fine-tuned `EuroBERT/EuroBERT-210m` for **binary** geopolitical detection in European news text.
22
+
23
+ - **Task:** Sequence classification (binary)
24
+ - **Labels:** `non_geopolitical` (0), `geopolitical` (1)
25
+ - **Intended use:** Rapid screening of texts to flag likely geopolitical content
26
+ - **Languages:** Primarily European languages (EN, DE, FR, ES, IT)
27
+ - **Framework:** 🤗 Transformers (PyTorch)
28
+
29
+ > If you use this model, consider adding a short description of your dataset and evaluation setup in the “Training & Evaluation” section below.
30
+
31
+ ---
32
+
33
+ ## Quick start
34
+
35
+ ### Inference with `transformers`
36
+
37
+ ```python
38
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
39
+ import torch
40
+
41
+ model_id = "<your_username>/eurobert-geopolitical-binary"
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
45
+
46
+ texts = [
47
+ "The EU imposed sanctions amid growing tensions with Russia.",
48
+ "New trade agreements are boosting European exports."
49
+ ]
50
+
51
+ inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
52
+
53
+ with torch.no_grad():
54
+ logits = model(**inputs).logits
55
+ probs = torch.softmax(logits, dim=1)
56
+
57
+ for text, p in zip(texts, probs):
58
+ label_id = int(p.argmax())
59
+ label = model.config.id2label[label_id]
60
+ confidence = float(p[label_id])
61
+ print(f"{label:>16} {confidence:6.2%} | {text}")
62
+ ```
63
+
64
+ ### Inference API (no local setup)
65
+
66
+ ```python
67
+ from huggingface_hub import InferenceClient
68
+
69
+ client = InferenceClient(model="<your_username>/eurobert-geopolitical-binary") # add token=... if private
70
+ res = client.text_classification("Parliament passed emergency measures amid escalating border tensions.")
71
+ print(res) # [{'label': 'geopolitical', 'score': 0.99}, ...]
72
+ ```
73
+
74
+ ```bash
75
+ curl https://api-inference.huggingface.co/models/<your_username>/eurobert-geopolitical-binary -H "Authorization: Bearer $HF_TOKEN" -X POST -d '{"inputs": "Talks broke down at the UN Security Council."}'
76
+ ```
77
+
78
+ ---
79
+
80
+ ## Labels
81
+
82
+ ```json
83
+ {
84
+ "0": "non_geopolitical",
85
+ "1": "geopolitical"
86
+ }
87
+ ```
88
+
89
+ You may apply a decision threshold (e.g., `score >= 0.5`) depending on your precision/recall trade-off.
90
+
91
+ ---
92
+
93
+ ## Training & Evaluation
94
+
95
+ - **Base model:** `EuroBERT/EuroBERT-210m`
96
+ - **Objective:** Cross-entropy (binary)
97
+ - **Data:** European news text labeled for geopolitical relevance (add your details here)
98
+ - **Hardware & hyperparameters:** (fill in as appropriate: batch size, lr, epochs, max length, etc.)
99
+ - **Metrics:** (add accuracy/F1/precision/recall on your validation/test set)
100
+
101
+ ---
102
+
103
+ ## Limitations & Risks
104
+
105
+ - May be sensitive to domain shift (non-news, social media slang)
106
+ - Class imbalance can affect thresholding; calibrate on your validation data
107
+ - Multilingual performance can vary by language and register
108
+
109
+ ---
110
+
111
+ ## How to cite
112
+
113
+ If you use this model, please cite the repository and the EuroBERT base model. (Add your preferred citation here.)