Trisham97 commited on
Commit
38b2add
·
verified ·
1 Parent(s): 5e2deb8

Add model card

Browse files
Files changed (1) hide show
  1. README.md +163 -0
README.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - token-classification
6
+ - ner
7
+ - energy
8
+ - geopolitics
9
+ - distilbert
10
+ pipeline_tag: token-classification
11
+ ---
12
+
13
+ # Energy Intelligence NER
14
+
15
+ **Model ID:** `Quantbridge/energy-intelligence-multitask-ner`
16
+
17
+ A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition in the **energy markets and geopolitical** domain. The model identifies nine entity types relevant to energy intelligence — companies, commodities, infrastructure, markets, events, and more.
18
+
19
+ ---
20
+
21
+ ## Entity Types
22
+
23
+ | Label | Description | Examples |
24
+ |---|---|---|
25
+ | `COMPANY` | Energy sector companies | ExxonMobil, BP, Saudi Aramco |
26
+ | `COMMODITY` | Energy commodities and resources | crude oil, natural gas, LNG, coal |
27
+ | `COUNTRY` | Nation states | United States, Russia, Saudi Arabia |
28
+ | `LOCATION` | Geographic locations, regions | Persian Gulf, North Sea, Permian Basin |
29
+ | `INFRASTRUCTURE` | Physical energy infrastructure | pipelines, refineries, LNG terminals |
30
+ | `MARKET` | Energy markets and trading hubs | Henry Hub, Brent, WTI, TTF |
31
+ | `EVENT` | Market events, geopolitical events | sanctions, OPEC+ cut, supply disruption |
32
+ | `ORGANIZATION` | Non-company organizations, bodies | OPEC, IEA, G7, US Energy Department |
33
+ | `PERSON` | Named individuals | ministers, executives, analysts |
34
+
35
+ ---
36
+
37
+ ## Usage
38
+
39
+ ```python
40
+ from transformers import pipeline
41
+
42
+ ner = pipeline(
43
+ "token-classification",
44
+ model="Quantbridge/energy-intelligence-multitask-ner",
45
+ aggregation_strategy="simple",
46
+ )
47
+
48
+ text = (
49
+ "Saudi Aramco announced a production cut of 1 million barrels per day "
50
+ "amid falling crude oil prices at the Brent benchmark market."
51
+ )
52
+
53
+ results = ner(text)
54
+ for entity in results:
55
+ print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}")
56
+ ```
57
+
58
+ **Example output:**
59
+ ```
60
+ Saudi Aramco COMPANY score=0.981
61
+ crude oil COMMODITY score=0.974
62
+ Brent MARKET score=0.968
63
+ ```
64
+
65
+ ### Load model directly
66
+
67
+ ```python
68
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
69
+ import torch
70
+
71
+ model_name = "Quantbridge/energy-intelligence-multitask-ner"
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
74
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
75
+
76
+ inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt")
77
+ with torch.no_grad():
78
+ outputs = model(**inputs)
79
+
80
+ logits = outputs.logits
81
+ predicted_ids = logits.argmax(dim=-1)[0]
82
+ tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
83
+
84
+ for token, label_id in zip(tokens, predicted_ids):
85
+ label = model.config.id2label[label_id.item()]
86
+ if label != "O":
87
+ print(f"{token:<20} {label}")
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Model Details
93
+
94
+ | Property | Value |
95
+ |---|---|
96
+ | Base model | `distilbert-base-uncased` |
97
+ | Architecture | DistilBERT + token classification head |
98
+ | Parameters | ~67M |
99
+ | Max sequence length | 256 tokens |
100
+ | Training precision | FP16 |
101
+ | Optimizer | AdamW |
102
+ | Learning rate | 2e-5 |
103
+ | Warmup ratio | 10% |
104
+ | Weight decay | 0.01 |
105
+ | Epochs | 5 |
106
+
107
+ ---
108
+
109
+ ## Training Data
110
+
111
+ The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging.
112
+
113
+ **Dataset split:**
114
+
115
+ | Split | Records |
116
+ |---|---|
117
+ | Train | ~9,200 |
118
+ | Validation | ~1,150 |
119
+ | Test | ~1,150 |
120
+
121
+ ---
122
+
123
+ ## Evaluation
124
+
125
+ Evaluated on the held-out test set using [seqeval](https://github.com/chakki-works/seqeval) (entity-level span matching).
126
+
127
+ | Metric | Score |
128
+ |---|---|
129
+ | Overall F1 | *reported after training* |
130
+ | Overall Precision | *reported after training* |
131
+ | Overall Recall | *reported after training* |
132
+
133
+ Per-entity F1 scores are available in `label_map.json` in the model repository.
134
+
135
+ ---
136
+
137
+ ## Limitations
138
+
139
+ - Trained exclusively on English text.
140
+ - Best suited for formal news-style writing about energy markets and geopolitics.
141
+ - Performance may degrade on highly technical engineering documents or non-standard text formats.
142
+ - Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported.
143
+
144
+ ---
145
+
146
+ ## Citation
147
+
148
+ If you use this model in your work, please cite:
149
+
150
+ ```bibtex
151
+ @misc{quantbridge-energy-ner-2025,
152
+ title = {Energy Intelligence NER},
153
+ author = {Quantbridge},
154
+ year = {2025},
155
+ url = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner}
156
+ }
157
+ ```
158
+
159
+ ---
160
+
161
+ ## License
162
+
163
+ Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0).