Trisham97 commited on
Commit
92d52bb
·
verified ·
1 Parent(s): 0e0d92a

Add model card

Browse files
Files changed (1) hide show
  1. README.md +227 -3
README.md CHANGED
@@ -1,3 +1,227 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - token-classification
6
+ - ner
7
+ - finance
8
+ - energy
9
+ - geopolitics
10
+ - distilbert
11
+ - multitask
12
+ pipeline_tag: token-classification
13
+ ---
14
+
15
+ # DistilBERT Energy Intelligence Multitask NER — v2
16
+
17
+ **Model ID:** `Quantbridge/distilbert-energy-intelligence-multitask-v2`
18
+
19
+ A domain-specific fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition across **energy markets, financial instruments, geopolitics, corporate events, and technology**. This is a broad-coverage multitask NER model designed for intelligence extraction from financial news and market commentary.
20
+
21
+ The model recognises **59 entity types** (119 BIO labels including B-/I- prefixes) spanning multiple intelligence domains.
22
+
23
+ ---
24
+
25
+ ## Entity Taxonomy
26
+
27
+ ### Financial Instruments & Markets
28
+ | Label | Description |
29
+ |---|---|
30
+ | `EQUITY` | Stocks and equity instruments |
31
+ | `DERIVATIVE` | Futures, options, swaps |
32
+ | `CURRENCY` | FX pairs and currencies |
33
+ | `FIXED_INCOME` | Bonds, treasuries, notes |
34
+ | `ASSET_CLASS` | Broad asset class references |
35
+ | `INDEX` | Market indices (S&P 500, FTSE, etc.) |
36
+ | `COMMODITY` | Physical commodities (oil, gas, metals) |
37
+ | `TRADING_HUB` | Price benchmarks and trading hubs |
38
+
39
+ ### Financial Institutions
40
+ | Label | Description |
41
+ |---|---|
42
+ | `FINANCIAL_INSTITUTION` | Banks, brokerages, investment firms |
43
+ | `CENTRAL_BANK` | Central banks (Fed, ECB, BoE) |
44
+ | `HEDGE_FUND` | Hedge funds and asset managers |
45
+ | `RATING_AGENCY` | Credit rating agencies |
46
+ | `EXCHANGE` | Stock and commodity exchanges |
47
+
48
+ ### Macro & Policy
49
+ | Label | Description |
50
+ |---|---|
51
+ | `MACRO_INDICATOR` | GDP, inflation, unemployment figures |
52
+ | `MONETARY_POLICY` | Interest rate decisions, QE programmes |
53
+ | `FISCAL_POLICY` | Government spending, tax policy |
54
+ | `TRADE_POLICY` | Tariffs, trade agreements, WTO actions |
55
+ | `ECONOMIC_BLOC` | G7, G20, EU, ASEAN, etc. |
56
+
57
+ ### Energy Domain
58
+ | Label | Description |
59
+ |---|---|
60
+ | `ENERGY_COMPANY` | Oil majors, utilities, renewable firms |
61
+ | `ENERGY_SOURCE` | Oil, gas, coal, solar, nuclear, etc. |
62
+ | `PIPELINE` | Energy pipelines and transmission lines |
63
+ | `REFINERY` | Oil refineries and processing plants |
64
+ | `ENERGY_POLICY` | OPEC decisions, energy legislation |
65
+ | `ENERGY_TRANSITION` | Decarbonisation, net-zero, EV, hydrogen |
66
+ | `GRID` | Power grids and electricity networks |
67
+
68
+ ### Geopolitical
69
+ | Label | Description |
70
+ |---|---|
71
+ | `GEOPOLITICAL_EVENT` | Summits, elections, geopolitical shifts |
72
+ | `SANCTION` | Economic sanctions and embargoes |
73
+ | `TREATY` | International agreements and accords |
74
+ | `CONFLICT_ZONE` | Active or historic conflict regions |
75
+ | `DIPLOMATIC_ACTION` | Diplomatic moves, expulsions, negotiations |
76
+ | `COUNTRY` | Nation states |
77
+ | `REGION` | Geographic regions (Middle East, EU, etc.) |
78
+ | `CITY` | Cities and urban locations |
79
+
80
+ ### Corporate Events
81
+ | Label | Description |
82
+ |---|---|
83
+ | `COMPANY` | General companies |
84
+ | `M_AND_A` | Mergers and acquisitions |
85
+ | `IPO` | Initial public offerings |
86
+ | `EARNINGS_EVENT` | Quarterly earnings, revenue reports |
87
+ | `EXECUTIVE` | Named C-suite executives |
88
+ | `CORPORATE_ACTION` | Dividends, buybacks, restructuring |
89
+
90
+ ### Infrastructure & Supply Chain
91
+ | Label | Description |
92
+ |---|---|
93
+ | `INFRA` | Physical infrastructure (general) |
94
+ | `SUPPLY_CHAIN` | Supply chain disruptions and logistics |
95
+ | `SHIPPING_VESSEL` | Named ships and tankers |
96
+ | `PORT` | Ports and maritime hubs |
97
+
98
+ ### Risk & Events
99
+ | Label | Description |
100
+ |---|---|
101
+ | `EVENT` | General newsworthy events |
102
+ | `RISK_FACTOR` | Risk factors and vulnerabilities |
103
+ | `NATURAL_DISASTER` | Hurricanes, earthquakes, floods |
104
+ | `CYBER_EVENT` | Cyber attacks and digital incidents |
105
+ | `DISRUPTION` | Supply or market disruptions |
106
+
107
+ ### Technology
108
+ | Label | Description |
109
+ |---|---|
110
+ | `TECH_COMPANY` | Technology companies |
111
+ | `AI_MODEL` | AI systems and models |
112
+ | `SEMICONDUCTOR` | Chips and semiconductor companies |
113
+ | `TECH_REGULATION` | Technology regulation and policy |
114
+
115
+ ### People & Organizations
116
+ | Label | Description |
117
+ |---|---|
118
+ | `PERSON` | Named individuals |
119
+ | `THINK_TANK` | Policy research organizations |
120
+ | `NEWS_SOURCE` | Media and news outlets |
121
+ | `REGULATORY_BODY` | Government regulators (SEC, FCA, etc.) |
122
+ | `ORG` | General organizations |
123
+
124
+ ---
125
+
126
+ ## Usage
127
+
128
+ ```python
129
+ from transformers import pipeline
130
+
131
+ ner = pipeline(
132
+ "token-classification",
133
+ model="Quantbridge/distilbert-energy-intelligence-multitask-v2",
134
+ aggregation_strategy="simple",
135
+ )
136
+
137
+ text = (
138
+ "The Federal Reserve held interest rates steady as Brent crude fell below $75 "
139
+ "following OPEC+ production cuts and renewed sanctions on Russian energy exports."
140
+ )
141
+
142
+ results = ner(text)
143
+ for entity in results:
144
+ print(f"{entity['word']:<35} {entity['entity_group']:<25} {entity['score']:.3f}")
145
+ ```
146
+
147
+ **Example output:**
148
+ ```
149
+ Federal Reserve CENTRAL_BANK 0.961
150
+ Brent TRADING_HUB 0.954
151
+ OPEC+ REGULATORY_BODY 0.947
152
+ Russian energy exports SANCTION 0.932
153
+ ```
154
+
155
+ ### Load model directly
156
+
157
+ ```python
158
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
159
+ import torch
160
+
161
+ model_name = "Quantbridge/distilbert-energy-intelligence-multitask-v2"
162
+
163
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
164
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
165
+ model.eval()
166
+
167
+ text = "Goldman Sachs cut its oil price forecast after OPEC+ agreed to extend output cuts."
168
+ inputs = tokenizer(text, return_tensors="pt")
169
+
170
+ with torch.no_grad():
171
+ outputs = model(**inputs)
172
+
173
+ predicted_ids = outputs.logits.argmax(dim=-1)[0]
174
+ tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
175
+
176
+ for token, label_id in zip(tokens, predicted_ids):
177
+ label = model.config.id2label[label_id.item()]
178
+ if label != "O" and not token.startswith("["):
179
+ print(f"{token.lstrip('##'):<25} {label}")
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Model Details
185
+
186
+ | Property | Value |
187
+ |---|---|
188
+ | Base architecture | `distilbert-base-uncased` |
189
+ | Architecture type | DistilBertForTokenClassification |
190
+ | Entity types | 59 types (119 BIO labels) |
191
+ | Hidden dimension | 768 |
192
+ | Attention heads | 12 |
193
+ | Layers | 6 |
194
+ | Vocabulary size | 30,522 |
195
+ | Max sequence length | 512 tokens |
196
+
197
+ ---
198
+
199
+ ## Intended Use
200
+
201
+ This model is designed for **financial and energy intelligence extraction** — automated NER over news feeds, earnings transcripts, regulatory filings, and geopolitical reports. It is a base model suitable for:
202
+
203
+ - Structured data extraction from unstructured financial news
204
+ - Entity linking and knowledge graph population
205
+ - Signal detection for trading and risk systems
206
+ - Geopolitical risk monitoring
207
+
208
+ ### Out-of-scope use
209
+
210
+ - General-purpose NER on non-financial text
211
+ - Languages other than English
212
+ - Documents with heavy technical jargon outside the financial/energy domain
213
+
214
+ ---
215
+
216
+ ## Limitations
217
+
218
+ - English-only
219
+ - Optimised for news-style formal writing; may underperform on social media or informal text
220
+ - 59-label taxonomy may produce overlapping predictions for ambiguous entities (e.g. a company that is also an energy company)
221
+ - BIO scheme does not support nested entities
222
+
223
+ ---
224
+
225
+ ## License
226
+
227
+ Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0).