| --- |
| language: en |
| license: apache-2.0 |
| tags: |
| - token-classification |
| - ner |
| - finance |
| - energy |
| - geopolitics |
| - distilbert |
| - multitask |
| pipeline_tag: token-classification |
| --- |
| |
| # DistilBERT Energy Intelligence Multitask NER — v2 |
|
|
| **Model ID:** `Quantbridge/distilbert-energy-intelligence-multitask-v2` |
|
|
| A domain-specific fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition across **energy markets, financial instruments, geopolitics, corporate events, and technology**. This is a broad-coverage multitask NER model designed for intelligence extraction from financial news and market commentary. |
|
|
| The model recognises **59 entity types** (119 BIO labels including B-/I- prefixes) spanning multiple intelligence domains. |
|
|
| --- |
|
|
| ## Entity Taxonomy |
|
|
| ### Financial Instruments & Markets |
| | Label | Description | |
| |---|---| |
| | `EQUITY` | Stocks and equity instruments | |
| | `DERIVATIVE` | Futures, options, swaps | |
| | `CURRENCY` | FX pairs and currencies | |
| | `FIXED_INCOME` | Bonds, treasuries, notes | |
| | `ASSET_CLASS` | Broad asset class references | |
| | `INDEX` | Market indices (S&P 500, FTSE, etc.) | |
| | `COMMODITY` | Physical commodities (oil, gas, metals) | |
| | `TRADING_HUB` | Price benchmarks and trading hubs | |
|
|
| ### Financial Institutions |
| | Label | Description | |
| |---|---| |
| | `FINANCIAL_INSTITUTION` | Banks, brokerages, investment firms | |
| | `CENTRAL_BANK` | Central banks (Fed, ECB, BoE) | |
| | `HEDGE_FUND` | Hedge funds and asset managers | |
| | `RATING_AGENCY` | Credit rating agencies | |
| | `EXCHANGE` | Stock and commodity exchanges | |
|
|
| ### Macro & Policy |
| | Label | Description | |
| |---|---| |
| | `MACRO_INDICATOR` | GDP, inflation, unemployment figures | |
| | `MONETARY_POLICY` | Interest rate decisions, QE programmes | |
| | `FISCAL_POLICY` | Government spending, tax policy | |
| | `TRADE_POLICY` | Tariffs, trade agreements, WTO actions | |
| | `ECONOMIC_BLOC` | G7, G20, EU, ASEAN, etc. | |
|
|
| ### Energy Domain |
| | Label | Description | |
| |---|---| |
| | `ENERGY_COMPANY` | Oil majors, utilities, renewable firms | |
| | `ENERGY_SOURCE` | Oil, gas, coal, solar, nuclear, etc. | |
| | `PIPELINE` | Energy pipelines and transmission lines | |
| | `REFINERY` | Oil refineries and processing plants | |
| | `ENERGY_POLICY` | OPEC decisions, energy legislation | |
| | `ENERGY_TRANSITION` | Decarbonisation, net-zero, EV, hydrogen | |
| | `GRID` | Power grids and electricity networks | |
|
|
| ### Geopolitical |
| | Label | Description | |
| |---|---| |
| | `GEOPOLITICAL_EVENT` | Summits, elections, geopolitical shifts | |
| | `SANCTION` | Economic sanctions and embargoes | |
| | `TREATY` | International agreements and accords | |
| | `CONFLICT_ZONE` | Active or historic conflict regions | |
| | `DIPLOMATIC_ACTION` | Diplomatic moves, expulsions, negotiations | |
| | `COUNTRY` | Nation states | |
| | `REGION` | Geographic regions (Middle East, EU, etc.) | |
| | `CITY` | Cities and urban locations | |
|
|
| ### Corporate Events |
| | Label | Description | |
| |---|---| |
| | `COMPANY` | General companies | |
| | `M_AND_A` | Mergers and acquisitions | |
| | `IPO` | Initial public offerings | |
| | `EARNINGS_EVENT` | Quarterly earnings, revenue reports | |
| | `EXECUTIVE` | Named C-suite executives | |
| | `CORPORATE_ACTION` | Dividends, buybacks, restructuring | |
|
|
| ### Infrastructure & Supply Chain |
| | Label | Description | |
| |---|---| |
| | `INFRA` | Physical infrastructure (general) | |
| | `SUPPLY_CHAIN` | Supply chain disruptions and logistics | |
| | `SHIPPING_VESSEL` | Named ships and tankers | |
| | `PORT` | Ports and maritime hubs | |
|
|
| ### Risk & Events |
| | Label | Description | |
| |---|---| |
| | `EVENT` | General newsworthy events | |
| | `RISK_FACTOR` | Risk factors and vulnerabilities | |
| | `NATURAL_DISASTER` | Hurricanes, earthquakes, floods | |
| | `CYBER_EVENT` | Cyber attacks and digital incidents | |
| | `DISRUPTION` | Supply or market disruptions | |
|
|
| ### Technology |
| | Label | Description | |
| |---|---| |
| | `TECH_COMPANY` | Technology companies | |
| | `AI_MODEL` | AI systems and models | |
| | `SEMICONDUCTOR` | Chips and semiconductor companies | |
| | `TECH_REGULATION` | Technology regulation and policy | |
|
|
| ### People & Organizations |
| | Label | Description | |
| |---|---| |
| | `PERSON` | Named individuals | |
| | `THINK_TANK` | Policy research organizations | |
| | `NEWS_SOURCE` | Media and news outlets | |
| | `REGULATORY_BODY` | Government regulators (SEC, FCA, etc.) | |
| | `ORG` | General organizations | |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import pipeline |
| |
| ner = pipeline( |
| "token-classification", |
| model="Quantbridge/distilbert-energy-intelligence-multitask-v2", |
| aggregation_strategy="simple", |
| ) |
| |
| text = ( |
| "The Federal Reserve held interest rates steady as Brent crude fell below $75 " |
| "following OPEC+ production cuts and renewed sanctions on Russian energy exports." |
| ) |
| |
| results = ner(text) |
| for entity in results: |
| print(f"{entity['word']:<35} {entity['entity_group']:<25} {entity['score']:.3f}") |
| ``` |
|
|
| **Example output:** |
| ``` |
| Federal Reserve CENTRAL_BANK 0.961 |
| Brent TRADING_HUB 0.954 |
| OPEC+ REGULATORY_BODY 0.947 |
| Russian energy exports SANCTION 0.932 |
| ``` |
|
|
| ### Load model directly |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForTokenClassification |
| import torch |
| |
| model_name = "Quantbridge/distilbert-energy-intelligence-multitask-v2" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForTokenClassification.from_pretrained(model_name) |
| model.eval() |
| |
| text = "Goldman Sachs cut its oil price forecast after OPEC+ agreed to extend output cuts." |
| inputs = tokenizer(text, return_tensors="pt") |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| |
| predicted_ids = outputs.logits.argmax(dim=-1)[0] |
| tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) |
| |
| for token, label_id in zip(tokens, predicted_ids): |
| label = model.config.id2label[label_id.item()] |
| if label != "O" and not token.startswith("["): |
| print(f"{token.lstrip('##'):<25} {label}") |
| ``` |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Base architecture | `distilbert-base-uncased` | |
| | Architecture type | DistilBertForTokenClassification | |
| | Entity types | 59 types (119 BIO labels) | |
| | Hidden dimension | 768 | |
| | Attention heads | 12 | |
| | Layers | 6 | |
| | Vocabulary size | 30,522 | |
| | Max sequence length | 512 tokens | |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| This model is designed for **financial and energy intelligence extraction** — automated NER over news feeds, earnings transcripts, regulatory filings, and geopolitical reports. It is a base model suitable for: |
|
|
| - Structured data extraction from unstructured financial news |
| - Entity linking and knowledge graph population |
| - Signal detection for trading and risk systems |
| - Geopolitical risk monitoring |
|
|
| ### Out-of-scope use |
|
|
| - General-purpose NER on non-financial text |
| - Languages other than English |
| - Documents with heavy technical jargon outside the financial/energy domain |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - English-only |
| - Optimised for news-style formal writing; may underperform on social media or informal text |
| - 59-label taxonomy may produce overlapping predictions for ambiguous entities (e.g. a company that is also an energy company) |
| - BIO scheme does not support nested entities |
|
|
| --- |
|
|
| ## License |
|
|
| Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0). |
|
|