Add model card

92d52bb verified 1 day ago

7.3 kB

language: en
license: apache-2.0
tags:
  - token-classification
  - ner
  - finance
  - energy
  - geopolitics
  - distilbert
  - multitask
pipeline_tag: token-classification

DistilBERT Energy Intelligence Multitask NER — v2

Model ID: Quantbridge/distilbert-energy-intelligence-multitask-v2

A domain-specific fine-tuned DistilBERT model for Named Entity Recognition across energy markets, financial instruments, geopolitics, corporate events, and technology. This is a broad-coverage multitask NER model designed for intelligence extraction from financial news and market commentary.

The model recognises 59 entity types (119 BIO labels including B-/I- prefixes) spanning multiple intelligence domains.

Entity Taxonomy

Financial Instruments & Markets

Label	Description
`EQUITY`	Stocks and equity instruments
`DERIVATIVE`	Futures, options, swaps
`CURRENCY`	FX pairs and currencies
`FIXED_INCOME`	Bonds, treasuries, notes
`ASSET_CLASS`	Broad asset class references
`INDEX`	Market indices (S&P 500, FTSE, etc.)
`COMMODITY`	Physical commodities (oil, gas, metals)
`TRADING_HUB`	Price benchmarks and trading hubs

Financial Institutions

Label	Description
`FINANCIAL_INSTITUTION`	Banks, brokerages, investment firms
`CENTRAL_BANK`	Central banks (Fed, ECB, BoE)
`HEDGE_FUND`	Hedge funds and asset managers
`RATING_AGENCY`	Credit rating agencies
`EXCHANGE`	Stock and commodity exchanges

Macro & Policy

Label	Description
`MACRO_INDICATOR`	GDP, inflation, unemployment figures
`MONETARY_POLICY`	Interest rate decisions, QE programmes
`FISCAL_POLICY`	Government spending, tax policy
`TRADE_POLICY`	Tariffs, trade agreements, WTO actions
`ECONOMIC_BLOC`	G7, G20, EU, ASEAN, etc.

Energy Domain

Label	Description
`ENERGY_COMPANY`	Oil majors, utilities, renewable firms
`ENERGY_SOURCE`	Oil, gas, coal, solar, nuclear, etc.
`PIPELINE`	Energy pipelines and transmission lines
`REFINERY`	Oil refineries and processing plants
`ENERGY_POLICY`	OPEC decisions, energy legislation
`ENERGY_TRANSITION`	Decarbonisation, net-zero, EV, hydrogen
`GRID`	Power grids and electricity networks

Geopolitical

Label	Description
`GEOPOLITICAL_EVENT`	Summits, elections, geopolitical shifts
`SANCTION`	Economic sanctions and embargoes
`TREATY`	International agreements and accords
`CONFLICT_ZONE`	Active or historic conflict regions
`DIPLOMATIC_ACTION`	Diplomatic moves, expulsions, negotiations
`COUNTRY`	Nation states
`REGION`	Geographic regions (Middle East, EU, etc.)
`CITY`	Cities and urban locations

Corporate Events

Label	Description
`COMPANY`	General companies
`M_AND_A`	Mergers and acquisitions
`IPO`	Initial public offerings
`EARNINGS_EVENT`	Quarterly earnings, revenue reports
`EXECUTIVE`	Named C-suite executives
`CORPORATE_ACTION`	Dividends, buybacks, restructuring

Infrastructure & Supply Chain

Label	Description
`INFRA`	Physical infrastructure (general)
`SUPPLY_CHAIN`	Supply chain disruptions and logistics
`SHIPPING_VESSEL`	Named ships and tankers
`PORT`	Ports and maritime hubs

Risk & Events

Label	Description
`EVENT`	General newsworthy events
`RISK_FACTOR`	Risk factors and vulnerabilities
`NATURAL_DISASTER`	Hurricanes, earthquakes, floods
`CYBER_EVENT`	Cyber attacks and digital incidents
`DISRUPTION`	Supply or market disruptions

Technology

Label	Description
`TECH_COMPANY`	Technology companies
`AI_MODEL`	AI systems and models
`SEMICONDUCTOR`	Chips and semiconductor companies
`TECH_REGULATION`	Technology regulation and policy

People & Organizations

Label	Description
`PERSON`	Named individuals
`THINK_TANK`	Policy research organizations
`NEWS_SOURCE`	Media and news outlets
`REGULATORY_BODY`	Government regulators (SEC, FCA, etc.)
`ORG`	General organizations

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="Quantbridge/distilbert-energy-intelligence-multitask-v2",
    aggregation_strategy="simple",
)

text = (
    "The Federal Reserve held interest rates steady as Brent crude fell below $75 "
    "following OPEC+ production cuts and renewed sanctions on Russian energy exports."
)

results = ner(text)
for entity in results:
    print(f"{entity['word']:<35} {entity['entity_group']:<25} {entity['score']:.3f}")

Example output:

Federal Reserve                     CENTRAL_BANK              0.961
Brent                               TRADING_HUB               0.954
OPEC+                               REGULATORY_BODY           0.947
Russian energy exports              SANCTION                  0.932

Load model directly

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "Quantbridge/distilbert-energy-intelligence-multitask-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
model.eval()

text = "Goldman Sachs cut its oil price forecast after OPEC+ agreed to extend output cuts."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

predicted_ids = outputs.logits.argmax(dim=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, label_id in zip(tokens, predicted_ids):
    label = model.config.id2label[label_id.item()]
    if label != "O" and not token.startswith("["):
        print(f"{token.lstrip('##'):<25} {label}")

Model Details

Property	Value
Base architecture	`distilbert-base-uncased`
Architecture type	DistilBertForTokenClassification
Entity types	59 types (119 BIO labels)
Hidden dimension	768
Attention heads	12
Layers	6
Vocabulary size	30,522
Max sequence length	512 tokens

Intended Use

This model is designed for financial and energy intelligence extraction — automated NER over news feeds, earnings transcripts, regulatory filings, and geopolitical reports. It is a base model suitable for:

Structured data extraction from unstructured financial news
Entity linking and knowledge graph population
Signal detection for trading and risk systems
Geopolitical risk monitoring

Out-of-scope use

General-purpose NER on non-financial text
Languages other than English
Documents with heavy technical jargon outside the financial/energy domain

Limitations

English-only
Optimised for news-style formal writing; may underperform on social media or informal text
59-label taxonomy may produce overlapping predictions for ambiguous entities (e.g. a company that is also an energy company)
BIO scheme does not support nested entities

License

Apache 2.0 — see LICENSE.