Energy Intelligence NER
Model ID: Quantbridge/energy-intelligence-multitask-ner
A fine-tuned DistilBERT model for Named Entity Recognition in the energy markets and geopolitical domain. The model identifies nine entity types relevant to energy intelligence โ companies, commodities, infrastructure, markets, events, and more.
Entity Types
| Label | Description | Examples |
|---|---|---|
COMPANY |
Energy sector companies | ExxonMobil, BP, Saudi Aramco |
COMMODITY |
Energy commodities and resources | crude oil, natural gas, LNG, coal |
COUNTRY |
Nation states | United States, Russia, Saudi Arabia |
LOCATION |
Geographic locations, regions | Persian Gulf, North Sea, Permian Basin |
INFRASTRUCTURE |
Physical energy infrastructure | pipelines, refineries, LNG terminals |
MARKET |
Energy markets and trading hubs | Henry Hub, Brent, WTI, TTF |
EVENT |
Market events, geopolitical events | sanctions, OPEC+ cut, supply disruption |
ORGANIZATION |
Non-company organizations, bodies | OPEC, IEA, G7, US Energy Department |
PERSON |
Named individuals | ministers, executives, analysts |
Usage
from transformers import pipeline
ner = pipeline(
"token-classification",
model="Quantbridge/energy-intelligence-multitask-ner",
aggregation_strategy="simple",
)
text = (
"Saudi Aramco announced a production cut of 1 million barrels per day "
"amid falling crude oil prices at the Brent benchmark market."
)
results = ner(text)
for entity in results:
print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}")
Example output:
Saudi Aramco COMPANY score=0.981
crude oil COMMODITY score=0.974
Brent MARKET score=0.968
Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
model_name = "Quantbridge/energy-intelligence-multitask-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_ids = logits.argmax(dim=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
for token, label_id in zip(tokens, predicted_ids):
label = model.config.id2label[label_id.item()]
if label != "O":
print(f"{token:<20} {label}")
Model Details
| Property | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Architecture | DistilBERT + token classification head |
| Parameters | ~67M |
| Max sequence length | 256 tokens |
| Training precision | FP16 |
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| Warmup ratio | 10% |
| Weight decay | 0.01 |
| Epochs | 5 |
Training Data
The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging.
Dataset split:
| Split | Records |
|---|---|
| Train | ~9,200 |
| Validation | ~1,150 |
| Test | ~1,150 |
Evaluation
Evaluated on the held-out test set using seqeval (entity-level span matching).
| Metric | Score |
|---|---|
| Overall F1 | reported after training |
| Overall Precision | reported after training |
| Overall Recall | reported after training |
Per-entity F1 scores are available in label_map.json in the model repository.
Limitations
- Trained exclusively on English text.
- Best suited for formal news-style writing about energy markets and geopolitics.
- Performance may degrade on highly technical engineering documents or non-standard text formats.
- Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported.
Citation
If you use this model in your work, please cite:
@misc{quantbridge-energy-ner-2025,
title = {Energy Intelligence NER},
author = {Quantbridge},
year = {2025},
url = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner}
}
License
Apache 2.0 โ see LICENSE.
- Downloads last month
- 13