File size: 4,673 Bytes

38b2add

---
language: en
license: apache-2.0
tags:
  - token-classification
  - ner
  - energy
  - geopolitics
  - distilbert
pipeline_tag: token-classification
---

# Energy Intelligence NER

**Model ID:** `Quantbridge/energy-intelligence-multitask-ner`

A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition in the **energy markets and geopolitical** domain. The model identifies nine entity types relevant to energy intelligence — companies, commodities, infrastructure, markets, events, and more.

---

## Entity Types

| Label | Description | Examples |
|---|---|---|
| `COMPANY` | Energy sector companies | ExxonMobil, BP, Saudi Aramco |
| `COMMODITY` | Energy commodities and resources | crude oil, natural gas, LNG, coal |
| `COUNTRY` | Nation states | United States, Russia, Saudi Arabia |
| `LOCATION` | Geographic locations, regions | Persian Gulf, North Sea, Permian Basin |
| `INFRASTRUCTURE` | Physical energy infrastructure | pipelines, refineries, LNG terminals |
| `MARKET` | Energy markets and trading hubs | Henry Hub, Brent, WTI, TTF |
| `EVENT` | Market events, geopolitical events | sanctions, OPEC+ cut, supply disruption |
| `ORGANIZATION` | Non-company organizations, bodies | OPEC, IEA, G7, US Energy Department |
| `PERSON` | Named individuals | ministers, executives, analysts |

---

## Usage

```python
from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="Quantbridge/energy-intelligence-multitask-ner",
    aggregation_strategy="simple",
)

text = (
    "Saudi Aramco announced a production cut of 1 million barrels per day "
    "amid falling crude oil prices at the Brent benchmark market."
)

results = ner(text)
for entity in results:
    print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}")
```

**Example output:**
```
Saudi Aramco                   COMPANY              score=0.981
crude oil                      COMMODITY            score=0.974
Brent                          MARKET               score=0.968
```

### Load model directly

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "Quantbridge/energy-intelligence-multitask-ner"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted_ids = logits.argmax(dim=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, label_id in zip(tokens, predicted_ids):
    label = model.config.id2label[label_id.item()]
    if label != "O":
        print(f"{token:<20} {label}")
```

---

## Model Details

| Property | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Architecture | DistilBERT + token classification head |
| Parameters | ~67M |
| Max sequence length | 256 tokens |
| Training precision | FP16 |
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| Warmup ratio | 10% |
| Weight decay | 0.01 |
| Epochs | 5 |

---

## Training Data

The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging.

**Dataset split:**

| Split | Records |
|---|---|
| Train | ~9,200 |
| Validation | ~1,150 |
| Test | ~1,150 |

---

## Evaluation

Evaluated on the held-out test set using [seqeval](https://github.com/chakki-works/seqeval) (entity-level span matching).

| Metric | Score |
|---|---|
| Overall F1 | *reported after training* |
| Overall Precision | *reported after training* |
| Overall Recall | *reported after training* |

Per-entity F1 scores are available in `label_map.json` in the model repository.

---

## Limitations

- Trained exclusively on English text.
- Best suited for formal news-style writing about energy markets and geopolitics.
- Performance may degrade on highly technical engineering documents or non-standard text formats.
- Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported.

---

## Citation

If you use this model in your work, please cite:

```bibtex
@misc{quantbridge-energy-ner-2025,
  title  = {Energy Intelligence NER},
  author = {Quantbridge},
  year   = {2025},
  url    = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner}
}
```

---

## License

Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0).