| --- |
| language: en |
| license: apache-2.0 |
| tags: |
| - token-classification |
| - ner |
| - energy |
| - geopolitics |
| - distilbert |
| pipeline_tag: token-classification |
| --- |
| |
| # Energy Intelligence NER |
|
|
| **Model ID:** `Quantbridge/energy-intelligence-multitask-ner` |
|
|
| A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition in the **energy markets and geopolitical** domain. The model identifies nine entity types relevant to energy intelligence — companies, commodities, infrastructure, markets, events, and more. |
|
|
| --- |
|
|
| ## Entity Types |
|
|
| | Label | Description | Examples | |
| |---|---|---| |
| | `COMPANY` | Energy sector companies | ExxonMobil, BP, Saudi Aramco | |
| | `COMMODITY` | Energy commodities and resources | crude oil, natural gas, LNG, coal | |
| | `COUNTRY` | Nation states | United States, Russia, Saudi Arabia | |
| | `LOCATION` | Geographic locations, regions | Persian Gulf, North Sea, Permian Basin | |
| | `INFRASTRUCTURE` | Physical energy infrastructure | pipelines, refineries, LNG terminals | |
| | `MARKET` | Energy markets and trading hubs | Henry Hub, Brent, WTI, TTF | |
| | `EVENT` | Market events, geopolitical events | sanctions, OPEC+ cut, supply disruption | |
| | `ORGANIZATION` | Non-company organizations, bodies | OPEC, IEA, G7, US Energy Department | |
| | `PERSON` | Named individuals | ministers, executives, analysts | |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import pipeline |
| |
| ner = pipeline( |
| "token-classification", |
| model="Quantbridge/energy-intelligence-multitask-ner", |
| aggregation_strategy="simple", |
| ) |
| |
| text = ( |
| "Saudi Aramco announced a production cut of 1 million barrels per day " |
| "amid falling crude oil prices at the Brent benchmark market." |
| ) |
| |
| results = ner(text) |
| for entity in results: |
| print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}") |
| ``` |
|
|
| **Example output:** |
| ``` |
| Saudi Aramco COMPANY score=0.981 |
| crude oil COMMODITY score=0.974 |
| Brent MARKET score=0.968 |
| ``` |
|
|
| ### Load model directly |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForTokenClassification |
| import torch |
| |
| model_name = "Quantbridge/energy-intelligence-multitask-ner" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForTokenClassification.from_pretrained(model_name) |
| |
| inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt") |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| |
| logits = outputs.logits |
| predicted_ids = logits.argmax(dim=-1)[0] |
| tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) |
| |
| for token, label_id in zip(tokens, predicted_ids): |
| label = model.config.id2label[label_id.item()] |
| if label != "O": |
| print(f"{token:<20} {label}") |
| ``` |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | `distilbert-base-uncased` | |
| | Architecture | DistilBERT + token classification head | |
| | Parameters | ~67M | |
| | Max sequence length | 256 tokens | |
| | Training precision | FP16 | |
| | Optimizer | AdamW | |
| | Learning rate | 2e-5 | |
| | Warmup ratio | 10% | |
| | Weight decay | 0.01 | |
| | Epochs | 5 | |
|
|
| --- |
|
|
| ## Training Data |
|
|
| The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging. |
|
|
| **Dataset split:** |
|
|
| | Split | Records | |
| |---|---| |
| | Train | ~9,200 | |
| | Validation | ~1,150 | |
| | Test | ~1,150 | |
|
|
| --- |
|
|
| ## Evaluation |
|
|
| Evaluated on the held-out test set using [seqeval](https://github.com/chakki-works/seqeval) (entity-level span matching). |
|
|
| | Metric | Score | |
| |---|---| |
| | Overall F1 | *reported after training* | |
| | Overall Precision | *reported after training* | |
| | Overall Recall | *reported after training* | |
|
|
| Per-entity F1 scores are available in `label_map.json` in the model repository. |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - Trained exclusively on English text. |
| - Best suited for formal news-style writing about energy markets and geopolitics. |
| - Performance may degrade on highly technical engineering documents or non-standard text formats. |
| - Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model in your work, please cite: |
|
|
| ```bibtex |
| @misc{quantbridge-energy-ner-2025, |
| title = {Energy Intelligence NER}, |
| author = {Quantbridge}, |
| year = {2025}, |
| url = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner} |
| } |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0). |
|
|