--- language: en license: apache-2.0 tags: - token-classification - ner - energy - geopolitics - distilbert pipeline_tag: token-classification --- # Energy Intelligence NER **Model ID:** `Quantbridge/energy-intelligence-multitask-ner` A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition in the **energy markets and geopolitical** domain. The model identifies nine entity types relevant to energy intelligence — companies, commodities, infrastructure, markets, events, and more. --- ## Entity Types | Label | Description | Examples | |---|---|---| | `COMPANY` | Energy sector companies | ExxonMobil, BP, Saudi Aramco | | `COMMODITY` | Energy commodities and resources | crude oil, natural gas, LNG, coal | | `COUNTRY` | Nation states | United States, Russia, Saudi Arabia | | `LOCATION` | Geographic locations, regions | Persian Gulf, North Sea, Permian Basin | | `INFRASTRUCTURE` | Physical energy infrastructure | pipelines, refineries, LNG terminals | | `MARKET` | Energy markets and trading hubs | Henry Hub, Brent, WTI, TTF | | `EVENT` | Market events, geopolitical events | sanctions, OPEC+ cut, supply disruption | | `ORGANIZATION` | Non-company organizations, bodies | OPEC, IEA, G7, US Energy Department | | `PERSON` | Named individuals | ministers, executives, analysts | --- ## Usage ```python from transformers import pipeline ner = pipeline( "token-classification", model="Quantbridge/energy-intelligence-multitask-ner", aggregation_strategy="simple", ) text = ( "Saudi Aramco announced a production cut of 1 million barrels per day " "amid falling crude oil prices at the Brent benchmark market." ) results = ner(text) for entity in results: print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}") ``` **Example output:** ``` Saudi Aramco COMPANY score=0.981 crude oil COMMODITY score=0.974 Brent MARKET score=0.968 ``` ### Load model directly ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch model_name = "Quantbridge/energy-intelligence-multitask-ner" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_ids = logits.argmax(dim=-1)[0] tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) for token, label_id in zip(tokens, predicted_ids): label = model.config.id2label[label_id.item()] if label != "O": print(f"{token:<20} {label}") ``` --- ## Model Details | Property | Value | |---|---| | Base model | `distilbert-base-uncased` | | Architecture | DistilBERT + token classification head | | Parameters | ~67M | | Max sequence length | 256 tokens | | Training precision | FP16 | | Optimizer | AdamW | | Learning rate | 2e-5 | | Warmup ratio | 10% | | Weight decay | 0.01 | | Epochs | 5 | --- ## Training Data The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging. **Dataset split:** | Split | Records | |---|---| | Train | ~9,200 | | Validation | ~1,150 | | Test | ~1,150 | --- ## Evaluation Evaluated on the held-out test set using [seqeval](https://github.com/chakki-works/seqeval) (entity-level span matching). | Metric | Score | |---|---| | Overall F1 | *reported after training* | | Overall Precision | *reported after training* | | Overall Recall | *reported after training* | Per-entity F1 scores are available in `label_map.json` in the model repository. --- ## Limitations - Trained exclusively on English text. - Best suited for formal news-style writing about energy markets and geopolitics. - Performance may degrade on highly technical engineering documents or non-standard text formats. - Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported. --- ## Citation If you use this model in your work, please cite: ```bibtex @misc{quantbridge-energy-ner-2025, title = {Energy Intelligence NER}, author = {Quantbridge}, year = {2025}, url = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner} } ``` --- ## License Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0).