Add model card

38b2add verified 2 days ago

4.67 kB

	---
	language: en
	license: apache-2.0
	tags:
	- token-classification
	- ner
	- energy
	- geopolitics
	- distilbert
	pipeline_tag: token-classification
	---

	# Energy Intelligence NER

	Model ID: `Quantbridge/energy-intelligence-multitask-ner`

	A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for Named Entity Recognition in the energy markets and geopolitical domain. The model identifies nine entity types relevant to energy intelligence — companies, commodities, infrastructure, markets, events, and more.

	---

	## Entity Types

	\| Label \| Description \| Examples \|
	\|---\|---\|---\|
	\| `COMPANY` \| Energy sector companies \| ExxonMobil, BP, Saudi Aramco \|
	\| `COMMODITY` \| Energy commodities and resources \| crude oil, natural gas, LNG, coal \|
	\| `COUNTRY` \| Nation states \| United States, Russia, Saudi Arabia \|
	\| `LOCATION` \| Geographic locations, regions \| Persian Gulf, North Sea, Permian Basin \|
	\| `INFRASTRUCTURE` \| Physical energy infrastructure \| pipelines, refineries, LNG terminals \|
	\| `MARKET` \| Energy markets and trading hubs \| Henry Hub, Brent, WTI, TTF \|
	\| `EVENT` \| Market events, geopolitical events \| sanctions, OPEC+ cut, supply disruption \|
	\| `ORGANIZATION` \| Non-company organizations, bodies \| OPEC, IEA, G7, US Energy Department \|
	\| `PERSON` \| Named individuals \| ministers, executives, analysts \|

	---

	## Usage

	```python
	from transformers import pipeline

	ner = pipeline(
	"token-classification",
	model="Quantbridge/energy-intelligence-multitask-ner",
	aggregation_strategy="simple",
	)

	text = (
	"Saudi Aramco announced a production cut of 1 million barrels per day "
	"amid falling crude oil prices at the Brent benchmark market."
	)

	results = ner(text)
	for entity in results:
	print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}")
	```

	Example output:
	```
	Saudi Aramco COMPANY score=0.981
	crude oil COMMODITY score=0.974
	Brent MARKET score=0.968
	```

	### Load model directly

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	model_name = "Quantbridge/energy-intelligence-multitask-ner"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)

	logits = outputs.logits
	predicted_ids = logits.argmax(dim=-1)[0]
	tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

	for token, label_id in zip(tokens, predicted_ids):
	label = model.config.id2label[label_id.item()]
	if label != "O":
	print(f"{token:<20} {label}")
	```

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| `distilbert-base-uncased` \|
	\| Architecture \| DistilBERT + token classification head \|
	\| Parameters \| ~67M \|
	\| Max sequence length \| 256 tokens \|
	\| Training precision \| FP16 \|
	\| Optimizer \| AdamW \|
	\| Learning rate \| 2e-5 \|
	\| Warmup ratio \| 10% \|
	\| Weight decay \| 0.01 \|
	\| Epochs \| 5 \|

	---

	## Training Data

	The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging.

	Dataset split:

	\| Split \| Records \|
	\|---\|---\|
	\| Train \| ~9,200 \|
	\| Validation \| ~1,150 \|
	\| Test \| ~1,150 \|

	---

	## Evaluation

	Evaluated on the held-out test set using [seqeval](https://github.com/chakki-works/seqeval) (entity-level span matching).

	\| Metric \| Score \|
	\|---\|---\|
	\| Overall F1 \| reported after training \|
	\| Overall Precision \| reported after training \|
	\| Overall Recall \| reported after training \|

	Per-entity F1 scores are available in `label_map.json` in the model repository.

	---

	## Limitations

	- Trained exclusively on English text.
	- Best suited for formal news-style writing about energy markets and geopolitics.
	- Performance may degrade on highly technical engineering documents or non-standard text formats.
	- Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported.

	---

	## Citation

	If you use this model in your work, please cite:

	```bibtex
	@misc{quantbridge-energy-ner-2025,
	title = {Energy Intelligence NER},
	author = {Quantbridge},
	year = {2025},
	url = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner}
	}
	```

	---

	## License

	Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0).