BatteryBERT Electrocatalyst NER v4

Fine-tuned Named Entity Recognition model for extracting durability-related entities from electrocatalyst research literature.

Model Description

This model is fine-tuned from BatteryBERT for domain-specific NER in electrocatalyst and fuel cell research. It identifies key experimental parameters related to catalyst durability, degradation, and electrochemical performance.

Supported Entity Types

Entity	Description	Example
MATERIAL	Catalyst materials and compounds	IrO₂, Pt/C, NiFe-LDH
CONDITION	Experimental conditions (voltage, temperature)	1.6 V vs RHE, 80°C
METRIC	Performance measurements	10 mA/cm², 45 mV/dec
PROCESS	Experimental techniques	electrodeposition, annealing, CV
ELECTROLYTE	Electrolyte solutions	0.5 M H₂SO₄, 1 M KOH
DURATION	Time periods	100 h, 5000 cycles

Performance

Overall Metrics

Metric	Score
F1	83.5%
Precision	78.7%
Recall	88.9%

Per-Entity Performance

Entity	Precision	Recall	F1	Support
CONDITION	0.75	0.91	0.82	175
DURATION	0.79	0.89	0.84	133
ELECTROLYTE	0.79	0.94	0.86	94
MATERIAL	0.83	0.80	0.81	90
METRIC	0.75	0.87	0.81	135
PROCESS	0.86	0.90	0.88	151

Training Details

Training Data

Sentences: 4,985
Total Entities: 8,381
Source: 245 open-access electrocatalyst research papers from MDPI, Nature Communications, Frontiers, and PubMed Central
Focus: Catalyst durability, degradation mechanisms, accelerated stress testing

Entity Distribution in Training Data

Entity	Count	Percentage
CONDITION	1,848	22.0%
METRIC	1,480	17.7%
PROCESS	1,405	16.8%
DURATION	1,193	14.2%
ELECTROLYTE	1,127	13.4%
MATERIAL	895	10.7%

Training Hyperparameters

Base Model: batteryonline/batterybert-cased
Learning Rate: 2e-5
Batch Size: 16
Epochs: 3
Max Sequence Length: 128
Optimizer: AdamW
Training Regime: fp16 mixed precision

Training Infrastructure

Hardware: NVIDIA T4 GPU (Google Colab)
Training Time: ~15 minutes

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load model
tokenizer = AutoTokenizer.from_pretrained("Dmjdxb/batterybert-electrocatalyst-ner-v4")
model = AutoModelForTokenClassification.from_pretrained("Dmjdxb/batterybert-electrocatalyst-ner-v4")

# Create pipeline
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Extract entities
text = "IrO2 showed 10 mA/cm² at 1.6 V vs RHE in 0.5 M H2SO4 after 100 h."
entities = ner_pipeline(text)

for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")

Expected Output

MATERIAL: IrO2 (98%)
METRIC: 10 mA/cm² (98%)
METRIC: 1.6 V vs RHE (99%)
ELECTROLYTE: 0.5 M H2SO4 (99%)
DURATION: 100 h (87%)

Version History

Version	F1 Score	Training Data	Notes
v2	41%	~500 sentences	Initial fine-tuning
v3	68%	1,824 sentences	Improved training data
v4	83.5%	4,985 sentences	Expanded corpus, cleaned labels

Intended Use

This model is designed for:

Extracting experimental parameters from electrocatalyst research papers
Building structured databases of catalyst durability data
Automating literature review for materials science research

Limitations

Trained primarily on English-language academic papers
May not generalize well to patents or informal text
SUPPORT and FAILURE_MODE entities have limited training examples

Citation

If you use this model, please cite:

@misc{batterybert-electrocatalyst-ner-v4,
  author = {DurabilityGraph-AI},
  title = {BatteryBERT Electrocatalyst NER v4},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Dmjdxb/batterybert-electrocatalyst-ner-v4}
}

Acknowledgments

Base model: BatteryBERT by Battery Online
Training data sourced from open-access publications

Downloads last month: 28

Safetensors

Model size

0.1B params

Tensor type

F32