BatteryBERT Electrocatalyst NER v4
Fine-tuned Named Entity Recognition model for extracting durability-related entities from electrocatalyst research literature.
Model Description
This model is fine-tuned from BatteryBERT for domain-specific NER in electrocatalyst and fuel cell research. It identifies key experimental parameters related to catalyst durability, degradation, and electrochemical performance.
Supported Entity Types
| Entity |
Description |
Example |
| MATERIAL |
Catalyst materials and compounds |
IrO₂, Pt/C, NiFe-LDH |
| CONDITION |
Experimental conditions (voltage, temperature) |
1.6 V vs RHE, 80°C |
| METRIC |
Performance measurements |
10 mA/cm², 45 mV/dec |
| PROCESS |
Experimental techniques |
electrodeposition, annealing, CV |
| ELECTROLYTE |
Electrolyte solutions |
0.5 M H₂SO₄, 1 M KOH |
| DURATION |
Time periods |
100 h, 5000 cycles |
Performance
Overall Metrics
| Metric |
Score |
| F1 |
83.5% |
| Precision |
78.7% |
| Recall |
88.9% |
Per-Entity Performance
| Entity |
Precision |
Recall |
F1 |
Support |
| CONDITION |
0.75 |
0.91 |
0.82 |
175 |
| DURATION |
0.79 |
0.89 |
0.84 |
133 |
| ELECTROLYTE |
0.79 |
0.94 |
0.86 |
94 |
| MATERIAL |
0.83 |
0.80 |
0.81 |
90 |
| METRIC |
0.75 |
0.87 |
0.81 |
135 |
| PROCESS |
0.86 |
0.90 |
0.88 |
151 |
Training Details
Training Data
- Sentences: 4,985
- Total Entities: 8,381
- Source: 245 open-access electrocatalyst research papers from MDPI, Nature Communications, Frontiers, and PubMed Central
- Focus: Catalyst durability, degradation mechanisms, accelerated stress testing
Entity Distribution in Training Data
| Entity |
Count |
Percentage |
| CONDITION |
1,848 |
22.0% |
| METRIC |
1,480 |
17.7% |
| PROCESS |
1,405 |
16.8% |
| DURATION |
1,193 |
14.2% |
| ELECTROLYTE |
1,127 |
13.4% |
| MATERIAL |
895 |
10.7% |
Training Hyperparameters
- Base Model: batteryonline/batterybert-cased
- Learning Rate: 2e-5
- Batch Size: 16
- Epochs: 3
- Max Sequence Length: 128
- Optimizer: AdamW
- Training Regime: fp16 mixed precision
Training Infrastructure
- Hardware: NVIDIA T4 GPU (Google Colab)
- Training Time: ~15 minutes
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("Dmjdxb/batterybert-electrocatalyst-ner-v4")
model = AutoModelForTokenClassification.from_pretrained("Dmjdxb/batterybert-electrocatalyst-ner-v4")
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "IrO2 showed 10 mA/cm² at 1.6 V vs RHE in 0.5 M H2SO4 after 100 h."
entities = ner_pipeline(text)
for entity in entities:
print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")
Expected Output
MATERIAL: IrO2 (98%)
METRIC: 10 mA/cm² (98%)
METRIC: 1.6 V vs RHE (99%)
ELECTROLYTE: 0.5 M H2SO4 (99%)
DURATION: 100 h (87%)
Version History
| Version |
F1 Score |
Training Data |
Notes |
| v2 |
41% |
~500 sentences |
Initial fine-tuning |
| v3 |
68% |
1,824 sentences |
Improved training data |
| v4 |
83.5% |
4,985 sentences |
Expanded corpus, cleaned labels |
Intended Use
This model is designed for:
- Extracting experimental parameters from electrocatalyst research papers
- Building structured databases of catalyst durability data
- Automating literature review for materials science research
Limitations
- Trained primarily on English-language academic papers
- May not generalize well to patents or informal text
- SUPPORT and FAILURE_MODE entities have limited training examples
Citation
If you use this model, please cite:
@misc{batterybert-electrocatalyst-ner-v4,
author = {DurabilityGraph-AI},
title = {BatteryBERT Electrocatalyst NER v4},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Dmjdxb/batterybert-electrocatalyst-ner-v4}
}
Acknowledgments
- Base model: BatteryBERT by Battery Online
- Training data sourced from open-access publications