BatteryBERT Electrocatalyst NER v4

Fine-tuned Named Entity Recognition model for extracting durability-related entities from electrocatalyst research literature.

Model Description

This model is fine-tuned from BatteryBERT for domain-specific NER in electrocatalyst and fuel cell research. It identifies key experimental parameters related to catalyst durability, degradation, and electrochemical performance.

Supported Entity Types

Entity Description Example
MATERIAL Catalyst materials and compounds IrO₂, Pt/C, NiFe-LDH
CONDITION Experimental conditions (voltage, temperature) 1.6 V vs RHE, 80°C
METRIC Performance measurements 10 mA/cm², 45 mV/dec
PROCESS Experimental techniques electrodeposition, annealing, CV
ELECTROLYTE Electrolyte solutions 0.5 M H₂SO₄, 1 M KOH
DURATION Time periods 100 h, 5000 cycles

Performance

Overall Metrics

Metric Score
F1 83.5%
Precision 78.7%
Recall 88.9%

Per-Entity Performance

Entity Precision Recall F1 Support
CONDITION 0.75 0.91 0.82 175
DURATION 0.79 0.89 0.84 133
ELECTROLYTE 0.79 0.94 0.86 94
MATERIAL 0.83 0.80 0.81 90
METRIC 0.75 0.87 0.81 135
PROCESS 0.86 0.90 0.88 151

Training Details

Training Data

  • Sentences: 4,985
  • Total Entities: 8,381
  • Source: 245 open-access electrocatalyst research papers from MDPI, Nature Communications, Frontiers, and PubMed Central
  • Focus: Catalyst durability, degradation mechanisms, accelerated stress testing

Entity Distribution in Training Data

Entity Count Percentage
CONDITION 1,848 22.0%
METRIC 1,480 17.7%
PROCESS 1,405 16.8%
DURATION 1,193 14.2%
ELECTROLYTE 1,127 13.4%
MATERIAL 895 10.7%

Training Hyperparameters

  • Base Model: batteryonline/batterybert-cased
  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 3
  • Max Sequence Length: 128
  • Optimizer: AdamW
  • Training Regime: fp16 mixed precision

Training Infrastructure

  • Hardware: NVIDIA T4 GPU (Google Colab)
  • Training Time: ~15 minutes

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load model
tokenizer = AutoTokenizer.from_pretrained("Dmjdxb/batterybert-electrocatalyst-ner-v4")
model = AutoModelForTokenClassification.from_pretrained("Dmjdxb/batterybert-electrocatalyst-ner-v4")

# Create pipeline
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Extract entities
text = "IrO2 showed 10 mA/cm² at 1.6 V vs RHE in 0.5 M H2SO4 after 100 h."
entities = ner_pipeline(text)

for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")

Expected Output

MATERIAL: IrO2 (98%)
METRIC: 10 mA/cm² (98%)
METRIC: 1.6 V vs RHE (99%)
ELECTROLYTE: 0.5 M H2SO4 (99%)
DURATION: 100 h (87%)

Version History

Version F1 Score Training Data Notes
v2 41% ~500 sentences Initial fine-tuning
v3 68% 1,824 sentences Improved training data
v4 83.5% 4,985 sentences Expanded corpus, cleaned labels

Intended Use

This model is designed for:

  • Extracting experimental parameters from electrocatalyst research papers
  • Building structured databases of catalyst durability data
  • Automating literature review for materials science research

Limitations

  • Trained primarily on English-language academic papers
  • May not generalize well to patents or informal text
  • SUPPORT and FAILURE_MODE entities have limited training examples

Citation

If you use this model, please cite:

@misc{batterybert-electrocatalyst-ner-v4,
  author = {DurabilityGraph-AI},
  title = {BatteryBERT Electrocatalyst NER v4},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Dmjdxb/batterybert-electrocatalyst-ner-v4}
}

Acknowledgments

  • Base model: BatteryBERT by Battery Online
  • Training data sourced from open-access publications
Downloads last month
1,119
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support