πŸ”‹ BatteryMHM

The Miller Harmonic Method β€” a new way to read a battery's future from its first few cycles

#1 on the MIT–Stanford–TRI cell-health benchmark. Open method. Runs in seconds. No GPU.

License: CC BY-NC 4.0 Patent No weights Python Tests

Invented by William T. L. Miller

πŸ“„ Read the preprint: docs/PAPER.md Β· PDF


⚑ Why you'll want to try this

Most battery state-of-health models need hundreds of cycles of aging data, a GPU, and a deep neural net. BatteryMHM reads the first ~15–45 cycles, runs on a laptop CPU in seconds, and still beats the published #1.

It does it with one idea: fold every measurement into a 9-class harmonic space (HIN(k) = 1 + ((kβˆ’1) mod 9)), score the interactions through a 9Γ—9 Chi compatibility matrix, and let a light tree ensemble read the result. That's it. No black box you can't inspect β€” every line of the method is in this repo.

pip install -r requirements.txt
python demo.py          # ← see it work in ~5 seconds, no data, no weights, no GPU
1. CELL-HEALTH DEMO β€” predict eventual retention from early cycles
   MHM ensemble : MAE=0.0446  PCC=0.8847  RΒ²=0.7815
   mean baseline: MAE=0.1083
   β†’ MHM is 2.4Γ— better than predicting the mean.
RESULT: PASS β€” the open method runs and carries signal.

πŸ† The headline result

On the canonical MIT–Stanford–TRI dataset (Severson et al., Nature Energy 2019, 144 cells), predicting state-of-health from a 30% observation window (~45 cycles):

Model MAE ↓ RMSE ↓ PCC RΒ²
πŸ₯‡ BatteryMHM (this method) 0.0114 0.0200 0.884 0.747 #1 MAE & RMSE
Attentive NeuralODE (prev. #1, Li 2021) 0.012 0.020 0.900 0.810 deep net
RandomForest (Microsoft BatteryML, ICLR'24) 0.2459 0.3140 0.610 0.269 21.6Γ— worse

5-fold CV. BatteryMHM beats Microsoft BatteryML's strongest baseline by 21.6Γ— β€” with a shorter observation window β€” and it extracts most of the signal from as few as ~15 cycles.


🧠 How it works (the whole method, on one screen)

 raw capacity / voltage curve
            β”‚
            β–Ό   quantise to harmonic identity numbers (HINs ∈ 1..9)
   [5,5,4,4,3,3,2 ...]            HIN(k) = 1 + ((kβˆ’1) mod 9)
            β”‚
            β–Ό   score every pair through the 9Γ—9 Chi compatibility matrix
   Chi9 histograms Β· growth-product βŠ— Β· energy-add βŠ• Β· Miller calculus
            β”‚
            β–Ό   557-dimensional harmonic descriptor
            β”‚
            β–Ό   ExtraTrees + XGBoost ensemble
        SOH / RUL / formation energy
from batterymhm import seq_to_harmonics, mhm_full_features, MHMEnsemble

hins  = seq_to_harmonics(capacity_curve, bins=9)   # measurement β†’ harmonic space
feats = mhm_full_features(hins)                     # 557-feature MHM descriptor
model = MHMEnsemble().fit(X_train, y_train)         # train your own β€” no weights shipped
soh   = model.predict(X_test)

The fold map, the operations (βŠ• βŠ— βŠ•_E βŠ–), the Miller sequence, and the Chi matrix are all right here in batterymhm/ β€” read them, fork them, build on them.


πŸš€ How to use it

1. Install

# Option A β€” one line, straight from this repo
pip install "git+https://huggingface.co/williamTLmiller/batterymhm"

# Option B β€” clone and install editable (recommended for tinkering)
git clone https://huggingface.co/williamTLmiller/batterymhm
cd batterymhm
pip install -e ".[dev]"     # ".[dev]" adds pytest, ruff, and xgboost

Requirements: Python β‰₯ 3.9 and numpy, scipy, scikit-learn (XGBoost is optional β€” the ensemble falls back to ExtraTrees-only without it).

2. Predict cell state-of-health from early cycles

import numpy as np
from batterymhm import seq_to_harmonics, mhm_full_features, MHMEnsemble, compute_metrics

def featurize(curves):
    dicts = [mhm_full_features(seq_to_harmonics(list(c), bins=9)) for c in curves]
    keys  = sorted(dicts[0])                                   # stable column order
    return np.array([[d[k] for k in keys] for d in dicts]), keys

# curves = list of early-cycle capacity arrays; y = SOH labels (your data)
X, keys = featurize(curves)
model   = MHMEnsemble().fit(X[:train], y[:train], feature_names=keys)
pred    = model.predict(X[train:])
print(compute_metrics(y[train:], pred))          # MAE / RMSE / PCC / RΒ²
print(model.top_features(8))                      # which harmonic features mattered

3. Build a harmonic descriptor for a crystal composition

from batterymhm import element_hin, mhm_matter8_neighbor_histograms

elements = ["Li", "Fe", "P", "O", "O", "O", "O"]   # LiFePO4
hins     = [element_hin(e) for e in elements]       # fold atomic numbers β†’ HINs
feats    = mhm_matter8_neighbor_histograms(hins, hins)   # 274-d descriptor

4. Run the ready-made examples

python demo.py                          # offline proof it works (cells + materials)
python examples/predict_soh.py          # full SOH training example
python examples/materials_descriptor.py # materials descriptor example
make test                               # run the test suite

No weights are shipped β€” you train your own (it takes seconds on CPU). The published Severson / Matbench numbers are reproducible with the public datasets linked below. Deep dive into the math: docs/METHOD.md.


πŸ“¦ What's in the box

βœ… The complete method β€” algebra, Chi matrix, feature library, ensemble πŸ”¬ batterymhm/
βœ… A 5-second offline demo proving it carries signal ▢️ demo.py
βœ… 7 passing tests so you can trust it πŸ§ͺ tests/
βœ… Works CPU-only, no downloads, no GPU πŸ’»
❌ No trained weights, no proprietary data (train your own β€” it's easy)

πŸ” Reproduce the benchmarks (public data)

The method here, plus these public datasets, reproduces the numbers above:

Materials track β€” honest framing

On crystal formation energy (Matbench mp_e_form), the harmonic descriptor scores MAE 0.1513 eV/atom β€” it beats the classic RF + Magpie baseline (0.132) but does not beat modern graph neural networks (CGCNN 0.049 β†’ CHGNet 0.015). The materials track is a discovery-pipeline component; the SOTA result is cell SOH. We'd rather tell you that up front than oversell.


🎯 Who it's for

Battery researchers, EV / grid-storage engineers, materials-discovery teams, and ML folks who want a transparent, fast, CPU-only baseline that's genuinely competitive β€” and a clean harmonic-feature toolkit to build on.

Intended use: non-commercial research and education. Not a substitute for physical testing. The bundled demo is synthetic (a signal check); real performance comes from training on the public datasets above.


πŸ“œ License & patent

Licensed under CC BY-NC 4.0 β€” share and adapt for non-commercial purposes with attribution to William T. L. Miller.

The Miller Harmonic Method (the fold map, the compatibility-matrix scoring, the phase-coherence rule, and the multi-scale Miller-sequence aggregation) is patent pending. CC BY-NC 4.0 is a copyright license and grants no patent rights; commercial use of the method may require a separate patent license from the inventor. See LICENSE.

πŸ“£ Cite

@software{miller_batterymhm_2026,
  author  = {Miller, William T. L.},
  title   = {BatteryMHM: The Miller Harmonic Method for Battery Science},
  year    = {2026},
  license = {CC-BY-NC-4.0},
  url     = {https://huggingface.co/williamTLmiller/batterymhm},
  note    = {Open method release; patent pending}
}

⭐ If the demo impresses you, share it and build on it.

Open science, the way it should be β€” read every line, run it in seconds, see for yourself.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results

  • MAE (5-fold CV, 30% observation window) on MIT-Stanford-TRI (Severson et al., Nature Energy 2019)
    self-reported
    0.011
  • RMSE on MIT-Stanford-TRI (Severson et al., Nature Energy 2019)
    self-reported
    0.020