🔋 BatteryMHM

The Miller Harmonic Method — a new way to read a battery's future from its first few cycles

#1 on the MIT–Stanford–TRI cell-health benchmark. Open method. Runs in seconds. No GPU.

Invented by William T. L. Miller

📄 Read the preprint: docs/PAPER.md · PDF

⚡ Why you'll want to try this

Most battery state-of-health models need hundreds of cycles of aging data, a GPU, and a deep neural net. BatteryMHM reads the first ~15–45 cycles, runs on a laptop CPU in seconds, and still beats the published #1.

It does it with one idea: fold every measurement into a 9-class harmonic space (HIN(k) = 1 + ((k−1) mod 9)), score the interactions through a 9×9 Chi compatibility matrix, and let a light tree ensemble read the result. That's it. No black box you can't inspect — every line of the method is in this repo.

pip install -r requirements.txt
python demo.py          # ← see it work in ~5 seconds, no data, no weights, no GPU

1. CELL-HEALTH DEMO — predict eventual retention from early cycles
   MHM ensemble : MAE=0.0446  PCC=0.8847  R²=0.7815
   mean baseline: MAE=0.1083
   → MHM is 2.4× better than predicting the mean.
RESULT: PASS — the open method runs and carries signal.

🏆 The headline result

On the canonical MIT–Stanford–TRI dataset (Severson et al., Nature Energy 2019, 144 cells), predicting state-of-health from a 30% observation window (~45 cycles):

Model	MAE ↓	RMSE ↓	PCC	R²
🥇 BatteryMHM (this method)	0.0114	0.0200	0.884	0.747	#1 MAE & RMSE
Attentive NeuralODE (prev. #1, Li 2021)	0.012	0.020	0.900	0.810	deep net
RandomForest (Microsoft BatteryML, ICLR'24)	0.2459	0.3140	0.610	0.269	21.6× worse

5-fold CV. BatteryMHM beats Microsoft BatteryML's strongest baseline by 21.6× — with a shorter observation window — and it extracts most of the signal from as few as ~15 cycles.

🧠 How it works (the whole method, on one screen)

 raw capacity / voltage curve
            │
            ▼   quantise to harmonic identity numbers (HINs ∈ 1..9)
   [5,5,4,4,3,3,2 ...]            HIN(k) = 1 + ((k−1) mod 9)
            │
            ▼   score every pair through the 9×9 Chi compatibility matrix
   Chi9 histograms · growth-product ⊗ · energy-add ⊕ · Miller calculus
            │
            ▼   557-dimensional harmonic descriptor
            │
            ▼   ExtraTrees + XGBoost ensemble
        SOH / RUL / formation energy

from batterymhm import seq_to_harmonics, mhm_full_features, MHMEnsemble

hins  = seq_to_harmonics(capacity_curve, bins=9)   # measurement → harmonic space
feats = mhm_full_features(hins)                     # 557-feature MHM descriptor
model = MHMEnsemble().fit(X_train, y_train)         # train your own — no weights shipped
soh   = model.predict(X_test)

The fold map, the operations (⊕ ⊗ ⊕_E ⊖), the Miller sequence, and the Chi matrix are all right here in batterymhm/ — read them, fork them, build on them.

🚀 How to use it

1. Install

# Option A — one line, straight from this repo
pip install "git+https://huggingface.co/williamTLmiller/batterymhm"

# Option B — clone and install editable (recommended for tinkering)
git clone https://huggingface.co/williamTLmiller/batterymhm
cd batterymhm
pip install -e ".[dev]"     # ".[dev]" adds pytest, ruff, and xgboost

Requirements: Python ≥ 3.9 and numpy, scipy, scikit-learn (XGBoost is optional — the ensemble falls back to ExtraTrees-only without it).

2. Predict cell state-of-health from early cycles

import numpy as np
from batterymhm import seq_to_harmonics, mhm_full_features, MHMEnsemble, compute_metrics

def featurize(curves):
    dicts = [mhm_full_features(seq_to_harmonics(list(c), bins=9)) for c in curves]
    keys  = sorted(dicts[0])                                   # stable column order
    return np.array([[d[k] for k in keys] for d in dicts]), keys

# curves = list of early-cycle capacity arrays; y = SOH labels (your data)
X, keys = featurize(curves)
model   = MHMEnsemble().fit(X[:train], y[:train], feature_names=keys)
pred    = model.predict(X[train:])
print(compute_metrics(y[train:], pred))          # MAE / RMSE / PCC / R²
print(model.top_features(8))                      # which harmonic features mattered

3. Build a harmonic descriptor for a crystal composition

from batterymhm import element_hin, mhm_matter8_neighbor_histograms

elements = ["Li", "Fe", "P", "O", "O", "O", "O"]   # LiFePO4
hins     = [element_hin(e) for e in elements]       # fold atomic numbers → HINs
feats    = mhm_matter8_neighbor_histograms(hins, hins)   # 274-d descriptor

4. Run the ready-made examples

python demo.py                          # offline proof it works (cells + materials)
python examples/predict_soh.py          # full SOH training example
python examples/materials_descriptor.py # materials descriptor example
make test                               # run the test suite

No weights are shipped — you train your own (it takes seconds on CPU). The published Severson / Matbench numbers are reproducible with the public datasets linked below. Deep dive into the math: docs/METHOD.md.

📦 What's in the box


✅ The complete method — algebra, Chi matrix, feature library, ensemble	🔬 `batterymhm/`
✅ A 5-second offline demo proving it carries signal	▶️ `demo.py`
✅ 7 passing tests so you can trust it	🧪 `tests/`
✅ Works CPU-only, no downloads, no GPU	💻
❌ No trained weights, no proprietary data	(train your own — it's easy)

🔁 Reproduce the benchmarks (public data)

The method here, plus these public datasets, reproduces the numbers above:

Cell SOH — MIT–Stanford–TRI: https://data.matr.io/1/projects/5c48dd2bc625d700019f3204
Materials — Matbench mp_e_form: https://matbench.materialsproject.org (auto-loads via matminer)

Materials track — honest framing

On crystal formation energy (Matbench mp_e_form), the harmonic descriptor scores MAE 0.1513 eV/atom — it beats the classic RF + Magpie baseline (0.132) but does not beat modern graph neural networks (CGCNN 0.049 → CHGNet 0.015). The materials track is a discovery-pipeline component; the SOTA result is cell SOH. We'd rather tell you that up front than oversell.

🎯 Who it's for

Battery researchers, EV / grid-storage engineers, materials-discovery teams, and ML folks who want a transparent, fast, CPU-only baseline that's genuinely competitive — and a clean harmonic-feature toolkit to build on.

Intended use: non-commercial research and education. Not a substitute for physical testing. The bundled demo is synthetic (a signal check); real performance comes from training on the public datasets above.

📜 License & patent

Licensed under CC BY-NC 4.0 — share and adapt for non-commercial purposes with attribution to William T. L. Miller.

The Miller Harmonic Method (the fold map, the compatibility-matrix scoring, the phase-coherence rule, and the multi-scale Miller-sequence aggregation) is patent pending. CC BY-NC 4.0 is a copyright license and grants no patent rights; commercial use of the method may require a separate patent license from the inventor. See LICENSE.

📣 Cite

@software{miller_batterymhm_2026,
  author  = {Miller, William T. L.},
  title   = {BatteryMHM: The Miller Harmonic Method for Battery Science},
  year    = {2026},
  license = {CC-BY-NC-4.0},
  url     = {https://huggingface.co/williamTLmiller/batterymhm},
  note    = {Open method release; patent pending}
}

⭐ If the demo impresses you, share it and build on it.

Open science, the way it should be — read every line, run it in seconds, see for yourself.

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

MAE (5-fold CV, 30% observation window) on MIT-Stanford-TRI (Severson et al., Nature Energy 2019)
self-reported

0.011
RMSE on MIT-Stanford-TRI (Severson et al., Nature Energy 2019)
self-reported

0.020