OpenAlphaDiffract / README.md
linked-liszt's picture
Upload README.md with huggingface_hub
46fdeba verified
---
license: bsd-3-clause
language:
- en
tags:
- pytorch
- materials-science
- crystallography
- x-ray-diffraction
- pxrd
- convnext
- arxiv:2603.23367
datasets:
- materials-project
metrics:
- accuracy
- mae
pipeline_tag: other
---
# Open AlphaDiffract
[arXiv](https://arxiv.org/abs/2603.23367) | [GitHub](https://github.com/AdvancedPhotonSource/OpenAlphaDiffract)
**Automated crystallographic analysis of powder X-ray diffraction data.**
AlphaDiffract is a multi-task 1D ConvNeXt model that takes a powder X-ray diffraction (PXRD) pattern and simultaneously predicts:
| Output | Description |
|---|---|
| **Crystal system** | 7-class classification (Triclinic β†’ Cubic) |
| **Space group** | 230-class classification |
| **Lattice parameters** | 6 values: a, b, c (Γ…), Ξ±, Ξ², Ξ³ (Β°) |
This release contains a **single model** trained exclusively on
[Materials Project](https://next-gen.materialsproject.org/) structures
(publicly available data). It is *not* the 10-model ensemble reported in
the paper β€” see [Performance](#performance) for details.
## Quick Start
```bash
pip install torch safetensors
huggingface-cli download linked-liszt/OpenAlphaDiffract --local-dir OpenAlphaDiffract
```
```python
import torch
import numpy as np
from OpenAlphaDiffract.model import AlphaDiffract
model = AlphaDiffract.from_pretrained("OpenAlphaDiffract", device="cpu")
# 8192-point intensity pattern, normalized to [0, 100]
pattern = np.load("my_pattern.npy").astype(np.float32)
pattern = (pattern - pattern.min()) / (pattern.max() - pattern.min() + 1e-10) * 100.0
x = torch.from_numpy(pattern).unsqueeze(0)
with torch.no_grad():
out = model(x)
cs_probs = torch.softmax(out["cs_logits"], dim=-1)
sg_probs = torch.softmax(out["sg_logits"], dim=-1)
lp = out["lp"] # [a, b, c, alpha, beta, gamma]
print("Crystal system:", AlphaDiffract.CRYSTAL_SYSTEMS[cs_probs.argmax().item()])
print("Space group: #", sg_probs.argmax().item() + 1)
print("Lattice params:", lp[0].tolist())
```
See `example_inference.py` for a complete runnable example.
## Files
| File | Description |
|---|---|
| `model.safetensors` | Model weights (safetensors format, ~35 MB) |
| `model.py` | Standalone model definition (pure PyTorch, no Lightning) |
| `config.json` | Architecture and training hyperparameters |
| `maxsub.json` | Space-group subgroup graph (230Γ—230, used as a registered buffer) |
| `example_inference.py` | End-to-end inference example |
| `LICENSE` | BSD 3-Clause |
## Input Format
- **Length:** 8192 equally-spaced intensity values
- **2ΞΈ range:** 5–20Β° (monochromatic, 20 keV)
- **Preprocessing:** floor negatives at zero, then rescale to [0, 100]
- **Shape:** `(batch, 8192)` or `(batch, 1, 8192)`
## Architecture
1D ConvNeXt backbone adapted from [Liu et al. (2022)](https://arxiv.org/abs/2201.03545):
```
Input (8192) β†’ [ConvNeXt Block Γ— 3 with AvgPool] β†’ Flatten (560-d)
β”œβ”€ CS head: MLP 560β†’2300β†’1150β†’7 (crystal system)
β”œβ”€ SG head: MLP 560β†’2300β†’1150β†’230 (space group)
└─ LP head: MLP 560β†’512β†’256β†’6 (lattice parameters, sigmoid-bounded)
```
- **Parameters:** 8,734,989
- **Activation:** GELU
- **Stochastic depth:** 0.3
- **Head dropout:** 0.5
## Performance
This is a **single model** trained on Materials Project data only (no ICSD).
Metrics on the best validation checkpoint (epoch 11):
| Metric | Simulated Val | RRUFF (experimental) |
|---|---|---|
| Crystal system accuracy | 74.88% | 60.35% |
| Space group accuracy | 57.31% | 38.28% |
| Lattice parameter MAE | 2.71 | β€” |
The paper reports higher numbers from a 10-model ensemble trained on
Materials Project + ICSD combined data. This open-weights release covers
only publicly available training data.
## Training Details
| | |
|---|---|
| **Data** | ~146k Materials Project structures, 100 GSAS-II simulations each |
| **Augmentation** | Poisson + Gaussian noise, rescaled to [0, 100] |
| **Optimizer** | AdamW (lr=2e-4, weight_decay=0.01) |
| **Scheduler** | CyclicLR (triangular2, 6-epoch half-cycles) |
| **Loss** | CE (crystal system) + CE + GEMD (space group) + MSE (lattice params) |
| **Hardware** | 1Γ— NVIDIA H100, float32 |
| **Batch size** | 64 |
## Citation
```bibtex
@article{andrejevic2026alphadiffract,
title = {AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data},
author = {Andrejevic, Nina and Du, Ming and Sharma, Hemant and Horwath, James P. and Luo, Aileen and Yin, Xiangyu and Prince, Michael and Toby, Brian H. and Cherukara, Mathew J.},
year = {2026},
eprint = {2603.23367},
archivePrefix = {arXiv},
primaryClass = {cond-mat.mtrl-sci},
doi = {10.48550/arXiv.2603.23367},
url = {https://arxiv.org/abs/2603.23367}
}
```
## License
BSD 3-Clause β€” Copyright 2026 UChicago Argonne, LLC.
## Links
- [arXiv: 2603.23367](https://arxiv.org/abs/2603.23367)
- [GitHub: OpenAlphaDiffract](https://github.com/AdvancedPhotonSource/OpenAlphaDiffract)
- [GitHub: AlphaDiffract](https://github.com/AdvancedPhotonSource/AlphaDiffract)