File size: 5,061 Bytes
8a10305
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3f1f7f
8a10305
cb91d39
8a10305
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46fdeba
 
8a10305
 
 
198ddf9
 
46fdeba
198ddf9
46fdeba
8a10305
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ae763e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
license: bsd-3-clause
language:
- en
tags:
- pytorch
- materials-science
- crystallography
- x-ray-diffraction
- pxrd
- convnext
- arxiv:2603.23367
datasets:
- materials-project
metrics:
- accuracy
- mae
pipeline_tag: other
---

# Open AlphaDiffract 

[arXiv](https://arxiv.org/abs/2603.23367) | [GitHub](https://github.com/AdvancedPhotonSource/OpenAlphaDiffract)

**Automated crystallographic analysis of powder X-ray diffraction data.**

AlphaDiffract is a multi-task 1D ConvNeXt model that takes a powder X-ray diffraction (PXRD) pattern and simultaneously predicts:

| Output | Description |
|---|---|
| **Crystal system** | 7-class classification (Triclinic β†’ Cubic) |
| **Space group** | 230-class classification |
| **Lattice parameters** | 6 values: a, b, c (Γ…), Ξ±, Ξ², Ξ³ (Β°) |

This release contains a **single model** trained exclusively on
[Materials Project](https://next-gen.materialsproject.org/) structures
(publicly available data). It is *not* the 10-model ensemble reported in
the paper β€” see [Performance](#performance) for details.

## Quick Start

```bash
pip install torch safetensors
huggingface-cli download linked-liszt/OpenAlphaDiffract --local-dir OpenAlphaDiffract
```

```python
import torch
import numpy as np
from OpenAlphaDiffract.model import AlphaDiffract

model = AlphaDiffract.from_pretrained("OpenAlphaDiffract", device="cpu")

# 8192-point intensity pattern, normalized to [0, 100]
pattern = np.load("my_pattern.npy").astype(np.float32)
pattern = (pattern - pattern.min()) / (pattern.max() - pattern.min() + 1e-10) * 100.0
x = torch.from_numpy(pattern).unsqueeze(0)

with torch.no_grad():
    out = model(x)

cs_probs = torch.softmax(out["cs_logits"], dim=-1)
sg_probs = torch.softmax(out["sg_logits"], dim=-1)
lp = out["lp"]  # [a, b, c, alpha, beta, gamma]

print("Crystal system:", AlphaDiffract.CRYSTAL_SYSTEMS[cs_probs.argmax().item()])
print("Space group:   #", sg_probs.argmax().item() + 1)
print("Lattice params:", lp[0].tolist())
```

See `example_inference.py` for a complete runnable example.

## Files

| File | Description |
|---|---|
| `model.safetensors` | Model weights (safetensors format, ~35 MB) |
| `model.py` | Standalone model definition (pure PyTorch, no Lightning) |
| `config.json` | Architecture and training hyperparameters |
| `maxsub.json` | Space-group subgroup graph (230Γ—230, used as a registered buffer) |
| `example_inference.py` | End-to-end inference example |
| `LICENSE` | BSD 3-Clause |


## Input Format

- **Length:** 8192 equally-spaced intensity values
- **2ΞΈ range:** 5–20Β° (monochromatic, 20 keV)
- **Preprocessing:** floor negatives at zero, then rescale to [0, 100]
- **Shape:** `(batch, 8192)` or `(batch, 1, 8192)`

## Architecture

1D ConvNeXt backbone adapted from [Liu et al. (2022)](https://arxiv.org/abs/2201.03545):

```
Input (8192) β†’ [ConvNeXt Block Γ— 3 with AvgPool] β†’ Flatten (560-d)
  β”œβ”€ CS head:  MLP 560β†’2300β†’1150β†’7    (crystal system)
  β”œβ”€ SG head:  MLP 560β†’2300β†’1150β†’230  (space group)
  └─ LP head:  MLP 560β†’512β†’256β†’6      (lattice parameters, sigmoid-bounded)
```

- **Parameters:** 8,734,989
- **Activation:** GELU
- **Stochastic depth:** 0.3
- **Head dropout:** 0.5

## Performance

This is a **single model** trained on Materials Project data only (no ICSD).
Metrics on the best validation checkpoint (epoch 11):

| Metric | Simulated Val | RRUFF (experimental) |
|---|---|---|
| Crystal system accuracy | 74.88% | 60.35% |
| Space group accuracy | 57.31% | 38.28% |
| Lattice parameter MAE | 2.71 | β€” |

The paper reports higher numbers from a 10-model ensemble trained on
Materials Project + ICSD combined data. This open-weights release covers
only publicly available training data.

## Training Details

| | |
|---|---|
| **Data** | ~146k Materials Project structures, 100 GSAS-II simulations each |
| **Augmentation** | Poisson + Gaussian noise, rescaled to [0, 100] |
| **Optimizer** | AdamW (lr=2e-4, weight_decay=0.01) |
| **Scheduler** | CyclicLR (triangular2, 6-epoch half-cycles) |
| **Loss** | CE (crystal system) + CE + GEMD (space group) + MSE (lattice params) |
| **Hardware** | 1Γ— NVIDIA H100, float32 |
| **Batch size** | 64 |

## Citation

```bibtex
@article{andrejevic2026alphadiffract,
  title   = {AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data},
  author  = {Andrejevic, Nina and Du, Ming and Sharma, Hemant and Horwath, James P. and Luo, Aileen and Yin, Xiangyu and Prince, Michael and Toby, Brian H. and Cherukara, Mathew J.},
  year    = {2026},
  eprint  = {2603.23367},
  archivePrefix = {arXiv},
  primaryClass  = {cond-mat.mtrl-sci},
  doi     = {10.48550/arXiv.2603.23367},
  url     = {https://arxiv.org/abs/2603.23367}
}
```

## License

BSD 3-Clause β€” Copyright 2026 UChicago Argonne, LLC.

## Links

- [arXiv: 2603.23367](https://arxiv.org/abs/2603.23367)
- [GitHub: OpenAlphaDiffract](https://github.com/AdvancedPhotonSource/OpenAlphaDiffract)
- [GitHub: AlphaDiffract](https://github.com/AdvancedPhotonSource/AlphaDiffract)