CrystalX / README.md
Kaipengm2's picture
Upload CrystalX release
2d621dd verified
---
tags:
- crystallography
- x-ray-diffraction
- single-crystal-structure-analysis
- chemistry
- pytorch
---
# CrystalX: High-accuracy Crystal Structure Analysis Using Deep Learning
**Accepted by the *Journal of the American Chemical Society* (JACS)**
**Invited for a journal cover feature**
## Overview
CrystalX is an AI system for routine single-crystal structure analysis from real experimental X-ray diffraction (XRD) data.
Designed specifically for everyday single-crystal structure solution, CrystalX uses geometric deep learning to model electron density and capture underlying three-dimensional geometric interactions directly from large-scale experimental XRD datasets. Compared with traditional rule-based approaches for automatic elemental determination, such as those used in **SHELXT** and **Olex2**, CrystalX delivers substantially improved accuracy and robustness.
In prospective, deployment-style evaluations, CrystalX was also compared with **AutoChem** under practical experimental conditions. Because AutoChem requires a real instrument-generated metadata file (`.cif_od`) produced by the **CrysAlisPro** data-reduction workflow, the comparison was performed on real-world cases that satisfied this requirement. CrystalX successfully solved **3/3** test cases, whereas AutoChem solved **1/3**.
CrystalX provides the following capabilities:
* Accurate discrimination between non-hydrogen atoms with similar atomic numbers, including challenging pairs such as **C/N/O** and **P/S/Cl**
* Fast and fully correct solution of large organometallic structures containing up to **370 non-hydrogen atoms**
* Detection of **9 verified expert interpretation errors** among **1,559** held-out structures published in **JCR Q1 journals**, including subtle cases that triggered no **CheckCIF A/B** alerts
* Confidence scores for both heavy-atom and hydrogen predictions
* Natural integration into standard crystallographic workflows
---
## Model Architecture
CrystalX adopts a two-stage geometric deep learning pipeline to predict both non-hydrogen and hydrogen atoms.
Both public checkpoints are built on an Equivariant Transformer backbone, specifically TorchMD-NET.
For hydrogen prediction, CrystalX leverages both intramolecular and intermolecular context by incorporating symmetry-equivalent neighbors within 3.2 Å. This design yields more than a 7% improvement over using intramolecular information alone.
---
## Available Checkpoints
This repository provides the two official inference checkpoints used in the CrystalX pipeline:
- `crystalx-heavy.pth`
- `crystalx-hydro.pth`
### `crystalx-heavy.pth`
Predicts **non-hydrogen element types** from coarse electron-density peaks generated by automatic phasing tools such as **SHELXT**, and outputs a **confidence score** for each prediction.
### `crystalx-hydro.pth`
Predicts the **number of hydrogens attached to each heavy atom** after heavy-atom determination, and also provides a **confidence score**.
---
## Intended Use
In practice, CrystalX can be inserted at different stages of the pipeline for both **heavy-atom** and **hydrogen** prediction seamlessly. The official codebase provides a lightweight integration with the **SHELX** suite, enabling a simple **`.res`-to-`.res`** workflow.
### Current Limitation: Disorder
CrystalX does not currently support the resolution of crystallographic disorder, largely because high-quality annotated training data for these cases are scarce. At the same time, disorder prediction is closely connected to the accurate detection and interpretation of residual electron density, making it a natural future extension of the current framework.
We view disorder modeling as a particularly promising direction for further development. Interpreting disorder is inherently a sequential, multi-step reasoning task: it involves iterative analysis, hypothesis generation, testing, and refinement rather than a single-pass prediction. In this context, agentic AI and reinforcement learning may offer a compelling path forward, as they could enable models to learn from sequential refinement processes and better capture the stepwise reasoning needed for robust disorder resolution.
---
## Minimal End-to-End Workflow
A typical wrapper pipeline is:
`SHELXT -> CrystalX Heavy -> SHELXL refinement -> CrystalX Hydro -> HFIX/AFIX placement -> SHELXL refinement -> weight refinement -> PLATON / CheckCIF`
1. **SHELXT** generates coarse electron-density peaks.
2. **CrystalX Heavy** predicts non-hydrogen atom types from geometric peak interactions.
3. **SHELXL** refines the heavy-atom framework.
4. **CrystalX Hydro** predicts how many hydrogens are attached to each heavy atom.
5. **HFIX/AFIX** placement and subsequent refinement produce the final all-atom structure.
Demo: `https://crystalx.intern-ai.org.cn/`
## Code
- GitHub: `https://github.com/kaipengm2/CrystalX`
## Citation
If you find this repository useful, please cite:
```bibtex
@article{doi:10.1021/jacs.5c21832,
author = {Zheng, Kaipeng and Huang, Weiran and Ouyang, Wanli and Zhong, Han-Sen and Li, Yuqiang},
title = {CrystalX: High-Accuracy Crystal Structure Analysis Using Deep Learning},
journal = {Journal of the American Chemical Society},
volume = {0},
number = {0},
pages = {null},
year = {0},
doi = {10.1021/jacs.5c21832}
}
```