| --- |
| tags: |
| - crystallography |
| - x-ray-diffraction |
| - single-crystal-structure-analysis |
| - chemistry |
| - pytorch |
| --- |
| |
| # CrystalX: High-accuracy Crystal Structure Analysis Using Deep Learning |
|
|
| **Accepted by the *Journal of the American Chemical Society* (JACS)** |
| **Invited for a journal cover feature** |
|
|
| ## Overview |
|
|
| CrystalX is an AI system for routine single-crystal structure analysis from real experimental X-ray diffraction (XRD) data. |
|
|
| Designed specifically for everyday single-crystal structure solution, CrystalX uses geometric deep learning to model electron density and capture underlying three-dimensional geometric interactions directly from large-scale experimental XRD datasets. Compared with traditional rule-based approaches for automatic elemental determination, such as those used in **SHELXT** and **Olex2**, CrystalX delivers substantially improved accuracy and robustness. |
|
|
| In prospective, deployment-style evaluations, CrystalX was also compared with **AutoChem** under practical experimental conditions. Because AutoChem requires a real instrument-generated metadata file (`.cif_od`) produced by the **CrysAlisPro** data-reduction workflow, the comparison was performed on real-world cases that satisfied this requirement. CrystalX successfully solved **3/3** test cases, whereas AutoChem solved **1/3**. |
|
|
| CrystalX provides the following capabilities: |
|
|
| * Accurate discrimination between non-hydrogen atoms with similar atomic numbers, including challenging pairs such as **C/N/O** and **P/S/Cl** |
| * Fast and fully correct solution of large organometallic structures containing up to **370 non-hydrogen atoms** |
| * Detection of **9 verified expert interpretation errors** among **1,559** held-out structures published in **JCR Q1 journals**, including subtle cases that triggered no **CheckCIF A/B** alerts |
| * Confidence scores for both heavy-atom and hydrogen predictions |
| * Natural integration into standard crystallographic workflows |
|
|
| --- |
|
|
| ## Model Architecture |
|
|
| CrystalX adopts a two-stage geometric deep learning pipeline to predict both non-hydrogen and hydrogen atoms. |
|
|
| Both public checkpoints are built on an Equivariant Transformer backbone, specifically TorchMD-NET. |
|
|
| For hydrogen prediction, CrystalX leverages both intramolecular and intermolecular context by incorporating symmetry-equivalent neighbors within 3.2 Å. This design yields more than a 7% improvement over using intramolecular information alone. |
|
|
| --- |
|
|
| ## Available Checkpoints |
|
|
| This repository provides the two official inference checkpoints used in the CrystalX pipeline: |
|
|
| - `crystalx-heavy.pth` |
| - `crystalx-hydro.pth` |
|
|
| ### `crystalx-heavy.pth` |
|
|
| Predicts **non-hydrogen element types** from coarse electron-density peaks generated by automatic phasing tools such as **SHELXT**, and outputs a **confidence score** for each prediction. |
|
|
| ### `crystalx-hydro.pth` |
|
|
| Predicts the **number of hydrogens attached to each heavy atom** after heavy-atom determination, and also provides a **confidence score**. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| In practice, CrystalX can be inserted at different stages of the pipeline for both **heavy-atom** and **hydrogen** prediction seamlessly. The official codebase provides a lightweight integration with the **SHELX** suite, enabling a simple **`.res`-to-`.res`** workflow. |
|
|
| ### Current Limitation: Disorder |
|
|
| CrystalX does not currently support the resolution of crystallographic disorder, largely because high-quality annotated training data for these cases are scarce. At the same time, disorder prediction is closely connected to the accurate detection and interpretation of residual electron density, making it a natural future extension of the current framework. |
|
|
| We view disorder modeling as a particularly promising direction for further development. Interpreting disorder is inherently a sequential, multi-step reasoning task: it involves iterative analysis, hypothesis generation, testing, and refinement rather than a single-pass prediction. In this context, agentic AI and reinforcement learning may offer a compelling path forward, as they could enable models to learn from sequential refinement processes and better capture the stepwise reasoning needed for robust disorder resolution. |
|
|
| --- |
|
|
| ## Minimal End-to-End Workflow |
|
|
| A typical wrapper pipeline is: |
|
|
| `SHELXT -> CrystalX Heavy -> SHELXL refinement -> CrystalX Hydro -> HFIX/AFIX placement -> SHELXL refinement -> weight refinement -> PLATON / CheckCIF` |
|
|
| 1. **SHELXT** generates coarse electron-density peaks. |
| 2. **CrystalX Heavy** predicts non-hydrogen atom types from geometric peak interactions. |
| 3. **SHELXL** refines the heavy-atom framework. |
| 4. **CrystalX Hydro** predicts how many hydrogens are attached to each heavy atom. |
| 5. **HFIX/AFIX** placement and subsequent refinement produce the final all-atom structure. |
|
|
| Demo: `https://crystalx.intern-ai.org.cn/` |
|
|
| ## Code |
|
|
| - GitHub: `https://github.com/kaipengm2/CrystalX` |
|
|
| ## Citation |
|
|
| If you find this repository useful, please cite: |
|
|
| ```bibtex |
| @article{doi:10.1021/jacs.5c21832, |
| author = {Zheng, Kaipeng and Huang, Weiran and Ouyang, Wanli and Zhong, Han-Sen and Li, Yuqiang}, |
| title = {CrystalX: High-Accuracy Crystal Structure Analysis Using Deep Learning}, |
| journal = {Journal of the American Chemical Society}, |
| volume = {0}, |
| number = {0}, |
| pages = {null}, |
| year = {0}, |
| doi = {10.1021/jacs.5c21832} |
| } |
| ``` |