--- tags: - crystallography - x-ray-diffraction - single-crystal-structure-analysis - chemistry - pytorch --- # CrystalX: High-accuracy Crystal Structure Analysis Using Deep Learning **Accepted by the *Journal of the American Chemical Society* (JACS)** **Invited for a journal cover feature** ## Overview CrystalX is an AI system for routine single-crystal structure analysis from real experimental X-ray diffraction (XRD) data. Designed specifically for everyday single-crystal structure solution, CrystalX uses geometric deep learning to model electron density and capture underlying three-dimensional geometric interactions directly from large-scale experimental XRD datasets. Compared with traditional rule-based approaches for automatic elemental determination, such as those used in **SHELXT** and **Olex2**, CrystalX delivers substantially improved accuracy and robustness. In prospective, deployment-style evaluations, CrystalX was also compared with **AutoChem** under practical experimental conditions. Because AutoChem requires a real instrument-generated metadata file (`.cif_od`) produced by the **CrysAlisPro** data-reduction workflow, the comparison was performed on real-world cases that satisfied this requirement. CrystalX successfully solved **3/3** test cases, whereas AutoChem solved **1/3**. CrystalX provides the following capabilities: * Accurate discrimination between non-hydrogen atoms with similar atomic numbers, including challenging pairs such as **C/N/O** and **P/S/Cl** * Fast and fully correct solution of large organometallic structures containing up to **370 non-hydrogen atoms** * Detection of **9 verified expert interpretation errors** among **1,559** held-out structures published in **JCR Q1 journals**, including subtle cases that triggered no **CheckCIF A/B** alerts * Confidence scores for both heavy-atom and hydrogen predictions * Natural integration into standard crystallographic workflows --- ## Model Architecture CrystalX adopts a two-stage geometric deep learning pipeline to predict both non-hydrogen and hydrogen atoms. Both public checkpoints are built on an Equivariant Transformer backbone, specifically TorchMD-NET. For hydrogen prediction, CrystalX leverages both intramolecular and intermolecular context by incorporating symmetry-equivalent neighbors within 3.2 Å. This design yields more than a 7% improvement over using intramolecular information alone. --- ## Available Checkpoints This repository provides the two official inference checkpoints used in the CrystalX pipeline: - `crystalx-heavy.pth` - `crystalx-hydro.pth` ### `crystalx-heavy.pth` Predicts **non-hydrogen element types** from coarse electron-density peaks generated by automatic phasing tools such as **SHELXT**, and outputs a **confidence score** for each prediction. ### `crystalx-hydro.pth` Predicts the **number of hydrogens attached to each heavy atom** after heavy-atom determination, and also provides a **confidence score**. --- ## Intended Use In practice, CrystalX can be inserted at different stages of the pipeline for both **heavy-atom** and **hydrogen** prediction seamlessly. The official codebase provides a lightweight integration with the **SHELX** suite, enabling a simple **`.res`-to-`.res`** workflow. ### Current Limitation: Disorder CrystalX does not currently support the resolution of crystallographic disorder, largely because high-quality annotated training data for these cases are scarce. At the same time, disorder prediction is closely connected to the accurate detection and interpretation of residual electron density, making it a natural future extension of the current framework. We view disorder modeling as a particularly promising direction for further development. Interpreting disorder is inherently a sequential, multi-step reasoning task: it involves iterative analysis, hypothesis generation, testing, and refinement rather than a single-pass prediction. In this context, agentic AI and reinforcement learning may offer a compelling path forward, as they could enable models to learn from sequential refinement processes and better capture the stepwise reasoning needed for robust disorder resolution. --- ## Minimal End-to-End Workflow A typical wrapper pipeline is: `SHELXT -> CrystalX Heavy -> SHELXL refinement -> CrystalX Hydro -> HFIX/AFIX placement -> SHELXL refinement -> weight refinement -> PLATON / CheckCIF` 1. **SHELXT** generates coarse electron-density peaks. 2. **CrystalX Heavy** predicts non-hydrogen atom types from geometric peak interactions. 3. **SHELXL** refines the heavy-atom framework. 4. **CrystalX Hydro** predicts how many hydrogens are attached to each heavy atom. 5. **HFIX/AFIX** placement and subsequent refinement produce the final all-atom structure. Demo: `https://crystalx.intern-ai.org.cn/` ## Code - GitHub: `https://github.com/kaipengm2/CrystalX` ## Citation If you find this repository useful, please cite: ```bibtex @article{doi:10.1021/jacs.5c21832, author = {Zheng, Kaipeng and Huang, Weiran and Ouyang, Wanli and Zhong, Han-Sen and Li, Yuqiang}, title = {CrystalX: High-Accuracy Crystal Structure Analysis Using Deep Learning}, journal = {Journal of the American Chemical Society}, volume = {0}, number = {0}, pages = {null}, year = {0}, doi = {10.1021/jacs.5c21832} } ```