CrystalX: High-accuracy Crystal Structure Analysis Using Deep Learning
Accepted by the Journal of the American Chemical Society (JACS) Invited for a journal cover feature
Overview
CrystalX is an AI system for routine single-crystal structure analysis from real experimental X-ray diffraction (XRD) data.
Designed specifically for everyday single-crystal structure solution, CrystalX uses geometric deep learning to model electron density and capture underlying three-dimensional geometric interactions directly from large-scale experimental XRD datasets. Compared with traditional rule-based approaches for automatic elemental determination, such as those used in SHELXT and Olex2, CrystalX delivers substantially improved accuracy and robustness.
In prospective, deployment-style evaluations, CrystalX was also compared with AutoChem under practical experimental conditions. Because AutoChem requires a real instrument-generated metadata file (.cif_od) produced by the CrysAlisPro data-reduction workflow, the comparison was performed on real-world cases that satisfied this requirement. CrystalX successfully solved 3/3 test cases, whereas AutoChem solved 1/3.
CrystalX provides the following capabilities:
- Accurate discrimination between non-hydrogen atoms with similar atomic numbers, including challenging pairs such as C/N/O and P/S/Cl
- Fast and fully correct solution of large organometallic structures containing up to 370 non-hydrogen atoms
- Detection of 9 verified expert interpretation errors among 1,559 held-out structures published in JCR Q1 journals, including subtle cases that triggered no CheckCIF A/B alerts
- Confidence scores for both heavy-atom and hydrogen predictions
- Natural integration into standard crystallographic workflows
Model Architecture
CrystalX adopts a two-stage geometric deep learning pipeline to predict both non-hydrogen and hydrogen atoms.
Both public checkpoints are built on an Equivariant Transformer backbone, specifically TorchMD-NET.
For hydrogen prediction, CrystalX leverages both intramolecular and intermolecular context by incorporating symmetry-equivalent neighbors within 3.2 Å. This design yields more than a 7% improvement over using intramolecular information alone.
Available Checkpoints
This repository provides the two official inference checkpoints used in the CrystalX pipeline:
crystalx-heavy.pthcrystalx-hydro.pth
crystalx-heavy.pth
Predicts non-hydrogen element types from coarse electron-density peaks generated by automatic phasing tools such as SHELXT, and outputs a confidence score for each prediction.
crystalx-hydro.pth
Predicts the number of hydrogens attached to each heavy atom after heavy-atom determination, and also provides a confidence score.
Intended Use
In practice, CrystalX can be inserted at different stages of the pipeline for both heavy-atom and hydrogen prediction seamlessly. The official codebase provides a lightweight integration with the SHELX suite, enabling a simple .res-to-.res workflow.
Current Limitation: Disorder
CrystalX does not currently support the resolution of crystallographic disorder, largely because high-quality annotated training data for these cases are scarce. At the same time, disorder prediction is closely connected to the accurate detection and interpretation of residual electron density, making it a natural future extension of the current framework.
We view disorder modeling as a particularly promising direction for further development. Interpreting disorder is inherently a sequential, multi-step reasoning task: it involves iterative analysis, hypothesis generation, testing, and refinement rather than a single-pass prediction. In this context, agentic AI and reinforcement learning may offer a compelling path forward, as they could enable models to learn from sequential refinement processes and better capture the stepwise reasoning needed for robust disorder resolution.
Minimal End-to-End Workflow
A typical wrapper pipeline is:
SHELXT -> CrystalX Heavy -> SHELXL refinement -> CrystalX Hydro -> HFIX/AFIX placement -> SHELXL refinement -> weight refinement -> PLATON / CheckCIF
- SHELXT generates coarse electron-density peaks.
- CrystalX Heavy predicts non-hydrogen atom types from geometric peak interactions.
- SHELXL refines the heavy-atom framework.
- CrystalX Hydro predicts how many hydrogens are attached to each heavy atom.
- HFIX/AFIX placement and subsequent refinement produce the final all-atom structure.
Demo: https://crystalx.intern-ai.org.cn/
Code
- GitHub:
https://github.com/kaipengm2/CrystalX
Citation
If you find this repository useful, please cite:
@article{doi:10.1021/jacs.5c21832,
author = {Zheng, Kaipeng and Huang, Weiran and Ouyang, Wanli and Zhong, Han-Sen and Li, Yuqiang},
title = {CrystalX: High-Accuracy Crystal Structure Analysis Using Deep Learning},
journal = {Journal of the American Chemical Society},
volume = {0},
number = {0},
pages = {null},
year = {0},
doi = {10.1021/jacs.5c21832}
}