CrystalX: High-accuracy Crystal Structure Analysis Using Deep Learning

Accepted by the Journal of the American Chemical Society (JACS) Invited for a journal cover feature

Overview

CrystalX is an AI system for routine single-crystal structure analysis from real experimental X-ray diffraction (XRD) data.

Designed specifically for everyday single-crystal structure solution, CrystalX uses geometric deep learning to model electron density and capture underlying three-dimensional geometric interactions directly from large-scale experimental XRD datasets. Compared with traditional rule-based approaches for automatic elemental determination, such as those used in SHELXT and Olex2, CrystalX delivers substantially improved accuracy and robustness.

In prospective, deployment-style evaluations, CrystalX was also compared with AutoChem under practical experimental conditions. Because AutoChem requires a real instrument-generated metadata file (.cif_od) produced by the CrysAlisPro data-reduction workflow, the comparison was performed on real-world cases that satisfied this requirement. CrystalX successfully solved 3/3 test cases, whereas AutoChem solved 1/3.

CrystalX provides the following capabilities:

  • Accurate discrimination between non-hydrogen atoms with similar atomic numbers, including challenging pairs such as C/N/O and P/S/Cl
  • Fast and fully correct solution of large organometallic structures containing up to 370 non-hydrogen atoms
  • Detection of 9 verified expert interpretation errors among 1,559 held-out structures published in JCR Q1 journals, including subtle cases that triggered no CheckCIF A/B alerts
  • Confidence scores for both heavy-atom and hydrogen predictions
  • Natural integration into standard crystallographic workflows

Model Architecture

CrystalX adopts a two-stage geometric deep learning pipeline to predict both non-hydrogen and hydrogen atoms.

Both public checkpoints are built on an Equivariant Transformer backbone, specifically TorchMD-NET.

For hydrogen prediction, CrystalX leverages both intramolecular and intermolecular context by incorporating symmetry-equivalent neighbors within 3.2 Å. This design yields more than a 7% improvement over using intramolecular information alone.


Available Checkpoints

This repository provides the two official inference checkpoints used in the CrystalX pipeline:

  • crystalx-heavy.pth
  • crystalx-hydro.pth

crystalx-heavy.pth

Predicts non-hydrogen element types from coarse electron-density peaks generated by automatic phasing tools such as SHELXT, and outputs a confidence score for each prediction.

crystalx-hydro.pth

Predicts the number of hydrogens attached to each heavy atom after heavy-atom determination, and also provides a confidence score.


Intended Use

In practice, CrystalX can be inserted at different stages of the pipeline for both heavy-atom and hydrogen prediction seamlessly. The official codebase provides a lightweight integration with the SHELX suite, enabling a simple .res-to-.res workflow.

Current Limitation: Disorder

CrystalX does not currently support the resolution of crystallographic disorder, largely because high-quality annotated training data for these cases are scarce. At the same time, disorder prediction is closely connected to the accurate detection and interpretation of residual electron density, making it a natural future extension of the current framework.

We view disorder modeling as a particularly promising direction for further development. Interpreting disorder is inherently a sequential, multi-step reasoning task: it involves iterative analysis, hypothesis generation, testing, and refinement rather than a single-pass prediction. In this context, agentic AI and reinforcement learning may offer a compelling path forward, as they could enable models to learn from sequential refinement processes and better capture the stepwise reasoning needed for robust disorder resolution.


Minimal End-to-End Workflow

A typical wrapper pipeline is:

SHELXT -> CrystalX Heavy -> SHELXL refinement -> CrystalX Hydro -> HFIX/AFIX placement -> SHELXL refinement -> weight refinement -> PLATON / CheckCIF

  1. SHELXT generates coarse electron-density peaks.
  2. CrystalX Heavy predicts non-hydrogen atom types from geometric peak interactions.
  3. SHELXL refines the heavy-atom framework.
  4. CrystalX Hydro predicts how many hydrogens are attached to each heavy atom.
  5. HFIX/AFIX placement and subsequent refinement produce the final all-atom structure.

Demo: https://crystalx.intern-ai.org.cn/

Code

  • GitHub: https://github.com/kaipengm2/CrystalX

Citation

If you find this repository useful, please cite:

@article{doi:10.1021/jacs.5c21832,
  author  = {Zheng, Kaipeng and Huang, Weiran and Ouyang, Wanli and Zhong, Han-Sen and Li, Yuqiang},
  title   = {CrystalX: High-Accuracy Crystal Structure Analysis Using Deep Learning},
  journal = {Journal of the American Chemical Society},
  volume  = {0},
  number  = {0},
  pages   = {null},
  year    = {0},
  doi     = {10.1021/jacs.5c21832}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support