CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning
CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling) is a Transformer-based foundation model that learns crystal representations from string encodings of crystal structures. Crystals are serialized with a novel Symmetry-Consistent Ordered Parameter Encoding (SCOPE), a compact, coordinate-free representation that captures space-group symmetry, Wyckoff positions, and composition. The model can be fine-tuned for accurate, generalizable, and scalable property prediction, and can be combined with physics laws (e.g. the Debye model) for thermodynamic-consistent predictions.
- π Paper (Nature Communications 17, 4074, 2026): CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning (arXiv preprint)
- π» Code: github.com/BattModels/CLOUD
- ποΈ Authors: Changwen Xu, Shang Zhu, Venkatasubramanian Viswanathan (University of Michigan)
Model Details
| Architecture | BERT encoder (BertForMaskedLM) |
| Hidden size | 768 |
| Hidden layers | 12 |
| Attention heads | 12 |
| Intermediate size | 3072 |
| Max sequence length | 64 |
| Vocab size | 30522 (custom SCOPE tokenizer) |
| Parameters | ~110M |
| Precision | float32 |
| Pretraining objective | Masked language modeling on SCOPE strings |
| Pretraining data | ~6M crystal structures from OPTIMADE |
Repository Layout
ckpt/
βββ config.json # BertForMaskedLM config
βββ generation_config.json
βββ model.safetensors # Pretrained weights (~530 MB)
βββ training_args.bin # HF Trainer arguments used for pretraining
βββ tokenizer_config.json # SCOPE tokenizer config
βββ special_tokens_map.json
βββ added_tokens.json
βββ vocab.txt # SCOPE vocabulary
Usage
Load the pretrained model and tokenizer
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("ChangwenXu/CLOUD", subfolder="ckpt")
model = AutoModelForMaskedLM.from_pretrained("ChangwenXu/CLOUD", subfolder="ckpt")
Encode a crystal structure to a SCOPE string
CLOUD operates on SCOPE string representations. Use the conversion utility from the code repository to turn a CIF file into a SCOPE string:
git clone https://github.com/BattModels/CLOUD.git
cd CLOUD
python structure_to_str.py --dir <path_to_cif> --out <output_path> \
--numproc <num_of_processes> --batchsize <batch_size>
Get crystal embeddings
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("ChangwenXu/CLOUD", subfolder="ckpt")
encoder = AutoModel.from_pretrained("ChangwenXu/CLOUD", subfolder="ckpt")
encoder.eval()
scope_string = "..." # SCOPE representation produced by structure_to_str.py
inputs = tokenizer(scope_string, return_tensors="pt", padding=True, truncation=True, max_length=64)
with torch.no_grad():
outputs = encoder(**inputs)
# Pooled [CLS] embedding for the crystal:
crystal_embedding = outputs.last_hidden_state[:, 0]
Fine-tuning
Recipes for fine-tuning on MatBench, UnconvBench, MatBench Discovery / WBM, and the physics-informed CLOUD-DEBYE variant are provided in the GitHub repository (train.py, train_mp.py, wbm_predict.py, train_debye.py).
Intended Use
- Pretrained backbone for downstream crystal property prediction (formation energy, bandgap, mechanical, thermodynamic properties, etc.)
- Featurizer for materials screening and discovery workflows
- Backbone for physics-informed extensions such as CLOUD-DEBYE
Out-of-scope
- Direct generation of crystal structures from scratch
- Predicting properties of non-crystalline systems (molecules, amorphous solids)
- Use as a substitute for high-fidelity DFT/MD without task-specific fine-tuning and validation
Limitations
- Trained on equilibrium / known crystal structures from OPTIMADE; out-of-distribution behavior on highly disordered, defective, or hypothetical structures is not guaranteed.
- Maximum sequence length of 64 tokens; very large or low-symmetry unit cells may be truncated by the SCOPE encoder.
- Property predictions require task-specific fine-tuning; the released checkpoint is the masked-language-model pretrained backbone only.
Citation
If you find CLOUD useful in your research, please cite:
@article{xu2026cloud,
title = {{CLOUD}: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning},
author = {Xu, Changwen and Zhu, Shang and Viswanathan, Venkatasubramanian},
journal = {Nature Communications},
volume = {17},
number = {1},
pages = {4074},
year = {2026},
doi = {10.1038/s41467-026-70467-3}
}
@inproceedings{xu2024cloud,
title = {{CLOUD}: A Scalable Scientific Foundation Model for Crystal Representation Learning},
author = {Xu, Changwen and Zhu, Shang and Viswanathan, Venkatasubramanian},
booktitle = {NeurIPS 2024 Workshop on Foundation Models for Science: Progress, Opportunities, and Challenges},
year = {2024}
}
License
Released under the MIT License, Β© 2025 Changwen Xu.