license: apache-2.0
tags:
- materials
- diffusion
- crystal-generation
- mattergen
ALM Gen · de-novo crystal generation
ALM Gen turns a natural-language description into a novel crystal by steering the
mattergen_base diffusion decoder with classifier-free guidance. The K=8 [atoms_i]
soft tokens feed a consumer-only bridge (a per-token projection whose output is read
by the cross-attention consumer inside the decoder); there is no learnable-query producer.
Contents: atoms_mapper.pt (consumer-only bridge, optimizer stripped) + lora_adapter/
(r=8 LLM bridge LoRA) + projector_and_state.pt. The base decoder is fetched separately
(mattergen_base, via external/setup_mattergen.sh); pass this repo subdir as
--alm_checkpoint (the r8 adapter applies directly).
De novo generation against the MP-20 hull: stability S is E_hull ≤ 0.016 eV/atom, structures pre-relaxed, N=10×1000 (95% CIs in the paper):
| Method | E_hull (eV)↓ | U (%)↑ | V_struct (%)↑ | V_chem (%)↑ | SUN (%)↑ |
|---|---|---|---|---|---|
| CrystalTextLLM | 0.61 | 47.40 | 90.01 | 91.59 | 0.38 |
| PLAID++ Wyckoff | 0.57 | 40.70 | 89.06 | 91.59 | 0.50 |
| CrysReas-Base (SFT) | 0.58 | 35.25 | 84.03 | 90.36 | 0.57 |
| CrysReas-Thinking | 0.52 | 38.64 | 91.29 | 91.72 | 0.59 |
| CrysReas-RL | 0.53 | 82.49 | 89.85 | 91.10 | 1.23 |
| CrysReas | 0.45 | 87.23 | 94.92 | 91.78 | 1.70 |
| MatterGen (Base, g=0) | 0.079 | 93.50 | 100.00 | 86.50 | 5.53 |
| ALM Gen (g=0.5) | 0.085 | 98.90 | 100.00 | 83.20 | 7.80 |
| ALM Gen + FK-stoich | 0.086 | 73.80 | 100.00 | 84.50 | 5.21 |
Steering the base decoder with language at g=0.5 improves SUN over the g=0 MatterGen base (5.53 → 7.80), SoTA on this protocol.
De novo generation on LeMat-GenBench (N=2500): strict stability is Ē_hull < 0, metastability E_hull < 0.1; E_f / E_hull / RMSD scored by 3 MLIPs:
| Model | Valid↑ | Unique↑ | Novel↑ | E_f↓ | Ē_hull↓ | RMSD↓ | Stable↑ | SUN↑ | Meta↑ | MSUN↑ |
|---|---|---|---|---|---|---|---|---|---|---|
| MatterGen | 95.7 | 95.1 | 70.5 | -0.70 | 0.18 | 0.39 | 2.0 | 0.2 | 33.4 | 15.0 |
| PLaID++ | 96.0 | 77.8 | 24.2 | -0.50 | 0.09 | 0.13 | 12.4 | 1.0 | 60.7 | 7.6 |
| WyFormer | 93.4 | 93.0 | 66.4 | -0.43 | 0.50 | 0.81 | 0.5 | 0.1 | 15.7 | 1.9 |
| WyFormer-DFT | 95.2 | 95.0 | 66.4 | -0.67 | 0.27 | 0.42 | 3.7 | 0.4 | 24.8 | 7.8 |
| MCFlow-S | 97.2 | 96.3 | 52.2 | -0.85 | 0.10 | 0.16 | 11.7 | 0.7 | 49.5 | 18.9 |
| MCFlow-B | 97.7 | 95.5 | 25.4 | -0.91 | 0.05 | 0.08 | 17.6 | 0.7 | 64.3 | 11.9 |
| MCFlow-L | 98.6 | 95.2 | 18.6 | -0.93 | 0.04 | 0.06 | 18.8 | 0.5 | 68.3 | 9.3 |
| ALM Gen | 92.2 | 91.3 | 61.5 | -0.44 | 0.09 | 0.20 | 3.6 | 0.8 | 58.7 | 35.2 |
Tops the field on metastable yield (MSUN 35.2); second to the de novo-specialist flow models at strict SUN.
Generate structures from a description (inference):
alm-generate generate --alm_checkpoint alm-gen \
--atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \
--prompt "A cubic rock-salt oxide of magnesium." --num_samples 8 \
--guidance_factor 0.5 --out_dir gen_out
Evaluate (de-novo S/U/N/SUN/MSUN):
alm-eval-dng --alm_checkpoint alm-gen \
--atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \
--guidance_factor 0.5 --num_samples 1000 --out_root out --run_id dng
Links
Paper: arXiv · HuggingFace · Code: GitHub
License
Apache-2.0.
Citation
@article{edamadaka2026atomistic,
title = {Atomistic Language Models Understand and Generate Materials},
author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
journal = {arXiv preprint arXiv:2606.21395},
year = {2026}
}