--- license: apache-2.0 tags: - materials - diffusion - crystal-generation - mattergen --- # ALM Gen · de-novo crystal generation **ALM Gen** turns a natural-language description into a novel crystal by steering the **`mattergen_base`** diffusion decoder with classifier-free guidance. The K=8 `[atoms_i]` soft tokens feed a **consumer-only** bridge (a per-token projection whose output is read by the cross-attention consumer inside the decoder); there is no learnable-query producer. Contents: `atoms_mapper.pt` (consumer-only bridge, optimizer stripped) + `lora_adapter/` (r=8 LLM bridge LoRA) + `projector_and_state.pt`. The base decoder is fetched separately (`mattergen_base`, via `external/setup_mattergen.sh`); pass this repo subdir as `--alm_checkpoint` (the r8 adapter applies directly). *De novo* generation against the MP-20 hull: stability S is E_hull ≤ 0.016 eV/atom, structures pre-relaxed, N=10×1000 (95% CIs in the paper): | Method | E_hull (eV)↓ | U (%)↑ | V_struct (%)↑ | V_chem (%)↑ | **SUN** (%)↑ | |---|--:|--:|--:|--:|--:| | CrystalTextLLM | 0.61 | 47.40 | 90.01 | 91.59 | 0.38 | | PLAID++ Wyckoff | 0.57 | 40.70 | 89.06 | 91.59 | 0.50 | | CrysReas-Base (SFT) | 0.58 | 35.25 | 84.03 | 90.36 | 0.57 | | CrysReas-Thinking | 0.52 | 38.64 | 91.29 | 91.72 | 0.59 | | CrysReas-RL | 0.53 | 82.49 | 89.85 | 91.10 | 1.23 | | CrysReas | 0.45 | 87.23 | 94.92 | **91.78** | 1.70 | | MatterGen (Base, g=0) | **0.079** | 93.50 | **100.00** | 86.50 | 5.53 | | **ALM Gen** (g=0.5) | 0.085 | **98.90** | **100.00** | 83.20 | **7.80** | | ALM Gen + FK-stoich | 0.086 | 73.80 | **100.00** | 84.50 | 5.21 | Steering the base decoder with language at g=0.5 *improves* SUN over the g=0 MatterGen base (5.53 → **7.80**), SoTA on this protocol. *De novo* generation on LeMat-GenBench (N=2500): strict stability is Ē_hull < 0, metastability E_hull < 0.1; E_f / E_hull / RMSD scored by 3 MLIPs: | Model | Valid↑ | Unique↑ | Novel↑ | E_f↓ | Ē_hull↓ | RMSD↓ | Stable↑ | SUN↑ | Meta↑ | **MSUN**↑ | |---|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:| | MatterGen | 95.7 | 95.1 | **70.5** | -0.70 | 0.18 | 0.39 | 2.0 | 0.2 | 33.4 | 15.0 | | PLaID++ | 96.0 | 77.8 | 24.2 | -0.50 | 0.09 | 0.13 | 12.4 | **1.0** | 60.7 | 7.6 | | WyFormer | 93.4 | 93.0 | 66.4 | -0.43 | 0.50 | 0.81 | 0.5 | 0.1 | 15.7 | 1.9 | | WyFormer-DFT | 95.2 | 95.0 | 66.4 | -0.67 | 0.27 | 0.42 | 3.7 | 0.4 | 24.8 | 7.8 | | MCFlow-S | 97.2 | **96.3** | 52.2 | -0.85 | 0.10 | 0.16 | 11.7 | 0.7 | 49.5 | 18.9 | | MCFlow-B | 97.7 | 95.5 | 25.4 | -0.91 | 0.05 | 0.08 | 17.6 | 0.7 | 64.3 | 11.9 | | MCFlow-L | **98.6** | 95.2 | 18.6 | **-0.93** | **0.04** | **0.06** | **18.8** | 0.5 | **68.3** | 9.3 | | **ALM Gen** | 92.2 | 91.3 | 61.5 | -0.44 | 0.09 | 0.20 | 3.6 | 0.8 | 58.7 | **35.2** | Tops the field on metastable yield (**MSUN 35.2**); second to the *de novo*-specialist flow models at strict SUN. **Generate structures from a description (inference):** ```bash alm-generate generate --alm_checkpoint alm-gen \ --atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \ --prompt "A cubic rock-salt oxide of magnesium." --num_samples 8 \ --guidance_factor 0.5 --out_dir gen_out ``` **Evaluate (de-novo S/U/N/SUN/MSUN):** ```bash alm-eval-dng --alm_checkpoint alm-gen \ --atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \ --guidance_factor 0.5 --num_samples 1000 --out_root out --run_id dng ``` ## Links Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm) ## License Apache-2.0. ## Citation ```bibtex @article{edamadaka2026atomistic, title = {Atomistic Language Models Understand and Generate Materials}, author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael}, journal = {arXiv preprint arXiv:2606.21395}, year = {2026} } ```