sathyae's picture
Upload folder using huggingface_hub
3d8c7f7 verified
|
Raw
History Blame Contribute Delete
3.97 kB
metadata
license: apache-2.0
tags:
  - materials
  - diffusion
  - crystal-generation
  - mattergen

ALM Gen · de-novo crystal generation

ALM Gen turns a natural-language description into a novel crystal by steering the mattergen_base diffusion decoder with classifier-free guidance. The K=8 [atoms_i] soft tokens feed a consumer-only bridge (a per-token projection whose output is read by the cross-attention consumer inside the decoder); there is no learnable-query producer.

Contents: atoms_mapper.pt (consumer-only bridge, optimizer stripped) + lora_adapter/ (r=8 LLM bridge LoRA) + projector_and_state.pt. The base decoder is fetched separately (mattergen_base, via external/setup_mattergen.sh); pass this repo subdir as --alm_checkpoint (the r8 adapter applies directly).

De novo generation against the MP-20 hull: stability S is E_hull ≤ 0.016 eV/atom, structures pre-relaxed, N=10×1000 (95% CIs in the paper):

Method E_hull (eV)↓ U (%)↑ V_struct (%)↑ V_chem (%)↑ SUN (%)↑
CrystalTextLLM 0.61 47.40 90.01 91.59 0.38
PLAID++ Wyckoff 0.57 40.70 89.06 91.59 0.50
CrysReas-Base (SFT) 0.58 35.25 84.03 90.36 0.57
CrysReas-Thinking 0.52 38.64 91.29 91.72 0.59
CrysReas-RL 0.53 82.49 89.85 91.10 1.23
CrysReas 0.45 87.23 94.92 91.78 1.70
MatterGen (Base, g=0) 0.079 93.50 100.00 86.50 5.53
ALM Gen (g=0.5) 0.085 98.90 100.00 83.20 7.80
ALM Gen + FK-stoich 0.086 73.80 100.00 84.50 5.21

Steering the base decoder with language at g=0.5 improves SUN over the g=0 MatterGen base (5.53 → 7.80), SoTA on this protocol.

De novo generation on LeMat-GenBench (N=2500): strict stability is Ē_hull < 0, metastability E_hull < 0.1; E_f / E_hull / RMSD scored by 3 MLIPs:

Model Valid↑ Unique↑ Novel↑ E_f↓ Ē_hull↓ RMSD↓ Stable↑ SUN↑ Meta↑ MSUN
MatterGen 95.7 95.1 70.5 -0.70 0.18 0.39 2.0 0.2 33.4 15.0
PLaID++ 96.0 77.8 24.2 -0.50 0.09 0.13 12.4 1.0 60.7 7.6
WyFormer 93.4 93.0 66.4 -0.43 0.50 0.81 0.5 0.1 15.7 1.9
WyFormer-DFT 95.2 95.0 66.4 -0.67 0.27 0.42 3.7 0.4 24.8 7.8
MCFlow-S 97.2 96.3 52.2 -0.85 0.10 0.16 11.7 0.7 49.5 18.9
MCFlow-B 97.7 95.5 25.4 -0.91 0.05 0.08 17.6 0.7 64.3 11.9
MCFlow-L 98.6 95.2 18.6 -0.93 0.04 0.06 18.8 0.5 68.3 9.3
ALM Gen 92.2 91.3 61.5 -0.44 0.09 0.20 3.6 0.8 58.7 35.2

Tops the field on metastable yield (MSUN 35.2); second to the de novo-specialist flow models at strict SUN.

Generate structures from a description (inference):

alm-generate generate --alm_checkpoint alm-gen \
    --atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \
    --prompt "A cubic rock-salt oxide of magnesium." --num_samples 8 \
    --guidance_factor 0.5 --out_dir gen_out

Evaluate (de-novo S/U/N/SUN/MSUN):

alm-eval-dng --alm_checkpoint alm-gen \
    --atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \
    --guidance_factor 0.5 --num_samples 1000 --out_root out --run_id dng

Links

Paper: arXiv · HuggingFace · Code: GitHub

License

Apache-2.0.

Citation

@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}