File size: 3,974 Bytes
3d8c7f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: apache-2.0
tags:
  - materials
  - diffusion
  - crystal-generation
  - mattergen
---
# ALM Gen · de-novo crystal generation

**ALM Gen** turns a natural-language description into a novel crystal by steering the
**`mattergen_base`** diffusion decoder with classifier-free guidance. The K=8 `[atoms_i]`
soft tokens feed a **consumer-only** bridge (a per-token projection whose output is read
by the cross-attention consumer inside the decoder); there is no learnable-query producer.

Contents: `atoms_mapper.pt` (consumer-only bridge, optimizer stripped) + `lora_adapter/`
(r=8 LLM bridge LoRA) + `projector_and_state.pt`. The base decoder is fetched separately
(`mattergen_base`, via `external/setup_mattergen.sh`); pass this repo subdir as
`--alm_checkpoint` (the r8 adapter applies directly).

*De novo* generation against the MP-20 hull: stability S is E_hull ≤ 0.016 eV/atom,
structures pre-relaxed, N=10×1000 (95% CIs in the paper):

| Method | E_hull (eV)↓ | U (%)↑ | V_struct (%)↑ | V_chem (%)↑ | **SUN** (%)↑ |
|---|--:|--:|--:|--:|--:|
| CrystalTextLLM | 0.61 | 47.40 | 90.01 | 91.59 | 0.38 |
| PLAID++ Wyckoff | 0.57 | 40.70 | 89.06 | 91.59 | 0.50 |
| CrysReas-Base (SFT) | 0.58 | 35.25 | 84.03 | 90.36 | 0.57 |
| CrysReas-Thinking | 0.52 | 38.64 | 91.29 | 91.72 | 0.59 |
| CrysReas-RL | 0.53 | 82.49 | 89.85 | 91.10 | 1.23 |
| CrysReas | 0.45 | 87.23 | 94.92 | **91.78** | 1.70 |
| MatterGen (Base, g=0) | **0.079** | 93.50 | **100.00** | 86.50 | 5.53 |
| **ALM Gen** (g=0.5) | 0.085 | **98.90** | **100.00** | 83.20 | **7.80** |
| ALM Gen + FK-stoich | 0.086 | 73.80 | **100.00** | 84.50 | 5.21 |

Steering the base decoder with language at g=0.5 *improves* SUN over the g=0 MatterGen base
(5.53 → **7.80**), SoTA on this protocol.

*De novo* generation on LeMat-GenBench (N=2500): strict stability is Ē_hull < 0,
metastability E_hull < 0.1; E_f / E_hull / RMSD scored by 3 MLIPs:

| Model | Valid↑ | Unique↑ | Novel↑ | E_f↓ | Ē_hull↓ | RMSD↓ | Stable↑ | SUN↑ | Meta↑ | **MSUN**↑ |
|---|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|
| MatterGen | 95.7 | 95.1 | **70.5** | -0.70 | 0.18 | 0.39 | 2.0 | 0.2 | 33.4 | 15.0 |
| PLaID++ | 96.0 | 77.8 | 24.2 | -0.50 | 0.09 | 0.13 | 12.4 | **1.0** | 60.7 | 7.6 |
| WyFormer | 93.4 | 93.0 | 66.4 | -0.43 | 0.50 | 0.81 | 0.5 | 0.1 | 15.7 | 1.9 |
| WyFormer-DFT | 95.2 | 95.0 | 66.4 | -0.67 | 0.27 | 0.42 | 3.7 | 0.4 | 24.8 | 7.8 |
| MCFlow-S | 97.2 | **96.3** | 52.2 | -0.85 | 0.10 | 0.16 | 11.7 | 0.7 | 49.5 | 18.9 |
| MCFlow-B | 97.7 | 95.5 | 25.4 | -0.91 | 0.05 | 0.08 | 17.6 | 0.7 | 64.3 | 11.9 |
| MCFlow-L | **98.6** | 95.2 | 18.6 | **-0.93** | **0.04** | **0.06** | **18.8** | 0.5 | **68.3** | 9.3 |
| **ALM Gen** | 92.2 | 91.3 | 61.5 | -0.44 | 0.09 | 0.20 | 3.6 | 0.8 | 58.7 | **35.2** |

Tops the field on metastable yield (**MSUN 35.2**); second to the *de novo*-specialist flow
models at strict SUN.

**Generate structures from a description (inference):**
```bash
alm-generate generate --alm_checkpoint alm-gen \
    --atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \
    --prompt "A cubic rock-salt oxide of magnesium." --num_samples 8 \
    --guidance_factor 0.5 --out_dir gen_out
```

**Evaluate (de-novo S/U/N/SUN/MSUN):**
```bash
alm-eval-dng --alm_checkpoint alm-gen \
    --atoms_mapper alm-gen/atoms_mapper.pt --mattergen_pretrained mattergen_base \
    --guidance_factor 0.5 --num_samples 1000 --out_root out --run_id dng
```

## Links
Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm)

## License
Apache-2.0.

## Citation
```bibtex
@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}
```