Upload folder using huggingface_hub
Browse files- alm-edit/README.md +95 -0
- alm-edit/atoms_mapper.pt +3 -0
- alm-edit/projector_and_state.pt +3 -0
alm-edit/README.md
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- materials
|
| 5 |
+
- diffusion
|
| 6 |
+
- crystal-structure-prediction
|
| 7 |
+
- editing
|
| 8 |
+
- mattergen
|
| 9 |
+
---
|
| 10 |
+
# ALM Edit · crystal-structure prediction + text-conditioned editing
|
| 11 |
+
|
| 12 |
+
**ALM Edit** conditions a from-scratch CSP-mode MatterGen decoder (`csp_backbone`) on an
|
| 13 |
+
*input structure + text instruction* via a **producer-consumer** bridge: K=8 `[atoms_i]`
|
| 14 |
+
soft tokens plus prompt context feed a learnable-query producer (M=16), whose tokens the
|
| 15 |
+
cross-attention consumer reads inside the decoder. It performs crystal-structure
|
| 16 |
+
prediction and the **ALM Bench** editing tasks.
|
| 17 |
+
|
| 18 |
+
This is a **full-finetuning** run: it bundles a fully fine-tuned Qwen3-8B
|
| 19 |
+
(`llm_full_ft/qwen3_state_dict.pt`, ~16 GB), which the loader auto-detects and uses in
|
| 20 |
+
place of any LoRA. Contents: `atoms_mapper.pt` (producer-consumer bridge) + `llm_full_ft/`
|
| 21 |
+
+ `projector_and_state.pt` + the `csp_backbone/` decoder dir (config + last.ckpt).
|
| 22 |
+
|
| 23 |
+
Crystal-structure prediction: match rate MR (%, ↑) and RMSE (Å, ↓) to the MP-20 and
|
| 24 |
+
MPTS-52 test sets at K=1 and best-of-K=20 (95% CIs in the paper):
|
| 25 |
+
|
| 26 |
+
| Model | MP-20 MR@1 | RMSE@1 | MR@20 | RMSE@20 | MPTS-52 MR@1 | RMSE@1 | MR@20 | RMSE@20 |
|
| 27 |
+
|---|--:|--:|--:|--:|--:|--:|--:|--:|
|
| 28 |
+
| CDVAE | 33.90 | 0.1045 | 66.95 | 0.1026 | 5.34 | 0.2106 | 20.79 | 0.2085 |
|
| 29 |
+
| DiffCSP | 51.49 | 0.0631 | 77.93 | 0.0492 | 12.19 | 0.1786 | 34.02 | 0.1749 |
|
| 30 |
+
| FlowMM | 61.39 | 0.0566 | n/a | n/a | 17.54 | 0.1726 | n/a | n/a |
|
| 31 |
+
| CrystaLLM-large | 58.70 | 0.0408 | 73.97 | 0.0349 | 19.21 | 0.1110 | 33.75 | 0.1059 |
|
| 32 |
+
| CrystalFlow | 62.02 | 0.0710 | 78.34 | 0.0577 | 21.00 | 0.1613 | 37.81 | 0.1584 |
|
| 33 |
+
| OMatG | 63.75 | 0.0720 | n/a | n/a | 25.15 | 0.1931 | n/a | n/a |
|
| 34 |
+
| MCFlow-L | **64.08** | 0.0561 | 76.08 | 0.0383 | **27.16** | 0.1401 | 41.45 | 0.1296 |
|
| 35 |
+
| **ALM Edit** | 45.6 | **0.021** | **83.2** | **0.034** | 22.7 | **0.022** | **45.7** | **0.038** |
|
| 36 |
+
| **ALM Gen** + T2C-FK | 22.3 | 0.025 | 41.0 | 0.012 | 6.0 | 0.040 | 10.0 | 0.011 |
|
| 37 |
+
|
| 38 |
+
**ALM Edit** sets SoTA RMSE and best-of-K=20 match rate on both benchmarks (it learns a
|
| 39 |
+
valid polymorph distribution); it sees composition + space group at train time, only
|
| 40 |
+
composition at inference.
|
| 41 |
+
|
| 42 |
+
**ALM Bench** directional editing per property (E_f, ρ, V) and direction (↑/↓), plus
|
| 43 |
+
polymorph/doping/strain (N=7×1000; every metric scores invalid generations, trivial
|
| 44 |
+
lattice rescalings, and unphysical relabelings as failures). Frontier LLMs were prompted
|
| 45 |
+
to read/write CIFs:
|
| 46 |
+
|
| 47 |
+
| Model | E_f↑ | E_f↓ | ρ↑ | ρ↓ | V↑ | V↓ | Polymorph | Doping | Strain |
|
| 48 |
+
|---|--:|--:|--:|--:|--:|--:|--:|--:|--:|
|
| 49 |
+
| **ALM Edit** | **0.613** | **0.624** | **0.353** | **0.367** | **0.451** | **0.355** | **0.224** | **0.879** | **0.151** |
|
| 50 |
+
| GPT-4o | 0.505 | 0.469 | 0.024 | 0.127 | 0.081 | 0.018 | 0.040 | 0.007 | 0.000 |
|
| 51 |
+
| GPT-4.1 | 0.465 | 0.496 | 0.007 | 0.239 | 0.276 | 0.040 | 0.083 | 0.003 | 0.000 |
|
| 52 |
+
| GPT-5.2 | 0.437 | 0.414 | 0.058 | 0.244 | 0.006 | 0.032 | 0.118 | 0.002 | 0.000 |
|
| 53 |
+
|
| 54 |
+
Text-conditioned generation (Application = LLM-judged fit; Describe/OOD = composition &
|
| 55 |
+
structure consistency, N=7×1000):
|
| 56 |
+
|
| 57 |
+
| Model | Application | Describe (Comp.) | Describe (Struct.) | OOD (Comp.) | OOD (Struct.) |
|
| 58 |
+
|---|--:|--:|--:|--:|--:|
|
| 59 |
+
| **ALM Edit** | **0.423** | **0.730** | **0.412** | **0.474** | **0.231** |
|
| 60 |
+
| GPT-4o | 0.131 | 0.279 | 0.121 | 0.130 | 0.025 |
|
| 61 |
+
| GPT-4.1 | 0.224 | 0.254 | 0.090 | 0.168 | 0.035 |
|
| 62 |
+
| GPT-5.2 | 0.252 | 0.356 | 0.162 | 0.263 | 0.075 |
|
| 63 |
+
|
| 64 |
+
**Edit / generate a structure (inference):**
|
| 65 |
+
```bash
|
| 66 |
+
# composition/description -> structure, using the CSP-mode backbone
|
| 67 |
+
alm-generate generate --alm_checkpoint alm-edit \
|
| 68 |
+
--atoms_mapper alm-edit/atoms_mapper.pt --mattergen_model_path alm-edit/csp_backbone \
|
| 69 |
+
--prompt "An orthorhombic perovskite of calcium and titanium." --num_samples 8 --out_dir gen_out
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
**Evaluate (CSP M@20 / RMSE, and the ALM Bench editing tasks):**
|
| 73 |
+
```bash
|
| 74 |
+
alm-eval-csp --ckpt_dir alm-edit/csp_backbone \
|
| 75 |
+
--guidance_factor 0.5 --out_dir out/csp
|
| 76 |
+
alm-eval-almbench --alm_checkpoint alm-edit \
|
| 77 |
+
--atoms_mapper alm-edit/atoms_mapper.pt --bridge_lora_dir none \
|
| 78 |
+
--mattergen_model_path alm-edit/csp_backbone --guidance_factor 0.5
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Links
|
| 82 |
+
Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm)
|
| 83 |
+
|
| 84 |
+
## License
|
| 85 |
+
Apache-2.0.
|
| 86 |
+
|
| 87 |
+
## Citation
|
| 88 |
+
```bibtex
|
| 89 |
+
@article{edamadaka2026atomistic,
|
| 90 |
+
title = {Atomistic Language Models Understand and Generate Materials},
|
| 91 |
+
author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
|
| 92 |
+
journal = {arXiv preprint arXiv:2606.21395},
|
| 93 |
+
year = {2026}
|
| 94 |
+
}
|
| 95 |
+
```
|
alm-edit/atoms_mapper.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6181002baae51c8e6ebfc122603aeb15a8875a1bb321889ecb7d76617125cacb
|
| 3 |
+
size 111220905
|
alm-edit/projector_and_state.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d54c727d8dd4a68bf3cd0ccd6156a6874f0b4d1e4c6910256ffc078b57d2102a
|
| 3 |
+
size 71338453
|