sathyae commited on
Commit
5619883
·
verified ·
1 Parent(s): b04060a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - materials
5
+ - qwen3
6
+ - diffusion
7
+ - crystal-structure-prediction
8
+ - crystal-generation
9
+ ---
10
+ # Atomistic Language Models
11
+
12
+ A single Qwen3-8B backbone that understands, generates, and edits crystals by reading
13
+ atoms as **soft tokens** from a machine-learning interatomic potential and steering a
14
+ MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model:
15
+
16
+ | subdir | model | what |
17
+ |---|---|---|
18
+ | `stage1-projector/` | structure-to-language projector | OrbV3 → Qwen3 soft tokens (~70 MB) |
19
+ | `alm-core/` | **ALM Core** | understanding: Qwen3-8B + LoRA (r128) + projector |
20
+ | `alm-gen/` | **ALM Gen** | de-novo generation: consumer-only bridge (r8) over `mattergen_base` |
21
+ | `alm-edit/` | **ALM Edit** | CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (`llm_full_ft/`) + `csp_backbone/` decoder |
22
+
23
+ Headlines (paper, https://arxiv.org/abs/2606.21395). **ALM Edit**: CSP MR@20 **83.2%** / RMSE@1 **0.021 Å**
24
+ (MP-20, SoTA), and SoTA across the **ALM Bench** editing tasks. **ALM Gen**: de-novo SUN
25
+ **7.80%** on the MP-20 hull (above the g=0 MatterGen base) and metastable **MSUN 35.2%** on
26
+ LeMat-GenBench. See each subdir's card for full tables.
27
+
28
+ Download into `./checkpoints/` with `hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints`.
29
+ The **ALM Bench** dataset lives in `LearningMatter/ALM-Bench`. `mattergen_base` (ALM Gen's backbone) is
30
+ fetched from `microsoft/mattergen`; `alm-edit/csp_backbone/` (the CSP decoder) ships here.
31
+
32
+ ## Links
33
+ Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm)
34
+
35
+ ## License
36
+ Apache-2.0.
37
+
38
+ ## Citation
39
+ ```bibtex
40
+ @article{edamadaka2026atomistic,
41
+ title = {Atomistic Language Models Understand and Generate Materials},
42
+ author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
43
+ journal = {arXiv preprint arXiv:2606.21395},
44
+ year = {2026}
45
+ }
46
+ ```