Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- materials
|
| 5 |
+
- qwen3
|
| 6 |
+
- diffusion
|
| 7 |
+
- crystal-structure-prediction
|
| 8 |
+
- crystal-generation
|
| 9 |
+
---
|
| 10 |
+
# Atomistic Language Models
|
| 11 |
+
|
| 12 |
+
A single Qwen3-8B backbone that understands, generates, and edits crystals by reading
|
| 13 |
+
atoms as **soft tokens** from a machine-learning interatomic potential and steering a
|
| 14 |
+
MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model:
|
| 15 |
+
|
| 16 |
+
| subdir | model | what |
|
| 17 |
+
|---|---|---|
|
| 18 |
+
| `stage1-projector/` | structure-to-language projector | OrbV3 → Qwen3 soft tokens (~70 MB) |
|
| 19 |
+
| `alm-core/` | **ALM Core** | understanding: Qwen3-8B + LoRA (r128) + projector |
|
| 20 |
+
| `alm-gen/` | **ALM Gen** | de-novo generation: consumer-only bridge (r8) over `mattergen_base` |
|
| 21 |
+
| `alm-edit/` | **ALM Edit** | CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (`llm_full_ft/`) + `csp_backbone/` decoder |
|
| 22 |
+
|
| 23 |
+
Headlines (paper, https://arxiv.org/abs/2606.21395). **ALM Edit**: CSP MR@20 **83.2%** / RMSE@1 **0.021 Å**
|
| 24 |
+
(MP-20, SoTA), and SoTA across the **ALM Bench** editing tasks. **ALM Gen**: de-novo SUN
|
| 25 |
+
**7.80%** on the MP-20 hull (above the g=0 MatterGen base) and metastable **MSUN 35.2%** on
|
| 26 |
+
LeMat-GenBench. See each subdir's card for full tables.
|
| 27 |
+
|
| 28 |
+
Download into `./checkpoints/` with `hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints`.
|
| 29 |
+
The **ALM Bench** dataset lives in `LearningMatter/ALM-Bench`. `mattergen_base` (ALM Gen's backbone) is
|
| 30 |
+
fetched from `microsoft/mattergen`; `alm-edit/csp_backbone/` (the CSP decoder) ships here.
|
| 31 |
+
|
| 32 |
+
## Links
|
| 33 |
+
Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm)
|
| 34 |
+
|
| 35 |
+
## License
|
| 36 |
+
Apache-2.0.
|
| 37 |
+
|
| 38 |
+
## Citation
|
| 39 |
+
```bibtex
|
| 40 |
+
@article{edamadaka2026atomistic,
|
| 41 |
+
title = {Atomistic Language Models Understand and Generate Materials},
|
| 42 |
+
author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
|
| 43 |
+
journal = {arXiv preprint arXiv:2606.21395},
|
| 44 |
+
year = {2026}
|
| 45 |
+
}
|
| 46 |
+
```
|