| --- |
| license: apache-2.0 |
| tags: |
| - materials |
| - qwen3 |
| - lora |
| - instruction-tuning |
| --- |
| # ALM Core 路 materials understanding |
|
|
| **ALM Core** reads a crystal as **soft tokens** (OrbV3 per-atom features projected and |
| spliced into the input sequence at `<atoms>`) and answers in natural language. It is |
| Qwen3-8B with a **LoRA adapter (r=128, 伪=256)** on `q/k/v/o/gate/up/down` plus the |
| structure-to-language projector, instruction-tuned on a materials mixture (property |
| prediction, structure description, Q&A; LLM4Mat-Bench + GPT-Narratives + MaScQA + |
| ChatML-formatted arXiv). It keeps GNN-level property accuracy through a language |
| interface while retaining zero-shot language ability. |
|
|
| **Run it on a structure (inference):** |
| ```bash |
| alm-generate understand --alm_checkpoint alm-core \ |
| --structure my_crystal.cif \ |
| --prompt "Predict the formation energy per atom and band gap of this material, and name a plausible application." |
| ``` |
| (Drop `--structure` for a text-only materials question.) |
|
|
| **Evaluate (property prediction on LLM4Mat-Bench):** |
| ```bash |
| python -m alm.eval.understanding.eval_llm4mat --checkpoint alm-core --configs mp --split validation |
| python -m alm.eval.understanding.eval_mascqa --checkpoint alm-core # materials-science MCQ + numerical |
| ``` |
| The generation models load their own LLM and do not depend on this checkpoint. |
|
|
| ## Links |
| Paper: [arXiv](https://arxiv.org/abs/2606.21395) 路 [HuggingFace](https://huggingface.co/papers/2606.21395) 路 Code: [GitHub](https://github.com/learningmatter-mit/alm) |
|
|
| ## License |
| Apache-2.0. |
|
|
| ## Citation |
| ```bibtex |
| @article{edamadaka2026atomistic, |
| title = {Atomistic Language Models Understand and Generate Materials}, |
| author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael}, |
| journal = {arXiv preprint arXiv:2606.21395}, |
| year = {2026} |
| } |
| ``` |
|
|