--- license: apache-2.0 tags: - materials - qwen3 - lora - instruction-tuning --- # ALM Core · materials understanding **ALM Core** reads a crystal as **soft tokens** (OrbV3 per-atom features projected and spliced into the input sequence at ``) and answers in natural language. It is Qwen3-8B with a **LoRA adapter (r=128, α=256)** on `q/k/v/o/gate/up/down` plus the structure-to-language projector, instruction-tuned on a materials mixture (property prediction, structure description, Q&A; LLM4Mat-Bench + GPT-Narratives + MaScQA + ChatML-formatted arXiv). It keeps GNN-level property accuracy through a language interface while retaining zero-shot language ability. **Run it on a structure (inference):** ```bash alm-generate understand --alm_checkpoint alm-core \ --structure my_crystal.cif \ --prompt "Predict the formation energy per atom and band gap of this material, and name a plausible application." ``` (Drop `--structure` for a text-only materials question.) **Evaluate (property prediction on LLM4Mat-Bench):** ```bash python -m alm.eval.understanding.eval_llm4mat --checkpoint alm-core --configs mp --split validation python -m alm.eval.understanding.eval_mascqa --checkpoint alm-core # materials-science MCQ + numerical ``` The generation models load their own LLM and do not depend on this checkpoint. ## Links Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm) ## License Apache-2.0. ## Citation ```bibtex @article{edamadaka2026atomistic, title = {Atomistic Language Models Understand and Generate Materials}, author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael}, journal = {arXiv preprint arXiv:2606.21395}, year = {2026} } ```