license: apache-2.0
tags:
- materials
- qwen3
- lora
- instruction-tuning
ALM Core 路 materials understanding
ALM Core reads a crystal as soft tokens (OrbV3 per-atom features projected and
spliced into the input sequence at <atoms>) and answers in natural language. It is
Qwen3-8B with a LoRA adapter (r=128, 伪=256) on q/k/v/o/gate/up/down plus the
structure-to-language projector, instruction-tuned on a materials mixture (property
prediction, structure description, Q&A; LLM4Mat-Bench + GPT-Narratives + MaScQA +
ChatML-formatted arXiv). It keeps GNN-level property accuracy through a language
interface while retaining zero-shot language ability.
Run it on a structure (inference):
alm-generate understand --alm_checkpoint alm-core \
--structure my_crystal.cif \
--prompt "Predict the formation energy per atom and band gap of this material, and name a plausible application."
(Drop --structure for a text-only materials question.)
Evaluate (property prediction on LLM4Mat-Bench):
python -m alm.eval.understanding.eval_llm4mat --checkpoint alm-core --configs mp --split validation
python -m alm.eval.understanding.eval_mascqa --checkpoint alm-core # materials-science MCQ + numerical
The generation models load their own LLM and do not depend on this checkpoint.
Links
Paper: arXiv 路 HuggingFace 路 Code: GitHub
License
Apache-2.0.
Citation
@article{edamadaka2026atomistic,
title = {Atomistic Language Models Understand and Generate Materials},
author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
journal = {arXiv preprint arXiv:2606.21395},
year = {2026}
}