sathyae's picture
Upload folder using huggingface_hub
7837eec verified
|
Raw
History Blame Contribute Delete
1.85 kB
metadata
license: apache-2.0
tags:
  - materials
  - qwen3
  - lora
  - instruction-tuning

ALM Core 路 materials understanding

ALM Core reads a crystal as soft tokens (OrbV3 per-atom features projected and spliced into the input sequence at <atoms>) and answers in natural language. It is Qwen3-8B with a LoRA adapter (r=128, 伪=256) on q/k/v/o/gate/up/down plus the structure-to-language projector, instruction-tuned on a materials mixture (property prediction, structure description, Q&A; LLM4Mat-Bench + GPT-Narratives + MaScQA + ChatML-formatted arXiv). It keeps GNN-level property accuracy through a language interface while retaining zero-shot language ability.

Run it on a structure (inference):

alm-generate understand --alm_checkpoint alm-core \
    --structure my_crystal.cif \
    --prompt "Predict the formation energy per atom and band gap of this material, and name a plausible application."

(Drop --structure for a text-only materials question.)

Evaluate (property prediction on LLM4Mat-Bench):

python -m alm.eval.understanding.eval_llm4mat --checkpoint alm-core --configs mp --split validation
python -m alm.eval.understanding.eval_mascqa  --checkpoint alm-core      # materials-science MCQ + numerical

The generation models load their own LLM and do not depend on this checkpoint.

Links

Paper: arXivHuggingFace 路 Code: GitHub

License

Apache-2.0.

Citation

@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}