File size: 1,853 Bytes

7837eec

---
license: apache-2.0
tags:
  - materials
  - qwen3
  - lora
  - instruction-tuning
---
# ALM Core · materials understanding

**ALM Core** reads a crystal as **soft tokens** (OrbV3 per-atom features projected and
spliced into the input sequence at `<atoms>`) and answers in natural language. It is
Qwen3-8B with a **LoRA adapter (r=128, α=256)** on `q/k/v/o/gate/up/down` plus the
structure-to-language projector, instruction-tuned on a materials mixture (property
prediction, structure description, Q&A; LLM4Mat-Bench + GPT-Narratives + MaScQA +
ChatML-formatted arXiv). It keeps GNN-level property accuracy through a language
interface while retaining zero-shot language ability.

**Run it on a structure (inference):**
```bash
alm-generate understand --alm_checkpoint alm-core \
    --structure my_crystal.cif \
    --prompt "Predict the formation energy per atom and band gap of this material, and name a plausible application."
```
(Drop `--structure` for a text-only materials question.)

**Evaluate (property prediction on LLM4Mat-Bench):**
```bash
python -m alm.eval.understanding.eval_llm4mat --checkpoint alm-core --configs mp --split validation
python -m alm.eval.understanding.eval_mascqa  --checkpoint alm-core      # materials-science MCQ + numerical
```
The generation models load their own LLM and do not depend on this checkpoint.

## Links
Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm)

## License
Apache-2.0.

## Citation
```bibtex
@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}
```