arxiv:2601.02530

Multi-scale Graph Autoregressive Modeling: Molecular Property Prediction via Next Token Prediction

Published on Jan 5

Authors:

Abstract

Connection-Aware Motif Sequencing (CamS) enables Transformer models to learn molecular graphs by converting them into causal sequences that preserve topological information while maintaining scalability.

AI-generated summary

We present Connection-Aware Motif Sequencing (CamS), a graph-to-sequence representation that enables decoder-only Transformers to learn molecular graphs via standard next-token prediction (NTP). For molecular property prediction, SMILES-based NTP scales well but lacks explicit topology, whereas graph-native masked modeling captures connectivity but risks disrupting the pivotal chemical details (e.g., activity cliffs). CamS bridges this gap by serializing molecular graphs into structure-rich causal sequences. CamS first mines data-driven connection-aware motifs. It then serializes motifs via scaffold-rooted breadth-first search (BFS) to establish a stable core-to-periphery order. Crucially, CamS enables hierarchical modeling by concatenating sequences from fine to coarse motif scales, allowing the model to condition global scaffolds on dense, uncorrupted local structural evidence. We instantiate CamS-LLaMA by pre-training a vanilla LLaMA backbone on CamS sequences. It achieves state-of-the-art performance on MoleculeNet and the activity-cliff benchmark MoleculeACE, outperforming both SMILES-based language models and strong graph baselines. Interpretability analysis confirms that our multi-scale causal serialization effectively drives attention toward cliff-determining differences.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.02530 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.02530 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.02530 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.