Mol-LLM: Multimodal Generalist Molecular LLM
Mol-LLM is a multimodal generalist molecular large language model for chemistry that jointly uses molecular 1D sequences and 2D molecular graphs to solve a wide range of molecular tasks in a single unified framework. It introduces Molecular structure Preference Optimization (MolPO) to force the LLM to prefer correct molecular graphs over perturbed ones, resolving the “graph-bypass” issue common in prior multimodal molecular LLMs.
Model summary
- Backbone: Mistral-7B-Instruct-v0.3.
- Modalities:
- Text (natural language instructions).
- 1D molecular sequences (SELFIES; SMILES supported via translation).
- 2D molecular graphs encoded by a hybrid GNN (GINE + TokenGT).
- Architecture: Mol-LLM uses a BLIP-2–style architecture where Q-Former (32 query tokens) projects graph embeddings into the LLM token space.
LLM:
- Mistral-7B-Instruct-v0.3 as text backbone.
- Extended tokenizer with SELFIES and numeric tokens, plus task tags for heterogeneous outputs (discrete labels, floats, descriptions).
Hybrid graph encoder:
- GINE for local structural patterns.
- TokenGT (transformer-based) for global structural dependencies and large graphs.
- Both encoders produce graph and node (and edge) embeddings; concatenated embeddings are fed into the Q-Former.
Q-Former:
- 5-layer SciBERT-style transformer with 32 learnable queries.
- Cross-attends to graph embeddings and outputs fixed-length tokens appended after SELFIES tokens in the LLM input.
- Selected over an MLP projector due to better alignment and graph-token efficiency.
- Tokenizer extensions: 3K SELFIES tokens, numeric tokens, and task tags for
[SELFIES],[BOOLEAN],[FLOAT],[DESCRIPTION], and reaction-direction symbols. - Training data: ~3.3M instruction-tuning examples over 27 tasks, with ~40K held-out test instances.
Mol-LLM is positioned as a state-of-the-art or comparable generalist molecular LLM on the most comprehensive benchmark suite evaluated so far, including out-of-distribution (OOD) settings.
Intended use
Mol-LLM is intended to solve molecular tasks via a single multitask model.
Supported task families:
Reaction prediction:
- Forward synthesis (product prediction, FS)
- Retrosynthesis (reactant prediction, RS)
- Reagent prediction (RP)
Property prediction:
- Regression: LogS, LogD, HOMO, LUMO, HOMO–LUMO gap
- Classification: BACE, BBBP, ClinTox, HIV, SIDER
Text–molecule tasks:
- Description-guided molecule generation
- Molecule captioning
- IUPAC/SELFIES/formula translation as auxiliary tasks
Model tree for KU-AGI/Mol-LLM
Base model
mistralai/Mistral-7B-v0.3