|
|
--- |
|
|
tags: |
|
|
- model_hub_mixin |
|
|
- pytorch_model_hub_mixin |
|
|
license: cc-by-nc-3.0 |
|
|
datasets: |
|
|
- KU-AGI/Mol-LLM |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- bleu |
|
|
- meteor |
|
|
- rouge |
|
|
- roc_auc |
|
|
- mae |
|
|
base_model: |
|
|
- mistralai/Mistral-7B-Instruct-v0.3 |
|
|
--- |
|
|
|
|
|
|
|
|
# Mol-LLM: Multimodal Generalist Molecular LLM |
|
|
|
|
|
Mol-LLM is a multimodal generalist molecular large language model for chemistry that jointly uses molecular 1D sequences and 2D molecular graphs to solve a wide range of molecular tasks in a single unified framework. |
|
|
It introduces **Molecular structure Preference Optimization (MolPO)** to force the LLM to prefer correct molecular graphs over perturbed ones, resolving the **“graph-bypass”** issue common in prior multimodal molecular LLMs. |
|
|
|
|
|
## Model summary |
|
|
|
|
|
- **Backbone**: Mistral-7B-Instruct-v0.3. |
|
|
- **Modalities**: |
|
|
- Text (natural language instructions). |
|
|
- 1D molecular sequences (SELFIES; SMILES supported via translation). |
|
|
- 2D molecular graphs encoded by a hybrid GNN (GINE + TokenGT). |
|
|
- **Architecture**: Mol-LLM uses a BLIP-2–style architecture where Q-Former (32 query tokens) projects graph embeddings into the LLM token space. |
|
|
- **LLM**: |
|
|
- Mistral-7B-Instruct-v0.3 as text backbone. |
|
|
- Extended tokenizer with SELFIES and numeric tokens, plus task tags for heterogeneous outputs (discrete labels, floats, descriptions). |
|
|
|
|
|
- **Hybrid graph encoder**: |
|
|
- GINE for local structural patterns. |
|
|
- TokenGT (transformer-based) for global structural dependencies and large graphs. |
|
|
- Both encoders produce graph and node (and edge) embeddings; concatenated embeddings are fed into the Q-Former. |
|
|
|
|
|
- **Q-Former**: |
|
|
- 5-layer SciBERT-style transformer with 32 learnable queries. |
|
|
- Cross-attends to graph embeddings and outputs fixed-length tokens appended after SELFIES tokens in the LLM input. |
|
|
- Selected over an MLP projector due to better alignment and graph-token efficiency. |
|
|
- **Tokenizer extensions**: 3K SELFIES tokens, numeric tokens, and task tags for `[SELFIES]`, `[BOOLEAN]`, `[FLOAT]`, `[DESCRIPTION]`, and reaction-direction symbols. |
|
|
- **Training data**: ~3.3M instruction-tuning examples over 27 tasks, with ~40K held-out test instances. |
|
|
|
|
|
Mol-LLM is positioned as a state-of-the-art or comparable **generalist** molecular LLM on the most comprehensive benchmark suite evaluated so far, including out-of-distribution (OOD) settings. |
|
|
|
|
|
## Intended use |
|
|
|
|
|
Mol-LLM is intended to solve **molecular tasks** via a single multitask model. |
|
|
|
|
|
Supported task families: |
|
|
|
|
|
- **Reaction prediction**: |
|
|
- Forward synthesis (product prediction, FS) |
|
|
- Retrosynthesis (reactant prediction, RS) |
|
|
- Reagent prediction (RP) |
|
|
|
|
|
- **Property prediction**: |
|
|
- Regression: LogS, LogD, HOMO, LUMO, HOMO–LUMO gap |
|
|
- Classification: BACE, BBBP, ClinTox, HIV, SIDER |
|
|
|
|
|
- **Text–molecule tasks**: |
|
|
- Description-guided molecule generation |
|
|
- Molecule captioning |
|
|
- IUPAC/SELFIES/formula translation as auxiliary tasks |