| | --- |
| | license: mit |
| | --- |
| | |
| | <div align="center"> |
| |
|
| | <h3>InstructBioMol: A Multimodal LLM for Biomolecule Understanding and Design</h3> |
| |
|
| | <p align="center"> |
| | <a href="https://arxiv.org/abs/2410.07919">Paper</a> β’ |
| | <a href="https://github.com/HICAI-ZJU/InstructBioMol">Project</a> β’ |
| | <a href="#quickstart">Quickstart</a> β’ |
| | <a href="#citation">Citation</a> |
| | </p> |
| | </div> |
| |
|
| | ### Model Description |
| |
|
| | InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning. |
| |
|
| | *For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).* |
| | ### Released Variants |
| |
|
| | | Model Name | Stage | Multimodal| Description | |
| | |------------|-----------| -------| -------| |
| | | [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | β| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. | |
| | | [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | β
| Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) | |
| | | [InstructBioMol-instruct](https://huggingface.co/hicai-zju/InstructBioMol-instruct) (*This Model*) | Instruction tuning (stage 1 and 2) | β
| Fully instruction-tuned model (stage1 & stage2) with biomolecular multimodal processing capabilities (e.g., 3D molecules/proteins) | |
| |
|
| | ### Training Details |
| |
|
| | **Base Architecture**: InstructBioMol-instruct-stage1 |
| |
|
| | **Training Data**: |
| |
|
| | β1. Molecule - Natural Language Alignment: |
| | - 52K data from chebi |
| |
|
| | β2. Protein - Natural Langauge Alignment: |
| | - 2 million data from UniProt (Swiss-Prot) |
| |
|
| | β3. Molecule - Protein Alignment: |
| | - 1 million data from BindingDB and Rhea |
| |
|
| |
|
| | **Training Objective**: Instruction tuning |
| |
|
| |
|
| | ### Citation |
| |
|
| | ```bibtex |
| | @article{zhuang2025advancing, |
| | author = {Xiang Zhuang and |
| | Keyan Ding and |
| | Tianwen Lyu and |
| | Yinuo Jiang and |
| | Xiaotong Li and |
| | Zhuoyi Xiang and |
| | Zeyuan Wang and |
| | Ming Qin and |
| | Kehua Feng and |
| | Jike Wang and |
| | Qiang Zhang and |
| | Huajun Chen}, |
| | title={Advancing biomolecular understanding and design following human instructions}, |
| | journal={Nature Machine Intelligence}, |
| | pages={1--14}, |
| | year={2025}, |
| | publisher={Nature Publishing Group UK London} |
| | } |
| | ``` |