hicai-zju
/

InstructBioMol-instruct

Model card Files Files and versions

InstructBioMol-instruct / README.md

XiangZH's picture

Update README.md

08c1c39 verified 7 months ago

|

history blame contribute delete

2.71 kB

	---
	license: mit
	---

	<div align="center">

	<h3>InstructBioMol: A Multimodal LLM for Biomolecule Understanding and Design</h3>

	<p align="center">
	<a href="https://arxiv.org/abs/2410.07919">Paper</a> •
	<a href="https://github.com/HICAI-ZJU/InstructBioMol">Project</a> •
	<a href="#quickstart">Quickstart</a> •
	<a href="#citation">Citation</a>
	</p>
	</div>

	### Model Description

	InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning.

	For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).
	### Released Variants

	\| Model Name \| Stage \| Multimodal\| Description \|
	\|------------\|-----------\| -------\| -------\|
	\| [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) \| Pretraining \| ❎\| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. \|
	\| [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) \| Instruction tuning (stage 1) \| ✅ \| Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) \|
	\| [InstructBioMol-instruct](https://huggingface.co/hicai-zju/InstructBioMol-instruct) (This Model) \| Instruction tuning (stage 1 and 2) \| ✅\| Fully instruction-tuned model (stage1 & stage2) with biomolecular multimodal processing capabilities (e.g., 3D molecules/proteins) \|

	### Training Details

	Base Architecture: InstructBioMol-instruct-stage1

	Training Data:

	1. Molecule - Natural Language Alignment:
	- 52K data from chebi

	2. Protein - Natural Langauge Alignment:
	- 2 million data from UniProt (Swiss-Prot)

	3. Molecule - Protein Alignment:
	- 1 million data from BindingDB and Rhea


	Training Objective: Instruction tuning


	### Citation

	```bibtex
	@article{zhuang2025advancing,
	author = {Xiang Zhuang and
	Keyan Ding and
	Tianwen Lyu and
	Yinuo Jiang and
	Xiaotong Li and
	Zhuoyi Xiang and
	Zeyuan Wang and
	Ming Qin and
	Kehua Feng and
	Jike Wang and
	Qiang Zhang and
	Huajun Chen},
	title={Advancing biomolecular understanding and design following human instructions},
	journal={Nature Machine Intelligence},
	pages={1--14},
	year={2025},
	publisher={Nature Publishing Group UK London}
	}
	```