hicai-zju
/

InstructBioMol-instruct

PyTorch

llama

Model card Files Files and versions

xet

Community

Improve model card: Add pipeline tag, library, and usage example

by nielsr HF Staff - opened Jul 28, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+25

-3

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 ---
 license: mit
 ---
 <div align="center">
@@ -16,12 +18,13 @@ license: mit
 ### Model Description
-InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning.
 *For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).*
 ### Released Variants
-| Model Name | Stage |  Multimodal| Description |
 |------------|-----------| -------| -------|
 | [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | ❎| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. |
 | [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | ✅ |  Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) |
@@ -45,6 +48,25 @@ InstructBioMol is a multimodal large language model that bridges natural languag
 **Training Objective**: Instruction tuning
 ### Citation
@@ -65,7 +87,7 @@ InstructBioMol is a multimodal large language model that bridges natural languag
   title        = {InstructBioMol: Advancing Biomolecule Understanding and Design Following
                   Human Instructions},
   journal      = {CoRR},
-  volume       = {abs/2410.07919},
   year         = {2024}
 }
 ```

 ---
 license: mit
+pipeline_tag: any-to-any
+library_name: transformers
 ---
 <div align="center">
 ### Model Description
+InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning. It can integrate multimodal biomolecules as input, enabling researchers to articulate design goals in natural language and receive biomolecular outputs that meet precise biological needs.
 *For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).*
 ### Released Variants
+| Model Name | Stage | Multimodal| Description |
 |------------|-----------| -------| -------|
 | [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | ❎| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. |
 | [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | ✅ |  Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) |
 **Training Objective**: Instruction tuning
+### Quickstart
+You can easily load and use the model with the `transformers` library.
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("hicai-zju/InstructBioMol-instruct", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("hicai-zju/InstructBioMol-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
+# Example usage for text generation with a protein sequence
+input_text = "What is the function of the protein with sequence: <PROT>MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR<PROT>"
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
 ### Citation
   title        = {InstructBioMol: Advancing Biomolecule Understanding and Design Following
                   Human Instructions},
   journal      = {CoRR},
+  volume       = {abs/2410-07919},
   year         = {2024}
 }
 ```