Improve model card: Add pipeline tag, library, and usage example
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
<div align="center">
|
|
@@ -16,12 +18,13 @@ license: mit
|
|
| 16 |
|
| 17 |
### Model Description
|
| 18 |
|
| 19 |
-
InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning.
|
| 20 |
|
| 21 |
*For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).*
|
|
|
|
| 22 |
### Released Variants
|
| 23 |
|
| 24 |
-
| Model Name | Stage |
|
| 25 |
|------------|-----------| -------| -------|
|
| 26 |
| [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | ❎| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. |
|
| 27 |
| [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | ✅ | Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) |
|
|
@@ -45,6 +48,25 @@ InstructBioMol is a multimodal large language model that bridges natural languag
|
|
| 45 |
|
| 46 |
**Training Objective**: Instruction tuning
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
### Citation
|
| 50 |
|
|
@@ -65,7 +87,7 @@ InstructBioMol is a multimodal large language model that bridges natural languag
|
|
| 65 |
title = {InstructBioMol: Advancing Biomolecule Understanding and Design Following
|
| 66 |
Human Instructions},
|
| 67 |
journal = {CoRR},
|
| 68 |
-
volume = {abs/2410
|
| 69 |
year = {2024}
|
| 70 |
}
|
| 71 |
```
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: any-to-any
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
|
| 7 |
<div align="center">
|
|
|
|
| 18 |
|
| 19 |
### Model Description
|
| 20 |
|
| 21 |
+
InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning. It can integrate multimodal biomolecules as input, enabling researchers to articulate design goals in natural language and receive biomolecular outputs that meet precise biological needs.
|
| 22 |
|
| 23 |
*For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).*
|
| 24 |
+
|
| 25 |
### Released Variants
|
| 26 |
|
| 27 |
+
| Model Name | Stage | Multimodal| Description |
|
| 28 |
|------------|-----------| -------| -------|
|
| 29 |
| [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | ❎| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. |
|
| 30 |
| [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | ✅ | Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) |
|
|
|
|
| 48 |
|
| 49 |
**Training Objective**: Instruction tuning
|
| 50 |
|
| 51 |
+
### Quickstart
|
| 52 |
+
|
| 53 |
+
You can easily load and use the model with the `transformers` library.
|
| 54 |
+
|
| 55 |
+
```python
|
| 56 |
+
import torch
|
| 57 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 58 |
+
|
| 59 |
+
# Load the tokenizer and model
|
| 60 |
+
tokenizer = AutoTokenizer.from_pretrained("hicai-zju/InstructBioMol-instruct", trust_remote_code=True)
|
| 61 |
+
model = AutoModelForCausalLM.from_pretrained("hicai-zju/InstructBioMol-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
|
| 62 |
+
|
| 63 |
+
# Example usage for text generation with a protein sequence
|
| 64 |
+
input_text = "What is the function of the protein with sequence: <PROT>MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR<PROT>"
|
| 65 |
+
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
| 66 |
+
|
| 67 |
+
outputs = model.generate(**inputs, max_new_tokens=100)
|
| 68 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 69 |
+
```
|
| 70 |
|
| 71 |
### Citation
|
| 72 |
|
|
|
|
| 87 |
title = {InstructBioMol: Advancing Biomolecule Understanding and Design Following
|
| 88 |
Human Instructions},
|
| 89 |
journal = {CoRR},
|
| 90 |
+
volume = {abs/2410-07919},
|
| 91 |
year = {2024}
|
| 92 |
}
|
| 93 |
```
|