PyTorch
llama

Improve model card: Add pipeline tag, library, and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +25 -3
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
5
  <div align="center">
@@ -16,12 +18,13 @@ license: mit
16
 
17
  ### Model Description
18
 
19
- InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning.
20
 
21
  *For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).*
 
22
  ### Released Variants
23
 
24
- | Model Name | Stage | Multimodal| Description |
25
  |------------|-----------| -------| -------|
26
  | [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | ❎| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. |
27
  | [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | ✅ | Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) |
@@ -45,6 +48,25 @@ InstructBioMol is a multimodal large language model that bridges natural languag
45
 
46
  **Training Objective**: Instruction tuning
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ### Citation
50
 
@@ -65,7 +87,7 @@ InstructBioMol is a multimodal large language model that bridges natural languag
65
  title = {InstructBioMol: Advancing Biomolecule Understanding and Design Following
66
  Human Instructions},
67
  journal = {CoRR},
68
- volume = {abs/2410.07919},
69
  year = {2024}
70
  }
71
  ```
 
1
  ---
2
  license: mit
3
+ pipeline_tag: any-to-any
4
+ library_name: transformers
5
  ---
6
 
7
  <div align="center">
 
18
 
19
  ### Model Description
20
 
21
+ InstructBioMol is a multimodal large language model that bridges natural language with biomolecules (proteins and small molecules). It achieves any-to-any alignment between natural language, molecules, and proteins through comprehensive instruction tuning. It can integrate multimodal biomolecules as input, enabling researchers to articulate design goals in natural language and receive biomolecular outputs that meet precise biological needs.
22
 
23
  *For detailed information, please refer to our [paper](https://arxiv.org/abs/2410.07919) and [code repository](https://github.com/HICAI-ZJU/InstructBioMol).*
24
+
25
  ### Released Variants
26
 
27
+ | Model Name | Stage | Multimodal| Description |
28
  |------------|-----------| -------| -------|
29
  | [InstructBioMol-base](https://huggingface.co/hicai-zju/InstructBioMol-base) | Pretraining | ❎| Continual pretrained model on molecular sequences, protein sequences, and scientific literature. |
30
  | [InstructBioMol-instruct-stage1](https://huggingface.co/hicai-zju/InstructBioMol-instruct-stage1) | Instruction tuning (stage 1) | ✅ | Stage1 instruction-tuned model with biomolecular multimodal processing capabilities. (e.g., 3D molecules/proteins) |
 
48
 
49
  **Training Objective**: Instruction tuning
50
 
51
+ ### Quickstart
52
+
53
+ You can easily load and use the model with the `transformers` library.
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoTokenizer, AutoModelForCausalLM
58
+
59
+ # Load the tokenizer and model
60
+ tokenizer = AutoTokenizer.from_pretrained("hicai-zju/InstructBioMol-instruct", trust_remote_code=True)
61
+ model = AutoModelForCausalLM.from_pretrained("hicai-zju/InstructBioMol-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
62
+
63
+ # Example usage for text generation with a protein sequence
64
+ input_text = "What is the function of the protein with sequence: <PROT>MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR<PROT>"
65
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
66
+
67
+ outputs = model.generate(**inputs, max_new_tokens=100)
68
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
69
+ ```
70
 
71
  ### Citation
72
 
 
87
  title = {InstructBioMol: Advancing Biomolecule Understanding and Design Following
88
  Human Instructions},
89
  journal = {CoRR},
90
+ volume = {abs/2410-07919},
91
  year = {2024}
92
  }
93
  ```