joethequant commited on
Commit
407002f
·
1 Parent(s): 36e6877

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -52
README.md CHANGED
@@ -7,79 +7,77 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- ## A Fine-Tuned GPT for De Novo Therapeutic Antibodies
11
 
12
- ### What are antibodies?
13
 
14
- Antibodies are proteins that bind to a target protein (called an antigen) in order to mount an immune response
15
 
16
- They are incredibly **safe** and **effective** therapeutics against infectious diseases, cancer, and autoimmune disorders.
17
 
18
- ### Why aren’t there more antibodies on the market?
19
 
20
- Current antibody discovery methods require a lot of capital, expertise, and luck.
21
 
22
- Generative AI opens up the possibility of moving from a paradigm of antibody discovery to antibody generation.
23
 
24
- However, work is required to translate the advances of LLMs to the realm of drug discovery.
25
 
26
- ### What is AntibodyGPT?
27
 
28
- A fine-tuned GPT language model that researchers can use to rapidly generate functional, diverse antibodies for any given target sequence
29
 
30
- ### Key Features
31
 
32
- - Rapid generation
33
- - Only requires target sequence
34
- - Outputs diverse, human-like antibodies
35
 
36
- ### Links:
37
- - [Web Demo](https://orca-app-ygzbp.ondigitalocean.app/Demo_Antibody_Generator)
38
- - [Huggingface Model Repository](https://huggingface.co/AntibodyGeneration)
39
- - [OpenSource RunPod Severless Rest API](https://github.com/joethequant/docker_protein_generator)
40
- - [The Code for this App](https://github.com/joethequant/docker_streamlit_antibody_protein_generation)
41
 
42
- ### Additional Resources and Links
43
- - [Progen Foundation Models](https://github.com/salesforce/progen)
44
- - [ANARCI Github](https://github.com/oxpig/ANARCI)
45
- - [ANARCI Webserver](http://opig.stats.ox.ac.uk/webapps/anarci/)
46
- - [TAP: Therapeutic Antibody Profiler](https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/tap)
47
- - [ESM Fold](https://esmatlas.com/resources?action=fold)
48
 
49
-
50
 
51
- ### Example Code To Use AntibodyGPT
52
 
53
- ```python
54
- from models.progen.modeling_progen import ProGenForCausalLM
55
- import torch
56
- from tokenizers import Tokenizer
57
- import json
58
 
59
- # Define the model identifier from Hugging Face's model hub
60
- model_path = 'AntibodyGeneration/fine-tuned-progen2-small'
 
 
 
61
 
62
- # Load the model and tokenizer
63
- model = ProGenForCausalLM.from_pretrained(model_path)
64
- tokenizer = Tokenizer.from_file('tokenizer.json')
65
 
66
- # Define your sequence and other parameters
67
- target_sequence = 'MQIPQAPWPVVWAVLQLGWRPGWFLDSPDRPWNPPTFSPALLVVTEGDNATFTCSFSNTSESFVLNWYRMSPSNQTDKLAAFPEDRSQPGQDCRFRVTQLPNGRDFHMSVVRARRNDSGTYLCGAISLAPKAQIKESLRAELRVTERRAEVPTAHPSPSPRPAGQFQTLVVGVVGGLLGSLVLLVWVLAVICSRAARGTIGARRTGQPLKEDPSAVPVFSVDYGELDFQWREKTPEPPVPCVPEQTEYATIVFPSGMGTSSPARRGSADGPRSAQPLRPEDGHCSWPL'
68
- number_of_sequences = 2
69
 
70
- # Tokenize the sequence
71
- tokenized_sequence = tokenizer(target_sequence, return_tensors="pt")
 
72
 
73
- # Move model and tensors to CUDA if available
74
- device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
75
- model = model.to(device)
76
- tokenized_sequence = tokenized_sequence.to(device)
77
 
78
- # Generate sequences
79
- with torch.no_grad():
80
- output = model.generate(**tokenized_sequence, max_length=1024, pad_token_id=tokenizer.pad_token_id, do_sample=True, top_p=0.9, temperature=0.8, num_return_sequences=number_of_sequences)
 
81
 
82
- # Decoding the output to get generated sequences
83
- generated_sequences = [tokenizer.decode(output_seq, skip_special_tokens=True) for output_seq in output]
 
84
 
85
- ```
 
 
7
  pinned: false
8
  ---
9
 
10
+ ## A Fine-Tuned GPT for De Novo Therapeutic Antibodies
11
 
12
+ ### What are antibodies?
13
 
14
+ Antibodies are proteins that bind to a target protein (called an antigen) in order to mount an immune response
15
 
16
+ They are incredibly **safe** and **effective** therapeutics against infectious diseases, cancer, and autoimmune disorders.
17
 
18
+ ### Why aren’t there more antibodies on the market?
19
 
20
+ Current antibody discovery methods require a lot of capital, expertise, and luck.
21
 
22
+ Generative AI opens up the possibility of moving from a paradigm of antibody discovery to antibody generation.
23
 
24
+ However, work is required to translate the advances of LLMs to the realm of drug discovery.
25
 
26
+ ### What is AntibodyGPT?
27
 
28
+ A fine-tuned GPT language model that researchers can use to rapidly generate functional, diverse antibodies for any given target sequence
29
 
30
+ ### Key Features
31
 
32
+ - Rapid generation
33
+ - Only requires target sequence
34
+ - Outputs diverse, human-like antibodies
35
 
36
+ ### Links:
37
+ - [Web Demo](https://orca-app-ygzbp.ondigitalocean.app/Demo_Antibody_Generator)
38
+ - [Huggingface Model Repository](https://huggingface.co/AntibodyGeneration)
39
+ - [OpenSource RunPod Severless Rest API](https://github.com/joethequant/docker_protein_generator)
40
+ - [The Code for this App](https://github.com/joethequant/docker_streamlit_antibody_protein_generation)
41
 
42
+ ### Additional Resources and Links
43
+ - [Progen Foundation Models](https://github.com/salesforce/progen)
44
+ - [ANARCI Github](https://github.com/oxpig/ANARCI)
45
+ - [ANARCI Webserver](http://opig.stats.ox.ac.uk/webapps/anarci/)
46
+ - [TAP: Therapeutic Antibody Profiler](https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/tap)
47
+ - [ESM Fold](https://esmatlas.com/resources?action=fold)
48
 
 
49
 
 
50
 
51
+ ### Example Code To Use AntibodyGPT
 
 
 
 
52
 
53
+ ```python
54
+ from models.progen.modeling_progen import ProGenForCausalLM
55
+ import torch
56
+ from tokenizers import Tokenizer
57
+ import json
58
 
59
+ # Define the model identifier from Hugging Face's model hub
60
+ model_path = 'AntibodyGeneration/fine-tuned-progen2-small'
 
61
 
62
+ # Load the model and tokenizer
63
+ model = ProGenForCausalLM.from_pretrained(model_path)
64
+ tokenizer = Tokenizer.from_file('tokenizer.json')
65
 
66
+ # Define your sequence and other parameters
67
+ target_sequence = 'MQIPQAPWPVVWAVLQLGWRPGWFLDSPDRPWNPPTFSPALLVVTEGDNATFTCSFSNTSESFVLNWYRMSPSNQTDKLAAFPEDRSQPGQDCRFRVTQLPNGRDFHMSVVRARRNDSGTYLCGAISLAPKAQIKESLRAELRVTERRAEVPTAHPSPSPRPAGQFQTLVVGVVGGLLGSLVLLVWVLAVICSRAARGTIGARRTGQPLKEDPSAVPVFSVDYGELDFQWREKTPEPPVPCVPEQTEYATIVFPSGMGTSSPARRGSADGPRSAQPLRPEDGHCSWPL'
68
+ number_of_sequences = 2
69
 
70
+ # Tokenize the sequence
71
+ tokenized_sequence = tokenizer(target_sequence, return_tensors="pt")
 
 
72
 
73
+ # Move model and tensors to CUDA if available
74
+ device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
75
+ model = model.to(device)
76
+ tokenized_sequence = tokenized_sequence.to(device)
77
 
78
+ # Generate sequences
79
+ with torch.no_grad():
80
+ output = model.generate(**tokenized_sequence, max_length=1024, pad_token_id=tokenizer.pad_token_id, do_sample=True, top_p=0.9, temperature=0.8, num_return_sequences=number_of_sequences)
81
 
82
+ # Decoding the output to get generated sequences
83
+ generated_sequences = [tokenizer.decode(output_seq, skip_special_tokens=True) for output_seq in output]