mjbuehler commited on
Commit
66fbbc6
·
1 Parent(s): 0bb3e26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -3,6 +3,8 @@
3
 
4
  This model is a pretrained autoregressive transformer model in GPT-style, trained on a large number of protein sequences.
5
 
 
 
6
  Load pretrained model:
7
 
8
  ```python
@@ -26,7 +28,9 @@ model.config.use_cache = False
26
  Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA":
27
 
28
  ```python
29
- prompt = "Sequence<AIIAA"
 
 
30
  generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
31
  print(generated.shape, generated)
32
 
@@ -35,9 +39,9 @@ sample_outputs = model.generate(
35
  eos_token_id =tokenizer.eos_token_id,
36
  do_sample=True,
37
  top_k=500,
38
- max_length = 300,
39
  top_p=0.9,
40
- num_return_sequences=3,
41
  temperature=1,
42
  ).to(device)
43
 
 
3
 
4
  This model is a pretrained autoregressive transformer model in GPT-style, trained on a large number of protein sequences.
5
 
6
+ Dataset: https://huggingface.co/datasets/lamm-mit/GPTProteinPretrained
7
+
8
  Load pretrained model:
9
 
10
  ```python
 
28
  Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA":
29
 
30
  ```python
31
+ import torch
32
+ device='cuda'
33
+ prompt = "Sequence<ETAVPKLLQAL"
34
  generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
35
  print(generated.shape, generated)
36
 
 
39
  eos_token_id =tokenizer.eos_token_id,
40
  do_sample=True,
41
  top_k=500,
42
+ max_length = 1024,
43
  top_p=0.9,
44
+ num_return_sequences=6,
45
  temperature=1,
46
  ).to(device)
47