File size: 2,274 Bytes
722b2f9
 
 
 
 
 
 
 
 
605e0c1
a4b011c
 
ec0fbb7
 
 
 
 
a4b011c
 
 
605e0c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: apache-2.0
base_model:
- nferruz/ProtGPT2
tags:
- axolotl
- lora
- transformers
- gpt2
- protein_design
---

# Model
This is a fine-tuned version of [ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) : a language model that speaks the protein language and can be used for de novo protein design and engineering.

<br/>

# Dataset
Protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). \
Only the OXIDOREDUCTASE enzymes were used. \
You can find the JSON formatted data @ [oxidos.json](https://github.com/zypchn/pLM/blob/main/data/oxidos.json)

<br/>

# How to Use?
```
>>> from transformers import pipeline
>>> pipe = pipeline("text-generation", model="zypchn/ProtGPT2-Oxido")
>>> sequences = pipe("", max_length=100, do_sample=True, top_k=950, repetition_penalty=1.2, num_return_sequences=5, eos_token_id=0)
# input field has left blank for diversity
```

```json
[
  {"generated_text": "SNANQAPQPQTPTRATDAKKGSYGHPADRVGMEDNKYQVGVFYYDGPNPSYAEWNRDTQFWVETAKTAEKGKFDSIFFADTLGIYDSFKGSFEANLRHGAQFPVNDPLVAISAIAGATTKLGLVATASTTYSEPFHIARRFASLDHLSNGRAGWNIVTSYLDSAARNFGRTEQMEHDERYAIAEEYIDVVYKLWEGSWEDNAVIKDKETGLFTDPAKVHQINHEGEHFRVAGPLNIPRSPQGHPVIFQAGTSERGRDFAARHAEAVFTAQLDLEAGREFYEDIKSRAAKLGRDPDDVKILPGISVFVGKTREEAERKFRELQSLIDEEGALTRFSSYTGTDLSTYDPDGPLPELAGIDPTTPIAKLEGLLGKSKMTVREIALKQGGVSLREYQPFVGATAGSALVGGTPEQIADFMQDWFIEGTVDGFNIMPPYLPDGLEDFVDHVVPELQRRGLFRTEYEGTTLREHLGLAKPLEHHHHHH"},
  {"generated_text": "MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRGSMGPCLICRSTSLKCVFCVRDPNGYKKCSKCDAFFCSRECQTEHWQRHHKFECPAAVAQPQIPPLPKPQQKQLTAAELGMFMEVRNQFALLKTNLERLDYEIFILERNVKLANTVTPPTNRTYFQSTMRYAPNPLRPNMTDAMRQQYLDKNKSSAALEHDLKELIKFKCYLLNDEYVEKEREENPFIWEYFLNKEWRKRNVWGNK\n"},
  {"generated_text": "MGHHHHHHSSGLVPRGSHMTVEQAKKLRAEAEAQAQIQDKAKAIAQTHGKVEVMVDGKHRVVDLDATTRRQLTDGELQAIVVAAQEAAAKQLKAQRQALLEQHQDAELRKLALEGEIV\nAVITGAAQGIGRAIALRLAKDGFRVAVADIDLAAAEAVAAEIEAKGGKALVIEGDVSREEDVKRLVRKAIDQFGRLDYAVNNAGIQGPLAPTEELPLALWNKVIDVNLTGVFLCMKYEIAQMVKQGRGGAIVNTASVAGLSGQPGMVAYCASKHGVVGLTKTVAIEYAKHGIRINAVAPGFIDTPMVQKLPEEKRARIAAAIPMRRLGQPDEIAAVVAFLLSDDASFITGQCIAVDGGFTAGLLA"},
  {"generated_text": "MAASKAADSLAEGAAKLEHHHHHH"},
  {"generated_text": "GSKPQPGVQVEGAKCQVLQAVYDFTVQSASELSFKAGDVICVTGQYDPTLGWWLAEERRTGKSGLVPENYVELLSTGPAQHHHHHH"}
]
```