Update README.md
Browse files
README.md
CHANGED
|
@@ -3,4 +3,31 @@ license: mit
|
|
| 3 |
tags:
|
| 4 |
- protein
|
| 5 |
- thermostability
|
| 6 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- protein
|
| 5 |
- thermostability
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
__Purpose__: classifies protein sequence into Thermophilic (> 60C) or Mesophilic (<40C) by host organism growth temperature.
|
| 9 |
+
|
| 10 |
+
__Training__:
|
| 11 |
+
ProteinBERT (Rostlab/prot_bert) was fine tuned on a class balanced version of learn2therm (see [here]()), about 250k protein amino acid sequences.
|
| 12 |
+
|
| 13 |
+
Training parameters below:
|
| 14 |
+
TODO
|
| 15 |
+
|
| 16 |
+
See the [training repository](https://github.com/BeckResearchLab/learn2thermML) for code.
|
| 17 |
+
|
| 18 |
+
__Usage__:
|
| 19 |
+
Prepare sequences identically to using the original pretrained model:
|
| 20 |
+
|
| 21 |
+
```
|
| 22 |
+
from transformers import BertModelForSequenceClassification, BertTokenizer
|
| 23 |
+
import torch
|
| 24 |
+
import re
|
| 25 |
+
tokenizer = BertTokenizer.from_pretrained("evankomp/learn2therm", do_lower_case=False )
|
| 26 |
+
model = BertModelForSequenceClassification.from_pretrained("evankomp/learn2therm")
|
| 27 |
+
sequence_Example = "A E T C Z A O"
|
| 28 |
+
sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example)
|
| 29 |
+
encoded_input = tokenizer(sequence_Example, return_tensors='pt')
|
| 30 |
+
output = torch.argmax(model(**encoded_input), dim=1)
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
1 indicates thermophilic, 0 mesophilic.
|