Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ The dataset is ~500k sequences in total, and the model was trained on about 5% o
|
|
| 18 |
The tokenizer uses bos, eos, and pad special tokens where each sequence is padded to length 512.
|
| 19 |
|
| 20 |
|
| 21 |
-
The purpose of this model was simply to build my own version of NVIDIA's ProtGPT.
|
| 22 |
Upon completion of training, the model will be properly evaluated, looking at perplexity, energy of proteins generated, and AlphaFold 3 pLDDT/pTM scores
|
| 23 |
|
| 24 |
|
|
|
|
| 18 |
The tokenizer uses bos, eos, and pad special tokens where each sequence is padded to length 512.
|
| 19 |
|
| 20 |
|
| 21 |
+
The purpose of this model was simply to build my own version of NVIDIA's ProtGPT. After this model is completed, I will be looking to add control tags to generate sequences based on a given function for a specific organism.
|
| 22 |
Upon completion of training, the model will be properly evaluated, looking at perplexity, energy of proteins generated, and AlphaFold 3 pLDDT/pTM scores
|
| 23 |
|
| 24 |
|