Update README.md
Browse files
README.md
CHANGED
|
@@ -21,13 +21,11 @@ ProtBert-BFD is based on Bert model which pretrained on a large corpus of protei
|
|
| 21 |
This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
|
| 22 |
publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
|
| 23 |
|
| 24 |
-
One important difference between
|
| 25 |
-
This means the Next
|
| 26 |
-
The masking follows the original Bert training with randomly masks 15% of the amino acids in the input.
|
| 27 |
|
| 28 |
At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein
|
| 29 |
-
shape.
|
| 30 |
-
This implied learning some of the grammar of the language of life realized in protein sequences.
|
| 31 |
|
| 32 |
## Intended uses & limitations
|
| 33 |
|
|
|
|
| 21 |
This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
|
| 22 |
publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
|
| 23 |
|
| 24 |
+
One important difference between this Bert model and the original Bert version is the way of dealing with sequences as separate documents.
|
| 25 |
+
This means the `Next Sentence Prediction` is not used, as each sequence is treated as a complete document. The masking follows the original Bert training with randomly masks 15% of the amino acids in the input.
|
|
|
|
| 26 |
|
| 27 |
At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein
|
| 28 |
+
shape. This implied learning some of the grammar of the language of life realized in protein sequences.
|
|
|
|
| 29 |
|
| 30 |
## Intended uses & limitations
|
| 31 |
|