pavm595 commited on
Commit
dd52476
·
verified ·
1 Parent(s): 19ccbab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -5
README.md CHANGED
@@ -21,13 +21,11 @@ ProtBert-BFD is based on Bert model which pretrained on a large corpus of protei
21
  This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
22
  publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
23
 
24
- One important difference between our Bert model and the original Bert version is the way of dealing with sequences as separate documents
25
- This means the Next sentence prediction is not used, as each sequence is treated as a complete document.
26
- The masking follows the original Bert training with randomly masks 15% of the amino acids in the input.
27
 
28
  At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein
29
- shape.
30
- This implied learning some of the grammar of the language of life realized in protein sequences.
31
 
32
  ## Intended uses & limitations
33
 
 
21
  This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
22
  publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
23
 
24
+ One important difference between this Bert model and the original Bert version is the way of dealing with sequences as separate documents.
25
+ This means the `Next Sentence Prediction` is not used, as each sequence is treated as a complete document. The masking follows the original Bert training with randomly masks 15% of the amino acids in the input.
 
26
 
27
  At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein
28
+ shape. This implied learning some of the grammar of the language of life realized in protein sequences.
 
29
 
30
  ## Intended uses & limitations
31