damlab
/

HIV_BERT

damlab commited on Feb 24, 2022

Commit

93acdc5

1 Parent(s): 9f8a750

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -41,13 +41,13 @@ As a masked language model this tool can be used to predict expected mutations u
 ## Training Data
-The dataset damlab/HIV_FLT was used to refine the original rostlab/Prot-bert-bfd. This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.
 ## Training Procedure
 ### Preprocessing
-As with the rostlab/Prot-bert-bfd model, the rare amino acids U, Z, O, and B were converted to X and spaces were added between each amino acid. All strings were concatenated and chunked into 256 token chunks for training. A random 20% of chunks were held for validation.
 ### Training

 ## Training Data
+The dataset [damlab/HIV_FLT](https://huggingface.co/datasets/damlab/HIV_FLT) was used to refine the original [rostlab/Prot-bert-bfd](https://huggingface.co/Rostlab/prot_bert_bfd). This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.
 ## Training Procedure
 ### Preprocessing
+As with the [rostlab/Prot-bert-bfd](https://huggingface.co/Rostlab/prot_bert_bfd) model, the rare amino acids U, Z, O, and B were converted to X and spaces were added between each amino acid. All strings were concatenated and chunked into 256 token chunks for training. A random 20% of chunks were held for validation.
 ### Training