Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,6 @@ license: mit
|
|
| 5 |
# Model Card for [HIV-BERT]
|
| 6 |
|
| 7 |
## Table of Contents
|
| 8 |
-
- [Table of Contents](#table-of-contents)
|
| 9 |
- [Summary](#model-summary)
|
| 10 |
- [Model Description](#model-description)
|
| 11 |
- [Intended Uses & Limitations](#intended-uses-&-limitations)
|
|
@@ -23,7 +22,7 @@ license: mit
|
|
| 23 |
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
-
[Like the original ProtBert-BFD model, this model encodes each amino acid as an individual token. This model was trained using Masked Language Modeling: a process in which a random set of tokens are masked with the model trained on their prediction. This model was trained using the damlab/
|
| 27 |
|
| 28 |
## Intended Uses & Limitations
|
| 29 |
|
|
@@ -35,7 +34,7 @@ license: mit
|
|
| 35 |
|
| 36 |
## Training Data
|
| 37 |
|
| 38 |
-
[The dataset damlab/
|
| 39 |
|
| 40 |
## Training Procedure
|
| 41 |
|
|
|
|
| 5 |
# Model Card for [HIV-BERT]
|
| 6 |
|
| 7 |
## Table of Contents
|
|
|
|
| 8 |
- [Summary](#model-summary)
|
| 9 |
- [Model Description](#model-description)
|
| 10 |
- [Intended Uses & Limitations](#intended-uses-&-limitations)
|
|
|
|
| 22 |
|
| 23 |
## Model Description
|
| 24 |
|
| 25 |
+
[Like the original ProtBert-BFD model, this model encodes each amino acid as an individual token. This model was trained using Masked Language Modeling: a process in which a random set of tokens are masked with the model trained on their prediction. This model was trained using the damlab/hiv-flt dataset with 256 amino acid chunks and a 15% mask rate.]
|
| 26 |
|
| 27 |
## Intended Uses & Limitations
|
| 28 |
|
|
|
|
| 34 |
|
| 35 |
## Training Data
|
| 36 |
|
| 37 |
+
[The dataset damlab/HIV-FLT was used to refine the original rostlab/Prot-bert-bfd. This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.]
|
| 38 |
|
| 39 |
## Training Procedure
|
| 40 |
|