damlab commited on
Commit
1801ae2
·
1 Parent(s): ad000e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -5,7 +5,6 @@ license: mit
5
  # Model Card for [HIV-BERT]
6
 
7
  ## Table of Contents
8
- - [Table of Contents](#table-of-contents)
9
  - [Summary](#model-summary)
10
  - [Model Description](#model-description)
11
  - [Intended Uses & Limitations](#intended-uses-&-limitations)
@@ -23,7 +22,7 @@ license: mit
23
 
24
  ## Model Description
25
 
26
- [Like the original ProtBert-BFD model, this model encodes each amino acid as an individual token. This model was trained using Masked Language Modeling: a process in which a random set of tokens are masked with the model trained on their prediction. This model was trained using the damlab/hiv_flt dataset with 256 amino acid chunks and a 15% mask rate.]
27
 
28
  ## Intended Uses & Limitations
29
 
@@ -35,7 +34,7 @@ license: mit
35
 
36
  ## Training Data
37
 
38
- [The dataset damlab/HIV_FLT was used to refine the original rostlab/Prot-bert-bfd. This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.]
39
 
40
  ## Training Procedure
41
 
 
5
  # Model Card for [HIV-BERT]
6
 
7
  ## Table of Contents
 
8
  - [Summary](#model-summary)
9
  - [Model Description](#model-description)
10
  - [Intended Uses & Limitations](#intended-uses-&-limitations)
 
22
 
23
  ## Model Description
24
 
25
+ [Like the original ProtBert-BFD model, this model encodes each amino acid as an individual token. This model was trained using Masked Language Modeling: a process in which a random set of tokens are masked with the model trained on their prediction. This model was trained using the damlab/hiv-flt dataset with 256 amino acid chunks and a 15% mask rate.]
26
 
27
  ## Intended Uses & Limitations
28
 
 
34
 
35
  ## Training Data
36
 
37
+ [The dataset damlab/HIV-FLT was used to refine the original rostlab/Prot-bert-bfd. This dataset contains 1790 full HIV genomes from across the globe. When translated, these genomes contain approximately 3.9 million amino-acid tokens.]
38
 
39
  ## Training Procedure
40