Safetensors
biology
michaelm16 commited on
Commit
06889f8
·
verified ·
1 Parent(s): 1f14681

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -5
README.md CHANGED
@@ -35,11 +35,7 @@ The foundation model is pretrained with [CRISPRviva-3B](https://huggingface.co/d
35
  ### Preprocessing
36
 
37
  #### Sequence encoding
38
- We utilized and modified the attention block, which is widely used in various research and industrial applications, to construct our sequence attention module, which functions as the building block of our CRISPR-viva system. The input of the module is a sequence of nucleotide characters composed of “A”, “C”, “G”, “T” and “N”; the length of the sequence should be 35 bp, and sequences with fewer than 35 bp are padded with “N”.
39
-
40
- Using this sequence encoding method, the encoded complementary strand can be easily calculated as . Taking the above sequence as an example, “GGAAAGCAGCAGATGGCAGGACATGGGCTGGAGNN” can be encoded as “44111421421415442144121544425441433”, and the encoded complementary strand is thus “6” – 44111421421415442144121544425441433” = “22555245245251224522545122241225233”.
41
-
42
-
43
 
44
  ### Compute Infrastructure
45
 
 
35
  ### Preprocessing
36
 
37
  #### Sequence encoding
38
+ We utilized and modified the attention block, which is widely used in various research and industrial applications, to construct our sequence attention module, which functions as the building block of our CRISPR-viva system. The input of the module is a sequence of nucleotide characters composed of “A”, “C”, “G”, “T” and “N”, where “N” acts as both mask token for pretraining task and padding token for downstreaming task.
 
 
 
 
39
 
40
  ### Compute Infrastructure
41