bm2-lab
/

CRISPR-viva

Model card Files Files and versions

michaelm16 commited on Oct 17, 2024

Commit

06889f8

·

verified ·

1 Parent(s): 1f14681

Update README.md

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -35,11 +35,7 @@ The foundation model is pretrained with [CRISPRviva-3B](https://huggingface.co/d
 ### Preprocessing
 #### Sequence encoding
-We utilized and modified the attention block, which is widely used in various research and industrial applications, to construct our sequence attention module, which functions as the building block of our CRISPR-viva system. The input of the module is a sequence of nucleotide characters composed of “A”, “C”, “G”, “T” and “N”; the length of the sequence should be 35 bp, and sequences with fewer than 35 bp are padded with “N”.
-Using this sequence encoding method, the encoded complementary strand can be easily calculated as  . Taking the above sequence as an example, “GGAAAGCAGCAGATGGCAGGACATGGGCTGGAGNN” can be encoded as “44111421421415442144121544425441433”, and the encoded complementary strand is thus “6” – 44111421421415442144121544425441433” = “22555245245251224522545122241225233”.
 ### Compute Infrastructure

 ### Preprocessing
 #### Sequence encoding
+We utilized and modified the attention block, which is widely used in various research and industrial applications, to construct our sequence attention module, which functions as the building block of our CRISPR-viva system. The input of the module is a sequence of nucleotide characters composed of “A”, “C”, “G”, “T” and “N”, where “N” acts as both mask token for pretraining task and padding token for downstreaming task.
 ### Compute Infrastructure