Update README.md
Browse files
README.md
CHANGED
|
@@ -35,11 +35,7 @@ The foundation model is pretrained with [CRISPRviva-3B](https://huggingface.co/d
|
|
| 35 |
### Preprocessing
|
| 36 |
|
| 37 |
#### Sequence encoding
|
| 38 |
-
We utilized and modified the attention block, which is widely used in various research and industrial applications, to construct our sequence attention module, which functions as the building block of our CRISPR-viva system. The input of the module is a sequence of nucleotide characters composed of “A”, “C”, “G”, “T” and “N”
|
| 39 |
-
|
| 40 |
-
Using this sequence encoding method, the encoded complementary strand can be easily calculated as . Taking the above sequence as an example, “GGAAAGCAGCAGATGGCAGGACATGGGCTGGAGNN” can be encoded as “44111421421415442144121544425441433”, and the encoded complementary strand is thus “6” – 44111421421415442144121544425441433” = “22555245245251224522545122241225233”.
|
| 41 |
-
|
| 42 |
-
|
| 43 |
|
| 44 |
### Compute Infrastructure
|
| 45 |
|
|
|
|
| 35 |
### Preprocessing
|
| 36 |
|
| 37 |
#### Sequence encoding
|
| 38 |
+
We utilized and modified the attention block, which is widely used in various research and industrial applications, to construct our sequence attention module, which functions as the building block of our CRISPR-viva system. The input of the module is a sequence of nucleotide characters composed of “A”, “C”, “G”, “T” and “N”, where “N” acts as both mask token for pretraining task and padding token for downstreaming task.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
### Compute Infrastructure
|
| 41 |
|