sizhennnn commited on
Commit
6b10979
·
verified ·
1 Parent(s): 75a89dd

Training in progress, step 838

Browse files
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "gpt2",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.1,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 768,
16
+ "n_head": 12,
17
+ "n_inner": null,
18
+ "n_layer": 12,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": false,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": false,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "text-generation": {
31
+ "do_sample": true,
32
+ "max_length": 50
33
+ }
34
+ },
35
+ "torch_dtype": "float32",
36
+ "transformers_version": "4.43.4",
37
+ "use_cache": true,
38
+ "vocab_size": 1028
39
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adece6df89c36e0dbc302ce2d6e0cbec8667ee5cabc4819d4702720afdeed261
3
+ size 346542720
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "BOS",
3
+ "eos_token": "EOS",
4
+ "pad_token": "PAD",
5
+ "unk_token": "UNK"
6
+ }
tokenizer.json ADDED
@@ -0,0 +1,2128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 15,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": {
11
+ "Fixed": 15
12
+ },
13
+ "direction": "Right",
14
+ "pad_to_multiple_of": null,
15
+ "pad_id": 1027,
16
+ "pad_type_id": 0,
17
+ "pad_token": "PAD"
18
+ },
19
+ "added_tokens": [
20
+ {
21
+ "id": 1024,
22
+ "content": "BOS",
23
+ "single_word": false,
24
+ "lstrip": false,
25
+ "rstrip": false,
26
+ "normalized": false,
27
+ "special": true
28
+ },
29
+ {
30
+ "id": 1025,
31
+ "content": "EOS",
32
+ "single_word": false,
33
+ "lstrip": false,
34
+ "rstrip": false,
35
+ "normalized": false,
36
+ "special": true
37
+ },
38
+ {
39
+ "id": 1026,
40
+ "content": "UNK",
41
+ "single_word": false,
42
+ "lstrip": false,
43
+ "rstrip": false,
44
+ "normalized": false,
45
+ "special": true
46
+ },
47
+ {
48
+ "id": 1027,
49
+ "content": "PAD",
50
+ "single_word": false,
51
+ "lstrip": false,
52
+ "rstrip": false,
53
+ "normalized": false,
54
+ "special": true
55
+ }
56
+ ],
57
+ "normalizer": {
58
+ "type": "NFKC"
59
+ },
60
+ "pre_tokenizer": {
61
+ "type": "Metaspace",
62
+ "replacement": "▁",
63
+ "prepend_scheme": "always",
64
+ "split": true
65
+ },
66
+ "post_processor": null,
67
+ "decoder": {
68
+ "type": "Metaspace",
69
+ "replacement": "▁",
70
+ "prepend_scheme": "always",
71
+ "split": true
72
+ },
73
+ "model": {
74
+ "type": "BPE",
75
+ "dropout": null,
76
+ "unk_token": "<unk>",
77
+ "continuing_subword_prefix": null,
78
+ "end_of_word_suffix": null,
79
+ "fuse_unk": false,
80
+ "byte_fallback": false,
81
+ "ignore_merges": false,
82
+ "vocab": {
83
+ "": 0,
84
+ "\n": 1,
85
+ "A": 2,
86
+ "C": 3,
87
+ "G": 4,
88
+ "T": 5,
89
+ "▁": 6,
90
+ "AA": 7,
91
+ "CC": 8,
92
+ "TT": 9,
93
+ "GG": 10,
94
+ "AC": 11,
95
+ "GC": 12,
96
+ "TC": 13,
97
+ "AG": 14,
98
+ "TG": 15,
99
+ "AT": 16,
100
+ "AAAA": 17,
101
+ "ACC": 18,
102
+ "AGG": 19,
103
+ "GT": 20,
104
+ "AAC": 21,
105
+ "ATT": 22,
106
+ "AGC": 23,
107
+ "ATC": 24,
108
+ "CCC": 25,
109
+ "ACG": 26,
110
+ "GGC": 27,
111
+ "ATG": 28,
112
+ "TGC": 29,
113
+ "TCC": 30,
114
+ "AAG": 31,
115
+ "TTC": 32,
116
+ "TGG": 33,
117
+ "GCC": 34,
118
+ "▁C": 35,
119
+ "TAC": 36,
120
+ "TAA": 37,
121
+ "TTTT": 38,
122
+ "TCG": 39,
123
+ "TAG": 40,
124
+ "AAAAAAAA": 41,
125
+ "GCG": 42,
126
+ "TTG": 43,
127
+ "CCG": 44,
128
+ "ACCC": 45,
129
+ "GGG": 46,
130
+ "TAT": 47,
131
+ "▁G": 48,
132
+ "AAAC": 49,
133
+ "AACC": 50,
134
+ "AGAC": 51,
135
+ "AGGC": 52,
136
+ "AGCC": 53,
137
+ "TGCC": 54,
138
+ "AAGC": 55,
139
+ "ACAC": 56,
140
+ "ATCC": 57,
141
+ "ATTC": 58,
142
+ "TACC": 59,
143
+ "ATGC": 60,
144
+ "AAAG": 61,
145
+ "TCCC": 62,
146
+ "ACGG": 63,
147
+ "ACGC": 64,
148
+ "AAGG": 65,
149
+ "TCGC": 66,
150
+ "TCGG": 67,
151
+ "TAAC": 68,
152
+ "ATGG": 69,
153
+ "ACCG": 70,
154
+ "AATT": 71,
155
+ "AATC": 72,
156
+ "TTCC": 73,
157
+ "AGCG": 74,
158
+ "▁CC": 75,
159
+ "ATCG": 76,
160
+ "TGGC": 77,
161
+ "ATAC": 78,
162
+ "TTGC": 79,
163
+ "AATG": 80,
164
+ "AGTC": 81,
165
+ "GTCC": 82,
166
+ "AACG": 83,
167
+ "TTTTTTTT": 84,
168
+ "TTGG": 85,
169
+ "ACTG": 86,
170
+ "TAGG": 87,
171
+ "ATTG": 88,
172
+ "AGTG": 89,
173
+ "TAGC": 90,
174
+ "CCCG": 91,
175
+ "AGTT": 92,
176
+ "GTGC": 93,
177
+ "▁GC": 94,
178
+ "TACG": 95,
179
+ "TCAC": 96,
180
+ "GTGG": 97,
181
+ "▁AC": 98,
182
+ "AAAAC": 99,
183
+ "AAAT": 100,
184
+ "TATT": 101,
185
+ "TATC": 102,
186
+ "TGAC": 103,
187
+ "TCCG": 104,
188
+ "TGCG": 105,
189
+ "TTCG": 106,
190
+ "GGCC": 107,
191
+ "AGAG": 108,
192
+ "TATG": 109,
193
+ "TGTC": 110,
194
+ "TAAG": 111,
195
+ "AGGG": 112,
196
+ "GGCG": 113,
197
+ "TTTC": 114,
198
+ "ACTC": 115,
199
+ "AGAA": 116,
200
+ "ATAA": 117,
201
+ "▁TC": 118,
202
+ "ACTT": 119,
203
+ "▁AA": 120,
204
+ "▁GG": 121,
205
+ "GCGG": 122,
206
+ "TGTG": 123,
207
+ "ATAG": 124,
208
+ "▁AAAAAAAA": 125,
209
+ "▁CG": 126,
210
+ "▁TT": 127,
211
+ "GCCG": 128,
212
+ "TAAAA": 129,
213
+ "TGTT": 130,
214
+ "ACAA": 131,
215
+ "TCTC": 132,
216
+ "GCGC": 133,
217
+ "▁AG": 134,
218
+ "ACAG": 135,
219
+ "TGGG": 136,
220
+ "AAGT": 137,
221
+ "ACCCC": 138,
222
+ "ACGCC": 139,
223
+ "TCTT": 140,
224
+ "TCTG": 141,
225
+ "TTTG": 142,
226
+ "ACGT": 143,
227
+ "AAAAG": 144,
228
+ "AAGCC": 145,
229
+ "ATGCC": 146,
230
+ "GGGG": 147,
231
+ "▁TG": 148,
232
+ "AAACC": 149,
233
+ "AGGCC": 150,
234
+ "TTAC": 151,
235
+ "GGGC": 152,
236
+ "ACTGC": 153,
237
+ "ATAT": 154,
238
+ "ACCGC": 155,
239
+ "TGAG": 156,
240
+ "GTAC": 157,
241
+ "TA": 158,
242
+ "TCAG": 159,
243
+ "ACCCG": 160,
244
+ "▁AT": 161,
245
+ "AACCC": 162,
246
+ "ACCGG": 163,
247
+ "AAAGC": 164,
248
+ "TTAG": 165,
249
+ "ATTCC": 166,
250
+ "▁AAAA": 167,
251
+ "AACGC": 168,
252
+ "TAACC": 169,
253
+ "GTAG": 170,
254
+ "AAAAAAAAAAAAAAAA": 171,
255
+ "ACAT": 172,
256
+ "TCCCC": 173,
257
+ "TGAA": 174,
258
+ "ATCGC": 175,
259
+ "TCAA": 176,
260
+ "TGGCC": 177,
261
+ "TCGT": 178,
262
+ "AACGG": 179,
263
+ "TAGCC": 180,
264
+ "TACCC": 181,
265
+ "ATTGC": 182,
266
+ "AGCGG": 183,
267
+ "AGCGC": 184,
268
+ "ATCGG": 185,
269
+ "CCCGC": 186,
270
+ "▁GT": 187,
271
+ "TTAA": 188,
272
+ "▁AAC": 189,
273
+ "ACGTC": 190,
274
+ "▁GCC": 191,
275
+ "TGCGG": 192,
276
+ "AGAT": 193,
277
+ "▁ACC": 194,
278
+ "TCGCC": 195,
279
+ "CCCGG": 196,
280
+ "ACATC": 197,
281
+ "TCCGC": 198,
282
+ "AGGCG": 199,
283
+ "TTGCC": 200,
284
+ "ATACC": 201,
285
+ "AATAA": 202,
286
+ "ATAAC": 203,
287
+ "AGGGC": 204,
288
+ "AGACC": 205,
289
+ "▁ATT": 206,
290
+ "ACACC": 207,
291
+ "ACAAC": 208,
292
+ "TTTTTTTTTTTTTTTT": 209,
293
+ "TTGT": 210,
294
+ "TCCGG": 211,
295
+ "AAATC": 212,
296
+ "TTCGC": 213,
297
+ "TTCGG": 214,
298
+ "TACGC": 215,
299
+ "TGCCC": 216,
300
+ "ATCCC": 217,
301
+ "▁CCC": 218,
302
+ "TGCGC": 219,
303
+ "AGAAC": 220,
304
+ "AGCCC": 221,
305
+ "TACGG": 222,
306
+ "AAATG": 223,
307
+ "AAAGG": 224,
308
+ "▁AAAAAAAAAAAAAAAA": 225,
309
+ "TATCC": 226,
310
+ "AAATT": 227,
311
+ "GGCGG": 228,
312
+ "▁AAG": 229,
313
+ "▁TAA": 230,
314
+ "ATTCG": 231,
315
+ "TGACC": 232,
316
+ "GGCGC": 233,
317
+ "ATTGG": 234,
318
+ "AATAC": 235,
319
+ "AATGC": 236,
320
+ "AAT": 237,
321
+ "▁ATC": 238,
322
+ "AGGGG": 239,
323
+ "▁AGC": 240,
324
+ "ATAGC": 241,
325
+ "ACAGC": 242,
326
+ "TCCCG": 243,
327
+ "ACGTT": 244,
328
+ "GCCCC": 245,
329
+ "▁AAAAAAAAAAAA": 246,
330
+ "AATCC": 247,
331
+ "AACCG": 248,
332
+ "TAAGC": 249,
333
+ "ACGTG": 250,
334
+ "▁CGC": 251,
335
+ "ACT": 252,
336
+ "TCTGC": 253,
337
+ "ACATT": 254,
338
+ "AAGGC": 255,
339
+ "ATATC": 256,
340
+ "AGAGC": 257,
341
+ "TGGGC": 258,
342
+ "AAAAAAAAAAAA": 259,
343
+ "ATGGC": 260,
344
+ "TATGC": 261,
345
+ "AGTCC": 262,
346
+ "TGAAC": 263,
347
+ "ATCCG": 264,
348
+ "TACCG": 265,
349
+ "TTTTC": 266,
350
+ "ACGGC": 267,
351
+ "TCGTC": 268,
352
+ "TAGGC": 269,
353
+ "TTACC": 270,
354
+ "GCGTC": 271,
355
+ "ACTCC": 272,
356
+ "ACATG": 273,
357
+ "AGTGC": 274,
358
+ "TGTCC": 275,
359
+ "▁ATG": 276,
360
+ "AAGTC": 277,
361
+ "TGGGG": 278,
362
+ "AGCCG": 279,
363
+ "AATAG": 280,
364
+ "ATAGG": 281,
365
+ "ACTAA": 282,
366
+ "TTAAC": 283,
367
+ "TGCCG": 284,
368
+ "TTCCC": 285,
369
+ "TCACC": 286,
370
+ "ACAGG": 287,
371
+ "AATTC": 288,
372
+ "AAACG": 289,
373
+ "ATAAG": 290,
374
+ "ACTAC": 291,
375
+ "AGT": 292,
376
+ "TGAGC": 293,
377
+ "TAACG": 294,
378
+ "ATATG": 295,
379
+ "AGATG": 296,
380
+ "ATATT": 297,
381
+ "AGATC": 298,
382
+ "TGTGC": 299,
383
+ "TTTCC": 300,
384
+ "AGAGG": 301,
385
+ "ATGCG": 302,
386
+ "▁TAC": 303,
387
+ "▁TCC": 304,
388
+ "▁TGC": 305,
389
+ "AGTAC": 306,
390
+ "TTTTG": 307,
391
+ "TGGCG": 308,
392
+ "AGTAA": 309,
393
+ "AGATT": 310,
394
+ "AGACG": 311,
395
+ "ACTGG": 312,
396
+ "ACAAG": 313,
397
+ "ACTTC": 314,
398
+ "CCCC": 315,
399
+ "▁GGC": 316,
400
+ "TCAAC": 317,
401
+ "TCGGC": 318,
402
+ "▁CCG": 319,
403
+ "TTTGC": 320,
404
+ "ATAAAA": 321,
405
+ "AGAAG": 322,
406
+ "TGTGG": 323,
407
+ "TGAGG": 324,
408
+ "TATGG": 325,
409
+ "ACA": 326,
410
+ "TCTAC": 327,
411
+ "AATGG": 328,
412
+ "GCCGC": 329,
413
+ "TCATC": 330,
414
+ "AGTGG": 331,
415
+ "ACTTG": 332,
416
+ "TCTCC": 333,
417
+ "TTGGC": 334,
418
+ "▁AGG": 335,
419
+ "GCCGG": 336,
420
+ "AAGCG": 337,
421
+ "▁CGG": 338,
422
+ "ATACG": 339,
423
+ "TTAGC": 340,
424
+ "AGTTC": 341,
425
+ "AGA": 342,
426
+ "TGT": 343,
427
+ "TGACG": 344,
428
+ "ACGCG": 345,
429
+ "GTCCC": 346,
430
+ "TCAGC": 347,
431
+ "AGAAAA": 348,
432
+ "TAAGG": 349,
433
+ "AGTAG": 350,
434
+ "AATCG": 351,
435
+ "ACTAG": 352,
436
+ "TGATC": 353,
437
+ "TTCCG": 354,
438
+ "TTTGG": 355,
439
+ "TTTAC": 356,
440
+ "ACACG": 357,
441
+ "AATTG": 358,
442
+ "TGATG": 359,
443
+ "GTACC": 360,
444
+ "TGA": 361,
445
+ "TGTAC": 362,
446
+ "TCTGG": 363,
447
+ "TGATT": 364,
448
+ "TATTC": 365,
449
+ "▁ACG": 366,
450
+ "ATGGG": 367,
451
+ "AGTCG": 368,
452
+ "▁TAG": 369,
453
+ "TTATC": 370,
454
+ "ACGGG": 371,
455
+ "TCAGG": 372,
456
+ "TGTTC": 373,
457
+ "TCGCG": 374,
458
+ "CCGC": 375,
459
+ "TCTTC": 376,
460
+ "ACTCG": 377,
461
+ "▁GCG": 378,
462
+ "TTTAA": 379,
463
+ "TTAGG": 380,
464
+ "ACAAAA": 381,
465
+ "TGTAG": 382,
466
+ "TAGCG": 383,
467
+ "ACTGCAGAC": 384,
468
+ "AGTTG": 385,
469
+ "GTGGC": 386,
470
+ "GTAAC": 387,
471
+ "CCGG": 388,
472
+ "GTCCG": 389,
473
+ "TTATT": 390,
474
+ "TTGCG": 391,
475
+ "TGAAG": 392,
476
+ "AAGGG": 393,
477
+ "TCATT": 394,
478
+ "▁TTC": 395,
479
+ "TGTCG": 396,
480
+ "TTTAG": 397,
481
+ "▁AAAC": 398,
482
+ "TCATG": 399,
483
+ "▁TTG": 400,
484
+ "TATCG": 401,
485
+ "TTTCG": 402,
486
+ "TAGGG": 403,
487
+ "TTATG": 404,
488
+ "▁TCG": 405,
489
+ "TGTAA": 406,
490
+ "TCTAA": 407,
491
+ "TCGGG": 408,
492
+ "AAAACC": 409,
493
+ "GTAGC": 410,
494
+ "TCTAG": 411,
495
+ "ACCCGC": 412,
496
+ "TGCAGAC": 413,
497
+ "TTAAG": 414,
498
+ "TCAT": 415,
499
+ "TCTCG": 416,
500
+ "TTACG": 417,
501
+ "AAAACG": 418,
502
+ "TGTTG": 419,
503
+ "AACCCC": 420,
504
+ "GTGCG": 421,
505
+ "TCAAG": 422,
506
+ "TGAT": 423,
507
+ "TCACG": 424,
508
+ "TTGGG": 425,
509
+ "GTATC": 426,
510
+ "TCTTG": 427,
511
+ "AGCCCC": 428,
512
+ "AATAT": 429,
513
+ "AAAAGC": 430,
514
+ "▁AAAG": 431,
515
+ "GTAGG": 432,
516
+ "AACCGC": 433,
517
+ "TGCCCC": 434,
518
+ "AGCCGC": 435,
519
+ "TATTG": 436,
520
+ "ACCAC": 437,
521
+ "GTAA": 438,
522
+ "ACCCGG": 439,
523
+ "AAACGC": 440,
524
+ "AACAC": 441,
525
+ "TCA": 442,
526
+ "TCT": 443,
527
+ "TGCCGC": 444,
528
+ "TTAT": 445,
529
+ "GTATT": 446,
530
+ "AGACGC": 447,
531
+ "GTATG": 448,
532
+ "ACCTG": 449,
533
+ "▁TAT": 450,
534
+ "AACCGG": 451,
535
+ "TCACTGCAGAC": 452,
536
+ "AGCCGG": 453,
537
+ "TGCCGG": 454,
538
+ "AGACGG": 455,
539
+ "GTGGG": 456,
540
+ "ACCTC": 457,
541
+ "ATCCCC": 458,
542
+ "AACAAC": 459,
543
+ "▁TGG": 460,
544
+ "GTACG": 461,
545
+ "AAGAC": 462,
546
+ "ACCTT": 463,
547
+ "ATGCGG": 464,
548
+ "ACCGCC": 465,
549
+ "▁AATT": 466,
550
+ "ATAAAC": 467,
551
+ "AAGGCC": 468,
552
+ "▁ACAC": 469,
553
+ "AAGAAC": 470,
554
+ "ACGGCC": 471,
555
+ "ATA": 472,
556
+ "TCGGCC": 473,
557
+ "▁CTT": 474,
558
+ "AAGCGC": 475,
559
+ "TAAAAC": 476,
560
+ "TCGCGG": 477,
561
+ "TTTTTTTTTTTTTTTTTTTT": 478,
562
+ "AAAGAC": 479,
563
+ "ATCCGG": 480,
564
+ "AAAAGG": 481,
565
+ "AGTAT": 482,
566
+ "ATCCGC": 483,
567
+ "AAA": 484,
568
+ "ACGAC": 485,
569
+ "AGCAC": 486,
570
+ "AGGCGC": 487,
571
+ "AAGTT": 488,
572
+ "ATCTAC": 489,
573
+ "AAACCC": 490,
574
+ "TACGCC": 491,
575
+ "▁AACC": 492,
576
+ "ACGA": 493,
577
+ "ATGTC": 494,
578
+ "AACTC": 495,
579
+ "ATGCGC": 496,
580
+ "AAACGG": 497,
581
+ "AACTG": 498,
582
+ "AGGCGG": 499,
583
+ "TCGCGC": 500,
584
+ "AAAAAAAAAAAAAAAAAAAA": 501,
585
+ "AAGTG": 502,
586
+ "AACTT": 503,
587
+ "ATGGCC": 504,
588
+ "AACATC": 505,
589
+ "TTTTTTTTTTTT": 506,
590
+ "ACGCCC": 507,
591
+ "AAGCGG": 508,
592
+ "ACTAT": 509,
593
+ "▁GGG": 510,
594
+ "TCCCGC": 511,
595
+ "▁ACCC": 512,
596
+ "AATAAC": 513,
597
+ "AGGTC": 514,
598
+ "AAAGCC": 515,
599
+ "ACCCCC": 516,
600
+ "ACGCGG": 517,
601
+ "ACGCGC": 518,
602
+ "▁AAAT": 519,
603
+ "TACCCC": 520,
604
+ "TACCGC": 521,
605
+ "GTAT": 522,
606
+ "AGAAAC": 523,
607
+ "GTA": 524,
608
+ "TTCCGC": 525,
609
+ "TTT": 526,
610
+ "AAGCCC": 527,
611
+ "TAAAAG": 528,
612
+ "TCGTG": 529,
613
+ "TTCACTGCAGAC": 530,
614
+ "AGCTG": 531,
615
+ "ATGAC": 532,
616
+ "TAAGCC": 533,
617
+ "AGCTC": 534,
618
+ "ACCAG": 535,
619
+ "ACACGC": 536,
620
+ "TCCCGG": 537,
621
+ "AGCTT": 538,
622
+ "TCGAC": 539,
623
+ "AATTGC": 540,
624
+ "AACACC": 541,
625
+ "ATCTG": 542,
626
+ "TCGCCC": 543,
627
+ "AACAG": 544,
628
+ "TTCCCC": 545,
629
+ "ATGCCC": 546,
630
+ "AAATGC": 547,
631
+ "▁CCCC": 548,
632
+ "TTCCGG": 549,
633
+ "TTGGCC": 550,
634
+ "ACAAAC": 551,
635
+ "ACGGGC": 552,
636
+ "ATCGCC": 553,
637
+ "ATTTG": 554,
638
+ "TCGGGC": 555,
639
+ "TTGCGG": 556,
640
+ "AACGCC": 557,
641
+ "▁AAGC": 558,
642
+ "AGCAG": 559,
643
+ "▁GTC": 560,
644
+ "ACCACC": 561,
645
+ "ATCAC": 562,
646
+ "ACGAG": 563,
647
+ "▁AATC": 564,
648
+ "AAGAAAA": 565,
649
+ "AGCGCC": 566,
650
+ "ATGTG": 567,
651
+ "AAGTCC": 568,
652
+ "ATAACC": 569,
653
+ "▁AATG": 570,
654
+ "ATCTT": 571,
655
+ "ATGGGC": 572,
656
+ "TTGCCC": 573,
657
+ "AAGGGC": 574,
658
+ "▁GCCC": 575,
659
+ "AAGAG": 576,
660
+ "▁CTC": 577,
661
+ "▁AGAC": 578,
662
+ "AGGAC": 579,
663
+ "TCGGGG": 580,
664
+ "TTGCGC": 581,
665
+ "▁GTT": 582,
666
+ "ACACGG": 583,
667
+ "ATCTC": 584,
668
+ "▁AGCC": 585,
669
+ "ATTCGC": 586,
670
+ "AATACC": 587,
671
+ "▁CAC": 588,
672
+ "ATTTC": 589,
673
+ "ACGACC": 590,
674
+ "TTCGCC": 591,
675
+ "ACGAAC": 592,
676
+ "▁CCGC": 593,
677
+ "ATGTT": 594,
678
+ "ATTCGG": 595,
679
+ "AGGTG": 596,
680
+ "TTTTCC": 597,
681
+ "TTGAC": 598,
682
+ "ACGGGG": 599,
683
+ "AAATAC": 600,
684
+ "ATTGCC": 601,
685
+ "GTAAG": 602,
686
+ "TTGTC": 603,
687
+ "TATGCC": 604,
688
+ "AATGCC": 605,
689
+ "AGGCCC": 606,
690
+ "AGAACC": 607,
691
+ "ACCGGC": 608,
692
+ "TTA": 609,
693
+ "AGCTGC": 610,
694
+ "ACGTCC": 611,
695
+ "AACTGC": 612,
696
+ "ACCTCC": 613,
697
+ "ATTTT": 614,
698
+ "TACCGG": 615,
699
+ "AAACCG": 616,
700
+ "AAATCC": 617,
701
+ "TTTTGC": 618,
702
+ "ATCTGC": 619,
703
+ "AAGTGC": 620,
704
+ "TAAAC": 621,
705
+ "AATTCC": 622,
706
+ "▁TAAC": 623,
707
+ "▁GAC": 624,
708
+ "▁CTG": 625,
709
+ "TCGTT": 626,
710
+ "GCGTG": 627,
711
+ "TGGCGG": 628,
712
+ "▁CTGC": 629,
713
+ "ACCGT": 630,
714
+ "ATGGGG": 631,
715
+ "AGACCC": 632,
716
+ "ACGAGC": 633,
717
+ "ATTAC": 634,
718
+ "▁GTCC": 635,
719
+ "TGGCGC": 636,
720
+ "ATACCC": 637,
721
+ "ATAGCC": 638,
722
+ "ATAAAG": 639,
723
+ "AATAAAA": 640,
724
+ "AGGTT": 641,
725
+ "▁CGCC": 642,
726
+ "TCGAG": 643,
727
+ "ACCTGC": 644,
728
+ "ACCAAC": 645,
729
+ "TAGGCC": 646,
730
+ "AAAGGC": 647,
731
+ "TTGTG": 648,
732
+ "▁CATC": 649,
733
+ "▁ATCC": 650,
734
+ "TCCGCC": 651,
735
+ "GTGGCC": 652,
736
+ "AGAAAG": 653,
737
+ "TCGA": 654,
738
+ "▁GGCC": 655,
739
+ "▁CTAA": 656,
740
+ "ATACAC": 657,
741
+ "AGAAGC": 658,
742
+ "ACCTAC": 659,
743
+ "▁ACGC": 660,
744
+ "▁CATT": 661,
745
+ "AGCATC": 662,
746
+ "AACGT": 663,
747
+ "AATTGG": 664,
748
+ "▁AAAAG": 665,
749
+ "AGGGCC": 666,
750
+ "▁ATAC": 667,
751
+ "ATGTCC": 668,
752
+ "▁TTTT": 669,
753
+ "ACCCCG": 670,
754
+ "AATAGC": 671,
755
+ "ATAAGC": 672,
756
+ "TTCTC": 673,
757
+ "ACAACC": 674,
758
+ "ACGAGG": 675,
759
+ "ACCTGG": 676,
760
+ "TAACGC": 677,
761
+ "AAGCCG": 678,
762
+ "▁CAAC": 679,
763
+ "TTGGGC": 680,
764
+ "ACGCCG": 681,
765
+ "AATCCC": 682,
766
+ "AATCGC": 683,
767
+ "▁AGTC": 684,
768
+ "TTCTG": 685,
769
+ "CCCTC": 686,
770
+ "ACCATC": 687,
771
+ "AGCAAC": 688,
772
+ "TTTAT": 689,
773
+ "▁ATGC": 690,
774
+ "GTCCGG": 691,
775
+ "ACCA": 692,
776
+ "CCCTG": 693,
777
+ "▁AAGG": 694,
778
+ "ATGAAC": 695,
779
+ "CCCAC": 696,
780
+ "TTAAAA": 697,
781
+ "GTGCGG": 698,
782
+ "ATAGAC": 699,
783
+ "AGGCCG": 700,
784
+ "▁CTAC": 701,
785
+ "TCCTG": 702,
786
+ "AGAGCC": 703,
787
+ "TAATC": 704,
788
+ "GTCCCC": 705,
789
+ "TGCTG": 706,
790
+ "AGCTAC": 707,
791
+ "AGCGGC": 708,
792
+ "▁GCGC": 709,
793
+ "AAGGGG": 710,
794
+ "▁AAAAC": 711,
795
+ "AACCCG": 712,
796
+ "ATACGC": 713,
797
+ "AGTGCC": 714,
798
+ "GCGTT": 715,
799
+ "TATCGC": 716,
800
+ "▁TACC": 717,
801
+ "TTGTT": 718,
802
+ "AAAAAAC": 719,
803
+ "GTCCGC": 720,
804
+ "TGCTC": 721,
805
+ "AAGACC": 722,
806
+ "AGCGT": 723,
807
+ "TCCTC": 724,
808
+ "ATGACC": 725,
809
+ "▁TTGC": 726,
810
+ "ACATCC": 727,
811
+ "AGGA": 728,
812
+ "TTCGGC": 729,
813
+ "ATCAG": 730,
814
+ "TTGGGG": 731,
815
+ "TGCGCC": 732,
816
+ "▁ACCG": 733,
817
+ "ATATCC": 734,
818
+ "▁TATT": 735,
819
+ "AAATGG": 736,
820
+ "TAGCGG": 737,
821
+ "▁AGTT": 738,
822
+ "TGAAAA": 739,
823
+ "TTCTT": 740,
824
+ "GGGTC": 741,
825
+ "AAGAGC": 742,
826
+ "GCGAC": 743,
827
+ "AACTAC": 744,
828
+ "TAGCGC": 745,
829
+ "ATGCCG": 746,
830
+ "AATAAG": 747,
831
+ "▁TGCC": 748,
832
+ "GGGA": 749,
833
+ "AACATT": 750,
834
+ "CCCGCC": 751,
835
+ "▁ACTG": 752,
836
+ "▁GTGC": 753,
837
+ "ACGATT": 754,
838
+ "TGGTC": 755,
839
+ "TCCAC": 756,
840
+ "ATGTGC": 757,
841
+ "AAAGCG": 758,
842
+ "AACGGC": 759,
843
+ "TTTTGG": 760,
844
+ "ACCATT": 761,
845
+ "AGGAG": 762,
846
+ "ATGAG": 763,
847
+ "ATAATC": 764,
848
+ "▁TCCC": 765,
849
+ "▁CCGG": 766,
850
+ "▁CAG": 767,
851
+ "ACGATC": 768,
852
+ "▁TAAAA": 769,
853
+ "GCGCCC": 770,
854
+ "AATATT": 771,
855
+ "AGCTCC": 772,
856
+ "TACAC": 773,
857
+ "AAGAAG": 774,
858
+ "▁AGGC": 775,
859
+ "GCGA": 776,
860
+ "▁TCGC": 777,
861
+ "ATCCCG": 778,
862
+ "▁CTCC": 779,
863
+ "ATACGG": 780,
864
+ "ACCTAA": 781,
865
+ "GGCTC": 782,
866
+ "▁ATTC": 783,
867
+ "TAACGG": 784,
868
+ "ATAATG": 785,
869
+ "AATATC": 786,
870
+ "AACTAA": 787,
871
+ "▁CGT": 788,
872
+ "ATCGT": 789,
873
+ "AAGTGG": 790,
874
+ "▁ATAA": 791,
875
+ "TCGACC": 792,
876
+ "TAGTC": 793,
877
+ "AACAGC": 794,
878
+ "▁GTG": 795,
879
+ "TTCAC": 796,
880
+ "ATGTGG": 797,
881
+ "ATGTAC": 798,
882
+ "TTCTGC": 799,
883
+ "AAATCG": 800,
884
+ "AATAGG": 801,
885
+ "TTGACC": 802,
886
+ "AATCGG": 803,
887
+ "ACTGCC": 804,
888
+ "ATAAGG": 805,
889
+ "▁CCCG": 806,
890
+ "▁CAAAA": 807,
891
+ "ATATGC": 808,
892
+ "CCGTC": 809,
893
+ "GGCTG": 810,
894
+ "ACAGCC": 811,
895
+ "AGACCG": 812,
896
+ "TTTGCC": 813,
897
+ "▁ACGG": 814,
898
+ "ATTTCC": 815,
899
+ "ATCTCC": 816,
900
+ "ATACCG": 817,
901
+ "TCGAGG": 818,
902
+ "TCGAAC": 819,
903
+ "▁GCGG": 820,
904
+ "AGCACC": 821,
905
+ "GTGCGC": 822,
906
+ "ATAACG": 823,
907
+ "▁AGTG": 824,
908
+ "TAACCC": 825,
909
+ "TTAACC": 826,
910
+ "ACACCC": 827,
911
+ "AAATTC": 828,
912
+ "▁GAAAA": 829,
913
+ "ATCGGC": 830,
914
+ "TGTAT": 831,
915
+ "AAATTG": 832,
916
+ "AGTCCC": 833,
917
+ "AACTCC": 834,
918
+ "AGCATT": 835,
919
+ "TGACGG": 836,
920
+ "AATGGC": 837,
921
+ "AGTACC": 838,
922
+ "▁ATTG": 839,
923
+ "GTAAAA": 840,
924
+ "AAGATT": 841,
925
+ "AATATG": 842,
926
+ "AGAAGG": 843,
927
+ "AAGATG": 844,
928
+ "▁TAGC": 845,
929
+ "ATAATT": 846,
930
+ "▁CTTC": 847,
931
+ "AAGAGG": 848,
932
+ "▁CTAG": 849,
933
+ "AGCCCG": 850,
934
+ "TACTC": 851,
935
+ "AGCTGG": 852,
936
+ "▁GACC": 853,
937
+ "▁GCCG": 854,
938
+ "TGACGC": 855,
939
+ "ATCTGG": 856,
940
+ "▁GATT": 857,
941
+ "ACAGAC": 858,
942
+ "▁AACG": 859,
943
+ "ACGTTC": 860,
944
+ "AGCTAA": 861,
945
+ "AGTCGC": 862,
946
+ "AGGTGC": 863,
947
+ "TATTGC": 864,
948
+ "AAGATC": 865,
949
+ "GGGAC": 866,
950
+ "TGACCC": 867,
951
+ "AATCCG": 868,
952
+ "TATCGG": 869,
953
+ "CCCTT": 870,
954
+ "▁TAAG": 871,
955
+ "▁GAAC": 872,
956
+ "AGTCGG": 873,
957
+ "GGCGCC": 874,
958
+ "TCGAGC": 875,
959
+ "▁TGAC": 876,
960
+ "▁AGAA": 877,
961
+ "AGAACG": 878,
962
+ "GGCAC": 879,
963
+ "AGAGAC": 880,
964
+ "TTCTCC": 881,
965
+ "TGAACC": 882,
966
+ "AGTTGC": 883,
967
+ "TGCAC": 884,
968
+ "▁CACC": 885,
969
+ "▁TCAC": 886,
970
+ "GGCCCC": 887,
971
+ "GCCTC": 888,
972
+ "AACAAAA": 889,
973
+ "AGCAGG": 890,
974
+ "GTGGGC": 891,
975
+ "▁TTTC": 892,
976
+ "ACGATG": 893,
977
+ "TCGATC": 894,
978
+ "TATTCC": 895,
979
+ "ATTCCC": 896,
980
+ "AGTAAC": 897,
981
+ "TAGGGC": 898,
982
+ "ATAGTC": 899,
983
+ "AGAGGC": 900,
984
+ "TTCGT": 901,
985
+ "TGCTT": 902,
986
+ "▁CGGC": 903,
987
+ "ATATAC": 904,
988
+ "GGCCGC": 905,
989
+ "ATAGGC": 906,
990
+ "TCTAT": 907,
991
+ "AGCAGC": 908,
992
+ "TGAGCC": 909,
993
+ "AACAGG": 910,
994
+ "GTGGGG": 911,
995
+ "TAGCCC": 912,
996
+ "GGCCGG": 913,
997
+ "TGTGCC": 914,
998
+ "AATTTT": 915,
999
+ "TCAAAA": 916,
1000
+ "TACCCG": 917,
1001
+ "AGATGC": 918,
1002
+ "ACAAAG": 919,
1003
+ "ATGAGG": 920,
1004
+ "AATACG": 921,
1005
+ "▁AGAG": 922,
1006
+ "TTGACG": 923,
1007
+ "TTGAGG": 924,
1008
+ "ACCTAG": 925,
1009
+ "TGCCCG": 926,
1010
+ "TTAAAC": 927,
1011
+ "ACAAGC": 928,
1012
+ "ACGTGC": 929,
1013
+ "TTCTGG": 930,
1014
+ "ATGT": 931,
1015
+ "ACCAGG": 932,
1016
+ "TTGAAC": 933,
1017
+ "ATCTAA": 934,
1018
+ "▁TATG": 935,
1019
+ "ACCAGC": 936,
1020
+ "ATTTGC": 937,
1021
+ "ATTTAC": 938,
1022
+ "AAAGAAAA": 939,
1023
+ "TCTACG": 940,
1024
+ "▁TATC": 941,
1025
+ "AGTAGC": 942,
1026
+ "AGTTCC": 943,
1027
+ "GGCTT": 944,
1028
+ "▁ATCG": 945,
1029
+ "GGGCCC": 946,
1030
+ "ACAATC": 947,
1031
+ "ACCT": 948,
1032
+ "ATCAAC": 949,
1033
+ "▁TGTC": 950,
1034
+ "AGAATC": 951,
1035
+ "ATCA": 952,
1036
+ "TGTCCC": 953,
1037
+ "ACCTTC": 954,
1038
+ "ACGTCG": 955,
1039
+ "ATCT": 956,
1040
+ "AAGGCG": 957,
1041
+ "TGAAGC": 958,
1042
+ "AGGTCC": 959,
1043
+ "AGGTAC": 960,
1044
+ "TCACTG": 961,
1045
+ "▁AGCG": 962,
1046
+ "AATGCG": 963,
1047
+ "TGGCCC": 964,
1048
+ "AACATG": 965,
1049
+ "AAGACG": 966,
1050
+ "TTTACC": 967,
1051
+ "TTGTCC": 968,
1052
+ "TAAAAAAAA": 969,
1053
+ "ACTACC": 970,
1054
+ "AACTAG": 971,
1055
+ "CCGTG": 972,
1056
+ "TTAGCC": 973,
1057
+ "TTACCC": 974,
1058
+ "AAGA": 975,
1059
+ "ATGTAA": 976,
1060
+ "TGGGCC": 977,
1061
+ "AGGT": 978,
1062
+ "GGGTG": 979,
1063
+ "ATAAAT": 980,
1064
+ "ATAGTG": 981,
1065
+ "ACATCG": 982,
1066
+ "ATGA": 983,
1067
+ "AGTTGG": 984,
1068
+ "ATGGCG": 985,
1069
+ "AGTCCG": 986,
1070
+ "AGCTAG": 987,
1071
+ "AGATGG": 988,
1072
+ "ATATGG": 989,
1073
+ "ATCATC": 990,
1074
+ "TGAAAC": 991,
1075
+ "▁TCGG": 992,
1076
+ "TCCCCC": 993,
1077
+ "TTTCACTGCAGAC": 994,
1078
+ "AGAATG": 995,
1079
+ "TTCACTG": 996,
1080
+ "AGGGGC": 997,
1081
+ "TATCCC": 998,
1082
+ "TTTCCC": 999,
1083
+ "ATTGGC": 1000,
1084
+ "ATTTGG": 1001,
1085
+ "CCGCCC": 1002,
1086
+ "ATTTAA": 1003,
1087
+ "TTCTAC": 1004,
1088
+ "TCGTCC": 1005,
1089
+ "ATTGT": 1006,
1090
+ "AACTGG": 1007,
1091
+ "ACGTGG": 1008,
1092
+ "ATATCG": 1009,
1093
+ "ACCATG": 1010,
1094
+ "AACA": 1011,
1095
+ "ATCGCG": 1012,
1096
+ "ACCGCG": 1013,
1097
+ "▁TTGG": 1014,
1098
+ "▁CAGC": 1015,
1099
+ "TAGGGG": 1016,
1100
+ "▁TTCC": 1017,
1101
+ "ACAAGG": 1018,
1102
+ "TGTCGC": 1019,
1103
+ "ACATGC": 1020,
1104
+ "AAGTTC": 1021,
1105
+ "ATGTAG": 1022,
1106
+ "▁GGGC": 1023
1107
+ },
1108
+ "merges": [
1109
+ "A A",
1110
+ "C C",
1111
+ "T T",
1112
+ "G G",
1113
+ "A C",
1114
+ "G C",
1115
+ "T C",
1116
+ "A G",
1117
+ "T G",
1118
+ "A T",
1119
+ "AA AA",
1120
+ "A CC",
1121
+ "A GG",
1122
+ "G T",
1123
+ "AA C",
1124
+ "A TT",
1125
+ "A GC",
1126
+ "A TC",
1127
+ "CC C",
1128
+ "AC G",
1129
+ "GG C",
1130
+ "A TG",
1131
+ "T GC",
1132
+ "T CC",
1133
+ "AA G",
1134
+ "TT C",
1135
+ "T GG",
1136
+ "G CC",
1137
+ "▁ C",
1138
+ "T AC",
1139
+ "T AA",
1140
+ "TT TT",
1141
+ "TC G",
1142
+ "T AG",
1143
+ "AAAA AAAA",
1144
+ "GC G",
1145
+ "TT G",
1146
+ "CC G",
1147
+ "ACC C",
1148
+ "GG G",
1149
+ "T AT",
1150
+ "▁ G",
1151
+ "AA AC",
1152
+ "AA CC",
1153
+ "AG AC",
1154
+ "AGG C",
1155
+ "AG CC",
1156
+ "TG CC",
1157
+ "AA GC",
1158
+ "AC AC",
1159
+ "AT CC",
1160
+ "ATT C",
1161
+ "T ACC",
1162
+ "AT GC",
1163
+ "AA AG",
1164
+ "T CCC",
1165
+ "AC GG",
1166
+ "AC GC",
1167
+ "AA GG",
1168
+ "TC GC",
1169
+ "TC GG",
1170
+ "T AAC",
1171
+ "AT GG",
1172
+ "ACC G",
1173
+ "AA TT",
1174
+ "AA TC",
1175
+ "TT CC",
1176
+ "AGC G",
1177
+ "▁ CC",
1178
+ "ATC G",
1179
+ "T GGC",
1180
+ "AT AC",
1181
+ "TT GC",
1182
+ "AA TG",
1183
+ "AG TC",
1184
+ "GT CC",
1185
+ "AAC G",
1186
+ "TTTT TTTT",
1187
+ "TT GG",
1188
+ "AC TG",
1189
+ "T AGG",
1190
+ "ATT G",
1191
+ "AG TG",
1192
+ "T AGC",
1193
+ "CCC G",
1194
+ "AG TT",
1195
+ "GT GC",
1196
+ "▁ GC",
1197
+ "T ACG",
1198
+ "TC AC",
1199
+ "GT GG",
1200
+ "▁ AC",
1201
+ "AAAA C",
1202
+ "AA AT",
1203
+ "T ATT",
1204
+ "T ATC",
1205
+ "TG AC",
1206
+ "TCC G",
1207
+ "TGC G",
1208
+ "TTC G",
1209
+ "GG CC",
1210
+ "AG AG",
1211
+ "T ATG",
1212
+ "TG TC",
1213
+ "T AAG",
1214
+ "AGG G",
1215
+ "GGC G",
1216
+ "TT TC",
1217
+ "AC TC",
1218
+ "AG AA",
1219
+ "AT AA",
1220
+ "▁ TC",
1221
+ "AC TT",
1222
+ "▁ AA",
1223
+ "▁ GG",
1224
+ "GC GG",
1225
+ "TG TG",
1226
+ "AT AG",
1227
+ "▁ AAAAAAAA",
1228
+ "▁C G",
1229
+ "▁ TT",
1230
+ "GCC G",
1231
+ "T AAAA",
1232
+ "TG TT",
1233
+ "AC AA",
1234
+ "TC TC",
1235
+ "GC GC",
1236
+ "▁ AG",
1237
+ "AC AG",
1238
+ "TGG G",
1239
+ "AA GT",
1240
+ "ACC CC",
1241
+ "ACG CC",
1242
+ "TC TT",
1243
+ "TC TG",
1244
+ "TT TG",
1245
+ "AC GT",
1246
+ "AAAA G",
1247
+ "AAG CC",
1248
+ "ATG CC",
1249
+ "GG GG",
1250
+ "▁ TG",
1251
+ "AA ACC",
1252
+ "AGG CC",
1253
+ "TT AC",
1254
+ "GG GC",
1255
+ "AC TGC",
1256
+ "AT AT",
1257
+ "ACC GC",
1258
+ "TG AG",
1259
+ "GT AC",
1260
+ "T A",
1261
+ "TC AG",
1262
+ "ACCC G",
1263
+ "▁ AT",
1264
+ "AA CCC",
1265
+ "ACC GG",
1266
+ "AA AGC",
1267
+ "TT AG",
1268
+ "ATT CC",
1269
+ "▁ AAAA",
1270
+ "AAC GC",
1271
+ "TAA CC",
1272
+ "GT AG",
1273
+ "AAAAAAAA AAAAAAAA",
1274
+ "AC AT",
1275
+ "TCC CC",
1276
+ "TG AA",
1277
+ "ATC GC",
1278
+ "TC AA",
1279
+ "TGG CC",
1280
+ "TC GT",
1281
+ "AAC GG",
1282
+ "TAG CC",
1283
+ "T ACCC",
1284
+ "ATT GC",
1285
+ "AGC GG",
1286
+ "AGC GC",
1287
+ "ATC GG",
1288
+ "CCC GC",
1289
+ "▁ GT",
1290
+ "TT AA",
1291
+ "▁ AAC",
1292
+ "ACG TC",
1293
+ "▁ GCC",
1294
+ "TGC GG",
1295
+ "AG AT",
1296
+ "▁ ACC",
1297
+ "TC GCC",
1298
+ "CCC GG",
1299
+ "AC ATC",
1300
+ "TCC GC",
1301
+ "AGGC G",
1302
+ "TT GCC",
1303
+ "AT ACC",
1304
+ "AA TAA",
1305
+ "AT AAC",
1306
+ "AGG GC",
1307
+ "AG ACC",
1308
+ "▁ ATT",
1309
+ "AC ACC",
1310
+ "AC AAC",
1311
+ "TTTTTTTT TTTTTTTT",
1312
+ "TT GT",
1313
+ "TCC GG",
1314
+ "AA ATC",
1315
+ "TTC GC",
1316
+ "TTC GG",
1317
+ "TAC GC",
1318
+ "TG CCC",
1319
+ "AT CCC",
1320
+ "▁ CCC",
1321
+ "TGC GC",
1322
+ "AG AAC",
1323
+ "AG CCC",
1324
+ "TAC GG",
1325
+ "AA ATG",
1326
+ "AA AGG",
1327
+ "▁AAAAAAAA AAAAAAAA",
1328
+ "TAT CC",
1329
+ "AA ATT",
1330
+ "GGC GG",
1331
+ "▁ AAG",
1332
+ "▁ TAA",
1333
+ "ATTC G",
1334
+ "TG ACC",
1335
+ "GGC GC",
1336
+ "ATT GG",
1337
+ "AA TAC",
1338
+ "AA TGC",
1339
+ "AA T",
1340
+ "▁ ATC",
1341
+ "AGG GG",
1342
+ "▁ AGC",
1343
+ "AT AGC",
1344
+ "AC AGC",
1345
+ "TCCC G",
1346
+ "ACG TT",
1347
+ "GCC CC",
1348
+ "▁AAAAAAAA AAAA",
1349
+ "AA TCC",
1350
+ "AA CCG",
1351
+ "TAA GC",
1352
+ "ACG TG",
1353
+ "▁C GC",
1354
+ "AC T",
1355
+ "TC TGC",
1356
+ "AC ATT",
1357
+ "AA GGC",
1358
+ "AT ATC",
1359
+ "AG AGC",
1360
+ "TGG GC",
1361
+ "AAAAAAAA AAAA",
1362
+ "AT GGC",
1363
+ "TAT GC",
1364
+ "AG TCC",
1365
+ "TG AAC",
1366
+ "AT CCG",
1367
+ "TACC G",
1368
+ "TT TTC",
1369
+ "AC GGC",
1370
+ "TCG TC",
1371
+ "T AGGC",
1372
+ "TT ACC",
1373
+ "GCG TC",
1374
+ "AC TCC",
1375
+ "AC ATG",
1376
+ "AG TGC",
1377
+ "TG TCC",
1378
+ "▁ ATG",
1379
+ "AAG TC",
1380
+ "TGG GG",
1381
+ "AG CCG",
1382
+ "AA TAG",
1383
+ "AT AGG",
1384
+ "AC TAA",
1385
+ "TT AAC",
1386
+ "TG CCG",
1387
+ "TT CCC",
1388
+ "TC ACC",
1389
+ "AC AGG",
1390
+ "AA TTC",
1391
+ "AA ACG",
1392
+ "AT AAG",
1393
+ "AC TAC",
1394
+ "AG T",
1395
+ "TG AGC",
1396
+ "TAAC G",
1397
+ "AT ATG",
1398
+ "AG ATG",
1399
+ "AT ATT",
1400
+ "AG ATC",
1401
+ "TG TGC",
1402
+ "TT TCC",
1403
+ "AG AGG",
1404
+ "AT GCG",
1405
+ "▁ TAC",
1406
+ "▁ TCC",
1407
+ "▁ TGC",
1408
+ "AG TAC",
1409
+ "TTTT G",
1410
+ "TGGC G",
1411
+ "AG TAA",
1412
+ "AG ATT",
1413
+ "AG ACG",
1414
+ "AC TGG",
1415
+ "AC AAG",
1416
+ "AC TTC",
1417
+ "CC CC",
1418
+ "▁ GGC",
1419
+ "TC AAC",
1420
+ "TC GGC",
1421
+ "▁ CCG",
1422
+ "TT TGC",
1423
+ "AT AAAA",
1424
+ "AG AAG",
1425
+ "TG TGG",
1426
+ "TG AGG",
1427
+ "TAT GG",
1428
+ "AC A",
1429
+ "TC TAC",
1430
+ "AA TGG",
1431
+ "GCC GC",
1432
+ "TC ATC",
1433
+ "AG TGG",
1434
+ "AC TTG",
1435
+ "TC TCC",
1436
+ "TT GGC",
1437
+ "▁ AGG",
1438
+ "GCC GG",
1439
+ "AA GCG",
1440
+ "▁C GG",
1441
+ "AT ACG",
1442
+ "TT AGC",
1443
+ "AG TTC",
1444
+ "AG A",
1445
+ "TG T",
1446
+ "TG ACG",
1447
+ "AC GCG",
1448
+ "GT CCC",
1449
+ "TC AGC",
1450
+ "AG AAAA",
1451
+ "TAA GG",
1452
+ "AG TAG",
1453
+ "AA TCG",
1454
+ "AC TAG",
1455
+ "TG ATC",
1456
+ "TT CCG",
1457
+ "TT TGG",
1458
+ "TT TAC",
1459
+ "AC ACG",
1460
+ "AA TTG",
1461
+ "TG ATG",
1462
+ "GT ACC",
1463
+ "TG A",
1464
+ "TG TAC",
1465
+ "TC TGG",
1466
+ "TG ATT",
1467
+ "T ATTC",
1468
+ "▁ ACG",
1469
+ "AT GGG",
1470
+ "AG TCG",
1471
+ "▁ TAG",
1472
+ "TT ATC",
1473
+ "AC GGG",
1474
+ "TC AGG",
1475
+ "TG TTC",
1476
+ "TC GCG",
1477
+ "CC GC",
1478
+ "TC TTC",
1479
+ "AC TCG",
1480
+ "▁ GCG",
1481
+ "TT TAA",
1482
+ "TT AGG",
1483
+ "AC AAAA",
1484
+ "TG TAG",
1485
+ "T AGCG",
1486
+ "ACTGC AGAC",
1487
+ "AG TTG",
1488
+ "GT GGC",
1489
+ "GT AAC",
1490
+ "CC GG",
1491
+ "GT CCG",
1492
+ "TT ATT",
1493
+ "TT GCG",
1494
+ "TG AAG",
1495
+ "AA GGG",
1496
+ "TC ATT",
1497
+ "▁ TTC",
1498
+ "TG TCG",
1499
+ "TT TAG",
1500
+ "▁ AAAC",
1501
+ "TC ATG",
1502
+ "▁ TTG",
1503
+ "T ATCG",
1504
+ "TT TCG",
1505
+ "TAGG G",
1506
+ "TT ATG",
1507
+ "▁ TCG",
1508
+ "TG TAA",
1509
+ "TC TAA",
1510
+ "TC GGG",
1511
+ "AAAA CC",
1512
+ "GT AGC",
1513
+ "TC TAG",
1514
+ "ACCC GC",
1515
+ "TGC AGAC",
1516
+ "TT AAG",
1517
+ "TC AT",
1518
+ "TC TCG",
1519
+ "TT ACG",
1520
+ "AAAAC G",
1521
+ "TG TTG",
1522
+ "AACC CC",
1523
+ "GT GCG",
1524
+ "TC AAG",
1525
+ "TG AT",
1526
+ "TC ACG",
1527
+ "TT GGG",
1528
+ "GT ATC",
1529
+ "TC TTG",
1530
+ "AGCC CC",
1531
+ "AA TAT",
1532
+ "AAAA GC",
1533
+ "▁ AAAG",
1534
+ "GT AGG",
1535
+ "AACC GC",
1536
+ "TGCC CC",
1537
+ "AGCC GC",
1538
+ "T ATTG",
1539
+ "ACC AC",
1540
+ "GT AA",
1541
+ "ACCC GG",
1542
+ "AAAC GC",
1543
+ "AAC AC",
1544
+ "TC A",
1545
+ "TC T",
1546
+ "TGCC GC",
1547
+ "TT AT",
1548
+ "GT ATT",
1549
+ "AGAC GC",
1550
+ "GT ATG",
1551
+ "ACC TG",
1552
+ "▁ TAT",
1553
+ "AACC GG",
1554
+ "TCAC TGCAGAC",
1555
+ "AGCC GG",
1556
+ "TGCC GG",
1557
+ "AGAC GG",
1558
+ "GT GGG",
1559
+ "ACC TC",
1560
+ "ATCC CC",
1561
+ "AAC AAC",
1562
+ "▁ TGG",
1563
+ "GT ACG",
1564
+ "AAG AC",
1565
+ "ACC TT",
1566
+ "ATGC GG",
1567
+ "ACC GCC",
1568
+ "▁ AATT",
1569
+ "AT AAAC",
1570
+ "AAGG CC",
1571
+ "▁ ACAC",
1572
+ "AAG AAC",
1573
+ "ACGG CC",
1574
+ "AT A",
1575
+ "TCGG CC",
1576
+ "▁C TT",
1577
+ "AAGC GC",
1578
+ "T AAAAC",
1579
+ "TCGC GG",
1580
+ "TTTTTTTTTTTTTTTT TTTT",
1581
+ "AA AGAC",
1582
+ "ATCC GG",
1583
+ "AAAA GG",
1584
+ "AG TAT",
1585
+ "ATCC GC",
1586
+ "AA A",
1587
+ "ACG AC",
1588
+ "AGC AC",
1589
+ "AGGC GC",
1590
+ "AAG TT",
1591
+ "ATC TAC",
1592
+ "AA ACCC",
1593
+ "TACG CC",
1594
+ "▁ AACC",
1595
+ "ACG A",
1596
+ "ATG TC",
1597
+ "AAC TC",
1598
+ "ATGC GC",
1599
+ "AAAC GG",
1600
+ "AAC TG",
1601
+ "AGGC GG",
1602
+ "TCGC GC",
1603
+ "AAAAAAAAAAAAAAAA AAAA",
1604
+ "AAG TG",
1605
+ "AAC TT",
1606
+ "ATGG CC",
1607
+ "AAC ATC",
1608
+ "TTTTTTTT TTTT",
1609
+ "ACG CCC",
1610
+ "AAGC GG",
1611
+ "AC TAT",
1612
+ "▁ GGG",
1613
+ "TCCC GC",
1614
+ "▁ ACCC",
1615
+ "AA TAAC",
1616
+ "AGG TC",
1617
+ "AA AGCC",
1618
+ "ACC CCC",
1619
+ "ACGC GG",
1620
+ "ACGC GC",
1621
+ "▁ AAAT",
1622
+ "TACC CC",
1623
+ "TACC GC",
1624
+ "GT AT",
1625
+ "AG AAAC",
1626
+ "GT A",
1627
+ "TTCC GC",
1628
+ "TT T",
1629
+ "AAG CCC",
1630
+ "TAAAA G",
1631
+ "TCG TG",
1632
+ "TTC ACTGCAGAC",
1633
+ "AGC TG",
1634
+ "ATG AC",
1635
+ "TAAG CC",
1636
+ "AGC TC",
1637
+ "ACC AG",
1638
+ "ACAC GC",
1639
+ "TCCC GG",
1640
+ "AGC TT",
1641
+ "TCG AC",
1642
+ "AATT GC",
1643
+ "AAC ACC",
1644
+ "ATC TG",
1645
+ "TCG CCC",
1646
+ "AAC AG",
1647
+ "TTCC CC",
1648
+ "ATG CCC",
1649
+ "AA ATGC",
1650
+ "▁CC CC",
1651
+ "TTCC GG",
1652
+ "TTGG CC",
1653
+ "AC AAAC",
1654
+ "ACGG GC",
1655
+ "ATC GCC",
1656
+ "ATT TG",
1657
+ "TCGG GC",
1658
+ "TTGC GG",
1659
+ "AAC GCC",
1660
+ "▁ AAGC",
1661
+ "AGC AG",
1662
+ "▁G TC",
1663
+ "ACC ACC",
1664
+ "ATC AC",
1665
+ "ACG AG",
1666
+ "▁ AATC",
1667
+ "AAG AAAA",
1668
+ "AGC GCC",
1669
+ "ATG TG",
1670
+ "AA GTCC",
1671
+ "AT AACC",
1672
+ "▁ AATG",
1673
+ "ATC TT",
1674
+ "ATGG GC",
1675
+ "TTG CCC",
1676
+ "AAGG GC",
1677
+ "▁G CCC",
1678
+ "AAG AG",
1679
+ "▁C TC",
1680
+ "▁ AGAC",
1681
+ "AGG AC",
1682
+ "TCGG GG",
1683
+ "TTGC GC",
1684
+ "▁G TT",
1685
+ "ACAC GG",
1686
+ "ATC TC",
1687
+ "▁ AGCC",
1688
+ "ATTC GC",
1689
+ "AA TACC",
1690
+ "▁C AC",
1691
+ "ATT TC",
1692
+ "ACG ACC",
1693
+ "TTC GCC",
1694
+ "ACG AAC",
1695
+ "▁CC GC",
1696
+ "ATG TT",
1697
+ "ATTC GG",
1698
+ "AGG TG",
1699
+ "TTTT CC",
1700
+ "TTG AC",
1701
+ "ACGG GG",
1702
+ "AA ATAC",
1703
+ "ATT GCC",
1704
+ "GT AAG",
1705
+ "TTG TC",
1706
+ "TATG CC",
1707
+ "AA TGCC",
1708
+ "AGG CCC",
1709
+ "AG AACC",
1710
+ "ACC GGC",
1711
+ "TT A",
1712
+ "AGC TGC",
1713
+ "AC GTCC",
1714
+ "AAC TGC",
1715
+ "ACC TCC",
1716
+ "ATT TT",
1717
+ "TACC GG",
1718
+ "AA ACCG",
1719
+ "AA ATCC",
1720
+ "TTTT GC",
1721
+ "ATC TGC",
1722
+ "AA GTGC",
1723
+ "TAA AC",
1724
+ "AATT CC",
1725
+ "▁ TAAC",
1726
+ "▁G AC",
1727
+ "▁C TG",
1728
+ "TCG TT",
1729
+ "GCG TG",
1730
+ "TGGC GG",
1731
+ "▁C TGC",
1732
+ "ACC GT",
1733
+ "ATGG GG",
1734
+ "AG ACCC",
1735
+ "ACG AGC",
1736
+ "ATT AC",
1737
+ "▁ GTCC",
1738
+ "TGGC GC",
1739
+ "AT ACCC",
1740
+ "AT AGCC",
1741
+ "AT AAAG",
1742
+ "AA TAAAA",
1743
+ "AGG TT",
1744
+ "▁C GCC",
1745
+ "TCG AG",
1746
+ "ACC TGC",
1747
+ "ACC AAC",
1748
+ "TAGG CC",
1749
+ "AA AGGC",
1750
+ "TTG TG",
1751
+ "▁C ATC",
1752
+ "▁ ATCC",
1753
+ "TCC GCC",
1754
+ "GTGG CC",
1755
+ "AG AAAG",
1756
+ "TCG A",
1757
+ "▁ GGCC",
1758
+ "▁C TAA",
1759
+ "AT ACAC",
1760
+ "AG AAGC",
1761
+ "ACC TAC",
1762
+ "▁ ACGC",
1763
+ "▁C ATT",
1764
+ "AGC ATC",
1765
+ "AAC GT",
1766
+ "AATT GG",
1767
+ "▁ AAAAG",
1768
+ "AGG GCC",
1769
+ "▁ ATAC",
1770
+ "ATG TCC",
1771
+ "▁ TTTT",
1772
+ "ACC CCG",
1773
+ "AA TAGC",
1774
+ "AT AAGC",
1775
+ "TTC TC",
1776
+ "AC AACC",
1777
+ "ACG AGG",
1778
+ "ACC TGG",
1779
+ "TAAC GC",
1780
+ "AAG CCG",
1781
+ "▁C AAC",
1782
+ "TTGG GC",
1783
+ "ACG CCG",
1784
+ "AA TCCC",
1785
+ "AA TCGC",
1786
+ "▁ AGTC",
1787
+ "TTC TG",
1788
+ "CCC TC",
1789
+ "ACC ATC",
1790
+ "AGC AAC",
1791
+ "TT TAT",
1792
+ "▁ ATGC",
1793
+ "GTCC GG",
1794
+ "ACC A",
1795
+ "CCC TG",
1796
+ "▁ AAGG",
1797
+ "ATG AAC",
1798
+ "CCC AC",
1799
+ "TT AAAA",
1800
+ "GTGC GG",
1801
+ "AT AGAC",
1802
+ "AGG CCG",
1803
+ "▁C TAC",
1804
+ "TCC TG",
1805
+ "AG AGCC",
1806
+ "TAA TC",
1807
+ "GTCC CC",
1808
+ "TGC TG",
1809
+ "AGC TAC",
1810
+ "AGC GGC",
1811
+ "▁GC GC",
1812
+ "AAGG GG",
1813
+ "▁ AAAAC",
1814
+ "AA CCCG",
1815
+ "AT ACGC",
1816
+ "AG TGCC",
1817
+ "GCG TT",
1818
+ "TATC GC",
1819
+ "▁ TACC",
1820
+ "TTG TT",
1821
+ "AAAA AAC",
1822
+ "GTCC GC",
1823
+ "TGC TC",
1824
+ "AAG ACC",
1825
+ "AGC GT",
1826
+ "TCC TC",
1827
+ "ATG ACC",
1828
+ "▁ TTGC",
1829
+ "AC ATCC",
1830
+ "AGG A",
1831
+ "TTC GGC",
1832
+ "ATC AG",
1833
+ "TTGG GG",
1834
+ "TGC GCC",
1835
+ "▁ ACCG",
1836
+ "AT ATCC",
1837
+ "▁ TATT",
1838
+ "AA ATGG",
1839
+ "TAGC GG",
1840
+ "▁ AGTT",
1841
+ "TG AAAA",
1842
+ "TTC TT",
1843
+ "GGG TC",
1844
+ "AAG AGC",
1845
+ "GCG AC",
1846
+ "AAC TAC",
1847
+ "TAGC GC",
1848
+ "ATG CCG",
1849
+ "AA TAAG",
1850
+ "▁ TGCC",
1851
+ "GGG A",
1852
+ "AAC ATT",
1853
+ "CCC GCC",
1854
+ "▁ ACTG",
1855
+ "▁ GTGC",
1856
+ "ACG ATT",
1857
+ "TGG TC",
1858
+ "TCC AC",
1859
+ "ATG TGC",
1860
+ "AA AGCG",
1861
+ "AAC GGC",
1862
+ "TTTT GG",
1863
+ "ACC ATT",
1864
+ "AGG AG",
1865
+ "ATG AG",
1866
+ "AT AATC",
1867
+ "▁ TCCC",
1868
+ "▁CC GG",
1869
+ "▁C AG",
1870
+ "ACG ATC",
1871
+ "▁ TAAAA",
1872
+ "GCG CCC",
1873
+ "AA TATT",
1874
+ "AGC TCC",
1875
+ "TAC AC",
1876
+ "AAG AAG",
1877
+ "▁ AGGC",
1878
+ "GCG A",
1879
+ "▁ TCGC",
1880
+ "AT CCCG",
1881
+ "▁C TCC",
1882
+ "AT ACGG",
1883
+ "ACC TAA",
1884
+ "GGC TC",
1885
+ "▁ ATTC",
1886
+ "TAAC GG",
1887
+ "AT AATG",
1888
+ "AA TATC",
1889
+ "AAC TAA",
1890
+ "▁C GT",
1891
+ "ATC GT",
1892
+ "AA GTGG",
1893
+ "▁ ATAA",
1894
+ "TCG ACC",
1895
+ "TAG TC",
1896
+ "AAC AGC",
1897
+ "▁G TG",
1898
+ "TTC AC",
1899
+ "ATG TGG",
1900
+ "ATG TAC",
1901
+ "TTC TGC",
1902
+ "AA ATCG",
1903
+ "AA TAGG",
1904
+ "TTG ACC",
1905
+ "AA TCGG",
1906
+ "AC TGCC",
1907
+ "AT AAGG",
1908
+ "▁ CCCG",
1909
+ "▁C AAAA",
1910
+ "AT ATGC",
1911
+ "CCG TC",
1912
+ "GGC TG",
1913
+ "AC AGCC",
1914
+ "AG ACCG",
1915
+ "TT TGCC",
1916
+ "▁ ACGG",
1917
+ "ATT TCC",
1918
+ "ATC TCC",
1919
+ "AT ACCG",
1920
+ "TCG AGG",
1921
+ "TCG AAC",
1922
+ "▁GC GG",
1923
+ "AGC ACC",
1924
+ "GTGC GC",
1925
+ "AT AACG",
1926
+ "▁ AGTG",
1927
+ "TAA CCC",
1928
+ "TT AACC",
1929
+ "AC ACCC",
1930
+ "AA ATTC",
1931
+ "▁G AAAA",
1932
+ "ATC GGC",
1933
+ "TG TAT",
1934
+ "AA ATTG",
1935
+ "AG TCCC",
1936
+ "AAC TCC",
1937
+ "AGC ATT",
1938
+ "TG ACGG",
1939
+ "AA TGGC",
1940
+ "AG TACC",
1941
+ "▁ ATTG",
1942
+ "GT AAAA",
1943
+ "AAG ATT",
1944
+ "AA TATG",
1945
+ "AG AAGG",
1946
+ "AAG ATG",
1947
+ "▁ TAGC",
1948
+ "AT AATT",
1949
+ "▁C TTC",
1950
+ "AAG AGG",
1951
+ "▁C TAG",
1952
+ "AG CCCG",
1953
+ "TAC TC",
1954
+ "AGC TGG",
1955
+ "▁G ACC",
1956
+ "▁ GCCG",
1957
+ "TG ACGC",
1958
+ "ATC TGG",
1959
+ "▁G ATT",
1960
+ "AC AGAC",
1961
+ "▁ AACG",
1962
+ "ACG TTC",
1963
+ "AGC TAA",
1964
+ "AG TCGC",
1965
+ "AGG TGC",
1966
+ "TATT GC",
1967
+ "AAG ATC",
1968
+ "GGG AC",
1969
+ "TG ACCC",
1970
+ "AA TCCG",
1971
+ "TATC GG",
1972
+ "CCC TT",
1973
+ "▁ TAAG",
1974
+ "▁G AAC",
1975
+ "AG TCGG",
1976
+ "GGC GCC",
1977
+ "TCG AGC",
1978
+ "▁ TGAC",
1979
+ "▁ AGAA",
1980
+ "AG AACG",
1981
+ "GGC AC",
1982
+ "AG AGAC",
1983
+ "TTC TCC",
1984
+ "TG AACC",
1985
+ "AG TTGC",
1986
+ "TGC AC",
1987
+ "▁C ACC",
1988
+ "▁ TCAC",
1989
+ "GGCC CC",
1990
+ "GCC TC",
1991
+ "AAC AAAA",
1992
+ "AGC AGG",
1993
+ "GTGG GC",
1994
+ "▁ TTTC",
1995
+ "ACG ATG",
1996
+ "TCG ATC",
1997
+ "TATT CC",
1998
+ "ATT CCC",
1999
+ "AG TAAC",
2000
+ "TAGG GC",
2001
+ "AT AGTC",
2002
+ "AG AGGC",
2003
+ "TTC GT",
2004
+ "TGC TT",
2005
+ "▁C GGC",
2006
+ "AT ATAC",
2007
+ "GGCC GC",
2008
+ "AT AGGC",
2009
+ "TC TAT",
2010
+ "AGC AGC",
2011
+ "TG AGCC",
2012
+ "AAC AGG",
2013
+ "GTGG GG",
2014
+ "TAG CCC",
2015
+ "GGCC GG",
2016
+ "TG TGCC",
2017
+ "AA TTTT",
2018
+ "TC AAAA",
2019
+ "T ACCCG",
2020
+ "AG ATGC",
2021
+ "AC AAAG",
2022
+ "ATG AGG",
2023
+ "AA TACG",
2024
+ "▁ AGAG",
2025
+ "TTG ACG",
2026
+ "TTG AGG",
2027
+ "ACC TAG",
2028
+ "TG CCCG",
2029
+ "TT AAAC",
2030
+ "AC AAGC",
2031
+ "AC GTGC",
2032
+ "TTC TGG",
2033
+ "ATG T",
2034
+ "ACC AGG",
2035
+ "TTG AAC",
2036
+ "ATC TAA",
2037
+ "▁ TATG",
2038
+ "ACC AGC",
2039
+ "ATT TGC",
2040
+ "ATT TAC",
2041
+ "AAAG AAAA",
2042
+ "TC TACG",
2043
+ "▁ TATC",
2044
+ "AG TAGC",
2045
+ "AG TTCC",
2046
+ "GGC TT",
2047
+ "▁ ATCG",
2048
+ "GGG CCC",
2049
+ "AC AATC",
2050
+ "ACC T",
2051
+ "ATC AAC",
2052
+ "▁ TGTC",
2053
+ "AG AATC",
2054
+ "ATC A",
2055
+ "TG TCCC",
2056
+ "ACC TTC",
2057
+ "ACG TCG",
2058
+ "ATC T",
2059
+ "AA GGCG",
2060
+ "TG AAGC",
2061
+ "AGG TCC",
2062
+ "AGG TAC",
2063
+ "TC ACTG",
2064
+ "▁ AGCG",
2065
+ "AA TGCG",
2066
+ "TGG CCC",
2067
+ "AAC ATG",
2068
+ "AAG ACG",
2069
+ "TT TACC",
2070
+ "TT GTCC",
2071
+ "T AAAAAAAA",
2072
+ "AC TACC",
2073
+ "AAC TAG",
2074
+ "CCG TG",
2075
+ "TT AGCC",
2076
+ "TT ACCC",
2077
+ "AAG A",
2078
+ "ATG TAA",
2079
+ "TGG GCC",
2080
+ "AGG T",
2081
+ "GGG TG",
2082
+ "AT AAAT",
2083
+ "AT AGTG",
2084
+ "AC ATCG",
2085
+ "ATG A",
2086
+ "AG TTGG",
2087
+ "AT GGCG",
2088
+ "AG TCCG",
2089
+ "AGC TAG",
2090
+ "AG ATGG",
2091
+ "AT ATGG",
2092
+ "ATC ATC",
2093
+ "TG AAAC",
2094
+ "▁ TCGG",
2095
+ "TCC CCC",
2096
+ "TT TCACTGCAGAC",
2097
+ "AG AATG",
2098
+ "TTC ACTG",
2099
+ "AGG GGC",
2100
+ "TAT CCC",
2101
+ "TT TCCC",
2102
+ "ATT GGC",
2103
+ "ATT TGG",
2104
+ "CCG CCC",
2105
+ "ATT TAA",
2106
+ "TTC TAC",
2107
+ "TC GTCC",
2108
+ "ATT GT",
2109
+ "AAC TGG",
2110
+ "AC GTGG",
2111
+ "AT ATCG",
2112
+ "ACC ATG",
2113
+ "AAC A",
2114
+ "ATC GCG",
2115
+ "ACC GCG",
2116
+ "▁ TTGG",
2117
+ "▁C AGC",
2118
+ "TAGG GG",
2119
+ "▁ TTCC",
2120
+ "AC AAGG",
2121
+ "TG TCGC",
2122
+ "AC ATGC",
2123
+ "AAG TTC",
2124
+ "ATG TAG",
2125
+ "▁GG GC"
2126
+ ]
2127
+ }
2128
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "1024": {
4
+ "content": "BOS",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1025": {
12
+ "content": "EOS",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "1026": {
20
+ "content": "UNK",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "1027": {
28
+ "content": "PAD",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "bos_token": "BOS",
37
+ "clean_up_tokenization_spaces": true,
38
+ "eos_token": "EOS",
39
+ "model_max_length": 15,
40
+ "pad_token": "PAD",
41
+ "padding_side": "right",
42
+ "tokenizer_class": "PreTrainedTokenizerFast",
43
+ "truncation_side": "right",
44
+ "unk_token": "UNK"
45
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1307c23c4f4012e6e4e79ee8268512067fdcf00cb0134169c24061fb318e244
3
+ size 5112