ZYMScott commited on
Commit
1072e2b
·
verified ·
1 Parent(s): 8733431

Upload tokenizer

Browse files
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
added_tokens.json ADDED
@@ -0,0 +1,543 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<0813-124 phase II>": 32352,
3
+ "<090008>": 32575,
4
+ "<1.2.20>": 32501,
5
+ "<15H5D-4a>": 32221,
6
+ "<1692>": 32349,
7
+ "<174/2>": 32395,
8
+ "<17Nkhm-UP2>": 32196,
9
+ "<18EpOKYJ>": 32596,
10
+ "<200023>": 32254,
11
+ "<21A>": 32508,
12
+ "<24.1>": 32388,
13
+ "<301>": 32638,
14
+ "<3347689II>": 32510,
15
+ "<3937>": 32289,
16
+ "<477>": 32419,
17
+ "<49125>": 32341,
18
+ "<5D>": 32268,
19
+ "<640>": 32512,
20
+ "<670-83>": 32261,
21
+ "<675>": 32187,
22
+ "<6D370>": 32519,
23
+ "<757>": 32534,
24
+ "<78-1320>": 32399,
25
+ "<7A>": 32312,
26
+ "<80813>": 32114,
27
+ "<>": 32522,
28
+ "<A1122>": 32407,
29
+ "<A1>": 32239,
30
+ "<A2-F21>": 32595,
31
+ "<A212-S19-A16>": 32493,
32
+ "<A23BA>": 32118,
33
+ "<A398-S21-F17>": 32392,
34
+ "<ACYC.E9L>": 32411,
35
+ "<ANU1>": 32168,
36
+ "<AR>": 32321,
37
+ "<ARAD>": 32160,
38
+ "<AR_0082>": 32544,
39
+ "<AS9>": 32104,
40
+ "<ATCC 13028>": 32106,
41
+ "<ATCC 39140>": 32502,
42
+ "<ATCC 43969>": 32339,
43
+ "<ATCC 51329>": 32307,
44
+ "<ATCC BAA-895>": 32630,
45
+ "<AVS0177>": 32225,
46
+ "<Annandia>": 32489,
47
+ "<ArsBeeUS>": 32401,
48
+ "<Arsenophonus apicola>": 32348,
49
+ "<Arsenophonus endosymbiont of Aleurodicus dispersus>": 32491,
50
+ "<Arsenophonus endosymbiont of Aphis craccivora>": 32266,
51
+ "<Arsenophonus nasoniae>": 32557,
52
+ "<Arsenophonus>": 32327,
53
+ "<Atlantibacter hermannii>": 32238,
54
+ "<Atlantibacter subterranea>": 32309,
55
+ "<Atlantibacter>": 32331,
56
+ "<BDA62-3>": 32621,
57
+ "<BHKY>": 32287,
58
+ "<BO-1>": 32245,
59
+ "<BPEN>": 32505,
60
+ "<BVAF>": 32573,
61
+ "<BY21311>": 32450,
62
+ "<Bacteria>": 32177,
63
+ "<Blochmannia endosymbiont of Camponotus (Colobopsis) obliquus>": 32448,
64
+ "<Blochmannia endosymbiont of Camponotus modoc>": 32328,
65
+ "<Blochmannia endosymbiont of Camponotus nipponensis>": 32387,
66
+ "<Blochmannia endosymbiont of Colobopsis nipponica>": 32315,
67
+ "<Blochmannia endosymbiont of Polyrhachis (Hedomyrma) turneri>": 32549,
68
+ "<Blochmannia>": 32369,
69
+ "<Brenneria goodwinii>": 32391,
70
+ "<Brenneria izadpanahii>": 32498,
71
+ "<Brenneria nigrifluens>": 32188,
72
+ "<Brenneria rubrifaciens>": 32606,
73
+ "<Brenneria ulupoensis>": 32201,
74
+ "<Brenneria>": 32425,
75
+ "<Bruguierivoracaceae>": 32364,
76
+ "<Buchnera aphidicola>": 32222,
77
+ "<Buchnera>": 32365,
78
+ "<Budviciaceae>": 32605,
79
+ "<Buttiauxella agrestis>": 32220,
80
+ "<Buttiauxella ferragutiae>": 32480,
81
+ "<Buttiauxella>": 32190,
82
+ "<C-002>": 32184,
83
+ "<C-005>": 32426,
84
+ "<C-006>": 32154,
85
+ "<C-050>": 32250,
86
+ "<C-7-2>": 32276,
87
+ "<CAVP490>": 32477,
88
+ "<CB>": 32452,
89
+ "<CCA6>": 32358,
90
+ "<CCUG 66741>": 32384,
91
+ "<CF-458>": 32284,
92
+ "<CFBP 3304>": 32371,
93
+ "<CFCC10813>": 32585,
94
+ "<CFPB1430>": 32470,
95
+ "<CFS1934>": 32507,
96
+ "<CQ10>": 32538,
97
+ "<CS-931>": 32146,
98
+ "<Candidatus Arsenophonus lipoptenae>": 32153,
99
+ "<Candidatus Blochmannia pennsylvanicus>": 32527,
100
+ "<Candidatus Blochmannia vafer>": 32255,
101
+ "<Candidatus Doolittlea endobia>": 32482,
102
+ "<Candidatus Fukatsuia symbiotica>": 32193,
103
+ "<Candidatus Gullanella endobia>": 32579,
104
+ "<Candidatus Hoaglandella endobia>": 32366,
105
+ "<Candidatus Mikella endobia>": 32342,
106
+ "<Candidatus Purcelliella pentastirinorum>": 32161,
107
+ "<Candidatus Riesia pediculicola>": 32445,
108
+ "<Candidatus Tachikawaea gelatinosa>": 32577,
109
+ "<Candidatus Westeberhardia cardiocondylae>": 32376,
110
+ "<Candidatus blochmannia chromaiodes>": 32570,
111
+ "<Candidatus ishikawaella capsulata>": 32155,
112
+ "<Candidatus moranella endobia>": 32159,
113
+ "<Candidatus sodalis pierantonius>": 32580,
114
+ "<Candidatus>": 32617,
115
+ "<Candidatus_antoea carbekii>": 32112,
116
+ "<Candidatus_ukatsuia>": 32240,
117
+ "<Cedecea lapagei>": 32624,
118
+ "<Cedecea neteri>": 32313,
119
+ "<Cedecea>": 32140,
120
+ "<Cf7303>": 32275,
121
+ "<Chania multitudinisentens>": 32337,
122
+ "<Chania>": 32282,
123
+ "<Citrobacter amalonaticus>": 32454,
124
+ "<Citrobacter arsenatis>": 32451,
125
+ "<Citrobacter braakii>": 32633,
126
+ "<Citrobacter freundii>": 32123,
127
+ "<Citrobacter koseri>": 32567,
128
+ "<Citrobacter portucalensis>": 32497,
129
+ "<Citrobacter rodentium>": 32363,
130
+ "<Citrobacter sedlakii>": 32340,
131
+ "<Citrobacter tructae>": 32598,
132
+ "<Citrobacter werkmanii>": 32403,
133
+ "<Citrobacter>": 32235,
134
+ "<Cp2>": 32380,
135
+ "<Cronobacter condimenti>": 32122,
136
+ "<Cronobacter dublinensis>": 32174,
137
+ "<Cronobacter malonaticus>": 32456,
138
+ "<Cronobacter muytjensii>": 32214,
139
+ "<Cronobacter sakazakii>": 32300,
140
+ "<Cronobacter universalis>": 32121,
141
+ "<Cronobacter>": 32209,
142
+ "<DH-S01>": 32243,
143
+ "<DSM 101947>": 32195,
144
+ "<DSM 102253>": 32421,
145
+ "<DSM 107547>": 32229,
146
+ "<DSM 15199>": 32476,
147
+ "<DSM 16636>": 32552,
148
+ "<DSM 16690>": 32145,
149
+ "<DSM 22758>": 32613,
150
+ "<DSM 32899>": 32548,
151
+ "<DSM 4481>": 32413,
152
+ "<DSM 4576>": 32111,
153
+ "<DSM 9389>": 32286,
154
+ "<Dickeya aquatica>": 32463,
155
+ "<Dickeya chrysanthemi>": 32137,
156
+ "<Dickeya dadantii>": 32223,
157
+ "<Dickeya dianthicola>": 32131,
158
+ "<Dickeya fangzhongdai>": 32267,
159
+ "<Dickeya parazeae>": 32461,
160
+ "<Dickeya poaceiphila>": 32152,
161
+ "<Dickeya solani>": 32170,
162
+ "<Dickeya zeae>": 32372,
163
+ "<Dickeya>": 32447,
164
+ "<Doolittlea>": 32215,
165
+ "<Duffyella gerundensis>": 32306,
166
+ "<Duffyella>": 32253,
167
+ "<EBP3064>": 32274,
168
+ "<EN-119>": 32562,
169
+ "<ERMR1:05>": 32490,
170
+ "<Eb661>": 32157,
171
+ "<Ech1591>": 32485,
172
+ "<Ech586>": 32346,
173
+ "<Ech703>": 32178,
174
+ "<Edwardsiella anguillarum>": 32234,
175
+ "<Edwardsiella hoshinae>": 32468,
176
+ "<Edwardsiella ictaluri>": 32410,
177
+ "<Edwardsiella piscicida>": 32280,
178
+ "<Edwardsiella tarda>": 32513,
179
+ "<Edwardsiella>": 32574,
180
+ "<Enterobacter asburiae>": 32236,
181
+ "<Enterobacter bugandensis>": 32607,
182
+ "<Enterobacter chengduensis>": 32555,
183
+ "<Enterobacter cloacae>": 32107,
184
+ "<Enterobacter hormaechei>": 32304,
185
+ "<Enterobacter huaxiensis>": 32294,
186
+ "<Enterobacter ludwigii>": 32571,
187
+ "<Enterobacter mori>": 32281,
188
+ "<Enterobacter oligotrophicus>": 32431,
189
+ "<Enterobacter pseudoroggenkampii>": 32465,
190
+ "<Enterobacter roggenkampii>": 32354,
191
+ "<Enterobacter sichuanensis>": 32241,
192
+ "<Enterobacter soli>": 32279,
193
+ "<Enterobacter>": 32224,
194
+ "<Enterobacterales>": 32635,
195
+ "<Enterobacteriaceae endosymbiont of Macroplea mutica>": 32486,
196
+ "<Enterobacteriaceae endosymbiont of Plateumaris pusilla>": 32433,
197
+ "<Enterobacteriaceae endosymbiont of_acroplea mutica>": 32409,
198
+ "<Enterobacteriaceae>": 32400,
199
+ "<EpK1/15>": 32386,
200
+ "<ErCicurvipes>": 32198,
201
+ "<Erwinia amylovora>": 32335,
202
+ "<Erwinia billingiae>": 32217,
203
+ "<Erwinia persicina>": 32568,
204
+ "<Erwinia pyrifoliae>": 32414,
205
+ "<Erwinia rhapontici>": 32408,
206
+ "<Erwinia sorbitola>": 32179,
207
+ "<Erwinia tasmaniensis>": 32189,
208
+ "<Erwinia tracheiphila>": 32277,
209
+ "<Erwinia>": 32546,
210
+ "<Erwiniaceae>": 32228,
211
+ "<Escherichia albertii>": 32484,
212
+ "<Escherichia coli >": 32212,
213
+ "<Escherichia fergusonii>": 32244,
214
+ "<Escherichia marmotae>": 32515,
215
+ "<Escherichia>": 32460,
216
+ "<Et1/99>": 32142,
217
+ "<FDAARGOS 1447>": 32231,
218
+ "<FDAARGOS_1499>": 32626,
219
+ "<FDAARGOS_165>": 32175,
220
+ "<FDAARGOS_186>": 32439,
221
+ "<FDAARGOS_392>": 32139,
222
+ "<FDAARGOS_408>": 32590,
223
+ "<FDAARGOS_500>": 32298,
224
+ "<FDAARGOS_616>": 32165,
225
+ "<FDAARGOS_730>": 32130,
226
+ "<FDAARGOS_926>": 32103,
227
+ "<FDAARGOS_940>": 32459,
228
+ "<FIN>": 32471,
229
+ "<FN20211>": 32581,
230
+ "<FRB141>": 32164,
231
+ "<FRB97>": 32351,
232
+ "<FRM16>": 32345,
233
+ "<FY-07>": 32338,
234
+ "<FY158>": 32148,
235
+ "<G5>": 32591,
236
+ "<G6>": 32499,
237
+ "<Gammaproteobacteria>": 32124,
238
+ "<Gibbsiella quercinecans>": 32397,
239
+ "<Gibbsiella>": 32584,
240
+ "<Gullanella>": 32258,
241
+ "<H4-C11>": 32604,
242
+ "<HI4320>": 32495,
243
+ "<HS11286>": 32323,
244
+ "<HS1>": 32488,
245
+ "<HYN0051>": 32329,
246
+ "<Hafnia alvei>": 32305,
247
+ "<Hafnia paralvei>": 32405,
248
+ "<Hafnia>": 32257,
249
+ "<Hafniaceae>": 32462,
250
+ "<Hoaglandella>": 32218,
251
+ "<IFB5427>": 32569,
252
+ "<IP32953>": 32105,
253
+ "<Iran 50>": 32582,
254
+ "<Ishikawaella>": 32554,
255
+ "<J780>": 32260,
256
+ "<JH01>": 32428,
257
+ "<JK2.1>": 32285,
258
+ "<JZ-GX1>": 32347,
259
+ "<JZB2120001>": 32389,
260
+ "<Jejubacter calystegiae>": 32210,
261
+ "<Jejubacter>": 32374,
262
+ "<K-12 substr. MG1655>": 32496,
263
+ "<K61>": 32110,
264
+ "<KACC 18508>": 32457,
265
+ "<KC-Pc-HB1>": 32566,
266
+ "<KMM821>": 32379,
267
+ "<KSNA2>": 32444,
268
+ "<KUDC3025>": 32186,
269
+ "<Ka37751>": 32472,
270
+ "<Kalro>": 32207,
271
+ "<Klebsiella aerogenes>": 32602,
272
+ "<Klebsiella africana>": 32319,
273
+ "<Klebsiella electrica>": 32283,
274
+ "<Klebsiella huaxiensis>": 32353,
275
+ "<Klebsiella michiganensis>": 32441,
276
+ "<Klebsiella oxytoca>": 32128,
277
+ "<Klebsiella pasteurii>": 32597,
278
+ "<Klebsiella pneumoniae>": 32265,
279
+ "<Klebsiella quasipneumoniae>": 32500,
280
+ "<Klebsiella variicola>": 32129,
281
+ "<Klebsiella>": 32615,
282
+ "<Kluyvera ascorbata>": 32362,
283
+ "<Kluyvera intermedia>": 32290,
284
+ "<Kluyvera>": 32611,
285
+ "<Kosakonia arachidis>": 32523,
286
+ "<Kosakonia cowanii>": 32325,
287
+ "<Kosakonia oryzae>": 32100,
288
+ "<Kosakonia oryzendophytica>": 32134,
289
+ "<Kosakonia pseudosacchari>": 32514,
290
+ "<Kosakonia radicincitans>": 32194,
291
+ "<Kosakonia sacchari>": 32125,
292
+ "<Kosakonia>": 32326,
293
+ "<KqPF26>": 32623,
294
+ "<L6>": 32603,
295
+ "<LEMB11>": 32518,
296
+ "<LF7a>": 32334,
297
+ "<LH84-a>": 32115,
298
+ "<LJ1>": 32436,
299
+ "<LMG 23823>": 32601,
300
+ "<LMG 23826>": 32543,
301
+ "<LMG 24197>": 32216,
302
+ "<LMG 24199>": 32310,
303
+ "<LMG 26250>": 32385,
304
+ "<LMG24200>": 32503,
305
+ "<LST-1>": 32430,
306
+ "<LT-1>": 32297,
307
+ "<LT2>": 32443,
308
+ "<LTYR-11Z>": 32422,
309
+ "<LY-1>": 32608,
310
+ "<Leclercia adecarboxylata>": 32564,
311
+ "<Leclercia pneumoniae>": 32269,
312
+ "<Leclercia>": 32204,
313
+ "<Lelliottia steviae>": 32539,
314
+ "<Lelliottia>": 32442,
315
+ "<Leminorella richardii>": 32427,
316
+ "<Leminorella>": 32271,
317
+ "<Limnobaculum parvum>": 32317,
318
+ "<Limnobaculum zhutongyuii>": 32248,
319
+ "<Limnobaculum>": 32226,
320
+ "<Lonsdalea britannica>": 32102,
321
+ "<Lonsdalea populi>": 32588,
322
+ "<Lonsdalea>": 32219,
323
+ "<Lsch>": 32143,
324
+ "<ME23>": 32199,
325
+ "<MS2>": 32589,
326
+ "<MiY-A>": 32494,
327
+ "<Mikella>": 32357,
328
+ "<Mixta gaviniae>": 32506,
329
+ "<Mixta hanseatica>": 32509,
330
+ "<Mixta intestinalis>": 32185,
331
+ "<Mixta>": 32394,
332
+ "<Moellerella wisconsensis>": 32246,
333
+ "<Moellerella>": 32202,
334
+ "<Moranella>": 32453,
335
+ "<Morganella morganii>": 32133,
336
+ "<Morganella>": 32432,
337
+ "<Morganellaceae>": 32318,
338
+ "<Mpkobe>": 32619,
339
+ "<Musicola paradisiaca>": 32542,
340
+ "<Musicola>": 32415,
341
+ "<N-5-1>": 32547,
342
+ "<N2-1>": 32583,
343
+ "<N268-08>": 32301,
344
+ "<NA>": 32171,
345
+ "<NCPPB 569>": 32356,
346
+ "<NCTC 14382>": 32303,
347
+ "<NCTC 9529>": 32213,
348
+ "<NCTC11466>": 32249,
349
+ "<NCTC12003>": 32360,
350
+ "<NCTC12148>": 32616,
351
+ "<NCTC12151>": 32474,
352
+ "<NCTC12284>": 32434,
353
+ "<NCTC13188>": 32437,
354
+ "<NIBIO1392>": 32181,
355
+ "<OLIH>": 32278,
356
+ "<Ola 51>": 32404,
357
+ "<PA13>": 32355,
358
+ "<PCVAL>": 32120,
359
+ "<PPO 9019>": 32438,
360
+ "<PR-310>": 32609,
361
+ "<PRI-2C>": 32233,
362
+ "<Pantoea agglomerans>": 32367,
363
+ "<Pantoea alfalfae>": 32101,
364
+ "<Pantoea alhagi>": 32191,
365
+ "<Pantoea ananatis>": 32550,
366
+ "<Pantoea deleyi>": 32370,
367
+ "<Pantoea dispersa>": 32594,
368
+ "<Pantoea eucalypti>": 32299,
369
+ "<Pantoea eucrina>": 32640,
370
+ "<Pantoea soli>": 32136,
371
+ "<Pantoea stewartii>": 32343,
372
+ "<Pantoea vagans>": 32627,
373
+ "<Pantoea>": 32537,
374
+ "<Pectobacteriaceae>": 32576,
375
+ "<Pectobacterium aquaticum>": 32176,
376
+ "<Pectobacterium aroidearum>": 32536,
377
+ "<Pectobacterium atrosepticum>": 32383,
378
+ "<Pectobacterium brasiliense>": 32556,
379
+ "<Pectobacterium cacticida>": 32205,
380
+ "<Pectobacterium carotovorum>": 32429,
381
+ "<Pectobacterium colocasium>": 32259,
382
+ "<Pectobacterium odoriferum>": 32359,
383
+ "<Pectobacterium parmentieri>": 32144,
384
+ "<Pectobacterium parvum>": 32563,
385
+ "<Pectobacterium polaris>": 32203,
386
+ "<Pectobacterium punjabense>": 32636,
387
+ "<Pectobacterium quasiaquaticum>": 32149,
388
+ "<Pectobacterium wasabiae>": 32417,
389
+ "<Pectobacterium>": 32593,
390
+ "<Photorhabdus akhurstii>": 32237,
391
+ "<Photorhabdus asymbiotica>": 32135,
392
+ "<Photorhabdus laumondii>": 32492,
393
+ "<Photorhabdus thracensis>": 32529,
394
+ "<Photorhabdus>": 32200,
395
+ "<Phytobacter diazotrophicus>": 32166,
396
+ "<Phytobacter>": 32147,
397
+ "<Plesiomonas shigelloides>": 32560,
398
+ "<Plesiomonas>": 32138,
399
+ "<Pluralibacter gergoviae>": 32586,
400
+ "<Pluralibacter>": 32578,
401
+ "<Pragia fontium>": 32475,
402
+ "<Pragia>": 32612,
403
+ "<Profftia>": 32406,
404
+ "<Proteus hauseri>": 32117,
405
+ "<Proteus mirabilis>": 32545,
406
+ "<Proteus penneri>": 32332,
407
+ "<Proteus terrae>": 32116,
408
+ "<Proteus>": 32610,
409
+ "<Providencia alcalifaciens>": 32520,
410
+ "<Providencia hangzhouensis>": 32629,
411
+ "<Providencia heimbachae>": 32467,
412
+ "<Providencia huaxiensis>": 32631,
413
+ "<Providencia rettgeri>": 32639,
414
+ "<Providencia stuartii>": 32251,
415
+ "<Providencia>": 32292,
416
+ "<Pseudocitrobacter corydidari>": 32504,
417
+ "<Pseudocitrobacter>": 32464,
418
+ "<Pseudomonadota>": 32420,
419
+ "<Purcelliella>": 32558,
420
+ "<RB-25>": 32525,
421
+ "<Rahnella aceris>": 32232,
422
+ "<Rahnella sikkimica>": 32416,
423
+ "<Rahnella victoriana>": 32440,
424
+ "<Rahnella>": 32390,
425
+ "<Raoultella planticola>": 32402,
426
+ "<Raoultella terrigena>": 32302,
427
+ "<Raoultella>": 32478,
428
+ "<Riesia>": 32158,
429
+ "<S07-698>": 32628,
430
+ "<S178-2>": 32192,
431
+ "<S1>": 32625,
432
+ "<S2-A69>": 32182,
433
+ "<SCPM-O-B-7604>": 32272,
434
+ "<SE6-1>": 32382,
435
+ "<SGAir0282>": 32458,
436
+ "<SII>": 32565,
437
+ "<SK>": 32126,
438
+ "<SNU WT2>": 32197,
439
+ "<SOPE>": 32273,
440
+ "<SRCM103226>": 32163,
441
+ "<SS95>": 32256,
442
+ "<SWHEFF_49>": 32109,
443
+ "<Sakai substr. RIMD 0509952>": 32263,
444
+ "<Salmonella bongori>": 32173,
445
+ "<Salmonella enterica>": 32435,
446
+ "<Salmonella>": 32264,
447
+ "<Sample 167>": 32291,
448
+ "<Sb-24>": 32533,
449
+ "<Scandinavium goeteborgense>": 32333,
450
+ "<Scandinavium>": 32141,
451
+ "<Schneideria>": 32618,
452
+ "<Serratia entomophila>": 32424,
453
+ "<Serratia ficaria>": 32614,
454
+ "<Serratia fonticola>": 32230,
455
+ "<Serratia inhibens>": 32368,
456
+ "<Serratia liquefaciens>": 32344,
457
+ "<Serratia nematodiphila>": 32620,
458
+ "<Serratia plymuthica>": 32156,
459
+ "<Serratia proteamaculans>": 32336,
460
+ "<Serratia quinivorans>": 32270,
461
+ "<Serratia rhizosphaerae>": 32418,
462
+ "<Serratia rubidaea>": 32211,
463
+ "<Serratia surfactantfaciens>": 32592,
464
+ "<Serratia symbiotica>": 32167,
465
+ "<Serratia ureilytica>": 32293,
466
+ "<Serratia>": 32541,
467
+ "<Shigella dysenteriae>": 32132,
468
+ "<Shigella flexneri>": 32516,
469
+ "<Shigella sonnei>": 32599,
470
+ "<Shigella>": 32481,
471
+ "<Shimwellia blattae>": 32469,
472
+ "<Shimwellia>": 32423,
473
+ "<Siccibacter colletis>": 32393,
474
+ "<Siccibacter>": 32183,
475
+ "<Sodalis endosymbiont of Henestaris halophilus>": 32487,
476
+ "<Sodalis glossinidius>": 32551,
477
+ "<Sodalis praecaptivus>": 32455,
478
+ "<Sodalis>": 32151,
479
+ "<SyEd1>": 32320,
480
+ "<Symbiopectobacterium purcellii>": 32483,
481
+ "<Symbiopectobacterium>": 32350,
482
+ "<T6>": 32377,
483
+ "<TA9759>": 32473,
484
+ "<TBY01>": 32361,
485
+ "<THO-011>": 32113,
486
+ "<TTO1>": 32540,
487
+ "<Tachikawaea>": 32528,
488
+ "<Tatumella citrea>": 32330,
489
+ "<Tatumella>": 32378,
490
+ "<Trabulsiella odontotermitis>": 32288,
491
+ "<Trabulsiella>": 32375,
492
+ "<US>": 32622,
493
+ "<USDA-ARS-USMARC-60222>": 32526,
494
+ "<USDA>": 32446,
495
+ "<UwTKB>": 32396,
496
+ "<VKH10>": 32553,
497
+ "<W65>": 32206,
498
+ "<WCHECl-C4 = WCHECh050004>": 32632,
499
+ "<WCHKl090001>": 32150,
500
+ "<WCHPr000369>": 32479,
501
+ "<WPP14>": 32242,
502
+ "<Westeberhardia>": 32308,
503
+ "<Wigglesworthia glossinidia>": 32535,
504
+ "<Wigglesworthia>": 32227,
505
+ "<Winslowiella toletana>": 32314,
506
+ "<Winslowiella>": 32252,
507
+ "<XL123>": 32637,
508
+ "<XL95>": 32247,
509
+ "<Xenorhabdus budapestensis>": 32466,
510
+ "<Xenorhabdus doucetiae>": 32316,
511
+ "<Xenorhabdus griffiniae>": 32532,
512
+ "<Xenorhabdus hominickii>": 32559,
513
+ "<Xenorhabdus nematophila>": 32531,
514
+ "<Xenorhabdus poinarii>": 32449,
515
+ "<Xenorhabdus>": 32600,
516
+ "<YD25>": 32561,
517
+ "<YF8>": 32169,
518
+ "<YRA>": 32381,
519
+ "<YSD YN2>": 32521,
520
+ "<Y_sim_228>": 32524,
521
+ "<Yersinia aldovae>": 32208,
522
+ "<Yersinia alsatica>": 32162,
523
+ "<Yersinia canariae>": 32295,
524
+ "<Yersinia hibernica>": 32412,
525
+ "<Yersinia intermedia>": 32398,
526
+ "<Yersinia mollaretii>": 32511,
527
+ "<Yersinia pestis>": 32572,
528
+ "<Yersinia pseudotuberculosis>": 32517,
529
+ "<Yersinia rohdei>": 32180,
530
+ "<Yersinia ruckeri>": 32587,
531
+ "<Yersinia similis>": 32530,
532
+ "<Yersinia>": 32322,
533
+ "<Yersiniaceae>": 32324,
534
+ "<ZJ-FGZX1>": 32373,
535
+ "<ZN2>": 32127,
536
+ "<[Enterobacter] lignolyticus>": 32119,
537
+ "<[Pantoea] beijingensis>": 32311,
538
+ "<morsitans>": 32634,
539
+ "<obscurior>": 32296,
540
+ "<secondary endosymbiont of Ctenarytaina eucalypti>": 32262,
541
+ "<secondary endosymbiont of Heteropsylla cubana>": 32108,
542
+ "<secondary endosymbiont of Trabutina mannipara>": 32172
543
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": "</s>",
105
+ "pad_token": "<pad>",
106
+ "unk_token": "<unk>"
107
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff