iszoke commited on
Commit
ff1ab6b
·
verified ·
1 Parent(s): 3985846

Upload tokenizer

Browse files
Files changed (4) hide show
  1. README.md +199 -0
  2. special_tokens_map.json +8 -0
  3. tokenizer.json +2144 -0
  4. tokenizer_config.json +61 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
special_tokens_map.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "mask_token": "<mask>",
5
+ "pad_token": "<pad>",
6
+ "sep_token": "▁",
7
+ "unk_token": "<unk>"
8
+ }
tokenizer.json ADDED
@@ -0,0 +1,2144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "<s>",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "</s>",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "<unk>",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "<pad>",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 4,
44
+ "content": "<mask>",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ },
51
+ {
52
+ "id": 5,
53
+ "content": "▁",
54
+ "single_word": false,
55
+ "lstrip": false,
56
+ "rstrip": false,
57
+ "normalized": false,
58
+ "special": true
59
+ }
60
+ ],
61
+ "normalizer": null,
62
+ "pre_tokenizer": {
63
+ "type": "Metaspace",
64
+ "replacement": "▁",
65
+ "add_prefix_space": true,
66
+ "prepend_scheme": "always"
67
+ },
68
+ "post_processor": {
69
+ "type": "TemplateProcessing",
70
+ "single": [
71
+ {
72
+ "Sequence": {
73
+ "id": "A",
74
+ "type_id": 0
75
+ }
76
+ },
77
+ {
78
+ "SpecialToken": {
79
+ "id": "</s>",
80
+ "type_id": 0
81
+ }
82
+ }
83
+ ],
84
+ "pair": [
85
+ {
86
+ "Sequence": {
87
+ "id": "A",
88
+ "type_id": 0
89
+ }
90
+ },
91
+ {
92
+ "SpecialToken": {
93
+ "id": "</s>",
94
+ "type_id": 0
95
+ }
96
+ },
97
+ {
98
+ "Sequence": {
99
+ "id": "B",
100
+ "type_id": 1
101
+ }
102
+ },
103
+ {
104
+ "SpecialToken": {
105
+ "id": "</s>",
106
+ "type_id": 1
107
+ }
108
+ }
109
+ ],
110
+ "special_tokens": {
111
+ "</s>": {
112
+ "id": "</s>",
113
+ "ids": [
114
+ 1
115
+ ],
116
+ "tokens": [
117
+ "</s>"
118
+ ]
119
+ },
120
+ "<s>": {
121
+ "id": "<s>",
122
+ "ids": [
123
+ 0
124
+ ],
125
+ "tokens": [
126
+ "<s>"
127
+ ]
128
+ }
129
+ }
130
+ },
131
+ "decoder": {
132
+ "type": "Metaspace",
133
+ "replacement": "▁",
134
+ "add_prefix_space": true,
135
+ "prepend_scheme": "always"
136
+ },
137
+ "model": {
138
+ "type": "Unigram",
139
+ "unk_id": 2,
140
+ "vocab": [
141
+ [
142
+ "<s>",
143
+ 0.0
144
+ ],
145
+ [
146
+ "</s>",
147
+ 0.0
148
+ ],
149
+ [
150
+ "<unk>",
151
+ 0.0
152
+ ],
153
+ [
154
+ "<pad>",
155
+ 0.0
156
+ ],
157
+ [
158
+ "<mask>",
159
+ 0.0
160
+ ],
161
+ [
162
+ "▁",
163
+ -1.851753745508601
164
+ ],
165
+ [
166
+ ".",
167
+ -3.467229702727165
168
+ ],
169
+ [
170
+ ",",
171
+ -3.4994580525827654
172
+ ],
173
+ [
174
+ "i",
175
+ -3.501997281321488
176
+ ],
177
+ [
178
+ "s",
179
+ -3.6675149287765354
180
+ ],
181
+ [
182
+ "e",
183
+ -3.740981327672804
184
+ ],
185
+ [
186
+ "t",
187
+ -3.756271820779278
188
+ ],
189
+ [
190
+ "a",
191
+ -3.851924688587939
192
+ ],
193
+ [
194
+ "d",
195
+ -3.9663641245891097
196
+ ],
197
+ [
198
+ "l",
199
+ -4.187184666104034
200
+ ],
201
+ [
202
+ "u",
203
+ -4.362726253174099
204
+ ],
205
+ [
206
+ "n",
207
+ -4.365281003269589
208
+ ],
209
+ [
210
+ "te",
211
+ -4.465178781498764
212
+ ],
213
+ [
214
+ "▁on",
215
+ -4.485617356589534
216
+ ],
217
+ [
218
+ "ma",
219
+ -4.507881482554289
220
+ ],
221
+ [
222
+ "ta",
223
+ -4.577405199912896
224
+ ],
225
+ [
226
+ "▁ja",
227
+ -4.659369622365345
228
+ ],
229
+ [
230
+ "le",
231
+ -4.754738949830481
232
+ ],
233
+ [
234
+ "da",
235
+ -4.763160408595947
236
+ ],
237
+ [
238
+ "k",
239
+ -4.764517002378536
240
+ ],
241
+ [
242
+ "se",
243
+ -4.82558826621422
244
+ ],
245
+ [
246
+ "id",
247
+ -4.839389364800779
248
+ ],
249
+ [
250
+ "▁et",
251
+ -4.863079140826789
252
+ ],
253
+ [
254
+ "o",
255
+ -4.868643884100571
256
+ ],
257
+ [
258
+ "me",
259
+ -4.88054197361431
260
+ ],
261
+ [
262
+ "b",
263
+ -4.893280824678866
264
+ ],
265
+ [
266
+ "ks",
267
+ -4.920152635536866
268
+ ],
269
+ [
270
+ "st",
271
+ -4.924191079610448
272
+ ],
273
+ [
274
+ "h",
275
+ -4.932466993741617
276
+ ],
277
+ [
278
+ "ne",
279
+ -5.01631846222519
280
+ ],
281
+ [
282
+ "r",
283
+ -5.023747986110356
284
+ ],
285
+ [
286
+ "m",
287
+ -5.109002791615339
288
+ ],
289
+ [
290
+ "va",
291
+ -5.12864421288047
292
+ ],
293
+ [
294
+ "▁ka",
295
+ -5.229718375045312
296
+ ],
297
+ [
298
+ "▁me",
299
+ -5.2748688708727265
300
+ ],
301
+ [
302
+ "tu",
303
+ -5.276698600171198
304
+ ],
305
+ [
306
+ "ra",
307
+ -5.393875035872222
308
+ ],
309
+ [
310
+ "unk",
311
+ -5.439702709633529
312
+ ],
313
+ [
314
+ "ga",
315
+ -5.450489872645525
316
+ ],
317
+ [
318
+ "▁Euroopa",
319
+ -5.471992917432958
320
+ ],
321
+ [
322
+ ">",
323
+ -5.474539494326656
324
+ ],
325
+ [
326
+ "<",
327
+ -5.474539494326656
328
+ ],
329
+ [
330
+ "us",
331
+ -5.505121366103829
332
+ ],
333
+ [
334
+ "el",
335
+ -5.510869021953342
336
+ ],
337
+ [
338
+ "▁selle",
339
+ -5.515787923126501
340
+ ],
341
+ [
342
+ "gi",
343
+ -5.518294934145231
344
+ ],
345
+ [
346
+ "ja",
347
+ -5.5639407280087365
348
+ ],
349
+ [
350
+ "▁see",
351
+ -5.580429802191812
352
+ ],
353
+ [
354
+ "ri",
355
+ -5.58303685327208
356
+ ],
357
+ [
358
+ "▁ei",
359
+ -5.602513947003798
360
+ ],
361
+ [
362
+ "is",
363
+ -5.608091010654608
364
+ ],
365
+ [
366
+ "ku",
367
+ -5.628117948676826
368
+ ],
369
+ [
370
+ "ü",
371
+ -5.641484201038043
372
+ ],
373
+ [
374
+ "li",
375
+ -5.662861124299768
376
+ ],
377
+ [
378
+ "ä",
379
+ -5.668312759894716
380
+ ],
381
+ [
382
+ "al",
383
+ -5.720288671730149
384
+ ],
385
+ [
386
+ "ge",
387
+ -5.749641201899214
388
+ ],
389
+ [
390
+ "vad",
391
+ -5.758167850952873
392
+ ],
393
+ [
394
+ "nud",
395
+ -5.768339084430565
396
+ ],
397
+ [
398
+ "na",
399
+ -5.773048422065626
400
+ ],
401
+ [
402
+ "re",
403
+ -5.815876593305079
404
+ ],
405
+ [
406
+ "õ",
407
+ -5.8345310774588235
408
+ ],
409
+ [
410
+ "v",
411
+ -5.846347926959826
412
+ ],
413
+ [
414
+ "mi",
415
+ -5.861293593653736
416
+ ],
417
+ [
418
+ "es",
419
+ -5.868850711872177
420
+ ],
421
+ [
422
+ "▁kui",
423
+ -5.878640833097753
424
+ ],
425
+ [
426
+ "ka",
427
+ -5.90206827934912
428
+ ],
429
+ [
430
+ "p",
431
+ -5.906236406204082
432
+ ],
433
+ [
434
+ "tud",
435
+ -5.921101614937527
436
+ ],
437
+ [
438
+ "as",
439
+ -5.926380765302193
440
+ ],
441
+ [
442
+ "ni",
443
+ -5.936832847802426
444
+ ],
445
+ [
446
+ "mu",
447
+ -6.00104349505018
448
+ ],
449
+ [
450
+ "▁meie",
451
+ -6.006002804792711
452
+ ],
453
+ [
454
+ "oo",
455
+ -6.054297433851142
456
+ ],
457
+ [
458
+ "ar",
459
+ -6.059389661732906
460
+ ],
461
+ [
462
+ "ju",
463
+ -6.066723831028785
464
+ ],
465
+ [
466
+ "g",
467
+ -6.06972031845312
468
+ ],
469
+ [
470
+ "in",
471
+ -6.097018886344532
472
+ ],
473
+ [
474
+ "▁p",
475
+ -6.105883135456947
476
+ ],
477
+ [
478
+ "▁või",
479
+ -6.113215945750243
480
+ ],
481
+ [
482
+ "▁ko",
483
+ -6.1567957224840075
484
+ ],
485
+ [
486
+ "▁mis",
487
+ -6.163709356807537
488
+ ],
489
+ [
490
+ "▁seda",
491
+ -6.181008954335827
492
+ ],
493
+ [
494
+ "▁saa",
495
+ -6.191139071265793
496
+ ],
497
+ [
498
+ "gu",
499
+ -6.191471923067434
500
+ ],
501
+ [
502
+ "val",
503
+ -6.2173714816893995
504
+ ],
505
+ [
506
+ "f",
507
+ -6.220665964968466
508
+ ],
509
+ [
510
+ "▁meil",
511
+ -6.23212090037727
512
+ ],
513
+ [
514
+ "ning",
515
+ -6.237519418960195
516
+ ],
517
+ [
518
+ "▁ole",
519
+ -6.2479564513302694
520
+ ],
521
+ [
522
+ "▁suur",
523
+ -6.252243524612862
524
+ ],
525
+ [
526
+ "mise",
527
+ -6.265632913496582
528
+ ],
529
+ [
530
+ "sed",
531
+ -6.27064974762089
532
+ ],
533
+ [
534
+ "he",
535
+ -6.304391835981752
536
+ ],
537
+ [
538
+ "lt",
539
+ -6.335409182785835
540
+ ],
541
+ [
542
+ "ke",
543
+ -6.3636196051178615
544
+ ],
545
+ [
546
+ "an",
547
+ -6.384023554179979
548
+ ],
549
+ [
550
+ "▁oma",
551
+ -6.398085477822427
552
+ ],
553
+ [
554
+ "▁kõik",
555
+ -6.39996827514852
556
+ ],
557
+ [
558
+ "K",
559
+ -6.414973586325018
560
+ ],
561
+ [
562
+ "▁kes",
563
+ -6.421202461615208
564
+ ],
565
+ [
566
+ "▁siis",
567
+ -6.444601657873719
568
+ ],
569
+ [
570
+ "▁väga",
571
+ -6.454035424594898
572
+ ],
573
+ [
574
+ "hi",
575
+ -6.458606115690512
576
+ ],
577
+ [
578
+ "-",
579
+ -6.477737902834786
580
+ ],
581
+ [
582
+ "▁aga",
583
+ -6.4817659799717315
584
+ ],
585
+ [
586
+ "▁vastu",
587
+ -6.485698390203481
588
+ ],
589
+ [
590
+ "▁mõ",
591
+ -6.50483048310699
592
+ ],
593
+ [
594
+ "▁peame",
595
+ -6.530990629241519
596
+ ],
597
+ [
598
+ "and",
599
+ -6.533164741058254
600
+ ],
601
+ [
602
+ "vi",
603
+ -6.5462373497933415
604
+ ],
605
+ [
606
+ "õigus",
607
+ -6.551011238509859
608
+ ],
609
+ [
610
+ "▁inim",
611
+ -6.5526255794563255
612
+ ],
613
+ [
614
+ "ed",
615
+ -6.55349251836417
616
+ ],
617
+ [
618
+ "il",
619
+ -6.556444293553909
620
+ ],
621
+ [
622
+ "ega",
623
+ -6.569508682508571
624
+ ],
625
+ [
626
+ "▁aitäh",
627
+ -6.5696686580664645
628
+ ],
629
+ [
630
+ "mine",
631
+ -6.577925306447063
632
+ ],
633
+ [
634
+ "on",
635
+ -6.588528368804588
636
+ ],
637
+ [
638
+ "ki",
639
+ -6.5992733330127695
640
+ ],
641
+ [
642
+ "▁nüüd",
643
+ -6.609780744644912
644
+ ],
645
+ [
646
+ "▁nii",
647
+ -6.616353848353142
648
+ ],
649
+ [
650
+ "▁nä",
651
+ -6.623407421551942
652
+ ],
653
+ [
654
+ "or",
655
+ -6.623922180093733
656
+ ],
657
+ [
658
+ "▁tei",
659
+ -6.636676590850732
660
+ ],
661
+ [
662
+ "enda",
663
+ -6.638791613354643
664
+ ],
665
+ [
666
+ "ve",
667
+ -6.6415748515454665
668
+ ],
669
+ [
670
+ "▁üle",
671
+ -6.647944470101434
672
+ ],
673
+ [
674
+ "▁töö",
675
+ -6.654199659773511
676
+ ],
677
+ [
678
+ "ru",
679
+ -6.716031688131752
680
+ ],
681
+ [
682
+ "▁Liidu",
683
+ -6.723786818500097
684
+ ],
685
+ [
686
+ "▁mida",
687
+ -6.732665213219939
688
+ ],
689
+ [
690
+ "tus",
691
+ -6.735612527529548
692
+ ],
693
+ [
694
+ "▁jä",
695
+ -6.738229195594432
696
+ ],
697
+ [
698
+ "ik",
699
+ -6.7449176494288325
700
+ ],
701
+ [
702
+ "ro",
703
+ -6.748206833663559
704
+ ],
705
+ [
706
+ "vä",
707
+ -6.751940318635054
708
+ ],
709
+ [
710
+ "▁vaja",
711
+ -6.7524525791053005
712
+ ],
713
+ [
714
+ "▁siin",
715
+ -6.766116744429205
716
+ ],
717
+ [
718
+ "des",
719
+ -6.76747226826607
720
+ ],
721
+ [
722
+ "eeri",
723
+ -6.771378098927969
724
+ ],
725
+ [
726
+ "▁kü",
727
+ -6.7782718395747175
728
+ ],
729
+ [
730
+ "lis",
731
+ -6.781678415088463
732
+ ],
733
+ [
734
+ "võ",
735
+ -6.784203171074139
736
+ ],
737
+ [
738
+ "tada",
739
+ -6.794105292934015
740
+ ],
741
+ [
742
+ "▁tõ",
743
+ -6.796936419626327
744
+ ],
745
+ [
746
+ "1",
747
+ -6.797587845662779
748
+ ],
749
+ [
750
+ "▁sa",
751
+ -6.809845522349655
752
+ ],
753
+ [
754
+ "0",
755
+ -6.81883730271349
756
+ ],
757
+ [
758
+ "▁nende",
759
+ -6.820569614841288
760
+ ],
761
+ [
762
+ "lik",
763
+ -6.8234168942644144
764
+ ],
765
+ [
766
+ "ha",
767
+ -6.828330361919034
768
+ ],
769
+ [
770
+ "võt",
771
+ -6.8432293260622625
772
+ ],
773
+ [
774
+ "▁sõna",
775
+ -6.8451083900236664
776
+ ],
777
+ [
778
+ "liku",
779
+ -6.849308105959519
780
+ ],
781
+ [
782
+ "selt",
783
+ -6.851119605223998
784
+ ],
785
+ [
786
+ "SP",
787
+ -6.860939821102747
788
+ ],
789
+ [
790
+ "(",
791
+ -6.861035166200734
792
+ ],
793
+ [
794
+ ")",
795
+ -6.861035166200734
796
+ ],
797
+ [
798
+ "takse",
799
+ -6.8672976844058535
800
+ ],
801
+ [
802
+ "▁tä",
803
+ -6.871962385337401
804
+ ],
805
+ [
806
+ "sti",
807
+ -6.873892877814959
808
+ ],
809
+ [
810
+ "lus",
811
+ -6.890947736544694
812
+ ],
813
+ [
814
+ "pi",
815
+ -6.894817259412826
816
+ ],
817
+ [
818
+ "▁pea",
819
+ -6.905637081122985
820
+ ],
821
+ [
822
+ "▁aasta",
823
+ -6.922647426283789
824
+ ],
825
+ [
826
+ "sime",
827
+ -6.9267490837436725
828
+ ],
829
+ [
830
+ "ko",
831
+ -6.927167665828413
832
+ ],
833
+ [
834
+ "▁la",
835
+ -6.928952996218759
836
+ ],
837
+ [
838
+ "mist",
839
+ -6.951943978555935
840
+ ],
841
+ [
842
+ "▁loo",
843
+ -6.96356743182482
844
+ ],
845
+ [
846
+ "▁oluli",
847
+ -6.969273116086222
848
+ ],
849
+ [
850
+ "tele",
851
+ -6.9700193574881935
852
+ ],
853
+ [
854
+ "j",
855
+ -6.996019995014971
856
+ ],
857
+ [
858
+ "S",
859
+ -7.004659171302743
860
+ ],
861
+ [
862
+ "▁need",
863
+ -7.006300481384357
864
+ ],
865
+ [
866
+ "▁osa",
867
+ -7.010837211050541
868
+ ],
869
+ [
870
+ "2",
871
+ -7.0130916295435615
872
+ ],
873
+ [
874
+ "▁oleme",
875
+ -7.015688969173557
876
+ ],
877
+ [
878
+ "tsi",
879
+ -7.015706233633436
880
+ ],
881
+ [
882
+ "▁täna",
883
+ -7.025908723203951
884
+ ],
885
+ [
886
+ "▁pro",
887
+ -7.031902686171742
888
+ ],
889
+ [
890
+ "ati",
891
+ -7.035389283833982
892
+ ],
893
+ [
894
+ "pe",
895
+ -7.041590457263219
896
+ ],
897
+ [
898
+ "sid",
899
+ -7.0665264828464265
900
+ ],
901
+ [
902
+ "pa",
903
+ -7.073622922078695
904
+ ],
905
+ [
906
+ "▁tea",
907
+ -7.083996165068698
908
+ ],
909
+ [
910
+ "▁tule",
911
+ -7.08548354281601
912
+ ],
913
+ [
914
+ "▁sõ",
915
+ -7.112099118601101
916
+ ],
917
+ [
918
+ "▁hea",
919
+ -7.124428100537737
920
+ ],
921
+ [
922
+ "▁taga",
923
+ -7.126201136801537
924
+ ],
925
+ [
926
+ "▁palju",
927
+ -7.140501730475849
928
+ ],
929
+ [
930
+ "pool",
931
+ -7.143088797199312
932
+ ],
933
+ [
934
+ "öö",
935
+ -7.146652842804281
936
+ ],
937
+ [
938
+ "▁muu",
939
+ -7.1473409422107474
940
+ ],
941
+ [
942
+ "siooni",
943
+ -7.16538232466433
944
+ ],
945
+ [
946
+ "▁tuleb",
947
+ -7.176750279820614
948
+ ],
949
+ [
950
+ "▁praegu",
951
+ -7.1832214613862435
952
+ ],
953
+ [
954
+ "▁su",
955
+ -7.1904916385105375
956
+ ],
957
+ [
958
+ "▁olema",
959
+ -7.190648510025193
960
+ ],
961
+ [
962
+ "▁kaits",
963
+ -7.2038243185739095
964
+ ],
965
+ [
966
+ "▁eest",
967
+ -7.20615783878872
968
+ ],
969
+ [
970
+ "▁neid",
971
+ -7.222820865407993
972
+ ],
973
+ [
974
+ "▁soovi",
975
+ -7.225624764685974
976
+ ],
977
+ [
978
+ "sest",
979
+ -7.229534493929964
980
+ ],
981
+ [
982
+ "är",
983
+ -7.230920200661927
984
+ ],
985
+ [
986
+ "▁mitte",
987
+ -7.232771879165971
988
+ ],
989
+ [
990
+ "▁palun",
991
+ -7.2563375806500625
992
+ ],
993
+ [
994
+ "pärast",
995
+ -7.257018987890197
996
+ ],
997
+ [
998
+ "test",
999
+ -7.2590805376033485
1000
+ ],
1001
+ [
1002
+ "▁tänu",
1003
+ -7.260628020577477
1004
+ ],
1005
+ [
1006
+ "A",
1007
+ -7.265024834498487
1008
+ ],
1009
+ [
1010
+ "▁vii",
1011
+ -7.276656141623064
1012
+ ],
1013
+ [
1014
+ "▁nad",
1015
+ -7.277285758604584
1016
+ ],
1017
+ [
1018
+ "mo",
1019
+ -7.280006686594515
1020
+ ],
1021
+ [
1022
+ "c",
1023
+ -7.288622351775304
1024
+ ],
1025
+ [
1026
+ "▁kasuta",
1027
+ -7.306931739243767
1028
+ ],
1029
+ [
1030
+ "▁saab",
1031
+ -7.307567868517468
1032
+ ],
1033
+ [
1034
+ "▁oli",
1035
+ -7.308193303449649
1036
+ ],
1037
+ [
1038
+ "konna",
1039
+ -7.3112564383056355
1040
+ ],
1041
+ [
1042
+ "▁välja",
1043
+ -7.31533799901075
1044
+ ],
1045
+ [
1046
+ "▁oleks",
1047
+ -7.320820851170182
1048
+ ],
1049
+ [
1050
+ "▁piir",
1051
+ -7.330656839999385
1052
+ ],
1053
+ [
1054
+ "sin",
1055
+ -7.345395297260704
1056
+ ],
1057
+ [
1058
+ "L",
1059
+ -7.353228209786732
1060
+ ],
1061
+ [
1062
+ "▁kuida",
1063
+ -7.356606133286253
1064
+ ],
1065
+ [
1066
+ "▁härra",
1067
+ -7.359443632454275
1068
+ ],
1069
+ [
1070
+ "▁selli",
1071
+ -7.359715942471686
1072
+ ],
1073
+ [
1074
+ "▁peaks",
1075
+ -7.363189123734943
1076
+ ],
1077
+ [
1078
+ "▁üks",
1079
+ -7.365198697015411
1080
+ ],
1081
+ [
1082
+ "▁president",
1083
+ -7.37096632050233
1084
+ ],
1085
+ [
1086
+ "tuse",
1087
+ -7.371661403476696
1088
+ ],
1089
+ [
1090
+ "▁komisjon",
1091
+ -7.377244368853256
1092
+ ],
1093
+ [
1094
+ "lise",
1095
+ -7.397653388934487
1096
+ ],
1097
+ [
1098
+ "▁taha",
1099
+ -7.398481428668544
1100
+ ],
1101
+ [
1102
+ "5",
1103
+ -7.403681633031638
1104
+ ],
1105
+ [
1106
+ "▁tee",
1107
+ -7.4165127500234975
1108
+ ],
1109
+ [
1110
+ "po",
1111
+ -7.430886046958901
1112
+ ],
1113
+ [
1114
+ "koha",
1115
+ -7.434542691659096
1116
+ ],
1117
+ [
1118
+ "▁kolleeg",
1119
+ -7.444400046789832
1120
+ ],
1121
+ [
1122
+ "▁kuu",
1123
+ -7.448341189161981
1124
+ ],
1125
+ [
1126
+ "B",
1127
+ -7.456799852463824
1128
+ ],
1129
+ [
1130
+ "T",
1131
+ -7.459726683589317
1132
+ ],
1133
+ [
1134
+ "port",
1135
+ -7.459804963676152
1136
+ ],
1137
+ [
1138
+ "▁nagu",
1139
+ -7.463630500885808
1140
+ ],
1141
+ [
1142
+ "▁veel",
1143
+ -7.467243596592985
1144
+ ],
1145
+ [
1146
+ "▁mille",
1147
+ -7.4716779770900255
1148
+ ],
1149
+ [
1150
+ "jõu",
1151
+ -7.476650550787945
1152
+ ],
1153
+ [
1154
+ "▁räägi",
1155
+ -7.487465050234481
1156
+ ],
1157
+ [
1158
+ "▁teie",
1159
+ -7.498282299910474
1160
+ ],
1161
+ [
1162
+ "?",
1163
+ -7.501628071497216
1164
+ ],
1165
+ [
1166
+ "kond",
1167
+ -7.5072838761483425
1168
+ ],
1169
+ [
1170
+ "▁nõu",
1171
+ -7.508347581153869
1172
+ ],
1173
+ [
1174
+ "▁hä",
1175
+ -7.509940178394881
1176
+ ],
1177
+ [
1178
+ "▁peab",
1179
+ -7.514157530124637
1180
+ ],
1181
+ [
1182
+ "ö",
1183
+ -7.529977334542933
1184
+ ],
1185
+ [
1186
+ "▁kus",
1187
+ -7.530942437413842
1188
+ ],
1189
+ [
1190
+ "olla",
1191
+ -7.535484573683446
1192
+ ],
1193
+ [
1194
+ "▁läh",
1195
+ -7.5368917384336
1196
+ ],
1197
+ [
1198
+ "▁volinik",
1199
+ -7.538996196345765
1200
+ ],
1201
+ [
1202
+ "▁juba",
1203
+ -7.543344092975019
1204
+ ],
1205
+ [
1206
+ "▁ainul",
1207
+ -7.54538232221728
1208
+ ],
1209
+ [
1210
+ "äär",
1211
+ -7.550503153695386
1212
+ ],
1213
+ [
1214
+ "M",
1215
+ -7.556063273077461
1216
+ ],
1217
+ [
1218
+ "▁ühe",
1219
+ -7.558288010617597
1220
+ ],
1221
+ [
1222
+ "▁järgmise",
1223
+ -7.564996726088508
1224
+ ],
1225
+ [
1226
+ "▁Liit",
1227
+ -7.583344400869443
1228
+ ],
1229
+ [
1230
+ "P",
1231
+ -7.583523662988431
1232
+ ],
1233
+ [
1234
+ "▁ütle",
1235
+ -7.594689158795801
1236
+ ],
1237
+ [
1238
+ "▁toeta",
1239
+ -7.598975914976487
1240
+ ],
1241
+ [
1242
+ "▁võimalik",
1243
+ -7.606378339922619
1244
+ ],
1245
+ [
1246
+ "▁20",
1247
+ -7.608263328954198
1248
+ ],
1249
+ [
1250
+ "▁samut",
1251
+ -7.612629051004373
1252
+ ],
1253
+ [
1254
+ "3",
1255
+ -7.618201944282982
1256
+ ],
1257
+ [
1258
+ "▁teha",
1259
+ -7.625687640736374
1260
+ ],
1261
+ [
1262
+ "!",
1263
+ -7.64253686197588
1264
+ ],
1265
+ [
1266
+ "ndus",
1267
+ -7.652811125253395
1268
+ ],
1269
+ [
1270
+ "▁tagasi",
1271
+ -7.655001451281471
1272
+ ],
1273
+ [
1274
+ "▁majandus",
1275
+ -7.6555310358587585
1276
+ ],
1277
+ [
1278
+ "▁pan",
1279
+ -7.662817092027097
1280
+ ],
1281
+ [
1282
+ "▁lugupee",
1283
+ -7.663877349598068
1284
+ ],
1285
+ [
1286
+ "endi",
1287
+ -7.6682838102827695
1288
+ ],
1289
+ [
1290
+ "järg",
1291
+ -7.6702731909997715
1292
+ ],
1293
+ [
1294
+ "R",
1295
+ -7.677146517406778
1296
+ ],
1297
+ [
1298
+ "▁proua",
1299
+ -7.683755633695213
1300
+ ],
1301
+ [
1302
+ "▁rohkem",
1303
+ -7.6918258109590125
1304
+ ],
1305
+ [
1306
+ "▁rääki",
1307
+ -7.69552943542292
1308
+ ],
1309
+ [
1310
+ "poliitika",
1311
+ -7.699246862153686
1312
+ ],
1313
+ [
1314
+ "▁kee",
1315
+ -7.71035081838196
1316
+ ],
1317
+ [
1318
+ "▁vasta",
1319
+ -7.720792627568789
1320
+ ],
1321
+ [
1322
+ "▁minu",
1323
+ -7.721313058233462
1324
+ ],
1325
+ [
1326
+ "▁nimel",
1327
+ -7.725999039539368
1328
+ ],
1329
+ [
1330
+ "C",
1331
+ -7.728216138002894
1332
+ ],
1333
+ [
1334
+ "▁jaok",
1335
+ -7.742613687583955
1336
+ ],
1337
+ [
1338
+ "E",
1339
+ -7.747571578223431
1340
+ ],
1341
+ [
1342
+ "riikide",
1343
+ -7.748921592961078
1344
+ ],
1345
+ [
1346
+ "▁ära",
1347
+ -7.751550272045874
1348
+ ],
1349
+ [
1350
+ "▁liikmesriikide",
1351
+ -7.763337381722763
1352
+ ],
1353
+ [
1354
+ "otsus",
1355
+ -7.768813582554823
1356
+ ],
1357
+ [
1358
+ "abi",
1359
+ -7.77460317815571
1360
+ ],
1361
+ [
1362
+ "päev",
1363
+ -7.777166016853058
1364
+ ],
1365
+ [
1366
+ "▁läbi",
1367
+ -7.786093634869793
1368
+ ],
1369
+ [
1370
+ "I",
1371
+ -7.7888008157487505
1372
+ ],
1373
+ [
1374
+ "▁ühis",
1375
+ -7.793128995786816
1376
+ ],
1377
+ [
1378
+ "H",
1379
+ -7.7983498420922945
1380
+ ],
1381
+ [
1382
+ "kord",
1383
+ -7.802438609915359
1384
+ ],
1385
+ [
1386
+ "▁raha",
1387
+ -7.817724970376148
1388
+ ],
1389
+ [
1390
+ "V",
1391
+ -7.817728438029739
1392
+ ],
1393
+ [
1394
+ "▁kodanike",
1395
+ -7.82474151720375
1396
+ ],
1397
+ [
1398
+ "▁võimalus",
1399
+ -7.8248436973938915
1400
+ ],
1401
+ [
1402
+ "pp",
1403
+ -7.828948341425596
1404
+ ],
1405
+ [
1406
+ "▁kaasa",
1407
+ -7.838860701915816
1408
+ ],
1409
+ [
1410
+ "▁üht",
1411
+ -7.84927175024195
1412
+ ],
1413
+ [
1414
+ "▁toimu",
1415
+ -7.851852118476616
1416
+ ],
1417
+ [
1418
+ "▁arutelu",
1419
+ -7.854744853360222
1420
+ ],
1421
+ [
1422
+ "▁edasi",
1423
+ -7.865099837176058
1424
+ ],
1425
+ [
1426
+ "▁tahaksin",
1427
+ -7.866454614107026
1428
+ ],
1429
+ [
1430
+ "y",
1431
+ -7.870825558682595
1432
+ ],
1433
+ [
1434
+ "kriis",
1435
+ -7.872797176517786
1436
+ ],
1437
+ [
1438
+ "tiiv",
1439
+ -7.873766919130496
1440
+ ],
1441
+ [
1442
+ "▁selge",
1443
+ -7.887491094188169
1444
+ ],
1445
+ [
1446
+ "4",
1447
+ -7.902265268031748
1448
+ ],
1449
+ [
1450
+ "tegevus",
1451
+ -7.905317747481934
1452
+ ],
1453
+ [
1454
+ "▁riigi",
1455
+ -7.910124078416441
1456
+ ],
1457
+ [
1458
+ "▁koostöö",
1459
+ -7.933171945802684
1460
+ ],
1461
+ [
1462
+ "▁näite",
1463
+ -7.93767512556262
1464
+ ],
1465
+ [
1466
+ "▁Ukraina",
1467
+ -7.944196869253502
1468
+ ],
1469
+ [
1470
+ "sugu",
1471
+ -7.949664602550985
1472
+ ],
1473
+ [
1474
+ "▁kõigi",
1475
+ -7.960185783033719
1476
+ ],
1477
+ [
1478
+ "▁kõr",
1479
+ -7.96187865951702
1480
+ ],
1481
+ [
1482
+ "z",
1483
+ -7.966651696233292
1484
+ ],
1485
+ [
1486
+ "▁Poola",
1487
+ -7.983005722600312
1488
+ ],
1489
+ [
1490
+ "▁parem",
1491
+ -7.983448126209992
1492
+ ],
1493
+ [
1494
+ "▁minuti",
1495
+ -7.984012246543839
1496
+ ],
1497
+ [
1498
+ "▁olnud",
1499
+ -7.996454589141428
1500
+ ],
1501
+ [
1502
+ "▁austa",
1503
+ -7.998598146175586
1504
+ ],
1505
+ [
1506
+ "O",
1507
+ -8.002989294442951
1508
+ ],
1509
+ [
1510
+ "▁olukord",
1511
+ -8.01143752014794
1512
+ ],
1513
+ [
1514
+ "▁noor",
1515
+ -8.013685299347088
1516
+ ],
1517
+ [
1518
+ "▁sotsiaal",
1519
+ -8.025100139064476
1520
+ ],
1521
+ [
1522
+ "▁tegutse",
1523
+ -8.025101742197556
1524
+ ],
1525
+ [
1526
+ "▁erineva",
1527
+ -8.026841668408755
1528
+ ],
1529
+ [
1530
+ "▁tegemist",
1531
+ -8.032021278940359
1532
+ ],
1533
+ [
1534
+ "'",
1535
+ -8.033734663387879
1536
+ ],
1537
+ [
1538
+ "aalse",
1539
+ -8.044720379778864
1540
+ ],
1541
+ [
1542
+ "süsteem",
1543
+ -8.058321060568908
1544
+ ],
1545
+ [
1546
+ "▁koos",
1547
+ -8.065907559114482
1548
+ ],
1549
+ [
1550
+ "vahend",
1551
+ -8.070013478907969
1552
+ ],
1553
+ [
1554
+ "D",
1555
+ -8.074448240312734
1556
+ ],
1557
+ [
1558
+ "▁kiire",
1559
+ -8.080155939335281
1560
+ ],
1561
+ [
1562
+ "kirja",
1563
+ -8.085483843387486
1564
+ ],
1565
+ [
1566
+ "▁kõne",
1567
+ -8.09645824059033
1568
+ ],
1569
+ [
1570
+ "7",
1571
+ -8.098215348761183
1572
+ ],
1573
+ [
1574
+ "▁maailma",
1575
+ -8.116892680262733
1576
+ ],
1577
+ [
1578
+ "▁mitme",
1579
+ -8.130478337180136
1580
+ ],
1581
+ [
1582
+ "▁tegema",
1583
+ -8.132322638313033
1584
+ ],
1585
+ [
1586
+ "juht",
1587
+ -8.133910854692843
1588
+ ],
1589
+ [
1590
+ "ettepaneku",
1591
+ -8.153060943171713
1592
+ ],
1593
+ [
1594
+ "▁enam",
1595
+ -8.153496259606413
1596
+ ],
1597
+ [
1598
+ "▁tervis",
1599
+ -8.16716280449062
1600
+ ],
1601
+ [
1602
+ "▁tuleviku",
1603
+ -8.17312604590213
1604
+ ],
1605
+ [
1606
+ "N",
1607
+ -8.179106371392958
1608
+ ],
1609
+ [
1610
+ "▁vaba",
1611
+ -8.18554695203786
1612
+ ],
1613
+ [
1614
+ "U",
1615
+ -8.195284674981874
1616
+ ],
1617
+ [
1618
+ "▁liikme",
1619
+ -8.195299977087725
1620
+ ],
1621
+ [
1622
+ "▁liikmesriigi",
1623
+ -8.205530678548651
1624
+ ],
1625
+ [
1626
+ "väärtus",
1627
+ -8.207636073053033
1628
+ ],
1629
+ [
1630
+ "▁poliitili",
1631
+ -8.211729029904978
1632
+ ],
1633
+ [
1634
+ "▁tegelikul",
1635
+ -8.211731546539438
1636
+ ],
1637
+ [
1638
+ "6",
1639
+ -8.213803717507972
1640
+ ],
1641
+ [
1642
+ "9",
1643
+ -8.215882719586974
1644
+ ],
1645
+ [
1646
+ "▁parlamendi",
1647
+ -8.239041347186697
1648
+ ],
1649
+ [
1650
+ "▁kell",
1651
+ -8.245126955724867
1652
+ ],
1653
+ [
1654
+ "▁saavuta",
1655
+ -8.254082999898273
1656
+ ],
1657
+ [
1658
+ "▁naiste",
1659
+ -8.256518482578183
1660
+ ],
1661
+ [
1662
+ "▁milli",
1663
+ -8.270065231048854
1664
+ ],
1665
+ [
1666
+ "G",
1667
+ -8.273714656313357
1668
+ ],
1669
+ [
1670
+ "▁meetme",
1671
+ -8.27824204605172
1672
+ ],
1673
+ [
1674
+ "▁pandeemia",
1675
+ -8.284801241894122
1676
+ ],
1677
+ [
1678
+ "▁kinni",
1679
+ -8.291590035212373
1680
+ ],
1681
+ [
1682
+ "▁vahel",
1683
+ -8.292317235281203
1684
+ ],
1685
+ [
1686
+ "▁hääletus",
1687
+ -8.30053227054644
1688
+ ],
1689
+ [
1690
+ "▁Parlamendi",
1691
+ -8.305072195631286
1692
+ ],
1693
+ [
1694
+ "▁Venemaa",
1695
+ -8.314215128964609
1696
+ ],
1697
+ [
1698
+ "▁valitsus",
1699
+ -8.330419728199754
1700
+ ],
1701
+ [
1702
+ "▁lihtsalt",
1703
+ -8.33275561329036
1704
+ ],
1705
+ [
1706
+ "▁eesmärk",
1707
+ -8.332755619557188
1708
+ ],
1709
+ [
1710
+ "8",
1711
+ -8.33509753247365
1712
+ ],
1713
+ [
1714
+ "▁Komisjon",
1715
+ -8.33744495033076
1716
+ ],
1717
+ [
1718
+ "w",
1719
+ -8.346890116496834
1720
+ ],
1721
+ [
1722
+ "▁kliima",
1723
+ -8.346896983977077
1724
+ ],
1725
+ [
1726
+ "▁juurde",
1727
+ -8.349265469567563
1728
+ ],
1729
+ [
1730
+ "▁huvi",
1731
+ -8.37832982406571
1732
+ ],
1733
+ [
1734
+ "▁lõpe",
1735
+ -8.385619670939489
1736
+ ],
1737
+ [
1738
+ "▁juhataja",
1739
+ -8.400496190340625
1740
+ ],
1741
+ [
1742
+ "▁esindaja",
1743
+ -8.400496691475183
1744
+ ],
1745
+ [
1746
+ "▁seetõt",
1747
+ -8.408060413910206
1748
+ ],
1749
+ [
1750
+ "▁Hiina",
1751
+ -8.425847561973622
1752
+ ],
1753
+ [
1754
+ "toetus",
1755
+ -8.426075170681987
1756
+ ],
1757
+ [
1758
+ "▁olukorra",
1759
+ -8.428416683333666
1760
+ ],
1761
+ [
1762
+ "▁eriti",
1763
+ -8.42848490042811
1764
+ ],
1765
+ [
1766
+ "vabadus",
1767
+ -8.438130592165686
1768
+ ],
1769
+ [
1770
+ "F",
1771
+ -8.438766057684811
1772
+ ],
1773
+ [
1774
+ "▁seadus",
1775
+ -8.462483686628985
1776
+ ],
1777
+ [
1778
+ "▁puhul",
1779
+ -8.4921834820906
1780
+ ],
1781
+ [
1782
+ "▁investeeri",
1783
+ -8.494930541483374
1784
+ ],
1785
+ [
1786
+ "rühm",
1787
+ -8.551911253253767
1788
+ ],
1789
+ [
1790
+ "▁kõigepeal",
1791
+ -8.554438115578218
1792
+ ],
1793
+ [
1794
+ "▁tähendab",
1795
+ -8.557362476378172
1796
+ ],
1797
+ [
1798
+ "▁rahvusvaheli",
1799
+ -8.58106688848573
1800
+ ],
1801
+ [
1802
+ "▁kodanikud",
1803
+ -8.590103116754666
1804
+ ],
1805
+ [
1806
+ "š",
1807
+ -8.68526191306795
1808
+ ],
1809
+ [
1810
+ "▁konverents",
1811
+ -8.739976360344667
1812
+ ],
1813
+ [
1814
+ "institutsioon",
1815
+ -8.743497502049788
1816
+ ],
1817
+ [
1818
+ "strateegia",
1819
+ -8.77575833331177
1820
+ ],
1821
+ [
1822
+ "▁Putin",
1823
+ -8.786748009380753
1824
+ ],
1825
+ [
1826
+ "ž",
1827
+ -8.794141164436255
1828
+ ],
1829
+ [
1830
+ "tingimus",
1831
+ -8.824279039082151
1832
+ ],
1833
+ [
1834
+ "▁kontrolli",
1835
+ -8.843580830988564
1836
+ ],
1837
+ [
1838
+ "▁solidaarsus",
1839
+ -8.851408649185894
1840
+ ],
1841
+ [
1842
+ "▁kliimamuutus",
1843
+ -8.851408951706524
1844
+ ],
1845
+ [
1846
+ "▁parlament",
1847
+ -8.867250544177251
1848
+ ],
1849
+ [
1850
+ "▁demokraatlik",
1851
+ -8.871250543935075
1852
+ ],
1853
+ [
1854
+ "platvorm",
1855
+ -8.879298866631109
1856
+ ],
1857
+ [
1858
+ "▁eesmärgi",
1859
+ -8.88334745119473
1860
+ ],
1861
+ [
1862
+ "▁täiskogu",
1863
+ -8.887412568882546
1864
+ ],
1865
+ [
1866
+ "▁konkreetse",
1867
+ -8.907989318825308
1868
+ ],
1869
+ [
1870
+ "▁julgeoleku",
1871
+ -8.93325378290737
1872
+ ],
1873
+ [
1874
+ "▁Valgevene",
1875
+ -8.994811576658421
1876
+ ],
1877
+ [
1878
+ "▁läbirääkimis",
1879
+ -9.00851039715884
1880
+ ],
1881
+ [
1882
+ "tehnoloogia",
1883
+ -9.027072381702595
1884
+ ],
1885
+ [
1886
+ "▁Parlament",
1887
+ -9.027072381704857
1888
+ ],
1889
+ [
1890
+ "projekt",
1891
+ -9.031767772892652
1892
+ ],
1893
+ [
1894
+ "▁tähelepanu",
1895
+ -9.060408738983025
1896
+ ],
1897
+ [
1898
+ "programm",
1899
+ -9.060409336465462
1900
+ ],
1901
+ [
1902
+ "x",
1903
+ -9.120276864001164
1904
+ ],
1905
+ [
1906
+ "▁humanitaar",
1907
+ -9.14105678588626
1908
+ ],
1909
+ [
1910
+ "▁ambitsioon",
1911
+ -9.141056799891471
1912
+ ],
1913
+ [
1914
+ "▁euroopla",
1915
+ -9.162277691489956
1916
+ ],
1917
+ [
1918
+ "informatsioon",
1919
+ -9.200533627940196
1920
+ ],
1921
+ [
1922
+ "W",
1923
+ -9.25197339844144
1924
+ ],
1925
+ [
1926
+ "▁põllumajandus",
1927
+ -9.269725292099054
1928
+ ],
1929
+ [
1930
+ "▁asepresident",
1931
+ -9.287798018498288
1932
+ ],
1933
+ [
1934
+ "uudatusettepanek",
1935
+ -9.307158863802869
1936
+ ],
1937
+ [
1938
+ "▁korruptsioon",
1939
+ -9.396880037827938
1940
+ ],
1941
+ [
1942
+ "konventsioon",
1943
+ -9.396880090903965
1944
+ ],
1945
+ [
1946
+ "▁ebaseaduslik",
1947
+ -9.424373069872576
1948
+ ],
1949
+ [
1950
+ "▁Prantsusmaa",
1951
+ -9.431366076861606
1952
+ ],
1953
+ [
1954
+ "organisatsioon",
1955
+ -9.438408330409231
1956
+ ],
1957
+ [
1958
+ "%",
1959
+ -9.82116264581201
1960
+ ],
1961
+ [
1962
+ "mmmmmmmmmmmmmmm",
1963
+ -9.9305601459234
1964
+ ],
1965
+ [
1966
+ "Z",
1967
+ -9.953952758782524
1968
+ ],
1969
+ [
1970
+ "q",
1971
+ -10.13508903572938
1972
+ ],
1973
+ [
1974
+ "é",
1975
+ -10.44911261190077
1976
+ ],
1977
+ [
1978
+ "á",
1979
+ -10.617927172765626
1980
+ ],
1981
+ [
1982
+ "č",
1983
+ -10.689382754431072
1984
+ ],
1985
+ [
1986
+ "Q",
1987
+ -11.587196800138464
1988
+ ],
1989
+ [
1990
+ "+",
1991
+ -12.138957539400424
1992
+ ],
1993
+ [
1994
+ "ó",
1995
+ -12.684592460046924
1996
+ ],
1997
+ [
1998
+ "è",
1999
+ -12.88459246008688
2000
+ ],
2001
+ [
2002
+ "/",
2003
+ -12.884592460086884
2004
+ ],
2005
+ [
2006
+ "ń",
2007
+ -12.884592460086884
2008
+ ],
2009
+ [
2010
+ "ñ",
2011
+ -13.134592460086882
2012
+ ],
2013
+ [
2014
+ "İ",
2015
+ -13.467925793420209
2016
+ ],
2017
+ [
2018
+ "ê",
2019
+ -13.46792579342021
2020
+ ],
2021
+ [
2022
+ "ć",
2023
+ -13.46792579342021
2024
+ ],
2025
+ [
2026
+ "ç",
2027
+ -13.467925793420216
2028
+ ],
2029
+ [
2030
+ "à",
2031
+ -13.467925793420216
2032
+ ],
2033
+ [
2034
+ "ú",
2035
+ -13.96792579338026
2036
+ ],
2037
+ [
2038
+ "ò",
2039
+ -13.967925793420212
2040
+ ],
2041
+ [
2042
+ "ô",
2043
+ -13.967925793420212
2044
+ ],
2045
+ [
2046
+ "ș",
2047
+ -13.967925793420218
2048
+ ],
2049
+ [
2050
+ "í",
2051
+ -13.967925793420218
2052
+ ],
2053
+ [
2054
+ "ō",
2055
+ -13.967925793420218
2056
+ ],
2057
+ [
2058
+ "å",
2059
+ -14.966825793420217
2060
+ ],
2061
+ [
2062
+ "J",
2063
+ -14.966925793420216
2064
+ ],
2065
+ [
2066
+ "X",
2067
+ -14.967025793420216
2068
+ ],
2069
+ [
2070
+ "Ö",
2071
+ -14.967125793420216
2072
+ ],
2073
+ [
2074
+ "Y",
2075
+ -14.967225793420216
2076
+ ],
2077
+ [
2078
+ "Ä",
2079
+ -14.967325793420216
2080
+ ],
2081
+ [
2082
+ "Š",
2083
+ -14.967425793420215
2084
+ ],
2085
+ [
2086
+ "ą",
2087
+ -14.967525793420217
2088
+ ],
2089
+ [
2090
+ "ğ",
2091
+ -14.967625793420217
2092
+ ],
2093
+ [
2094
+ "Ü",
2095
+ -14.967725793420216
2096
+ ],
2097
+ [
2098
+ "ř",
2099
+ -14.967825793420216
2100
+ ],
2101
+ [
2102
+ "ś",
2103
+ -14.967925793420193
2104
+ ],
2105
+ [
2106
+ "Ž",
2107
+ -14.967925793420216
2108
+ ],
2109
+ [
2110
+ "ã",
2111
+ -14.967925793420216
2112
+ ],
2113
+ [
2114
+ ":",
2115
+ -14.967925793420216
2116
+ ],
2117
+ [
2118
+ "Õ",
2119
+ -14.967925793420216
2120
+ ],
2121
+ [
2122
+ "²",
2123
+ -14.967925793420216
2124
+ ],
2125
+ [
2126
+ "ł",
2127
+ -14.967925793420216
2128
+ ],
2129
+ [
2130
+ "Á",
2131
+ -14.967925793420216
2132
+ ],
2133
+ [
2134
+ "ū",
2135
+ -14.967925793420216
2136
+ ],
2137
+ [
2138
+ "ż",
2139
+ -14.967925793420216
2140
+ ]
2141
+ ],
2142
+ "byte_fallback": false
2143
+ }
2144
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "</s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<pad>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "5": {
44
+ "content": "▁",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": true,
54
+ "eos_token": "</s>",
55
+ "mask_token": "<mask>",
56
+ "model_max_length": 1000000000000000019884624838656,
57
+ "pad_token": "<pad>",
58
+ "sep_token": "▁",
59
+ "tokenizer_class": "PreTrainedTokenizerFast",
60
+ "unk_token": "<unk>"
61
+ }