nmcuong commited on
Commit
25e5bf3
·
verified ·
1 Parent(s): 52eb2a0

Upload ByT5 Vietnamese Normalization model

Browse files
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ByT5-Vi-Normalization
2
+
3
+ ## Model Description
4
+
5
+ - **Model Type:** T5 (Text-to-Text Transfer Transformer)
6
+ - **Base Model:** google/byt5-small
7
+ - **Task:** Text Normalization
8
+ - **Language:** Vietnamese
9
+ - **License:** MIT
10
+
11
+ ByT5-Vi-Normalization is a fine-tuned version of Google's ByT5-small model, specifically designed for Vietnamese text normalization. The model is capable of handling a wide range of normalization tasks, including standardizing time expressions, locations, numbers, youth slang, contextual language, and more. It is particularly useful for pre-processing text in Vietnamese NLP pipelines and speech applications.
12
+
13
+ ---
14
+
15
+ ## Training Data
16
+
17
+ - **Number of sentence pairs:** 73,000
18
+ - **Data source:** Automatically generated by Gemini (Google). As a result, some sentence pairs may be inconsistent or contain noise. Users should be aware that the model's outputs may reflect these inconsistencies in certain edge cases.
19
+ - **Content:** Each pair consists of an unnormalized and a normalized Vietnamese sentence. The dataset covers various real-life scenarios, including informal language, numbers, dates, locations, abbreviations, slang, and contextual expressions.
20
+
21
+ ---
22
+
23
+ ## Applications
24
+
25
+ - Text-to-Speech (TTS) systems
26
+ - Speech Synthesis
27
+ - Preprocessing for NLP pipelines (chatbots, virtual assistants, etc.)
28
+ - Data normalization for downstream tasks such as ASR, NLU, and more
29
+
30
+ ---
31
+
32
+ ## Performance
33
+
34
+ - **Test hardware:** NVIDIA RTX 4090
35
+ - **Library:** Hugging Face Transformers
36
+ - **Average inference time:** ~1 second per request (sentence) on GPU
37
+
38
+ ---
39
+
40
+ ## Usage
41
+
42
+ You can use the model with Hugging Face Transformers as follows:
43
+
44
+ ```python
45
+ from transformers import T5ForConditionalGeneration, AutoTokenizer
46
+ import torch
47
+
48
+ # Load model and tokenizer
49
+ model_dir = "ByT5-Vi-Normalization"
50
+ model = T5ForConditionalGeneration.from_pretrained(model_dir)
51
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
52
+
53
+ # Move model to GPU if available
54
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
55
+ model = model.to(device).to(dtype=torch.bfloat16)
56
+ model.eval()
57
+
58
+ # Example usage
59
+ input_text = "Normalize: Theo thông tư số 01/2023/TT-BTC, từ ngày 1/1/2024, Việt Nam sẽ áp dụng thuế giá trị gia tăng (VAT) mới cho các mặt hàng tiêu dùng."
60
+ inputs = tokenizer(input_text, return_tensors="pt", padding=True).to(device)
61
+ with torch.no_grad():
62
+ outputs = model.generate(**inputs, max_length=768, num_beams=2)
63
+ decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
64
+ print("Result:", decoded)
65
+
66
+ # Result: Theo thông tư số không một, năm hai nghìn không trăm hai mươi ba, Thông tư của Bộ Tài chính, từ ngày một tháng một, năm hai nghìn không trăm hai mươi tư, Việt Nam sẽ áp dụng thuế giá trị gia tăng mới cho các mặt hàng tiêu dùng.
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Example Inputs & Outputs
72
+
73
+ <!-- Add your own examples below -->
74
+ - **Example 1:**
75
+ - Input: `Normalize: ngày 19/05/1890 là ngày sinh của Chủ tịch Hồ Chí Minh, người đã lãnh đạo Việt Nam trong cuộc kháng chiến chống Pháp và Mỹ.`
76
+ - Output: `ngày mười chín tháng năm năm một nghìn tám trăm chín mươi là ngày sinh của Chủ tịch Hồ Chí Minh, người đã lãnh đạo Việt Nam trong cuộc kháng chiến chống Pháp và Mỹ.`
77
+
78
+ - **Example 2:**
79
+ - Input: `D/c: thôn An Phú, xã An Thượng, huyện Hoài Đức, Hà Nội. Số nhà 123/45. Đây là nơi sản xuất chính của nông trại.`
80
+ - Output: `Địa chỉ: thôn An Phú, xã An Thượng, huyện Hoài Đức, Hà Nội. Số nhà một trăm hai mươi ba, trên bốn mươi lăm. Đây là nơi sản xuất chính của nông trại.`
81
+
82
+ - **Example 3:**
83
+ - Input: `Thế kỷ XXI chứng kiến sự bùng nổ của CNTT và trí tuệ nhân tạo, làm thay đổi sâu sắc mọi mặt của đời sống xã hội. Những thách thức toàn cầu như biến đổi khí hậu cũng trở nên cấp bách hơn bao giờ hết trong giai đoạn này`
84
+ - Output: `Thế kỷ hai mươi mốt chứng kiến sự bùng nổ của công nghệ thông tin và trí tuệ nhân tạo, làm thay đổi sâu sắc mọi mặt của đời sống xã hội. Những thách thức toàn cầu như biến đổi khí hậu cũng trở nên cấp bách hơn bao giờ hết trong giai đoạn này.`
85
+
86
+ - **Example 4:**
87
+ - Input: `Hoa quả năm nay rẻ hơn năm ngoái, đặc biệt là táo và cam. Giá táo chỉ còn 20.000 đồng/kg, trong khi cam là 15.000 đồng/kg.`
88
+ - Output: `Hoa quả năm nay rẻ hơn năm ngoái, đặc biệt là táo và cam. Giá táo chỉ còn hai mươi nghìn đồng một ki lô gam, trong khi cam là mười lăm nghìn đồng một ki lô gam.`
89
+ ---
90
+
91
+ ## License
92
+
93
+ MIT License
94
+
95
+ Copyright (c) 2025 [Nguyễn Mạnh Cường]
96
+
97
+ Permission is hereby granted, free of charge, to any person obtaining a copy
98
+ of this software and associated documentation files (the "Software"), to deal
99
+ in the Software without restriction, including without limitation the rights
100
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
101
+ copies of the Software, and to permit persons to whom the Software is
102
+ furnished to do so, subject to the following conditions:
103
+
104
+ The above copyright notice and this permission notice shall be included in all
105
+ copies or substantial portions of the Software.
106
+
107
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
108
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
109
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
110
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
111
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
112
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
113
+ SOFTWARE.
added_tokens.json ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<extra_id_0>": 259,
3
+ "<extra_id_100>": 359,
4
+ "<extra_id_101>": 360,
5
+ "<extra_id_102>": 361,
6
+ "<extra_id_103>": 362,
7
+ "<extra_id_104>": 363,
8
+ "<extra_id_105>": 364,
9
+ "<extra_id_106>": 365,
10
+ "<extra_id_107>": 366,
11
+ "<extra_id_108>": 367,
12
+ "<extra_id_109>": 368,
13
+ "<extra_id_10>": 269,
14
+ "<extra_id_110>": 369,
15
+ "<extra_id_111>": 370,
16
+ "<extra_id_112>": 371,
17
+ "<extra_id_113>": 372,
18
+ "<extra_id_114>": 373,
19
+ "<extra_id_115>": 374,
20
+ "<extra_id_116>": 375,
21
+ "<extra_id_117>": 376,
22
+ "<extra_id_118>": 377,
23
+ "<extra_id_119>": 378,
24
+ "<extra_id_11>": 270,
25
+ "<extra_id_120>": 379,
26
+ "<extra_id_121>": 380,
27
+ "<extra_id_122>": 381,
28
+ "<extra_id_123>": 382,
29
+ "<extra_id_124>": 383,
30
+ "<extra_id_12>": 271,
31
+ "<extra_id_13>": 272,
32
+ "<extra_id_14>": 273,
33
+ "<extra_id_15>": 274,
34
+ "<extra_id_16>": 275,
35
+ "<extra_id_17>": 276,
36
+ "<extra_id_18>": 277,
37
+ "<extra_id_19>": 278,
38
+ "<extra_id_1>": 260,
39
+ "<extra_id_20>": 279,
40
+ "<extra_id_21>": 280,
41
+ "<extra_id_22>": 281,
42
+ "<extra_id_23>": 282,
43
+ "<extra_id_24>": 283,
44
+ "<extra_id_25>": 284,
45
+ "<extra_id_26>": 285,
46
+ "<extra_id_27>": 286,
47
+ "<extra_id_28>": 287,
48
+ "<extra_id_29>": 288,
49
+ "<extra_id_2>": 261,
50
+ "<extra_id_30>": 289,
51
+ "<extra_id_31>": 290,
52
+ "<extra_id_32>": 291,
53
+ "<extra_id_33>": 292,
54
+ "<extra_id_34>": 293,
55
+ "<extra_id_35>": 294,
56
+ "<extra_id_36>": 295,
57
+ "<extra_id_37>": 296,
58
+ "<extra_id_38>": 297,
59
+ "<extra_id_39>": 298,
60
+ "<extra_id_3>": 262,
61
+ "<extra_id_40>": 299,
62
+ "<extra_id_41>": 300,
63
+ "<extra_id_42>": 301,
64
+ "<extra_id_43>": 302,
65
+ "<extra_id_44>": 303,
66
+ "<extra_id_45>": 304,
67
+ "<extra_id_46>": 305,
68
+ "<extra_id_47>": 306,
69
+ "<extra_id_48>": 307,
70
+ "<extra_id_49>": 308,
71
+ "<extra_id_4>": 263,
72
+ "<extra_id_50>": 309,
73
+ "<extra_id_51>": 310,
74
+ "<extra_id_52>": 311,
75
+ "<extra_id_53>": 312,
76
+ "<extra_id_54>": 313,
77
+ "<extra_id_55>": 314,
78
+ "<extra_id_56>": 315,
79
+ "<extra_id_57>": 316,
80
+ "<extra_id_58>": 317,
81
+ "<extra_id_59>": 318,
82
+ "<extra_id_5>": 264,
83
+ "<extra_id_60>": 319,
84
+ "<extra_id_61>": 320,
85
+ "<extra_id_62>": 321,
86
+ "<extra_id_63>": 322,
87
+ "<extra_id_64>": 323,
88
+ "<extra_id_65>": 324,
89
+ "<extra_id_66>": 325,
90
+ "<extra_id_67>": 326,
91
+ "<extra_id_68>": 327,
92
+ "<extra_id_69>": 328,
93
+ "<extra_id_6>": 265,
94
+ "<extra_id_70>": 329,
95
+ "<extra_id_71>": 330,
96
+ "<extra_id_72>": 331,
97
+ "<extra_id_73>": 332,
98
+ "<extra_id_74>": 333,
99
+ "<extra_id_75>": 334,
100
+ "<extra_id_76>": 335,
101
+ "<extra_id_77>": 336,
102
+ "<extra_id_78>": 337,
103
+ "<extra_id_79>": 338,
104
+ "<extra_id_7>": 266,
105
+ "<extra_id_80>": 339,
106
+ "<extra_id_81>": 340,
107
+ "<extra_id_82>": 341,
108
+ "<extra_id_83>": 342,
109
+ "<extra_id_84>": 343,
110
+ "<extra_id_85>": 344,
111
+ "<extra_id_86>": 345,
112
+ "<extra_id_87>": 346,
113
+ "<extra_id_88>": 347,
114
+ "<extra_id_89>": 348,
115
+ "<extra_id_8>": 267,
116
+ "<extra_id_90>": 349,
117
+ "<extra_id_91>": 350,
118
+ "<extra_id_92>": 351,
119
+ "<extra_id_93>": 352,
120
+ "<extra_id_94>": 353,
121
+ "<extra_id_95>": 354,
122
+ "<extra_id_96>": 355,
123
+ "<extra_id_97>": 356,
124
+ "<extra_id_98>": 357,
125
+ "<extra_id_99>": 358,
126
+ "<extra_id_9>": 268
127
+ }
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "T5ForConditionalGeneration"
4
+ ],
5
+ "classifier_dropout": 0.0,
6
+ "d_ff": 3584,
7
+ "d_kv": 64,
8
+ "d_model": 1472,
9
+ "decoder_start_token_id": 0,
10
+ "dense_act_fn": "gelu_new",
11
+ "dropout_rate": 0.1,
12
+ "eos_token_id": 1,
13
+ "feed_forward_proj": "gated-gelu",
14
+ "gradient_checkpointing": false,
15
+ "initializer_factor": 1.0,
16
+ "is_encoder_decoder": true,
17
+ "is_gated_act": true,
18
+ "layer_norm_epsilon": 1e-06,
19
+ "model_type": "t5",
20
+ "num_decoder_layers": 4,
21
+ "num_heads": 6,
22
+ "num_layers": 12,
23
+ "pad_token_id": 0,
24
+ "relative_attention_max_distance": 128,
25
+ "relative_attention_num_buckets": 32,
26
+ "tie_word_embeddings": false,
27
+ "tokenizer_class": "ByT5Tokenizer",
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.53.0",
30
+ "use_cache": true,
31
+ "vocab_size": 384
32
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "decoder_start_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.53.0"
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a80d21de722518aa7d7c09239581cfb17793bcbe909199025e14d01868c796d
3
+ size 1198571496
special_tokens_map.json ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>",
103
+ "<extra_id_100>",
104
+ "<extra_id_101>",
105
+ "<extra_id_102>",
106
+ "<extra_id_103>",
107
+ "<extra_id_104>",
108
+ "<extra_id_105>",
109
+ "<extra_id_106>",
110
+ "<extra_id_107>",
111
+ "<extra_id_108>",
112
+ "<extra_id_109>",
113
+ "<extra_id_110>",
114
+ "<extra_id_111>",
115
+ "<extra_id_112>",
116
+ "<extra_id_113>",
117
+ "<extra_id_114>",
118
+ "<extra_id_115>",
119
+ "<extra_id_116>",
120
+ "<extra_id_117>",
121
+ "<extra_id_118>",
122
+ "<extra_id_119>",
123
+ "<extra_id_120>",
124
+ "<extra_id_121>",
125
+ "<extra_id_122>",
126
+ "<extra_id_123>",
127
+ "<extra_id_124>"
128
+ ],
129
+ "eos_token": {
130
+ "content": "</s>",
131
+ "lstrip": false,
132
+ "normalized": true,
133
+ "rstrip": false,
134
+ "single_word": false
135
+ },
136
+ "pad_token": {
137
+ "content": "<pad>",
138
+ "lstrip": false,
139
+ "normalized": true,
140
+ "rstrip": false,
141
+ "single_word": false
142
+ },
143
+ "unk_token": {
144
+ "content": "<unk>",
145
+ "lstrip": false,
146
+ "normalized": true,
147
+ "rstrip": false,
148
+ "single_word": false
149
+ }
150
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,1163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "</s>",
13
+ "lstrip": false,
14
+ "normalized": true,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "259": {
28
+ "content": "<extra_id_0>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "260": {
36
+ "content": "<extra_id_1>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "261": {
44
+ "content": "<extra_id_2>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "262": {
52
+ "content": "<extra_id_3>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "263": {
60
+ "content": "<extra_id_4>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "264": {
68
+ "content": "<extra_id_5>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "265": {
76
+ "content": "<extra_id_6>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "266": {
84
+ "content": "<extra_id_7>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "267": {
92
+ "content": "<extra_id_8>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "268": {
100
+ "content": "<extra_id_9>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "269": {
108
+ "content": "<extra_id_10>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "270": {
116
+ "content": "<extra_id_11>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "271": {
124
+ "content": "<extra_id_12>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "272": {
132
+ "content": "<extra_id_13>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "273": {
140
+ "content": "<extra_id_14>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "274": {
148
+ "content": "<extra_id_15>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "275": {
156
+ "content": "<extra_id_16>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "276": {
164
+ "content": "<extra_id_17>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "277": {
172
+ "content": "<extra_id_18>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "278": {
180
+ "content": "<extra_id_19>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "279": {
188
+ "content": "<extra_id_20>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "280": {
196
+ "content": "<extra_id_21>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "281": {
204
+ "content": "<extra_id_22>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "282": {
212
+ "content": "<extra_id_23>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "283": {
220
+ "content": "<extra_id_24>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "284": {
228
+ "content": "<extra_id_25>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "285": {
236
+ "content": "<extra_id_26>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "286": {
244
+ "content": "<extra_id_27>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "287": {
252
+ "content": "<extra_id_28>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "288": {
260
+ "content": "<extra_id_29>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "289": {
268
+ "content": "<extra_id_30>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "290": {
276
+ "content": "<extra_id_31>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "291": {
284
+ "content": "<extra_id_32>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "292": {
292
+ "content": "<extra_id_33>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "293": {
300
+ "content": "<extra_id_34>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "294": {
308
+ "content": "<extra_id_35>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "295": {
316
+ "content": "<extra_id_36>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "296": {
324
+ "content": "<extra_id_37>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "297": {
332
+ "content": "<extra_id_38>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "298": {
340
+ "content": "<extra_id_39>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "299": {
348
+ "content": "<extra_id_40>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "300": {
356
+ "content": "<extra_id_41>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "301": {
364
+ "content": "<extra_id_42>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "302": {
372
+ "content": "<extra_id_43>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "303": {
380
+ "content": "<extra_id_44>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "304": {
388
+ "content": "<extra_id_45>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "305": {
396
+ "content": "<extra_id_46>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "306": {
404
+ "content": "<extra_id_47>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "307": {
412
+ "content": "<extra_id_48>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "308": {
420
+ "content": "<extra_id_49>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "309": {
428
+ "content": "<extra_id_50>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "310": {
436
+ "content": "<extra_id_51>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "311": {
444
+ "content": "<extra_id_52>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "312": {
452
+ "content": "<extra_id_53>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "313": {
460
+ "content": "<extra_id_54>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "314": {
468
+ "content": "<extra_id_55>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "315": {
476
+ "content": "<extra_id_56>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "316": {
484
+ "content": "<extra_id_57>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "317": {
492
+ "content": "<extra_id_58>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "318": {
500
+ "content": "<extra_id_59>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "319": {
508
+ "content": "<extra_id_60>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "320": {
516
+ "content": "<extra_id_61>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "321": {
524
+ "content": "<extra_id_62>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "322": {
532
+ "content": "<extra_id_63>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "323": {
540
+ "content": "<extra_id_64>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "324": {
548
+ "content": "<extra_id_65>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "325": {
556
+ "content": "<extra_id_66>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "326": {
564
+ "content": "<extra_id_67>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "327": {
572
+ "content": "<extra_id_68>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "328": {
580
+ "content": "<extra_id_69>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "329": {
588
+ "content": "<extra_id_70>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "330": {
596
+ "content": "<extra_id_71>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "331": {
604
+ "content": "<extra_id_72>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "332": {
612
+ "content": "<extra_id_73>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "333": {
620
+ "content": "<extra_id_74>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "334": {
628
+ "content": "<extra_id_75>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "335": {
636
+ "content": "<extra_id_76>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "336": {
644
+ "content": "<extra_id_77>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "337": {
652
+ "content": "<extra_id_78>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "338": {
660
+ "content": "<extra_id_79>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "339": {
668
+ "content": "<extra_id_80>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "340": {
676
+ "content": "<extra_id_81>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "341": {
684
+ "content": "<extra_id_82>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "342": {
692
+ "content": "<extra_id_83>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "343": {
700
+ "content": "<extra_id_84>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "344": {
708
+ "content": "<extra_id_85>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "345": {
716
+ "content": "<extra_id_86>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "346": {
724
+ "content": "<extra_id_87>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "347": {
732
+ "content": "<extra_id_88>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "348": {
740
+ "content": "<extra_id_89>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "349": {
748
+ "content": "<extra_id_90>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "350": {
756
+ "content": "<extra_id_91>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "351": {
764
+ "content": "<extra_id_92>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "352": {
772
+ "content": "<extra_id_93>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "353": {
780
+ "content": "<extra_id_94>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "354": {
788
+ "content": "<extra_id_95>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "355": {
796
+ "content": "<extra_id_96>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "356": {
804
+ "content": "<extra_id_97>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "357": {
812
+ "content": "<extra_id_98>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "358": {
820
+ "content": "<extra_id_99>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "359": {
828
+ "content": "<extra_id_100>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "360": {
836
+ "content": "<extra_id_101>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "361": {
844
+ "content": "<extra_id_102>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "362": {
852
+ "content": "<extra_id_103>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "363": {
860
+ "content": "<extra_id_104>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "364": {
868
+ "content": "<extra_id_105>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "365": {
876
+ "content": "<extra_id_106>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "366": {
884
+ "content": "<extra_id_107>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "367": {
892
+ "content": "<extra_id_108>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "368": {
900
+ "content": "<extra_id_109>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "369": {
908
+ "content": "<extra_id_110>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "370": {
916
+ "content": "<extra_id_111>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "371": {
924
+ "content": "<extra_id_112>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "372": {
932
+ "content": "<extra_id_113>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "373": {
940
+ "content": "<extra_id_114>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "374": {
948
+ "content": "<extra_id_115>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "375": {
956
+ "content": "<extra_id_116>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "376": {
964
+ "content": "<extra_id_117>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "377": {
972
+ "content": "<extra_id_118>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "378": {
980
+ "content": "<extra_id_119>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "379": {
988
+ "content": "<extra_id_120>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "380": {
996
+ "content": "<extra_id_121>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "381": {
1004
+ "content": "<extra_id_122>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "382": {
1012
+ "content": "<extra_id_123>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "383": {
1020
+ "content": "<extra_id_124>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ }
1027
+ },
1028
+ "additional_special_tokens": [
1029
+ "<extra_id_0>",
1030
+ "<extra_id_1>",
1031
+ "<extra_id_2>",
1032
+ "<extra_id_3>",
1033
+ "<extra_id_4>",
1034
+ "<extra_id_5>",
1035
+ "<extra_id_6>",
1036
+ "<extra_id_7>",
1037
+ "<extra_id_8>",
1038
+ "<extra_id_9>",
1039
+ "<extra_id_10>",
1040
+ "<extra_id_11>",
1041
+ "<extra_id_12>",
1042
+ "<extra_id_13>",
1043
+ "<extra_id_14>",
1044
+ "<extra_id_15>",
1045
+ "<extra_id_16>",
1046
+ "<extra_id_17>",
1047
+ "<extra_id_18>",
1048
+ "<extra_id_19>",
1049
+ "<extra_id_20>",
1050
+ "<extra_id_21>",
1051
+ "<extra_id_22>",
1052
+ "<extra_id_23>",
1053
+ "<extra_id_24>",
1054
+ "<extra_id_25>",
1055
+ "<extra_id_26>",
1056
+ "<extra_id_27>",
1057
+ "<extra_id_28>",
1058
+ "<extra_id_29>",
1059
+ "<extra_id_30>",
1060
+ "<extra_id_31>",
1061
+ "<extra_id_32>",
1062
+ "<extra_id_33>",
1063
+ "<extra_id_34>",
1064
+ "<extra_id_35>",
1065
+ "<extra_id_36>",
1066
+ "<extra_id_37>",
1067
+ "<extra_id_38>",
1068
+ "<extra_id_39>",
1069
+ "<extra_id_40>",
1070
+ "<extra_id_41>",
1071
+ "<extra_id_42>",
1072
+ "<extra_id_43>",
1073
+ "<extra_id_44>",
1074
+ "<extra_id_45>",
1075
+ "<extra_id_46>",
1076
+ "<extra_id_47>",
1077
+ "<extra_id_48>",
1078
+ "<extra_id_49>",
1079
+ "<extra_id_50>",
1080
+ "<extra_id_51>",
1081
+ "<extra_id_52>",
1082
+ "<extra_id_53>",
1083
+ "<extra_id_54>",
1084
+ "<extra_id_55>",
1085
+ "<extra_id_56>",
1086
+ "<extra_id_57>",
1087
+ "<extra_id_58>",
1088
+ "<extra_id_59>",
1089
+ "<extra_id_60>",
1090
+ "<extra_id_61>",
1091
+ "<extra_id_62>",
1092
+ "<extra_id_63>",
1093
+ "<extra_id_64>",
1094
+ "<extra_id_65>",
1095
+ "<extra_id_66>",
1096
+ "<extra_id_67>",
1097
+ "<extra_id_68>",
1098
+ "<extra_id_69>",
1099
+ "<extra_id_70>",
1100
+ "<extra_id_71>",
1101
+ "<extra_id_72>",
1102
+ "<extra_id_73>",
1103
+ "<extra_id_74>",
1104
+ "<extra_id_75>",
1105
+ "<extra_id_76>",
1106
+ "<extra_id_77>",
1107
+ "<extra_id_78>",
1108
+ "<extra_id_79>",
1109
+ "<extra_id_80>",
1110
+ "<extra_id_81>",
1111
+ "<extra_id_82>",
1112
+ "<extra_id_83>",
1113
+ "<extra_id_84>",
1114
+ "<extra_id_85>",
1115
+ "<extra_id_86>",
1116
+ "<extra_id_87>",
1117
+ "<extra_id_88>",
1118
+ "<extra_id_89>",
1119
+ "<extra_id_90>",
1120
+ "<extra_id_91>",
1121
+ "<extra_id_92>",
1122
+ "<extra_id_93>",
1123
+ "<extra_id_94>",
1124
+ "<extra_id_95>",
1125
+ "<extra_id_96>",
1126
+ "<extra_id_97>",
1127
+ "<extra_id_98>",
1128
+ "<extra_id_99>",
1129
+ "<extra_id_100>",
1130
+ "<extra_id_101>",
1131
+ "<extra_id_102>",
1132
+ "<extra_id_103>",
1133
+ "<extra_id_104>",
1134
+ "<extra_id_105>",
1135
+ "<extra_id_106>",
1136
+ "<extra_id_107>",
1137
+ "<extra_id_108>",
1138
+ "<extra_id_109>",
1139
+ "<extra_id_110>",
1140
+ "<extra_id_111>",
1141
+ "<extra_id_112>",
1142
+ "<extra_id_113>",
1143
+ "<extra_id_114>",
1144
+ "<extra_id_115>",
1145
+ "<extra_id_116>",
1146
+ "<extra_id_117>",
1147
+ "<extra_id_118>",
1148
+ "<extra_id_119>",
1149
+ "<extra_id_120>",
1150
+ "<extra_id_121>",
1151
+ "<extra_id_122>",
1152
+ "<extra_id_123>",
1153
+ "<extra_id_124>"
1154
+ ],
1155
+ "clean_up_tokenization_spaces": false,
1156
+ "eos_token": "</s>",
1157
+ "extra_ids": 0,
1158
+ "extra_special_tokens": {},
1159
+ "model_max_length": 1000000000000000019884624838656,
1160
+ "pad_token": "<pad>",
1161
+ "tokenizer_class": "ByT5Tokenizer",
1162
+ "unk_token": "<unk>"
1163
+ }