Text-to-Speech
Transformers
Safetensors
English
parler_tts
text-generation
annotation
Kayabuki4 sanchit-gandhi commited on
Commit
76e8359
·
verified ·
0 Parent(s):

Duplicate from parler-tts/parler-tts-mini-expresso

Browse files

Co-authored-by: Sanchit Gandhi <sanchit-gandhi@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-to-speech
5
+ - annotation
6
+ license: apache-2.0
7
+ language:
8
+ - en
9
+ pipeline_tag: text-to-speech
10
+ inference: false
11
+ datasets:
12
+ - ylacombe/expresso
13
+ - reach-vb/jenny_tts_dataset
14
+ - blabble-io/libritts_r
15
+ ---
16
+
17
+ <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
18
+
19
+
20
+ # Parler-TTS Mini: Expresso
21
+
22
+ <a target="_blank" href="https://huggingface.co/spaces/parler-tts/parler-tts-expresso">
23
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
24
+ </a>
25
+
26
+ **Parler-TTS Mini: Expresso** is a fine-tuned version of [Parler-TTS Mini v0.1](https://huggingface.co/parler-tts/parler_tts_mini_v0.1)
27
+ on the [Expresso](https://huggingface.co/datasets/ylacombe/expresso) dataset. It is a lightweight text-to-speech (TTS)
28
+ model that can generate high-quality, natural sounding speech. Compared to the original model, Parler-TTS Expresso provides
29
+ superior control over **emotions** (happy, confused, laughing, sad) and **consistent voices** (Jerry, Thomas, Elisabeth, Talia).
30
+
31
+ It is part of the first release from the [Parler-TTS](https://github.com/huggingface/parler-tts) project, which aims to
32
+ provide the community with TTS training resources and dataset pre-processing code. Details for reproducing this entire
33
+ training run are provided in the section [Training Procedure](#training-procedure).
34
+
35
+ ## Usage
36
+
37
+ Using Expresso is as simple as "bonjour". Simply install the library from source:
38
+
39
+ ```sh
40
+ pip install git+https://github.com/huggingface/parler-tts.git
41
+ ```
42
+
43
+ You can then use the model with the following inference snippet:
44
+
45
+ ```py
46
+ import torch
47
+ from parler_tts import ParlerTTSForConditionalGeneration
48
+ from transformers import AutoTokenizer, set_seed
49
+ import soundfile as sf
50
+
51
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
52
+
53
+ model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-expresso").to(device)
54
+ tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-expresso")
55
+
56
+ prompt = "Why do you make me do these examples? They're *so* generic."
57
+ description = "Thomas speaks moderately slowly in a sad tone with emphasis and high quality audio."
58
+
59
+ input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
60
+ prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
61
+
62
+ set_seed(42)
63
+ generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
64
+ audio_arr = generation.cpu().numpy().squeeze()
65
+ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
66
+ ```
67
+
68
+ **Tips**:
69
+ * Specify the name of a male speaker (Jerry, Thomas) or female speaker (Talia, Elisabeth) for consistent voices
70
+ * The model can generate in a range of emotions, including: "happy", "confused", "default" (meaning no particular emotion conveyed), "laughing", "sad", "whisper", "emphasis"
71
+ * Include the term "high quality audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
72
+ * Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
73
+ * To emphasise particular words, wrap them in asterisk (e.g. `*you*` in the example above) and include "emphasis" in the prompt
74
+
75
+ ## Training Procedure
76
+
77
+ Expresso is a high-quality, expressive speech dataset that includes samples from four speakers (two male, two female).
78
+ By fine-tuning Parler-TTS Mini v0.1 on this dataset, we can train the model to follow emotion and speaker prompts.
79
+
80
+ To reproduce this fine-tuning run, we need to perform two steps:
81
+ 1. Create text descriptions from the audio samples in the Expresso dataset
82
+ 2. Train the model on the (text, audio) pairs
83
+
84
+ Step 1 is performed using the [DataSpeech](https://github.com/huggingface/dataspeech) library, and step 2 using
85
+ [Parler-TTS](https://github.com/huggingface/parler-tts). Should you wish to use the pre-annotated dataset from our
86
+ experiments, you can jump straight to [step 2](#step-2--fine-tune-the-model). For both, you can follow step 0 for
87
+ getting set-up.
88
+
89
+ ### Step 0: Set-Up
90
+
91
+ We'll start by creating a fresh Python environment:
92
+
93
+ ```sh
94
+ python3 -m venv parler-env
95
+ source parler-env/bin/activate
96
+ ```
97
+
98
+ Next, install PyTorch according to the [official instructions](https://pytorch.org/get-started/locally/). We can then
99
+ install DataSpeech and Parler-TTS sequentially:
100
+
101
+ ```sh
102
+ git clone git@github.com:huggingface/dataspeech.git && cd dataspeech && pip install -r requirements.txt
103
+ cd ..
104
+ git clone https://github.com/huggingface/parler-tts.git && cd parler-tts && pip install -e ."[train]"
105
+ cd ..
106
+ ```
107
+
108
+ You can link your Hugging Face account so that you can push model repositories on the Hub. This will allow you to save
109
+ your trained models on the Hub so that you can share them with the community. Simply run the command:
110
+
111
+ ```sh
112
+ git config --global credential.helper store
113
+ huggingface-cli login
114
+ ```
115
+
116
+ And then enter an authentication token from https://huggingface.co/settings/tokens. Create a new token if you do not
117
+ have one already. You should make sure that this token has "write" privileges.
118
+
119
+ You also have the option to configure Accelerate by running the following command. Note that you should set the number
120
+ of GPUs you wish to use for training/inference, and also the data type (dtype) based on your device (e.g. bfloat16 on
121
+ A100 GPUs, float16 on V100 GPUs, etc.):
122
+
123
+ ```sh
124
+ accelerate config
125
+ ```
126
+
127
+ Optionally, you can also login to Weights and Biases for automatic logging:
128
+
129
+ ```sh
130
+ wandb login
131
+ ```
132
+
133
+ ### Step 1: Create Text Descriptions
134
+
135
+ Creating text descriptions for the dataset comprises three sub-stages from DataSpeech, which we'll cover below.
136
+
137
+ #### 1.A. Annotate the Expresso dataset
138
+
139
+ We'll use the [`main.py`](https://github.com/huggingface/dataspeech/blob/main/main.py) file from DataSpeech to label
140
+ the following continuous variables:
141
+ - Speaking rate
142
+ - Signal-to-noise ratio (SNR)
143
+ - Reverberation
144
+ - Speech monotony
145
+
146
+ This can be done with the following command:
147
+ ```sh
148
+ python ./dataspeech/main.py "ylacombe/expresso" \
149
+ --configuration "default" \
150
+ --text_column_name "text" \
151
+ --audio_column_name "audio" \
152
+ --cpu_num_workers 8 \
153
+ --rename_column \
154
+ --repo_id "expresso-tags"
155
+ ```
156
+
157
+ Note that the script will be faster if you have GPUs at your disposal. It will automatically scale up to every GPU available in your environment. To control which GPUs to run the script on consider indicating via `CUDA_VISIBLE_DEVICES` environment variable.
158
+
159
+ The resulting dataset will be pushed to the Hugging Face Hub under your Hugging Face handle. Mine was pushed to [reach-vb/expresso-tags](https://huggingface.co/datasets/reach-vb/expresso-tags).
160
+ We can see that the dataset is annotated with continuous features like "speaking_rate" and "snr".
161
+
162
+ #### 1.B. Map annotations to text bins
163
+
164
+ The next step involves mapping the continuous variables to discrete ones. This is achieved by binning the continuous variables
165
+ into buckets, and assigning each one a text label.
166
+
167
+ Since the ultimate goal here is to fine-tune the [Parler-TTS v0.1 checkpoint](https://huggingface.co/parler-tts/parler_tts_mini_v0.1)
168
+ on the Expresso dataset, we want to stay consistent with the text bins of the dataset on which the original model was trained.
169
+
170
+ To do this, we'll pass [`v01_bin_edges.json`](https://github.com/huggingface/dataspeech/blob/main/examples/tags_to_annotations/v01_bin_edges.json)
171
+ as an input argument to our script, which holds the bin edges from the original dataset:
172
+
173
+ ```sh
174
+ python ./dataspeech/scripts/metadata_to_text.py \
175
+ "reach-vb/expresso-tags" \
176
+ --repo_id "expresso-tags" \
177
+ --configuration "default" \
178
+ --cpu_num_workers "8" \
179
+ --path_to_bin_edges "./examples/tags_to_annotations/v01_bin_edges.json" \
180
+ --avoid_pitch_computation
181
+ ```
182
+
183
+ Since we leverage the bins from the original dataset, the above script only takes a few seconds. The resulting dataset
184
+ will be pushed to the Hugging Face Hub under your Hugging Face handle. Mine was pushed to [reach-vb/expresso-tags](https://huggingface.co/datasets/reach-vb/expresso-tags).
185
+
186
+ You can notice that text bins such as "slightly noisy", "quite monotone" have been added to the samples.
187
+
188
+ #### 1.C. Create natural language descriptions from those text bins
189
+
190
+ Now that we have text bins associated to the Expresso dataset, the next step is to create natural language descriptions.
191
+ This involves passing the text bins to a large-language model (LLM), and have it generate corresponding descriptions.
192
+
193
+ There is a template [prompt creation script](https://github.com/huggingface/dataspeech/blob/main/scripts/run_prompt_creation.py)
194
+ in Parler-TTS which can be used to generate descriptions from the features tagged in [step 1.A](#1a-annotate-the-expresso-dataset) (reverberation, noise, speaking rate, etc).
195
+
196
+ However, not all of these features are relevant for the Expresso dataset. For instance, Expresso was recorded in a
197
+ professional recording studio, so all the samples are high quality. Thus, we chose to create prompts with the following subset of features:
198
+ 1. Name: we mapped the speaker ids (ex1, ex2, ex3, ex4) to unique speaker names (Jerry, Elisabeth, Thomas, Talia). This encourages the model to learn specific speakers from the training data
199
+ 2. Emotion: we include the emotion provided in the Expresso dataset
200
+ 3. Speaking rate: we use the pre-computed text bins from the previous step
201
+
202
+ 4. In addition, we also hard-coded the quality of the audio to be "very high-quality", given the studio recording conditions.
203
+
204
+ As an example, if we passed:
205
+ 1. Speaker: Jerry
206
+ 2. Emotion: confused
207
+ 3. Speaking rate: moderate speed
208
+
209
+ We would expect to generate a sample along the lines of: "Jerry speaks with a confused tone and at a moderate speed with high quality audio."
210
+
211
+ The modified prompt creation script can be found in this repository. You can download this script with the following Python command:
212
+
213
+ ```python
214
+ from huggingface_hub import hf_hub_download
215
+
216
+ hf_hub_download(repo_id="parler-tts/parler-tts-mini-expresso", filename="run_prompt_creation.py", local_dir="./run_prompt_creation_expresso.py")
217
+ ```
218
+
219
+ You can then launch prompt creation using the [Mistral Instruct 7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
220
+ model with the following command:
221
+
222
+ ```sh
223
+ accelerate launch ./dataspeech/run_prompt_creation_expresso.py \
224
+ --dataset_name "reach-vb/expresso-tags" \
225
+ --dataset_config_name "default" \
226
+ --model_name_or_path "mistralai/Mistral-7B-Instruct-v0.2" \
227
+ --per_device_eval_batch_size 32 \
228
+ --attn_implementation "sdpa" \
229
+ --dataloader_num_workers 8 \
230
+ --output_dir "./tmp_expresso" \
231
+ --load_in_4bit \
232
+ --push_to_hub \
233
+ --hub_dataset_id "expresso-tagged-w-speech-mistral" \
234
+ --preprocessing_num_workers 16
235
+ ```
236
+
237
+ Note that the Mistral model is gated, so you should ensure you have accepted the terms-of-use from the [model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
238
+ You can find the annotated dataset under TODO [reach-vb/expresso-tagged-w-speech-mistral](https://huggingface.co/datasets/reach-vb/expresso-tagged-w-speech-mistral),
239
+ where you'll find sensible descriptions from the features that we passed.
240
+
241
+ This step generally demands more resources and times and should use one or many GPUs. Scaling to multiple GPUs using [distributed data parallelism (DDP)](https://pytorch.org/tutorials/beginner/ddp_series_theory.html)
242
+ is trivial: simply run `accelerate config` and select the multi-GPU option, specifying the IDs of the GPUs you wish to use. The
243
+ above script can then be run using DDP with no code changes.
244
+
245
+ If you are resource constrained and need to use a smaller model, [Gemma 2B](https://huggingface.co/google/gemma-2b-it)
246
+ is an excellent choice.
247
+
248
+ ### Step 2: Fine-Tune the Model
249
+
250
+ Fine-tuning is performed using the Parler-TTS training script [run_parler_tts_training.py](https://github.com/huggingface/parler-tts/blob/main/training/run_parler_tts_training.py).
251
+ It is the same script used to pre-train the model, and can be used for fine-tuning without any code-changes.
252
+
253
+ To preserve the model's ability to generate speech with generic voice descriptions, such as in the style of
254
+ [Parler-TTS Mini v0.1](https://huggingface.co/parler-tts/parler_tts_mini_v0.1), we fine-tuned the model
255
+ on a combination of three datasets, including the test split of LibriTTS-R:
256
+ 1. [Expresso](https://huggingface.co/datasets/ylacombe/expresso)
257
+ 2. [Jenny](https://huggingface.co/datasets/reach-vb/jenny_tts_dataset)
258
+ 3. [LibriTTS-R](https://huggingface.co/datasets/blabble-io/libritts_r)
259
+
260
+ This was achieved through the following command:
261
+
262
+ ```sh
263
+ accelerate launch ./training/run_parler_tts_training.py \
264
+ --model_name_or_path "parler-tts/parler_tts_mini_v0.1" \
265
+ --feature_extractor_name "parler-tts/dac_44khZ_8kbps" \
266
+ --description_tokenizer_name "parler-tts/parler_tts_mini_v0.1" \
267
+ --prompt_tokenizer_name "parler-tts/parler_tts_mini_v0.1" \
268
+ --report_to "wandb" \
269
+ --overwrite_output_dir true \
270
+ --train_dataset_name "ylacombe/expresso+reach-vb/jenny_tts_dataset+blabble-io/libritts_r+blabble-io/libritts_r" \
271
+ --train_metadata_dataset_name "reach-vb/expresso-tagged-w-speech-mistral-v3+ylacombe/jenny-tts-10k-tagged+parler-tts/libritts_r_tags_tagged_10k_generated+parler-tts/libritts_r_tags_tagged_10k_generated" \
272
+ --train_dataset_config_name "read+default+clean+other" \
273
+ --train_split_name "train+train[:20%]+test.clean+test.other" \
274
+ --eval_dataset_name "ylacombe/expresso+reach-vb/jenny_tts_dataset+blabble-io/libritts_r+blabble-io/libritts_r" \
275
+ --eval_metadata_dataset_name "reach-vb/expresso-tagged-w-speech-mistral-v3+ylacombe/jenny-tts-10k-tagged+parler-tts/libritts_r_tags_tagged_10k_generated+parler-tts/libritts_r_tags_tagged_10k_generated" \
276
+ --eval_dataset_config_name "read+default+clean+other" \
277
+ --eval_split_name "train+train[:20%]+test.clean+test.other" \
278
+ --max_eval_samples 8 \
279
+ --per_device_eval_batch_size 16 \
280
+ --target_audio_column_name "audio" \
281
+ --description_column_name "text_description" \
282
+ --prompt_column_name "text" \
283
+ --max_duration_in_seconds 30.0 \
284
+ --min_duration_in_seconds 2.0 \
285
+ --max_text_length 400 \
286
+ --preprocessing_num_workers 2 \
287
+ --do_train true \
288
+ --num_train_epochs 8 \
289
+ --gradient_accumulation_steps 8 \
290
+ --gradient_checkpointing true \
291
+ --per_device_train_batch_size 16 \
292
+ --learning_rate 0.00008 \
293
+ --adam_beta1 0.9 \
294
+ --adam_beta2 0.99 \
295
+ --weight_decay 0.01 \
296
+ --lr_scheduler_type "cosine" \
297
+ --warmup_steps 250 \
298
+ --logging_steps 2 \
299
+ --freeze_text_encoder true \
300
+ --audio_encoder_per_device_batch_size 4 \
301
+ --dtype "bfloat16" \
302
+ --seed 456 \
303
+ --output_dir "./parler-tts-mini-expresso" \
304
+ --temporary_save_to_disk "./audio_code_tmp" \
305
+ --save_to_disk "./tmp_dataset_audio" \
306
+ --dataloader_num_workers 4 \
307
+ --do_eval \
308
+ --predict_with_generate \
309
+ --include_inputs_for_metrics \
310
+ --group_by_length true
311
+ ```
312
+
313
+ On a single 80GB A100 GPU, training took approximately 1.5 hours and returned a final evaluation loss of 4.0. Again, the
314
+ script can be configured for multiple GPUs by running `accelerate config` from the command line; no further
315
+ code-changes are required.
316
+
317
+ Training performance is quite sensitive to learning rate and number of epochs: you should tune these according to your task
318
+ and the size of your dataset. In our experiments, we found the best performance to occur after 8 epochs of training
319
+ with a learning rate of 8e-5.
320
+
321
+ If you followed to the end of these steps: congratulations! You should now have a fine-tuned model you can use for your
322
+ downstream applications using the [inference code-example](#usage) above. You can try substituting your own dataset, or
323
+ run training using a single-speaker dataset, like the [Jenny example](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/Finetuning_Parler_TTS_on_a_single_speaker_dataset.ipynb).
324
+
325
+ ## Motivation
326
+
327
+ Parler-TTS is a reproduction of work from the paper [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://www.text-description-to-speech.com) by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.
328
+
329
+ Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. All datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models.
330
+ Parler-TTS was released alongside:
331
+ * [The Parler-TTS repository](https://github.com/huggingface/parler-tts) - you can train and fine-tuned your own version of the model.
332
+ * [The Data-Speech repository](https://github.com/huggingface/dataspeech) - a suite of utility scripts designed to annotate speech datasets.
333
+ * [The Parler-TTS organization](https://huggingface.co/parler-tts) - where you can find the annotated datasets as well as the future checkpoints.
334
+
335
+ ## Citation
336
+
337
+ If you found this repository useful, please consider citing this work and also the original Stability AI paper:
338
+
339
+ ```
340
+ @misc{lacombe-etal-2024-parler-tts,
341
+ author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
342
+ title = {Parler-TTS},
343
+ year = {2024},
344
+ publisher = {GitHub},
345
+ journal = {GitHub repository},
346
+ howpublished = {\url{https://github.com/huggingface/parler-tts}}
347
+ }
348
+ ```
349
+
350
+ ```
351
+ @misc{lyth2024natural,
352
+ title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
353
+ author={Dan Lyth and Simon King},
354
+ year={2024},
355
+ eprint={2402.01912},
356
+ archivePrefix={arXiv},
357
+ primaryClass={cs.SD}
358
+ }
359
+ ```
360
+
361
+ ## License
362
+
363
+ This model is permissively licensed under the Apache 2.0 license.
config.json ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sanchit-gandhi/parler-tts-mini-v0.1-expresso-combined",
3
+ "architectures": [
4
+ "ParlerTTSForConditionalGeneration"
5
+ ],
6
+ "audio_encoder": {
7
+ "_name_or_path": "ylacombe/dac_44khZ_8kbps",
8
+ "add_cross_attention": false,
9
+ "architectures": [
10
+ "DACModel"
11
+ ],
12
+ "bad_words_ids": null,
13
+ "begin_suppress_tokens": null,
14
+ "bos_token_id": null,
15
+ "chunk_size_feed_forward": 0,
16
+ "codebook_size": 1024,
17
+ "cross_attention_hidden_size": null,
18
+ "decoder_start_token_id": null,
19
+ "diversity_penalty": 0.0,
20
+ "do_sample": false,
21
+ "early_stopping": false,
22
+ "encoder_no_repeat_ngram_size": 0,
23
+ "eos_token_id": null,
24
+ "exponential_decay_length_penalty": null,
25
+ "finetuning_task": null,
26
+ "forced_bos_token_id": null,
27
+ "forced_eos_token_id": null,
28
+ "frame_rate": 86,
29
+ "id2label": {
30
+ "0": "LABEL_0",
31
+ "1": "LABEL_1"
32
+ },
33
+ "is_decoder": false,
34
+ "is_encoder_decoder": false,
35
+ "label2id": {
36
+ "LABEL_0": 0,
37
+ "LABEL_1": 1
38
+ },
39
+ "latent_dim": 1024,
40
+ "length_penalty": 1.0,
41
+ "max_length": 20,
42
+ "min_length": 0,
43
+ "model_bitrate": 8,
44
+ "model_type": "dac",
45
+ "no_repeat_ngram_size": 0,
46
+ "num_beam_groups": 1,
47
+ "num_beams": 1,
48
+ "num_codebooks": 9,
49
+ "num_return_sequences": 1,
50
+ "output_attentions": false,
51
+ "output_hidden_states": false,
52
+ "output_scores": false,
53
+ "pad_token_id": null,
54
+ "prefix": null,
55
+ "problem_type": null,
56
+ "pruned_heads": {},
57
+ "remove_invalid_values": false,
58
+ "repetition_penalty": 1.0,
59
+ "return_dict": true,
60
+ "return_dict_in_generate": false,
61
+ "sampling_rate": 44100,
62
+ "sep_token_id": null,
63
+ "suppress_tokens": null,
64
+ "task_specific_params": null,
65
+ "temperature": 1.0,
66
+ "tf_legacy_loss": false,
67
+ "tie_encoder_decoder": false,
68
+ "tie_word_embeddings": true,
69
+ "tokenizer_class": null,
70
+ "top_k": 50,
71
+ "top_p": 1.0,
72
+ "torch_dtype": "float32",
73
+ "torchscript": false,
74
+ "typical_p": 1.0,
75
+ "use_bfloat16": false
76
+ },
77
+ "decoder": {
78
+ "_name_or_path": "/fsx/yoach/tmp/artefacts/decoder_400M/",
79
+ "activation_dropout": 0.0,
80
+ "activation_function": "gelu",
81
+ "add_cross_attention": true,
82
+ "architectures": [
83
+ "ParlerTTSForCausalLM"
84
+ ],
85
+ "attention_dropout": 0.0,
86
+ "bad_words_ids": null,
87
+ "begin_suppress_tokens": null,
88
+ "bos_token_id": 1025,
89
+ "chunk_size_feed_forward": 0,
90
+ "cross_attention_hidden_size": null,
91
+ "decoder_start_token_id": null,
92
+ "diversity_penalty": 0.0,
93
+ "do_sample": false,
94
+ "dropout": 0.1,
95
+ "early_stopping": false,
96
+ "encoder_no_repeat_ngram_size": 0,
97
+ "eos_token_id": 1024,
98
+ "exponential_decay_length_penalty": null,
99
+ "ffn_dim": 4096,
100
+ "finetuning_task": null,
101
+ "forced_bos_token_id": null,
102
+ "forced_eos_token_id": null,
103
+ "hidden_size": 1024,
104
+ "id2label": {
105
+ "0": "LABEL_0",
106
+ "1": "LABEL_1"
107
+ },
108
+ "initializer_factor": 0.02,
109
+ "is_decoder": true,
110
+ "is_encoder_decoder": false,
111
+ "label2id": {
112
+ "LABEL_0": 0,
113
+ "LABEL_1": 1
114
+ },
115
+ "layerdrop": 0.0,
116
+ "length_penalty": 1.0,
117
+ "max_length": 20,
118
+ "max_position_embeddings": 4096,
119
+ "min_length": 0,
120
+ "model_type": "parler_tts_decoder",
121
+ "no_repeat_ngram_size": 0,
122
+ "num_attention_heads": 16,
123
+ "num_beam_groups": 1,
124
+ "num_beams": 1,
125
+ "num_codebooks": 9,
126
+ "num_hidden_layers": 24,
127
+ "num_return_sequences": 1,
128
+ "output_attentions": false,
129
+ "output_hidden_states": false,
130
+ "output_scores": false,
131
+ "pad_token_id": 1024,
132
+ "prefix": null,
133
+ "problem_type": null,
134
+ "pruned_heads": {},
135
+ "remove_invalid_values": false,
136
+ "repetition_penalty": 1.0,
137
+ "return_dict": true,
138
+ "return_dict_in_generate": false,
139
+ "scale_embedding": false,
140
+ "sep_token_id": null,
141
+ "suppress_tokens": null,
142
+ "task_specific_params": null,
143
+ "temperature": 1.0,
144
+ "tf_legacy_loss": false,
145
+ "tie_encoder_decoder": false,
146
+ "tie_word_embeddings": false,
147
+ "tokenizer_class": null,
148
+ "top_k": 50,
149
+ "top_p": 1.0,
150
+ "torch_dtype": "float32",
151
+ "torchscript": false,
152
+ "typical_p": 1.0,
153
+ "use_bfloat16": false,
154
+ "use_cache": true,
155
+ "vocab_size": 1088
156
+ },
157
+ "decoder_start_token_id": 1025,
158
+ "is_encoder_decoder": true,
159
+ "model_type": "parler_tts",
160
+ "pad_token_id": 1024,
161
+ "text_encoder": {
162
+ "_name_or_path": "google/flan-t5-base",
163
+ "add_cross_attention": false,
164
+ "architectures": [
165
+ "T5ForConditionalGeneration"
166
+ ],
167
+ "bad_words_ids": null,
168
+ "begin_suppress_tokens": null,
169
+ "bos_token_id": null,
170
+ "chunk_size_feed_forward": 0,
171
+ "classifier_dropout": 0.0,
172
+ "cross_attention_hidden_size": null,
173
+ "d_ff": 2048,
174
+ "d_kv": 64,
175
+ "d_model": 768,
176
+ "decoder_start_token_id": 0,
177
+ "dense_act_fn": "gelu_new",
178
+ "diversity_penalty": 0.0,
179
+ "do_sample": false,
180
+ "dropout_rate": 0.1,
181
+ "early_stopping": false,
182
+ "encoder_no_repeat_ngram_size": 0,
183
+ "eos_token_id": 1,
184
+ "exponential_decay_length_penalty": null,
185
+ "feed_forward_proj": "gated-gelu",
186
+ "finetuning_task": null,
187
+ "forced_bos_token_id": null,
188
+ "forced_eos_token_id": null,
189
+ "id2label": {
190
+ "0": "LABEL_0",
191
+ "1": "LABEL_1"
192
+ },
193
+ "initializer_factor": 1.0,
194
+ "is_decoder": false,
195
+ "is_encoder_decoder": true,
196
+ "is_gated_act": true,
197
+ "label2id": {
198
+ "LABEL_0": 0,
199
+ "LABEL_1": 1
200
+ },
201
+ "layer_norm_epsilon": 1e-06,
202
+ "length_penalty": 1.0,
203
+ "max_length": 20,
204
+ "min_length": 0,
205
+ "model_type": "t5",
206
+ "n_positions": 512,
207
+ "no_repeat_ngram_size": 0,
208
+ "num_beam_groups": 1,
209
+ "num_beams": 1,
210
+ "num_decoder_layers": 12,
211
+ "num_heads": 12,
212
+ "num_layers": 12,
213
+ "num_return_sequences": 1,
214
+ "output_attentions": false,
215
+ "output_hidden_states": false,
216
+ "output_past": true,
217
+ "output_scores": false,
218
+ "pad_token_id": 0,
219
+ "prefix": null,
220
+ "problem_type": null,
221
+ "pruned_heads": {},
222
+ "relative_attention_max_distance": 128,
223
+ "relative_attention_num_buckets": 32,
224
+ "remove_invalid_values": false,
225
+ "repetition_penalty": 1.0,
226
+ "return_dict": true,
227
+ "return_dict_in_generate": false,
228
+ "sep_token_id": null,
229
+ "suppress_tokens": null,
230
+ "task_specific_params": {
231
+ "summarization": {
232
+ "early_stopping": true,
233
+ "length_penalty": 2.0,
234
+ "max_length": 200,
235
+ "min_length": 30,
236
+ "no_repeat_ngram_size": 3,
237
+ "num_beams": 4,
238
+ "prefix": "summarize: "
239
+ },
240
+ "translation_en_to_de": {
241
+ "early_stopping": true,
242
+ "max_length": 300,
243
+ "num_beams": 4,
244
+ "prefix": "translate English to German: "
245
+ },
246
+ "translation_en_to_fr": {
247
+ "early_stopping": true,
248
+ "max_length": 300,
249
+ "num_beams": 4,
250
+ "prefix": "translate English to French: "
251
+ },
252
+ "translation_en_to_ro": {
253
+ "early_stopping": true,
254
+ "max_length": 300,
255
+ "num_beams": 4,
256
+ "prefix": "translate English to Romanian: "
257
+ }
258
+ },
259
+ "temperature": 1.0,
260
+ "tf_legacy_loss": false,
261
+ "tie_encoder_decoder": false,
262
+ "tie_word_embeddings": false,
263
+ "tokenizer_class": null,
264
+ "top_k": 50,
265
+ "top_p": 1.0,
266
+ "torch_dtype": null,
267
+ "torchscript": false,
268
+ "typical_p": 1.0,
269
+ "use_bfloat16": false,
270
+ "use_cache": true,
271
+ "vocab_size": 32128
272
+ },
273
+ "torch_dtype": "float32",
274
+ "transformers_version": "4.41.0.dev0",
275
+ "vocab_size": 32128
276
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1025,
4
+ "decoder_start_token_id": 1025,
5
+ "do_sample": true,
6
+ "eos_token_id": 1024,
7
+ "guidance_scale": 1.0,
8
+ "max_length": 2580,
9
+ "min_new_tokens": 50,
10
+ "pad_token_id": 1024,
11
+ "transformers_version": "4.41.0.dev0"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab690bfb1b231f648dc3c141200f7a753fe4c84b4e3e30307f4491bccd5e7ee7
3
+ size 2588215392
preprocessor_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length_s": null,
3
+ "feature_extractor_type": "EncodecFeatureExtractor",
4
+ "feature_size": 1,
5
+ "overlap": null,
6
+ "padding_side": "right",
7
+ "padding_value": 0.0,
8
+ "return_attention_mask": true,
9
+ "sampling_rate": 44100
10
+ }
run_prompt_creation_expresso.py ADDED
@@ -0,0 +1,520 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import logging
3
+ import os
4
+ import re
5
+ import shutil
6
+ import sys
7
+ from dataclasses import dataclass, field
8
+ from pathlib import Path
9
+ from typing import Any, Dict, List, Optional, Tuple, Union
10
+
11
+ import numpy as np
12
+ import torch
13
+ from accelerate import Accelerator, skip_first_batches
14
+ from accelerate.logging import get_logger
15
+ from datasets import DatasetDict, load_dataset
16
+ from torch.utils.data import DataLoader
17
+ from tqdm import tqdm
18
+ from transformers import (
19
+ AutoModelForCausalLM,
20
+ AutoTokenizer,
21
+ BitsAndBytesConfig,
22
+ HfArgumentParser,
23
+ )
24
+
25
+
26
+ logger = get_logger(__name__, log_level="INFO")
27
+
28
+
29
+ @dataclass
30
+ class ModelArguments:
31
+ """
32
+ Arguments pertaining to what data we are going to input our model for training and eval.
33
+ """
34
+
35
+ model_name_or_path: str = field(
36
+ metadata={"help": "The name of the model to use (via the transformers library) for the prompt annotation."},
37
+ )
38
+ per_device_eval_batch_size: int = field(
39
+ metadata={"help": "The per-device batch size to use for inference."},
40
+ )
41
+ model_variant: str = field(
42
+ default=None,
43
+ metadata={"help": "If specified load weights from `variant` filename, *e.g.* pytorch_model.<variant>.bin. "},
44
+ )
45
+ model_revision: str = field(
46
+ default="main",
47
+ metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
48
+ )
49
+ cache_dir: Optional[str] = field(
50
+ default=None,
51
+ metadata={"help": "Where to store the pretrained models downloaded from huggingface.co"},
52
+ )
53
+ torch_dtype: Optional[str] = field(
54
+ default="float16",
55
+ metadata={
56
+ "help": (
57
+ "Floating-point format in which the model weights should be initialized"
58
+ " and the computations run. Choose one of `[float32, float16, bfloat16]`."
59
+ )
60
+ },
61
+ )
62
+ attn_implementation: Optional[str] = field(
63
+ default="sdpa",
64
+ metadata={"help": "Which attn type to use: ['eager', 'sdpa', 'flash_attention_2']"},
65
+ )
66
+ load_in_8bit: Optional[bool] = field(
67
+ default=False, metadata={"help": "Whether to use 8-bit precision for inference."}
68
+ )
69
+ load_in_4bit: Optional[bool] = field(
70
+ default=False, metadata={"help": "Whether to use 4-bit precision for inference."}
71
+ )
72
+ bnb_4bit_quant_type: Optional[str] = field(
73
+ default="nf4", metadata={"help": "precise the quantization type (fp4 or nf4)"}
74
+ )
75
+ use_bnb_nested_quant: Optional[bool] = field(default=False, metadata={"help": "use nested quantization"})
76
+ trust_remote_code: Optional[bool] = field(
77
+ default=False,
78
+ metadata={
79
+ "help": (
80
+ "Whether or not to allow for custom models defined on the Hub in their own modeling files. This option "
81
+ "should only be set to `True` for repositories you trust and in which you have read the code, as it will "
82
+ "execute code present on the Hub on your local machine."
83
+ )
84
+ },
85
+ )
86
+ use_fast_tokenizer: Optional[bool] = field(
87
+ default=True, metadata={"help": "Use fast tokenizer for encoding/decoding input ids"}
88
+ )
89
+ token: Optional[bool] = field(
90
+ default=True,
91
+ metadata={
92
+ "help": "Whether or not to use an authentication token when loading/uploading from the Hugging Face Hub"
93
+ },
94
+ )
95
+ do_sample: Optional[bool] = field(default=True, metadata={"help": "Whether to use sampling mode for generation"})
96
+ temperature: Optional[float] = field(default=0.6, metadata={"help": "Temperature for sampling-based generation"})
97
+ max_new_tokens: Optional[int] = field(
98
+ default=256, metadata={"help": "Maximum number of new tokens during generation"}
99
+ )
100
+ torch_compile: Optional[bool] = field(
101
+ default=False,
102
+ metadata={
103
+ "help": "Whether to compile the forward pass (not sampling) in generate. Only compatible with Gemma and LlaMA."
104
+ },
105
+ )
106
+
107
+
108
+ @dataclass
109
+ class DataArguments:
110
+ """
111
+ Arguments pertaining to what data we are going to input our model for training and eval.
112
+ """
113
+
114
+ output_dir: str = field(
115
+ metadata={
116
+ "help": "Where to save the processed dataset to disk. If unspecified, uses a 'pretty' version of the "
117
+ "original dataset name. E.g. 'facebook/voxpopuli' will be saved under 'voxpopuli'."
118
+ },
119
+ )
120
+ dataset_name: str = field(
121
+ default=None,
122
+ metadata={"help": "The name of the dataset to use (via the datasets library)"},
123
+ )
124
+ dataset_config_name: Optional[str] = field(
125
+ default=None,
126
+ metadata={"help": "The configuration name of the dataset to use (via the datasets library)."},
127
+ )
128
+ dataset_split_name: Optional[str] = field(
129
+ default=None,
130
+ metadata={"help": "The split name of the dataset to use (via the datasets library)."},
131
+ )
132
+ dataset_cache_dir: Optional[str] = field(
133
+ default=None,
134
+ metadata={"help": "Path to cache directory for saving and loading datasets"},
135
+ )
136
+ max_eval_samples: Optional[int] = field(
137
+ default=None,
138
+ metadata={"help": "Maximum number of samples for generation - use for debugging purposes."},
139
+ )
140
+ overwrite_cache: bool = field(
141
+ default=False,
142
+ metadata={"help": "Overwrite the cached training and evaluation sets"},
143
+ )
144
+ preprocessing_num_workers: Optional[int] = field(
145
+ default=None,
146
+ metadata={"help": "The number of processes to use for the preprocessing."},
147
+ )
148
+ dataloader_num_workers: Optional[int] = field(
149
+ default=0,
150
+ metadata={"help": "The number of processes to use for the dataloader."},
151
+ )
152
+ push_to_hub: Optional[bool] = field(
153
+ default=False,
154
+ metadata={"help": "Whether or not to push the processed dataset to the Hub."},
155
+ )
156
+ hub_dataset_id: Optional[str] = field(
157
+ default=None,
158
+ metadata={"help": "Repository namespace if pushing to the Hugging Face Hub."},
159
+ )
160
+ overwrite_output_dir: Optional[bool] = field(
161
+ default=False,
162
+ metadata={"help": "Overwrite the content of the output directory each time the script is run."},
163
+ )
164
+ save_steps: Optional[int] = field(
165
+ default=500,
166
+ metadata={"help": "Save the generated prompts every save_steps."},
167
+ )
168
+ save_total_limit: Optional[int] = field(
169
+ default=1, metadata={"help": ("If a value is passed, will limit the total number of saved checkpoints")}
170
+ )
171
+
172
+ def __post_init__(self):
173
+ if self.push_to_hub and self.hub_dataset_id is None:
174
+ raise ValueError("You must specify the `hub_dataset_id` when setting `--push_to_hub=True`")
175
+
176
+
177
+ def get_quantization_config(model_args: ModelArguments) -> Union[BitsAndBytesConfig, None]:
178
+ if model_args.load_in_4bit:
179
+ compute_dtype = torch.float16
180
+ if model_args.torch_dtype not in {"auto", None}:
181
+ compute_dtype = getattr(torch, model_args.torch_dtype)
182
+
183
+ quantization_config = BitsAndBytesConfig(
184
+ load_in_4bit=True,
185
+ bnb_4bit_compute_dtype=compute_dtype,
186
+ bnb_4bit_quant_type=model_args.bnb_4bit_quant_type,
187
+ bnb_4bit_use_double_quant=model_args.use_bnb_nested_quant,
188
+ )
189
+ elif model_args.load_in_8bit:
190
+ quantization_config = BitsAndBytesConfig(
191
+ load_in_8bit=True,
192
+ )
193
+ else:
194
+ quantization_config = None
195
+
196
+ return quantization_config
197
+
198
+
199
+ def get_current_device() -> int:
200
+ """Get the current device. For GPU we return the local process index to enable multiple GPU training."""
201
+ return Accelerator().local_process_index if torch.cuda.is_available() else "cpu"
202
+
203
+
204
+ def get_kbit_device_map() -> Union[Dict[str, int], None]:
205
+ """Useful for running inference with quantized models by setting `device_map=get_peft_device_map()`"""
206
+ return {"": get_current_device()} if torch.cuda.is_available() else None
207
+
208
+
209
+ CHECKPOINT_PREFIX = "checkpoint"
210
+ _RE_CHECKPOINT = re.compile(r"^checkpoint-(\d+).json$")
211
+
212
+
213
+ def save_checkpoint(output_dir, all_generated_ids, step):
214
+ checkpoint_path = f"{CHECKPOINT_PREFIX}-{step}.json"
215
+ output_path = os.path.join(output_dir, checkpoint_path)
216
+ all_generated_ids = [ids.tolist() for ids in all_generated_ids]
217
+ with open(output_path, "w") as file:
218
+ json.dump(all_generated_ids, file)
219
+
220
+
221
+ def load_checkpoint(checkpoint_path):
222
+ with open(checkpoint_path, "r") as file:
223
+ all_generated_ids = json.load(file)
224
+ all_generated_ids = [np.array(lst) for lst in all_generated_ids]
225
+ return all_generated_ids
226
+
227
+
228
+ def sorted_checkpoints(output_dir=None) -> List[str]:
229
+ """Helper function to sort saved checkpoints from oldest to newest."""
230
+ ordering_and_checkpoint_path = []
231
+
232
+ glob_checkpoints = [str(x) for x in Path(output_dir).glob(f"{CHECKPOINT_PREFIX}-*")]
233
+
234
+ for path in glob_checkpoints:
235
+ regex_match = re.match(f".*{CHECKPOINT_PREFIX}-([0-9]+)", path)
236
+ if regex_match is not None and regex_match.groups() is not None:
237
+ ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))
238
+
239
+ checkpoints_sorted = sorted(ordering_and_checkpoint_path)
240
+ checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
241
+ return checkpoints_sorted
242
+
243
+
244
+ def rotate_checkpoints(save_total_limit=None, output_dir=None) -> None:
245
+ """Helper function to delete old checkpoints."""
246
+ if save_total_limit is None or save_total_limit <= 0:
247
+ return
248
+ # Check if we should delete older checkpoint(s)
249
+ checkpoints_sorted = sorted_checkpoints(output_dir=output_dir)
250
+ if len(checkpoints_sorted) <= save_total_limit:
251
+ return
252
+
253
+ number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - save_total_limit)
254
+ checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
255
+ for checkpoint in checkpoints_to_be_deleted:
256
+ logger.info(f"Deleting older checkpoint [{checkpoint}] due to args.save_total_limit")
257
+ os.remove(checkpoint)
258
+
259
+
260
+ def get_last_checkpoint(folder) -> Tuple[List, int]:
261
+ if not os.path.exists(folder) or not os.path.isdir(folder):
262
+ os.makedirs(folder, exist_ok=True)
263
+ return [], 0
264
+ content = os.listdir(folder)
265
+ checkpoints = [path for path in content if _RE_CHECKPOINT.search(path) is not None]
266
+ if len(checkpoints) == 0:
267
+ return [], 0
268
+ last_checkpoint = os.path.join(folder, max(checkpoints, key=lambda x: int(_RE_CHECKPOINT.search(x).groups()[0])))
269
+ # Find num steps saved state string pattern
270
+ pattern = r"checkpoint-(\d+).json"
271
+ match = re.search(pattern, last_checkpoint)
272
+ cur_step = int(match.group(1))
273
+ # load corresponding generated ids
274
+ all_generated_ids = load_checkpoint(last_checkpoint)
275
+ return all_generated_ids, cur_step
276
+
277
+
278
+ @dataclass
279
+ class DataCollatorWithPadding:
280
+ """
281
+ Data collator that will dynamically pad the inputs received to the longest sequence in the batch.
282
+ """
283
+
284
+ tokenizer: Any
285
+
286
+ def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
287
+ # split inputs and labels since they have to be of different lengths and need
288
+ # different padding methods
289
+ input_ids = {"input_ids": [feature["input_ids"] for feature in features]}
290
+ batch = self.tokenizer.pad(input_ids, return_tensors="pt", padding="longest", return_attention_mask=True)
291
+ return batch
292
+
293
+ id_to_name = {
294
+ "ex01": "Jerry",
295
+ "ex02": "Elisabeth",
296
+ "ex03": "Thomas",
297
+ "ex04": "Talia"
298
+ }
299
+
300
+ PROMPT = """You will be given a name and an enunciation style related to an audio sample of someone speaking.
301
+ 1. The name will be one of: Jerry, Elisabeth, Thomas, Talia.
302
+ 2. The enunciation style will be one of: 'enunciated', 'happy', 'confused', 'default' (meaning no particular emotion conveyed), 'laughing', 'sad', 'whisper', 'emphasis'.
303
+ 3. The pace of the speaker's delivery (e.g., very slowly, quite slowly, slightly slowly, moderate speed, slightly fast, quite fast, very fast)
304
+
305
+ Your task is to create a simple text description using these keywords that accurately describes the audio sample. Ensure that the generated description is grammatically correct, easy to understand, and most importantly, concise.
306
+
307
+ For example, given the following keywords: 'Talia', 'happy', 'quite slowly', a valid description would be: 'Talia speaks happily and quite slowly with high quality.'. Another valid description would be: 'Talia delivers her words happily and quite slowly with high quality audio.'. Another example, given the following keywords: 'Jerry', 'emphasis', 'slightly slowly': 'Jerry speaks with emphasis on certain words and slightly slowly with high quality audio.'
308
+
309
+ Each description is appended with 'with high quality'.
310
+
311
+ You are free to change the order of the information, and replace synonymous terms. Give one description and nothing else. No alternatives or repeating the task. Remember to prioritise conciseness and simplicity.
312
+
313
+ For the information: '[speaker_id]', '[style]', '[speaking_rate]' the corresponding description is:"""
314
+
315
+
316
+ def main():
317
+ # 1. Parse input arguments
318
+ parser = HfArgumentParser((ModelArguments, DataArguments))
319
+ if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
320
+ # If we pass only one argument to the script and it's the path to a json file,
321
+ # let's parse it to get our arguments.
322
+ model_args, data_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
323
+ else:
324
+ model_args, data_args = parser.parse_args_into_dataclasses()
325
+
326
+ # 2. Setup logging
327
+ # Make one log on every process with the configuration for debugging.
328
+ logging.basicConfig(
329
+ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
330
+ datefmt="%m/%d/%Y %H:%M:%S",
331
+ handlers=[logging.StreamHandler(sys.stdout)],
332
+ )
333
+
334
+ accelerator = Accelerator()
335
+
336
+ if data_args.overwrite_output_dir and os.path.exists(data_args.output_dir) and os.path.isdir(data_args.output_dir):
337
+ logger.info("Cleaning output dir from previous run...")
338
+ shutil.rmtree(data_args.output_dir)
339
+
340
+ # 3. Load annotated dataset
341
+ logger.info("*** Load annotated dataset ***")
342
+ if data_args.dataset_split_name is not None:
343
+ raw_datasets = DatasetDict()
344
+ data_splits = data_args.dataset_split_name.split("+")
345
+ # load on a split-wise basis
346
+ for split in data_splits:
347
+ with accelerator.local_main_process_first():
348
+ raw_datasets[split] = load_dataset(
349
+ data_args.dataset_name,
350
+ data_args.dataset_config_name,
351
+ split=split,
352
+ cache_dir=model_args.cache_dir,
353
+ token=model_args.token,
354
+ num_proc=data_args.preprocessing_num_workers,
355
+ )
356
+ else:
357
+ with accelerator.local_main_process_first():
358
+ # load all splits for annotation
359
+ raw_datasets = load_dataset(
360
+ data_args.dataset_name,
361
+ data_args.dataset_config_name,
362
+ cache_dir=model_args.cache_dir,
363
+ token=model_args.token,
364
+ num_proc=data_args.preprocessing_num_workers,
365
+ )
366
+
367
+ raw_datasets_features = set(raw_datasets[next(iter(raw_datasets))].features.keys())
368
+
369
+ if data_args.max_eval_samples is not None:
370
+ for split in raw_datasets:
371
+ raw_datasets[split] = raw_datasets[split].select(range(data_args.max_eval_samples))
372
+
373
+ # TODO(SG): add accent
374
+ EXPECTED_COLUMNS = {"speaker_id", "style", "speaking_rate"}
375
+ if not EXPECTED_COLUMNS.issubset(raw_datasets_features):
376
+ missing_columns = EXPECTED_COLUMNS - raw_datasets_features
377
+ raise ValueError(
378
+ f"Missing columns {missing_columns} from the dataset features. Got dataset features {raw_datasets_features}"
379
+ )
380
+
381
+ # 4. Load pre-trained model
382
+ logger.info("*** Load pretrained model ***")
383
+ torch_dtype = (
384
+ model_args.torch_dtype if model_args.torch_dtype in ["auto", None] else getattr(torch, model_args.torch_dtype)
385
+ )
386
+ quantization_config = get_quantization_config(model_args)
387
+
388
+ model = AutoModelForCausalLM.from_pretrained(
389
+ model_args.model_name_or_path,
390
+ revision=model_args.model_revision,
391
+ variant=model_args.model_variant,
392
+ trust_remote_code=model_args.trust_remote_code,
393
+ attn_implementation=model_args.attn_implementation,
394
+ torch_dtype=torch_dtype,
395
+ device_map=get_kbit_device_map() if quantization_config is not None else None,
396
+ quantization_config=quantization_config,
397
+ low_cpu_mem_usage=True,
398
+ token=model_args.token,
399
+ ).eval()
400
+
401
+ if model_args.torch_compile:
402
+ # torch compile only compatible with gemma and llama
403
+ if not callable(getattr(model, "_setup_cache", None)):
404
+ raise ValueError(
405
+ f"Static k/v cache is not compatible with the model {model.__class__.__name__}. Set `--torch_compile=False"
406
+ "for dynamic k/v cache"
407
+ )
408
+ model.generation_config.cache_implementation = "static"
409
+ # compile the forward pass (but not the top-{p,k} sampling)
410
+ model = torch.compile(model, mode="reduce-overhead", fullgraph=True)
411
+
412
+ tokenizer = AutoTokenizer.from_pretrained(
413
+ model_args.model_name_or_path,
414
+ revision=model_args.model_revision,
415
+ trust_remote_code=model_args.trust_remote_code,
416
+ use_fast=model_args.use_fast_tokenizer,
417
+ padding_side="left",
418
+ )
419
+ if tokenizer.pad_token_id is None:
420
+ tokenizer.pad_token_id = tokenizer.bos_token_id
421
+ model.generation_config.pad_token_id = model.generation_config.eos_token_id
422
+
423
+
424
+ def prepare_dataset(sample):
425
+ sample_prompt = PROMPT
426
+ sample["speaker_id"] = id_to_name[sample["speaker_id"]]
427
+ for key in EXPECTED_COLUMNS:
428
+ sample_prompt = sample_prompt.replace(f"[{key}]", sample[key])
429
+ sample_prompt = [{"role": "user", "content": sample_prompt}]
430
+ token_ids = tokenizer.apply_chat_template(sample_prompt)
431
+ sample["input_ids"] = token_ids
432
+ return sample
433
+
434
+ with accelerator.local_main_process_first():
435
+ vectorized_datasets = raw_datasets.map(
436
+ prepare_dataset, num_proc=data_args.preprocessing_num_workers, desc="Preparing prompts"
437
+ )
438
+
439
+ # Prepare everything with our `accelerator`
440
+ model = accelerator.prepare(model)
441
+ data_collator = DataCollatorWithPadding(tokenizer)
442
+
443
+ def generate_step(batch):
444
+ output_ids = accelerator.unwrap_model(model).generate(
445
+ batch["input_ids"],
446
+ attention_mask=batch["attention_mask"],
447
+ do_sample=model_args.do_sample,
448
+ temperature=model_args.temperature,
449
+ max_new_tokens=model_args.max_new_tokens,
450
+ )
451
+ output_ids = accelerator.pad_across_processes(output_ids, dim=1, pad_index=tokenizer.pad_token_id)
452
+ return output_ids
453
+
454
+ def postprocess_dataset(sample):
455
+ prompt_text = tokenizer.decode(sample["input_ids"], skip_special_tokens=True)
456
+ generated_text = tokenizer.decode(sample["generated_ids"], skip_special_tokens=True)
457
+ sample["text_description"] = generated_text[len(prompt_text) :]
458
+ return sample
459
+
460
+ for split in vectorized_datasets:
461
+ data_loader = DataLoader(
462
+ vectorized_datasets[split],
463
+ batch_size=model_args.per_device_eval_batch_size,
464
+ collate_fn=data_collator,
465
+ num_workers=data_args.dataloader_num_workers,
466
+ pin_memory=True,
467
+ )
468
+ data_loader = accelerator.prepare(data_loader)
469
+ total_inference_steps = len(data_loader)
470
+ progress_bar = tqdm(
471
+ range(total_inference_steps), desc=" ... ", position=0, disable=not accelerator.is_local_main_process
472
+ )
473
+
474
+ split_output_dir = os.path.join(data_args.output_dir, split)
475
+ all_generated_ids, cur_step = get_last_checkpoint(split_output_dir)
476
+
477
+ if cur_step > 0:
478
+ logger.info(f"Resuming {split} from step {cur_step}")
479
+ # efficiently skip the first n batches
480
+ data_loader = skip_first_batches(data_loader, cur_step)
481
+ progress_bar.update(cur_step)
482
+
483
+ while cur_step < total_inference_steps:
484
+ for batch in data_loader:
485
+ generated_ids = generate_step(batch)
486
+ generated_ids = accelerator.gather_for_metrics(generated_ids)
487
+ all_generated_ids.extend(generated_ids.cpu().numpy())
488
+
489
+ cur_step += 1
490
+ progress_bar.update(1)
491
+
492
+ if (cur_step % data_args.save_steps == 0) or (cur_step == total_inference_steps):
493
+ save_checkpoint(split_output_dir, all_generated_ids, cur_step)
494
+ rotate_checkpoints(data_args.save_total_limit, output_dir=split_output_dir)
495
+
496
+ vectorized_datasets[split] = vectorized_datasets[split].add_column("generated_ids", all_generated_ids)
497
+
498
+ if accelerator.is_main_process:
499
+ vectorized_datasets[split] = vectorized_datasets[split].map(
500
+ postprocess_dataset,
501
+ num_proc=data_args.preprocessing_num_workers,
502
+ desc="Postprocessing dataset",
503
+ remove_columns=["input_ids", "generated_ids"],
504
+ )
505
+ accelerator.wait_for_everyone()
506
+
507
+ if accelerator.is_main_process:
508
+ vectorized_datasets.save_to_disk(data_args.output_dir)
509
+ if data_args.push_to_hub:
510
+ vectorized_datasets.push_to_hub(
511
+ data_args.hub_dataset_id,
512
+ config_name=data_args.dataset_config_name if data_args.dataset_config_name is not None else "default",
513
+ token=model_args.token,
514
+ )
515
+ accelerator.wait_for_everyone()
516
+ accelerator.end_training()
517
+
518
+
519
+ if __name__ == "__main__":
520
+ main()
special_tokens_map.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": {
105
+ "content": "</s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
+ "pad_token": {
112
+ "content": "<pad>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "unk_token": {
119
+ "content": "<unk>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ }
125
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,941 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "</s>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<unk>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "32000": {
29
+ "content": "<extra_id_99>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "32001": {
37
+ "content": "<extra_id_98>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "32002": {
45
+ "content": "<extra_id_97>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "32003": {
53
+ "content": "<extra_id_96>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "32004": {
61
+ "content": "<extra_id_95>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "32005": {
69
+ "content": "<extra_id_94>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "32006": {
77
+ "content": "<extra_id_93>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "32007": {
85
+ "content": "<extra_id_92>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "32008": {
93
+ "content": "<extra_id_91>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "32009": {
101
+ "content": "<extra_id_90>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "32010": {
109
+ "content": "<extra_id_89>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "32011": {
117
+ "content": "<extra_id_88>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "32012": {
125
+ "content": "<extra_id_87>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "32013": {
133
+ "content": "<extra_id_86>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "32014": {
141
+ "content": "<extra_id_85>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "32015": {
149
+ "content": "<extra_id_84>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "32016": {
157
+ "content": "<extra_id_83>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "32017": {
165
+ "content": "<extra_id_82>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "32018": {
173
+ "content": "<extra_id_81>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "32019": {
181
+ "content": "<extra_id_80>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "32020": {
189
+ "content": "<extra_id_79>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "32021": {
197
+ "content": "<extra_id_78>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "32022": {
205
+ "content": "<extra_id_77>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "32023": {
213
+ "content": "<extra_id_76>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "32024": {
221
+ "content": "<extra_id_75>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "32025": {
229
+ "content": "<extra_id_74>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "32026": {
237
+ "content": "<extra_id_73>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "32027": {
245
+ "content": "<extra_id_72>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "32028": {
253
+ "content": "<extra_id_71>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "32029": {
261
+ "content": "<extra_id_70>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "32030": {
269
+ "content": "<extra_id_69>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "32031": {
277
+ "content": "<extra_id_68>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "32032": {
285
+ "content": "<extra_id_67>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "32033": {
293
+ "content": "<extra_id_66>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "32034": {
301
+ "content": "<extra_id_65>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "32035": {
309
+ "content": "<extra_id_64>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "32036": {
317
+ "content": "<extra_id_63>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "32037": {
325
+ "content": "<extra_id_62>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "32038": {
333
+ "content": "<extra_id_61>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "32039": {
341
+ "content": "<extra_id_60>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "32040": {
349
+ "content": "<extra_id_59>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "32041": {
357
+ "content": "<extra_id_58>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "32042": {
365
+ "content": "<extra_id_57>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "32043": {
373
+ "content": "<extra_id_56>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "32044": {
381
+ "content": "<extra_id_55>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "32045": {
389
+ "content": "<extra_id_54>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "32046": {
397
+ "content": "<extra_id_53>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "32047": {
405
+ "content": "<extra_id_52>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "32048": {
413
+ "content": "<extra_id_51>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "32049": {
421
+ "content": "<extra_id_50>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "32050": {
429
+ "content": "<extra_id_49>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "32051": {
437
+ "content": "<extra_id_48>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "32052": {
445
+ "content": "<extra_id_47>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "32053": {
453
+ "content": "<extra_id_46>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "32054": {
461
+ "content": "<extra_id_45>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "32055": {
469
+ "content": "<extra_id_44>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "32056": {
477
+ "content": "<extra_id_43>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "32057": {
485
+ "content": "<extra_id_42>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "32058": {
493
+ "content": "<extra_id_41>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "32059": {
501
+ "content": "<extra_id_40>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "32060": {
509
+ "content": "<extra_id_39>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "32061": {
517
+ "content": "<extra_id_38>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "32062": {
525
+ "content": "<extra_id_37>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "32063": {
533
+ "content": "<extra_id_36>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "32064": {
541
+ "content": "<extra_id_35>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "32065": {
549
+ "content": "<extra_id_34>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "32066": {
557
+ "content": "<extra_id_33>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "32067": {
565
+ "content": "<extra_id_32>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "32068": {
573
+ "content": "<extra_id_31>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "32069": {
581
+ "content": "<extra_id_30>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "32070": {
589
+ "content": "<extra_id_29>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "32071": {
597
+ "content": "<extra_id_28>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "32072": {
605
+ "content": "<extra_id_27>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "32073": {
613
+ "content": "<extra_id_26>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "32074": {
621
+ "content": "<extra_id_25>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "32075": {
629
+ "content": "<extra_id_24>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "32076": {
637
+ "content": "<extra_id_23>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "32077": {
645
+ "content": "<extra_id_22>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "32078": {
653
+ "content": "<extra_id_21>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "32079": {
661
+ "content": "<extra_id_20>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "32080": {
669
+ "content": "<extra_id_19>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "32081": {
677
+ "content": "<extra_id_18>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "32082": {
685
+ "content": "<extra_id_17>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "32083": {
693
+ "content": "<extra_id_16>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "32084": {
701
+ "content": "<extra_id_15>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "32085": {
709
+ "content": "<extra_id_14>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "32086": {
717
+ "content": "<extra_id_13>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "32087": {
725
+ "content": "<extra_id_12>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "32088": {
733
+ "content": "<extra_id_11>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "32089": {
741
+ "content": "<extra_id_10>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "32090": {
749
+ "content": "<extra_id_9>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "32091": {
757
+ "content": "<extra_id_8>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "32092": {
765
+ "content": "<extra_id_7>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "32093": {
773
+ "content": "<extra_id_6>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "32094": {
781
+ "content": "<extra_id_5>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "32095": {
789
+ "content": "<extra_id_4>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "32096": {
797
+ "content": "<extra_id_3>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "32097": {
805
+ "content": "<extra_id_2>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "32098": {
813
+ "content": "<extra_id_1>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "32099": {
821
+ "content": "<extra_id_0>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ }
828
+ },
829
+ "additional_special_tokens": [
830
+ "<extra_id_0>",
831
+ "<extra_id_1>",
832
+ "<extra_id_2>",
833
+ "<extra_id_3>",
834
+ "<extra_id_4>",
835
+ "<extra_id_5>",
836
+ "<extra_id_6>",
837
+ "<extra_id_7>",
838
+ "<extra_id_8>",
839
+ "<extra_id_9>",
840
+ "<extra_id_10>",
841
+ "<extra_id_11>",
842
+ "<extra_id_12>",
843
+ "<extra_id_13>",
844
+ "<extra_id_14>",
845
+ "<extra_id_15>",
846
+ "<extra_id_16>",
847
+ "<extra_id_17>",
848
+ "<extra_id_18>",
849
+ "<extra_id_19>",
850
+ "<extra_id_20>",
851
+ "<extra_id_21>",
852
+ "<extra_id_22>",
853
+ "<extra_id_23>",
854
+ "<extra_id_24>",
855
+ "<extra_id_25>",
856
+ "<extra_id_26>",
857
+ "<extra_id_27>",
858
+ "<extra_id_28>",
859
+ "<extra_id_29>",
860
+ "<extra_id_30>",
861
+ "<extra_id_31>",
862
+ "<extra_id_32>",
863
+ "<extra_id_33>",
864
+ "<extra_id_34>",
865
+ "<extra_id_35>",
866
+ "<extra_id_36>",
867
+ "<extra_id_37>",
868
+ "<extra_id_38>",
869
+ "<extra_id_39>",
870
+ "<extra_id_40>",
871
+ "<extra_id_41>",
872
+ "<extra_id_42>",
873
+ "<extra_id_43>",
874
+ "<extra_id_44>",
875
+ "<extra_id_45>",
876
+ "<extra_id_46>",
877
+ "<extra_id_47>",
878
+ "<extra_id_48>",
879
+ "<extra_id_49>",
880
+ "<extra_id_50>",
881
+ "<extra_id_51>",
882
+ "<extra_id_52>",
883
+ "<extra_id_53>",
884
+ "<extra_id_54>",
885
+ "<extra_id_55>",
886
+ "<extra_id_56>",
887
+ "<extra_id_57>",
888
+ "<extra_id_58>",
889
+ "<extra_id_59>",
890
+ "<extra_id_60>",
891
+ "<extra_id_61>",
892
+ "<extra_id_62>",
893
+ "<extra_id_63>",
894
+ "<extra_id_64>",
895
+ "<extra_id_65>",
896
+ "<extra_id_66>",
897
+ "<extra_id_67>",
898
+ "<extra_id_68>",
899
+ "<extra_id_69>",
900
+ "<extra_id_70>",
901
+ "<extra_id_71>",
902
+ "<extra_id_72>",
903
+ "<extra_id_73>",
904
+ "<extra_id_74>",
905
+ "<extra_id_75>",
906
+ "<extra_id_76>",
907
+ "<extra_id_77>",
908
+ "<extra_id_78>",
909
+ "<extra_id_79>",
910
+ "<extra_id_80>",
911
+ "<extra_id_81>",
912
+ "<extra_id_82>",
913
+ "<extra_id_83>",
914
+ "<extra_id_84>",
915
+ "<extra_id_85>",
916
+ "<extra_id_86>",
917
+ "<extra_id_87>",
918
+ "<extra_id_88>",
919
+ "<extra_id_89>",
920
+ "<extra_id_90>",
921
+ "<extra_id_91>",
922
+ "<extra_id_92>",
923
+ "<extra_id_93>",
924
+ "<extra_id_94>",
925
+ "<extra_id_95>",
926
+ "<extra_id_96>",
927
+ "<extra_id_97>",
928
+ "<extra_id_98>",
929
+ "<extra_id_99>"
930
+ ],
931
+ "clean_up_tokenization_spaces": true,
932
+ "eos_token": "</s>",
933
+ "extra_ids": 100,
934
+ "legacy": true,
935
+ "model_max_length": 512,
936
+ "pad_token": "<pad>",
937
+ "padding_side": "left",
938
+ "sp_model_kwargs": {},
939
+ "tokenizer_class": "T5Tokenizer",
940
+ "unk_token": "<unk>"
941
+ }