rahul7star commited on
Commit
d8de062
Β·
verified Β·
1 Parent(s): 8a069ff

Chatterbox fine-tuned model + logs

Browse files
runs/Feb07_05-45-58_r-rahul7star-chatterbox-train-632mlseq-2bdf1-8pdgh/events.out.tfevents.1770439558.r-rahul7star-chatterbox-train-632mlseq-2bdf1-8pdgh.174.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00979b5e54b04eb381f24b2cdd35f2c2681622ddc91d939d29b01259bc4aa09f
3
+ size 4097
training.log CHANGED
@@ -1,8 +1,14 @@
 
 
 
 
 
 
1
  loaded PerthNet (Implicit) at step 250,000
2
 
3
  /usr/local/lib/python3.13/site-packages/perth/perth_net/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
4
  from pkg_resources import resource_filename
5
- 02/07/2026 05:32:57 - INFO - __main__ - Training/evaluation parameters CustomTrainingArguments(
6
  accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
7
  adam_beta1=0.9,
8
  adam_beta2=0.999,
@@ -82,7 +88,7 @@ num_train_epochs=1.0,
82
  optim=OptimizerNames.ADAMW_TORCH_FUSED,
83
  optim_args=None,
84
  optim_target_modules=None,
85
- output_dir=./checkpoints/v1,
86
  parallelism_config=None,
87
  per_device_eval_batch_size=8,
88
  per_device_train_batch_size=1,
@@ -114,53 +120,116 @@ warmup_ratio=None,
114
  warmup_steps=1.0,
115
  weight_decay=0.0,
116
  )
117
- 02/07/2026 05:32:57 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='ResembleAI/chatterbox', local_model_dir=None, cache_dir=None, freeze_voice_encoder=True, freeze_s3gen=True)
118
- 02/07/2026 05:32:57 - INFO - __main__ - Data parameters DataArguments(language='hi', dataset_dir=None, metadata_file=None, dataset_name='rahul7star/hindi-speech-dataset', dataset_config_name=None, train_split_name='train', eval_split_name='validation', text_column_name='text_scribe', audio_column_name='audio', max_text_len=256, max_speech_len=800, audio_prompt_duration_s=3.0, eval_split_size=0.0002, preprocessing_num_workers=None, ignore_verifications=False)
119
- 02/07/2026 05:32:57 - INFO - __main__ - Loading ChatterboxTTS model...
120
- 02/07/2026 05:32:57 - INFO - __main__ - Loading model from Hugging Face Hub: ResembleAI/chatterbox
121
- /usr/local/lib/python3.13/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.
122
- warnings.warn(
123
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/ve.safetensors "HTTP/1.1 302 Found"
124
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/t3_mtl23ls_v2.safetensors "HTTP/1.1 302 Found"
125
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/s3gen.safetensors "HTTP/1.1 302 Found"
126
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/mtl_tokenizer.json "HTTP/1.1 307 Temporary Redirect"
127
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/mtl_tokenizer.json "HTTP/1.1 200 OK"
128
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/conds.pt "HTTP/1.1 302 Found"
129
- 02/07/2026 05:32:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/revision/main "HTTP/1.1 200 OK"
130
 
131
 
132
  Downloading (incomplete total...): 0.00B [00:00, ?B/s]
133
 
134
- Fetching 5 files: 0%| | 0/5 [00:00<?, ?it/s]
135
- Fetching 5 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 32313.59it/s]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
 
137
 
138
- Download complete: : 0.00B [00:00, ?B/s] 
139
- Download complete: : 0.00B [00:00, ?B/s]
 
 
 
 
 
 
140
  /usr/local/lib/python3.13/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
141
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
142
- 02/07/2026 05:33:10 - INFO - root - input frame rate=25
143
- 02/07/2026 05:33:15 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/Cangjie5_TC.json "HTTP/1.1 307 Temporary Redirect"
144
- 02/07/2026 05:33:15 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
145
- 02/07/2026 05:33:17 - INFO - __main__ - Voice Encoder frozen.
146
- 02/07/2026 05:33:17 - INFO - __main__ - S3Gen model frozen.
147
- 02/07/2026 05:33:17 - INFO - __main__ - T3 model set to trainable.
148
- 02/07/2026 05:33:17 - INFO - __main__ - Loading and processing dataset...
149
- 02/07/2026 05:33:17 - INFO - __main__ - Loading dataset 'rahul7star/hindi-speech-dataset' from Hugging Face Hub.
150
- 02/07/2026 05:33:17 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
151
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/rahul7star/hindi-speech-dataset/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/README.md "HTTP/1.1 200 OK"
152
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/hindi-speech-dataset.py "HTTP/1.1 404 Not Found"
153
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/rahul7star/hindi-speech-dataset/rahul7star/hindi-speech-dataset.py "HTTP/1.1 404 Not Found"
154
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/rahul7star/hindi-speech-dataset/revision/0bfd5e2e4555ec80d7dd74b10442836d2e169be6 "HTTP/1.1 200 OK"
155
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/.huggingface.yaml "HTTP/1.1 404 Not Found"
156
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=rahul7star/hindi-speech-dataset "HTTP/1.1 200 OK"
157
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/rahul7star/hindi-speech-dataset/tree/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/data?recursive=true&expand=false "HTTP/1.1 200 OK"
158
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/rahul7star/hindi-speech-dataset/tree/0bfd5e2e4555ec80d7dd74b10442836d2e169be6?recursive=false&expand=false "HTTP/1.1 200 OK"
159
- 02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/dataset_infos.json "HTTP/1.1 404 Not Found"
160
- 02/07/2026 05:33:18 - INFO - __main__ - *** Training T3 model ***
161
-
162
-
163
- 0%| | 0/145152 [00:00<?, ?it/s]Traceback (most recent call last):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
  File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 849, in <module>
165
  main()
166
  ~~~~^^
@@ -185,60 +254,15 @@ Download complete: : 0.00B [00:00, ?B/s]
185
  ...<5 lines>...
186
  metric_key_prefix=metric_key_prefix,
187
  )
188
- File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 4468, in evaluation_loop
189
- for step, inputs in enumerate(dataloader):
190
- ~~~~~~~~~^^^^^^^^^^^^
191
- File "/usr/local/lib/python3.13/site-packages/accelerate/data_loader.py", line 567, in __iter__
192
- current_batch = next(dataloader_iter)
193
- File "/usr/local/lib/python3.13/site-packages/torch/utils/data/dataloader.py", line 741, in __next__
194
- data = self._next_data()
195
- File "/usr/local/lib/python3.13/site-packages/torch/utils/data/dataloader.py", line 1548, in _next_data
196
- return self._process_data(data, worker_id)
197
- ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
198
- File "/usr/local/lib/python3.13/site-packages/torch/utils/data/dataloader.py", line 1586, in _process_data
199
- data.reraise()
200
- ~~~~~~~~~~~~^^
201
- File "/usr/local/lib/python3.13/site-packages/torch/_utils.py", line 775, in reraise
202
- raise exception
203
- ImportError: Caught ImportError in DataLoader worker process 0.
204
- Original Traceback (most recent call last):
205
- File "/usr/local/lib/python3.13/site-packages/torch/utils/data/_utils/worker.py", line 358, in _worker_loop
206
- data = fetcher.fetch(index) # type: ignore[possibly-undefined]
207
- File "/usr/local/lib/python3.13/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
208
- data = [self.dataset[idx] for idx in possibly_batched_index]
209
- ~~~~~~~~~~~~^^^^^
210
- File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 239, in __getitem__
211
- wav_16k, text = self._load_audio_text_from_item(idx)
212
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
213
- File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 187, in _load_audio_text_from_item
214
- item = self.dataset_source[idx]
215
- ~~~~~~~~~~~~~~~~~~~^^^^^
216
- File "/usr/local/lib/python3.13/site-packages/datasets/arrow_dataset.py", line 2878, in __getitem__
217
- return self._getitem(key)
218
- ~~~~~~~~~~~~~^^^^^
219
- File "/usr/local/lib/python3.13/site-packages/datasets/arrow_dataset.py", line 2860, in _getitem
220
- formatted_output = format_table(
221
- pa_subtable, key, formatter=formatter, format_columns=format_columns, output_all_columns=output_all_columns
222
  )
223
- File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 658, in format_table
224
- return formatter(pa_table, query_type=query_type)
225
- File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 411, in __call__
226
- return self.format_row(pa_table)
227
- ~~~~~~~~~~~~~~~^^^^^^^^^^
228
- File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 460, in format_row
229
- row = self.python_features_decoder.decode_row(row)
230
- File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 224, in decode_row
231
- return self.features.decode_example(row, token_per_repo_id=self.token_per_repo_id) if self.features else row
232
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
233
- File "/usr/local/lib/python3.13/site-packages/datasets/features/features.py", line 2111, in decode_example
234
- column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
235
- ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
236
- File "/usr/local/lib/python3.13/site-packages/datasets/features/features.py", line 1419, in decode_nested_example
237
- return schema.decode_example(obj, token_per_repo_id=token_per_repo_id) if obj is not None else None
238
- ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
239
- File "/usr/local/lib/python3.13/site-packages/datasets/features/audio.py", line 186, in decode_example
240
- raise ImportError("To support decoding audio data, please install 'torchcodec'.")
241
- ImportError: To support decoding audio data, please install 'torchcodec'.
242
-
243
-
244
- 0%| | 0/145152 [00:02<?, ?it/s]
 
1
+
2
+ Resolved paths:
3
+ - model_name_or_path: /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9
4
+ - output_dir: /app/checkpoints/v1
5
+
6
+ Starting training...
7
  loaded PerthNet (Implicit) at step 250,000
8
 
9
  /usr/local/lib/python3.13/site-packages/perth/perth_net/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
10
  from pkg_resources import resource_filename
11
+ 02/07/2026 05:45:31 - INFO - __main__ - Training/evaluation parameters CustomTrainingArguments(
12
  accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
13
  adam_beta1=0.9,
14
  adam_beta2=0.999,
 
88
  optim=OptimizerNames.ADAMW_TORCH_FUSED,
89
  optim_args=None,
90
  optim_target_modules=None,
91
+ output_dir=/app/checkpoints/v1,
92
  parallelism_config=None,
93
  per_device_eval_batch_size=8,
94
  per_device_train_batch_size=1,
 
120
  warmup_steps=1.0,
121
  weight_decay=0.0,
122
  )
123
+ 02/07/2026 05:45:31 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9', local_model_dir=None, cache_dir=None, freeze_voice_encoder=True, freeze_s3gen=True)
124
+ 02/07/2026 05:45:31 - INFO - __main__ - Data parameters DataArguments(language='hi', dataset_dir=None, metadata_file=None, dataset_name='dare43321/hindi', dataset_config_name=None, train_split_name='train', eval_split_name='validation', text_column_name='text', audio_column_name='audio', max_text_len=256, max_speech_len=800, audio_prompt_duration_s=3.0, eval_split_size=0.0002, preprocessing_num_workers=None, ignore_verifications=False)
125
+ 02/07/2026 05:45:31 - INFO - __main__ - Loading ChatterboxTTS model...
126
+ 02/07/2026 05:45:31 - INFO - __main__ - Loading model from Hugging Face Hub: /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9
127
+ 02/07/2026 05:45:31 - WARNING - __main__ - Could not download ve.safetensors from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
128
+ 02/07/2026 05:45:31 - WARNING - __main__ - Could not download t3_mtl23ls_v2.safetensors from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
129
+ 02/07/2026 05:45:31 - WARNING - __main__ - Could not download s3gen.safetensors from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
130
+ 02/07/2026 05:45:31 - WARNING - __main__ - Could not download mtl_tokenizer.json from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
131
+ 02/07/2026 05:45:31 - INFO - __main__ - conds.pt not found on Hub or failed to download for this model.
132
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/revision/main "HTTP/1.1 200 OK"
 
 
 
133
 
134
 
135
  Downloading (incomplete total...): 0.00B [00:00, ?B/s]
136
 
137
+ Fetching 5 files: 0%| | 0/5 [00:00<?, ?it/s]02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/conds.pt "HTTP/1.1 302 Found"
138
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 307 Temporary Redirect"
139
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/s3gen.pt "HTTP/1.1 302 Found"
140
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/ve.pt "HTTP/1.1 302 Found"
141
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/t3_mtl23ls_v2.safetensors "HTTP/1.1 302 Found"
142
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 200 OK"
143
+ 02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
144
+
145
+
146
+ Downloading (incomplete total...): 0%| | 0.00/107k [00:00<?, ?B/s]02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
147
+
148
+
149
+ Downloading (incomplete total...): 0%| | 0.00/1.06G [00:00<?, ?B/s]02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
150
+
151
+
152
+ Downloading (incomplete total...): 0%| | 0.00/3.20G [00:00<?, ?B/s]02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 200 OK"
153
+
154
+
155
+ Downloading (incomplete total...): 0%| | 0.00/3.20G [00:00<?, ?B/s]02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
156
+
157
+
158
+ Downloading (incomplete total...): 0%| | 70.0k/3.21G [00:00<1:02:25, 856kB/s]
159
+
160
+ Downloading (incomplete total...): 0%| | 13.5M/3.21G [00:02<11:15, 4.73MB/s] 
161
+
162
+ Downloading (incomplete total...): 3%|β–Ž | 80.6M/3.21G [00:04<02:34, 20.3MB/s]
163
+
164
+ Downloading (incomplete total...): 5%|▍ | 148M/3.21G [00:07<02:22, 21.4MB/s] 
165
+
166
+ Downloading (incomplete total...): 20%|β–ˆβ–‰ | 637M/3.21G [00:08<00:22, 116MB/s] 
167
+
168
+ Downloading (incomplete total...): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1.18G/3.21G [00:10<00:11, 183MB/s]
169
 
170
+ Downloading (incomplete total...): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.45G/3.21G [00:11<00:09, 179MB/s]
171
 
172
+ Downloading (incomplete total...): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.21G/3.21G [00:12<00:03, 300MB/s]
173
+
174
+ Fetching 5 files: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3/5 [00:13<00:08, 4.38s/it]
175
+ Fetching 5 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:13<00:00, 2.73s/it]
176
+
177
+
178
+ Download complete: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.21G/3.21G [00:13<00:00, 300MB/s] 
179
+ Download complete: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.21G/3.21G [00:20<00:00, 156MB/s]
180
  /usr/local/lib/python3.13/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
181
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
182
+ 02/07/2026 05:45:53 - INFO - root - input frame rate=25
183
+ 02/07/2026 05:45:54 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/Cangjie5_TC.json "HTTP/1.1 307 Temporary Redirect"
184
+ 02/07/2026 05:45:54 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
185
+ 02/07/2026 05:45:54 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
186
+
187
+
188
+ Cangjie5_TC.json: 0%| | 0.00/1.92M [00:00<?, ?B/s]
189
+ Cangjie5_TC.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.92M/1.92M [00:00<00:00, 89.4MB/s]
190
+ Downloading: "https://github.com/explosion/spacy-pkuseg/releases/download/v0.0.26/spacy_ontonotes.zip" to /root/.pkuseg/spacy_ontonotes.zip
191
+
192
+
193
+ 0%| | 0/34567143 [00:00<?, ?it/s]
194
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 34567143/34567143 [00:00<00:00, 188751231.77it/s]
195
+ 02/07/2026 05:45:57 - INFO - __main__ - Voice Encoder frozen.
196
+ 02/07/2026 05:45:57 - INFO - __main__ - S3Gen model frozen.
197
+ 02/07/2026 05:45:57 - INFO - __main__ - T3 model set to trainable.
198
+ 02/07/2026 05:45:57 - INFO - __main__ - Loading and processing dataset...
199
+ 02/07/2026 05:45:57 - INFO - __main__ - Loading dataset 'dare43321/hindi' from Hugging Face Hub.
200
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
201
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/dare43321/hindi/a3feda7a1dc916a46b8e50462c6104c34a497d95/README.md "HTTP/1.1 200 OK"
202
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/dare43321/hindi/a3feda7a1dc916a46b8e50462c6104c34a497d95/README.md "HTTP/1.1 200 OK"
203
+
204
+
205
+ README.md: 0%| | 0.00/312 [00:00<?, ?B/s]
206
+ README.md: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 312/312 [00:00<00:00, 1.23MB/s]
207
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/hindi.py "HTTP/1.1 404 Not Found"
208
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/dare43321/hindi/dare43321/hindi.py "HTTP/1.1 404 Not Found"
209
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/revision/a3feda7a1dc916a46b8e50462c6104c34a497d95 "HTTP/1.1 200 OK"
210
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/.huggingface.yaml "HTTP/1.1 404 Not Found"
211
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=dare43321/hindi "HTTP/1.1 200 OK"
212
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/tree/a3feda7a1dc916a46b8e50462c6104c34a497d95/data?recursive=true&expand=false "HTTP/1.1 200 OK"
213
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/tree/a3feda7a1dc916a46b8e50462c6104c34a497d95?recursive=false&expand=false "HTTP/1.1 200 OK"
214
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/dataset_infos.json "HTTP/1.1 404 Not Found"
215
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/data/train-00000-of-00001.parquet "HTTP/1.1 302 Found"
216
+ 02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/xet-read-token/a3feda7a1dc916a46b8e50462c6104c34a497d95 "HTTP/1.1 200 OK"
217
+
218
+
219
+ data/train-00000-of-00001.parquet: 0%| | 0.00/13.0M [00:00<?, ?B/s]
220
+ data/train-00000-of-00001.parquet: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13.0M/13.0M [00:00<00:00, 32.4MB/s]
221
+
222
+
223
+ Generating train split: 0%| | 0/65 [00:00<?, ? examples/s]
224
+ Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 65/65 [00:00<00:00, 508.48 examples/s]
225
+ 02/07/2026 05:45:58 - INFO - __main__ - Splitting train dataset for evaluation with ratio 0.0002
226
+ 02/07/2026 05:45:58 - INFO - __main__ - Evaluation set size: 1
227
+ 02/07/2026 05:45:58 - INFO - __main__ - *** Training T3 model ***
228
+
229
+
230
+ 0%| | 0/64 [00:00<?, ?it/s]02/07/2026 05:45:59 - ERROR - __main__ - Unexpected audio data format for item 0: <class 'datasets.features._torchcodec.AudioDecoder'>. Skipping.
231
+ 02/07/2026 05:45:59 - WARNING - __main__ - SpeechDataCollator received no valid features. Returning empty batch.
232
+ Traceback (most recent call last):
233
  File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 849, in <module>
234
  main()
235
  ~~~~^^
 
254
  ...<5 lines>...
255
  metric_key_prefix=metric_key_prefix,
256
  )
257
+ File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 4478, in evaluation_loop
258
+ losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
259
+ ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
260
+ File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 4649, in prediction_step
261
+ inputs = self._prepare_inputs(inputs)
262
+ File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 3610, in _prepare_inputs
263
+ raise ValueError(
264
+ ...<2 lines>...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
265
  )
266
+ ValueError: The batch received was empty, your model won't be able to train on it. Double-check that your training dataset contains keys expected by the model: text_tokens,text_token_lens,speech_tokens,speech_token_lens,t3_cond_speaker_emb,t3_cond_prompt_speech_tokens,t3_cond_emotion_adv,labels_text,labels_speech,labels_speech,label_ids,label.
267
+
268
+ 0%| | 0/64 [00:02<?, ?it/s]