Chatterbox fine-tuned model + logs

Browse files

Files changed (2) hide show

runs/Feb07_05-45-58_r-rahul7star-chatterbox-train-632mlseq-2bdf1-8pdgh/events.out.tfevents.1770439558.r-rahul7star-chatterbox-train-632mlseq-2bdf1-8pdgh.174.0 +3 -0
training.log +121 -97

runs/Feb07_05-45-58_r-rahul7star-chatterbox-train-632mlseq-2bdf1-8pdgh/events.out.tfevents.1770439558.r-rahul7star-chatterbox-train-632mlseq-2bdf1-8pdgh.174.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:00979b5e54b04eb381f24b2cdd35f2c2681622ddc91d939d29b01259bc4aa09f
+size 4097

training.log CHANGED Viewed

@@ -1,8 +1,14 @@
 loaded PerthNet (Implicit) at step 250,000
 /usr/local/lib/python3.13/site-packages/perth/perth_net/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
   from pkg_resources import resource_filename
-02/07/2026 05:32:57 - INFO - __main__ - Training/evaluation parameters CustomTrainingArguments(
 accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
 adam_beta1=0.9,
 adam_beta2=0.999,
@@ -82,7 +88,7 @@ num_train_epochs=1.0,
 optim=OptimizerNames.ADAMW_TORCH_FUSED,
 optim_args=None,
 optim_target_modules=None,
-output_dir=./checkpoints/v1,
 parallelism_config=None,
 per_device_eval_batch_size=8,
 per_device_train_batch_size=1,
@@ -114,53 +120,116 @@ warmup_ratio=None,
 warmup_steps=1.0,
 weight_decay=0.0,
 )
-02/07/2026 05:32:57 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='ResembleAI/chatterbox', local_model_dir=None, cache_dir=None, freeze_voice_encoder=True, freeze_s3gen=True)
-02/07/2026 05:32:57 - INFO - __main__ - Data parameters DataArguments(language='hi', dataset_dir=None, metadata_file=None, dataset_name='rahul7star/hindi-speech-dataset', dataset_config_name=None, train_split_name='train', eval_split_name='validation', text_column_name='text_scribe', audio_column_name='audio', max_text_len=256, max_speech_len=800, audio_prompt_duration_s=3.0, eval_split_size=0.0002, preprocessing_num_workers=None, ignore_verifications=False)
-02/07/2026 05:32:57 - INFO - __main__ - Loading ChatterboxTTS model...
-02/07/2026 05:32:57 - INFO - __main__ - Loading model from Hugging Face Hub: ResembleAI/chatterbox
-/usr/local/lib/python3.13/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.
-  warnings.warn(
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/ve.safetensors "HTTP/1.1 302 Found"
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/t3_mtl23ls_v2.safetensors "HTTP/1.1 302 Found"
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/s3gen.safetensors "HTTP/1.1 302 Found"
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/mtl_tokenizer.json "HTTP/1.1 307 Temporary Redirect"
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/mtl_tokenizer.json "HTTP/1.1 200 OK"
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/conds.pt "HTTP/1.1 302 Found"
-02/07/2026 05:32:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/revision/main "HTTP/1.1 200 OK"
 Downloading (incomplete total...): 0.00B [00:00, ?B/s][A
-Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s][A
-Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 32313.59it/s]
-Download complete: : 0.00B [00:00, ?B/s]              [A
-Download complete: : 0.00B [00:00, ?B/s]
 /usr/local/lib/python3.13/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
   deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
-02/07/2026 05:33:10 - INFO - root - input frame rate=25
-02/07/2026 05:33:15 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/Cangjie5_TC.json "HTTP/1.1 307 Temporary Redirect"
-02/07/2026 05:33:15 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
-02/07/2026 05:33:17 - INFO - __main__ - Voice Encoder frozen.
-02/07/2026 05:33:17 - INFO - __main__ - S3Gen model frozen.
-02/07/2026 05:33:17 - INFO - __main__ - T3 model set to trainable.
-02/07/2026 05:33:17 - INFO - __main__ - Loading and processing dataset...
-02/07/2026 05:33:17 - INFO - __main__ - Loading dataset 'rahul7star/hindi-speech-dataset' from Hugging Face Hub.
-02/07/2026 05:33:17 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/rahul7star/hindi-speech-dataset/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/README.md "HTTP/1.1 200 OK"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/hindi-speech-dataset.py "HTTP/1.1 404 Not Found"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/rahul7star/hindi-speech-dataset/rahul7star/hindi-speech-dataset.py "HTTP/1.1 404 Not Found"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/rahul7star/hindi-speech-dataset/revision/0bfd5e2e4555ec80d7dd74b10442836d2e169be6 "HTTP/1.1 200 OK"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/.huggingface.yaml "HTTP/1.1 404 Not Found"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=rahul7star/hindi-speech-dataset "HTTP/1.1 200 OK"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/rahul7star/hindi-speech-dataset/tree/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/data?recursive=true&expand=false "HTTP/1.1 200 OK"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/rahul7star/hindi-speech-dataset/tree/0bfd5e2e4555ec80d7dd74b10442836d2e169be6?recursive=false&expand=false "HTTP/1.1 200 OK"
-02/07/2026 05:33:18 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/rahul7star/hindi-speech-dataset/resolve/0bfd5e2e4555ec80d7dd74b10442836d2e169be6/dataset_infos.json "HTTP/1.1 404 Not Found"
-02/07/2026 05:33:18 - INFO - __main__ - *** Training T3 model ***
-  0%|          | 0/145152 [00:00<?, ?it/s][ATraceback (most recent call last):
   File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 849, in <module>
     main()
     ~~~~^^
@@ -185,60 +254,15 @@ Download complete: : 0.00B [00:00, ?B/s]
     ...<5 lines>...
         metric_key_prefix=metric_key_prefix,
     )
-  File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 4468, in evaluation_loop
-    for step, inputs in enumerate(dataloader):
-                        ~~~~~~~~~^^^^^^^^^^^^
-  File "/usr/local/lib/python3.13/site-packages/accelerate/data_loader.py", line 567, in __iter__
-    current_batch = next(dataloader_iter)
-  File "/usr/local/lib/python3.13/site-packages/torch/utils/data/dataloader.py", line 741, in __next__
-    data = self._next_data()
-  File "/usr/local/lib/python3.13/site-packages/torch/utils/data/dataloader.py", line 1548, in _next_data
-    return self._process_data(data, worker_id)
-           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
-  File "/usr/local/lib/python3.13/site-packages/torch/utils/data/dataloader.py", line 1586, in _process_data
-    data.reraise()
-    ~~~~~~~~~~~~^^
-  File "/usr/local/lib/python3.13/site-packages/torch/_utils.py", line 775, in reraise
-    raise exception
-ImportError: Caught ImportError in DataLoader worker process 0.
-Original Traceback (most recent call last):
-  File "/usr/local/lib/python3.13/site-packages/torch/utils/data/_utils/worker.py", line 358, in _worker_loop
-    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
-  File "/usr/local/lib/python3.13/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
-    data = [self.dataset[idx] for idx in possibly_batched_index]
-            ~~~~~~~~~~~~^^^^^
-  File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 239, in __getitem__
-    wav_16k, text = self._load_audio_text_from_item(idx)
-                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
-  File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 187, in _load_audio_text_from_item
-    item = self.dataset_source[idx]
-           ~~~~~~~~~~~~~~~~~~~^^^^^
-  File "/usr/local/lib/python3.13/site-packages/datasets/arrow_dataset.py", line 2878, in __getitem__
-    return self._getitem(key)
-           ~~~~~~~~~~~~~^^^^^
-  File "/usr/local/lib/python3.13/site-packages/datasets/arrow_dataset.py", line 2860, in _getitem
-    formatted_output = format_table(
-        pa_subtable, key, formatter=formatter, format_columns=format_columns, output_all_columns=output_all_columns
     )
-  File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 658, in format_table
-    return formatter(pa_table, query_type=query_type)
-  File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 411, in __call__
-    return self.format_row(pa_table)
-           ~~~~~~~~~~~~~~~^^^^^^^^^^
-  File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 460, in format_row
-    row = self.python_features_decoder.decode_row(row)
-  File "/usr/local/lib/python3.13/site-packages/datasets/formatting/formatting.py", line 224, in decode_row
-    return self.features.decode_example(row, token_per_repo_id=self.token_per_repo_id) if self.features else row
-           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  File "/usr/local/lib/python3.13/site-packages/datasets/features/features.py", line 2111, in decode_example
-    column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
-                 ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  File "/usr/local/lib/python3.13/site-packages/datasets/features/features.py", line 1419, in decode_nested_example
-    return schema.decode_example(obj, token_per_repo_id=token_per_repo_id) if obj is not None else None
-           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  File "/usr/local/lib/python3.13/site-packages/datasets/features/audio.py", line 186, in decode_example
-    raise ImportError("To support decoding audio data, please install 'torchcodec'.")
-ImportError: To support decoding audio data, please install 'torchcodec'.
-  0%|          | 0/145152 [00:02<?, ?it/s]

+Resolved paths:
+- model_name_or_path: /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9
+- output_dir: /app/checkpoints/v1
+Starting training...
 loaded PerthNet (Implicit) at step 250,000
 /usr/local/lib/python3.13/site-packages/perth/perth_net/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
   from pkg_resources import resource_filename
+02/07/2026 05:45:31 - INFO - __main__ - Training/evaluation parameters CustomTrainingArguments(
 accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
 adam_beta1=0.9,
 adam_beta2=0.999,
 optim=OptimizerNames.ADAMW_TORCH_FUSED,
 optim_args=None,
 optim_target_modules=None,
+output_dir=/app/checkpoints/v1,
 parallelism_config=None,
 per_device_eval_batch_size=8,
 per_device_train_batch_size=1,
 warmup_steps=1.0,
 weight_decay=0.0,
 )
+02/07/2026 05:45:31 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9', local_model_dir=None, cache_dir=None, freeze_voice_encoder=True, freeze_s3gen=True)
+02/07/2026 05:45:31 - INFO - __main__ - Data parameters DataArguments(language='hi', dataset_dir=None, metadata_file=None, dataset_name='dare43321/hindi', dataset_config_name=None, train_split_name='train', eval_split_name='validation', text_column_name='text', audio_column_name='audio', max_text_len=256, max_speech_len=800, audio_prompt_duration_s=3.0, eval_split_size=0.0002, preprocessing_num_workers=None, ignore_verifications=False)
+02/07/2026 05:45:31 - INFO - __main__ - Loading ChatterboxTTS model...
+02/07/2026 05:45:31 - INFO - __main__ - Loading model from Hugging Face Hub: /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9
+02/07/2026 05:45:31 - WARNING - __main__ - Could not download ve.safetensors from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
+02/07/2026 05:45:31 - WARNING - __main__ - Could not download t3_mtl23ls_v2.safetensors from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
+02/07/2026 05:45:31 - WARNING - __main__ - Could not download s3gen.safetensors from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
+02/07/2026 05:45:31 - WARNING - __main__ - Could not download mtl_tokenizer.json from /app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/app/hf_cache/models--ResembleAI--chatterbox/snapshots/05e904af2b5c7f8e482687a9d7336c5c824467d9'. Use `repo_type` argument if needed..
+02/07/2026 05:45:31 - INFO - __main__ - conds.pt not found on Hub or failed to download for this model.
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/revision/main "HTTP/1.1 200 OK"
 Downloading (incomplete total...): 0.00B [00:00, ?B/s][A
+Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s][A02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/conds.pt "HTTP/1.1 302 Found"
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 307 Temporary Redirect"
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/s3gen.pt "HTTP/1.1 302 Found"
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/ve.pt "HTTP/1.1 302 Found"
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/t3_mtl23ls_v2.safetensors "HTTP/1.1 302 Found"
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 200 OK"
+02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
+Downloading (incomplete total...):   0%|          | 0.00/107k [00:00<?, ?B/s][A02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
+Downloading (incomplete total...):   0%|          | 0.00/1.06G [00:00<?, ?B/s][A02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
+Downloading (incomplete total...):   0%|          | 0.00/3.20G [00:00<?, ?B/s][A02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 200 OK"
+Downloading (incomplete total...):   0%|          | 0.00/3.20G [00:00<?, ?B/s][A02/07/2026 05:45:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
+Downloading (incomplete total...):   0%|          | 70.0k/3.21G [00:00<1:02:25, 856kB/s][A
+Downloading (incomplete total...):   0%|          | 13.5M/3.21G [00:02<11:15, 4.73MB/s] [A
+Downloading (incomplete total...):   3%|▎         | 80.6M/3.21G [00:04<02:34, 20.3MB/s][A
+Downloading (incomplete total...):   5%|▍         | 148M/3.21G [00:07<02:22, 21.4MB/s] [A
+Downloading (incomplete total...):  20%|█▉        | 637M/3.21G [00:08<00:22, 116MB/s] [A
+Downloading (incomplete total...):  37%|███▋      | 1.18G/3.21G [00:10<00:11, 183MB/s][A
+Downloading (incomplete total...):  45%|████▌     | 1.45G/3.21G [00:11<00:09, 179MB/s][A
+Downloading (incomplete total...):  69%|██████▉   | 2.21G/3.21G [00:12<00:03, 300MB/s][A
+Fetching 5 files:  60%|██████    | 3/5 [00:13<00:08,  4.38s/it][A
+Fetching 5 files: 100%|██████████| 5/5 [00:13<00:00,  2.73s/it]
+Download complete: 100%|██████████| 3.21G/3.21G [00:13<00:00, 300MB/s]                [A
+Download complete: 100%|██████████| 3.21G/3.21G [00:20<00:00, 156MB/s]
 /usr/local/lib/python3.13/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
   deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
+02/07/2026 05:45:53 - INFO - root - input frame rate=25
+02/07/2026 05:45:54 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/Cangjie5_TC.json "HTTP/1.1 307 Temporary Redirect"
+02/07/2026 05:45:54 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
+02/07/2026 05:45:54 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
+Cangjie5_TC.json:   0%|          | 0.00/1.92M [00:00<?, ?B/s][A
+Cangjie5_TC.json: 100%|██████████| 1.92M/1.92M [00:00<00:00, 89.4MB/s]
+Downloading: "https://github.com/explosion/spacy-pkuseg/releases/download/v0.0.26/spacy_ontonotes.zip" to /root/.pkuseg/spacy_ontonotes.zip
+  0%|          | 0/34567143 [00:00<?, ?it/s][A
+100%|██████████| 34567143/34567143 [00:00<00:00, 188751231.77it/s]
+02/07/2026 05:45:57 - INFO - __main__ - Voice Encoder frozen.
+02/07/2026 05:45:57 - INFO - __main__ - S3Gen model frozen.
+02/07/2026 05:45:57 - INFO - __main__ - T3 model set to trainable.
+02/07/2026 05:45:57 - INFO - __main__ - Loading and processing dataset...
+02/07/2026 05:45:57 - INFO - __main__ - Loading dataset 'dare43321/hindi' from Hugging Face Hub.
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/dare43321/hindi/a3feda7a1dc916a46b8e50462c6104c34a497d95/README.md "HTTP/1.1 200 OK"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/dare43321/hindi/a3feda7a1dc916a46b8e50462c6104c34a497d95/README.md "HTTP/1.1 200 OK"
+README.md:   0%|          | 0.00/312 [00:00<?, ?B/s][A
+README.md: 100%|██████████| 312/312 [00:00<00:00, 1.23MB/s]
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/hindi.py "HTTP/1.1 404 Not Found"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/dare43321/hindi/dare43321/hindi.py "HTTP/1.1 404 Not Found"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/revision/a3feda7a1dc916a46b8e50462c6104c34a497d95 "HTTP/1.1 200 OK"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/.huggingface.yaml "HTTP/1.1 404 Not Found"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=dare43321/hindi "HTTP/1.1 200 OK"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/tree/a3feda7a1dc916a46b8e50462c6104c34a497d95/data?recursive=true&expand=false "HTTP/1.1 200 OK"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/tree/a3feda7a1dc916a46b8e50462c6104c34a497d95?recursive=false&expand=false "HTTP/1.1 200 OK"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/dataset_infos.json "HTTP/1.1 404 Not Found"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/datasets/dare43321/hindi/resolve/a3feda7a1dc916a46b8e50462c6104c34a497d95/data/train-00000-of-00001.parquet "HTTP/1.1 302 Found"
+02/07/2026 05:45:57 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/datasets/dare43321/hindi/xet-read-token/a3feda7a1dc916a46b8e50462c6104c34a497d95 "HTTP/1.1 200 OK"
+data/train-00000-of-00001.parquet:   0%|          | 0.00/13.0M [00:00<?, ?B/s][A
+data/train-00000-of-00001.parquet: 100%|██████████| 13.0M/13.0M [00:00<00:00, 32.4MB/s]
+Generating train split:   0%|          | 0/65 [00:00<?, ? examples/s][A
+Generating train split: 100%|██████████| 65/65 [00:00<00:00, 508.48 examples/s]
+02/07/2026 05:45:58 - INFO - __main__ - Splitting train dataset for evaluation with ratio 0.0002
+02/07/2026 05:45:58 - INFO - __main__ - Evaluation set size: 1
+02/07/2026 05:45:58 - INFO - __main__ - *** Training T3 model ***
+  0%|          | 0/64 [00:00<?, ?it/s][A02/07/2026 05:45:59 - ERROR - __main__ - Unexpected audio data format for item 0: <class 'datasets.features._torchcodec.AudioDecoder'>. Skipping.
+02/07/2026 05:45:59 - WARNING - __main__ - SpeechDataCollator received no valid features. Returning empty batch.
+Traceback (most recent call last):
   File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 849, in <module>
     main()
     ~~~~^^
     ...<5 lines>...
         metric_key_prefix=metric_key_prefix,
     )
+  File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 4478, in evaluation_loop
+    losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
+                             ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 4649, in prediction_step
+    inputs = self._prepare_inputs(inputs)
+  File "/usr/local/lib/python3.13/site-packages/transformers/trainer.py", line 3610, in _prepare_inputs
+    raise ValueError(
+    ...<2 lines>...
     )
+ValueError: The batch received was empty, your model won't be able to train on it. Double-check that your training dataset contains keys expected by the model: text_tokens,text_token_lens,speech_tokens,speech_token_lens,t3_cond_speaker_emb,t3_cond_prompt_speech_tokens,t3_cond_emotion_adv,labels_text,labels_speech,labels_speech,label_ids,label.
+  0%|          | 0/64 [00:02<?, ?it/s]