beng / training.log
rahul7star's picture
Chatterbox fine-tuned model + logs
06f6230 verified
/usr/local/lib/python3.13/site-packages/perth/perth_net/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import resource_filename
02/06/2026 05:27:31 - INFO - __main__ - Training/evaluation parameters CustomTrainingArguments(
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=True,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=8,
dataloader_persistent_workers=False,
dataloader_pin_memory=False,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
early_stopping_patience=None,
enable_jit_checkpoint=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=True,
eval_steps=2,
eval_strategy=IntervalStrategy.STEPS,
eval_use_gather_object=False,
fp16=True,
fp16_full_eval=False,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=None,
hub_revision=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_for_metrics=[],
include_num_input_tokens_seen=no,
label_names=['labels_speech'],
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
liger_kernel_config=None,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=None,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs=None,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
neftune_noise_alpha=None,
num_train_epochs=1.0,
optim=OptimizerNames.ADAMW_TORCH_FUSED,
optim_args=None,
optim_target_modules=None,
output_dir=./checkpoints/v1,
parallelism_config=None,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
prediction_loss_only=False,
project=huggingface,
push_to_hub=False,
remove_unused_columns=True,
report_to=['tensorboard'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=None,
save_on_each_node=False,
save_only_model=False,
save_steps=1.0,
save_strategy=SaveStrategy.STEPS,
save_total_limit=1,
seed=42,
skip_memory_metrics=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
trackio_space_id=trackio,
use_cache=False,
use_cpu=False,
use_liger_kernel=False,
warmup_ratio=None,
warmup_steps=1.0,
weight_decay=0.0,
)
02/06/2026 05:27:31 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='ResembleAI/chatterbox', local_model_dir=None, cache_dir=None, freeze_voice_encoder=True, freeze_s3gen=True)
02/06/2026 05:27:31 - INFO - __main__ - Data parameters DataArguments(language='bn', dataset_dir=None, metadata_file=None, dataset_name=' maddi99/bengali-banspeech', dataset_config_name=None, train_split_name='train', eval_split_name='validation', text_column_name='text_scribe', audio_column_name='audio', max_text_len=256, max_speech_len=800, audio_prompt_duration_s=3.0, eval_split_size=0.0002, preprocessing_num_workers=None, ignore_verifications=False)
02/06/2026 05:27:31 - INFO - __main__ - Loading ChatterboxTTS model...
02/06/2026 05:27:31 - INFO - __main__ - Loading model from Hugging Face Hub: ResembleAI/chatterbox
/usr/local/lib/python3.13/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.
warnings.warn(
02/06/2026 05:27:31 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/ve.safetensors "HTTP/1.1 302 Found"
02/06/2026 05:27:31 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/xet-read-token/05e904af2b5c7f8e482687a9d7336c5c824467d9 "HTTP/1.1 200 OK"
ve.safetensors: 0%| | 0.00/5.70M [00:00<?, ?B/s]
ve.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.70M/5.70M [00:00<00:00, 21.1MB/s]
02/06/2026 05:27:32 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/t3_mtl23ls_v2.safetensors "HTTP/1.1 302 Found"
t3_mtl23ls_v2.safetensors: 0%| | 0.00/2.14G [00:00<?, ?B/s]
t3_mtl23ls_v2.safetensors: 0%| | 7.60M/2.14G [00:01<07:04, 5.04MB/s]
t3_mtl23ls_v2.safetensors: 4%|▍ | 80.9M/2.14G [00:05<02:07, 16.2MB/s]
t3_mtl23ls_v2.safetensors: 27%|β–ˆβ–ˆβ–‹ | 579M/2.14G [00:06<00:13, 118MB/s] 
t3_mtl23ls_v2.safetensors: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 792M/2.14G [00:07<00:10, 128MB/s]
t3_mtl23ls_v2.safetensors: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.29G/2.14G [00:09<00:04, 211MB/s]
t3_mtl23ls_v2.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.14G/2.14G [00:09<00:00, 221MB/s]
02/06/2026 05:27:41 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/s3gen.safetensors "HTTP/1.1 302 Found"
s3gen.safetensors: 0%| | 0.00/1.06G [00:00<?, ?B/s]
s3gen.safetensors: 6%|β–‹ | 67.0M/1.06G [00:01<00:29, 33.9MB/s]
s3gen.safetensors: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 587M/1.06G [00:03<00:02, 229MB/s] 
s3gen.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.06G/1.06G [00:03<00:00, 322MB/s]
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/mtl_tokenizer.json "HTTP/1.1 307 Temporary Redirect"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/mtl_tokenizer.json "HTTP/1.1 200 OK"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/mtl_tokenizer.json "HTTP/1.1 200 OK"
mtl_tokenizer.json: 0%| | 0.00/68.1k [00:00<?, ?B/s]
mtl_tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 68.1k/68.1k [00:00<00:00, 134MB/s]
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/main/conds.pt "HTTP/1.1 302 Found"
conds.pt: 0%| | 0.00/107k [00:00<?, ?B/s]
conds.pt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 107k/107k [00:00<00:00, 1.31MB/s]
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/ResembleAI/chatterbox/revision/main "HTTP/1.1 200 OK"
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 307 Temporary Redirect"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/conds.pt "HTTP/1.1 302 Found"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/s3gen.pt "HTTP/1.1 302 Found"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/ve.pt "HTTP/1.1 302 Found"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/t3_mtl23ls_v2.safetensors "HTTP/1.1 302 Found"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/ResembleAI/chatterbox/resolve/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 307 Temporary Redirect"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
Downloading (incomplete total...): 0%| | 0.00/2.14G [00:00<?, ?B/s]
Downloading (incomplete total...): 0%| | 0.00/3.20G [00:00<?, ?B/s]
Downloading (incomplete total...): 0%| | 0.00/3.21G [00:00<?, ?B/s]
Downloading (incomplete total...): 0%| | 0.00/3.21G [00:00<?, ?B/s]02/06/2026 05:27:45 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/Cangjie5_TC.json "HTTP/1.1 200 OK"
Downloading (incomplete total...): 0%| | 0.00/3.21G [00:00<?, ?B/s]02/06/2026 05:27:45 - INFO - httpx - HTTP Request: HEAD https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 200 OK"
02/06/2026 05:27:45 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/resolve-cache/models/ResembleAI/chatterbox/05e904af2b5c7f8e482687a9d7336c5c824467d9/grapheme_mtl_merged_expanded_v1.json "HTTP/1.1 200 OK"
Downloading (incomplete total...): 0%| | 1.92M/3.21G [00:00<02:36, 20.5MB/s]
Downloading (incomplete total...): 0%| | 15.4M/3.21G [00:01<05:28, 9.71MB/s]
Downloading (incomplete total...): 3%|β–Ž | 86.5M/3.21G [00:06<03:42, 14.0MB/s]
Downloading (incomplete total...): 5%|▍ | 158M/3.21G [00:07<02:13, 22.9MB/s] 
Downloading (incomplete total...): 9%|β–Š | 280M/3.21G [00:09<01:17, 37.6MB/s]
Downloading (incomplete total...): 28%|β–ˆβ–ˆβ–Š | 908M/3.21G [00:10<00:14, 155MB/s] 
Downloading (incomplete total...): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.40G/3.21G [00:11<00:08, 223MB/s]
Fetching 6 files: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/6 [00:12<00:06, 3.09s/it]
Downloading (incomplete total...): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.78G/3.21G [00:12<00:00, 493MB/s]
Fetching 6 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:13<00:00, 2.20s/it]
Download complete: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.21G/3.21G [00:13<00:00, 493MB/s] Traceback (most recent call last):
File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 848, in <module>
main()
~~~~^^
File "/app/chatterbox-multilingual-finetuning/src/finetune_t3.py", line 616, in main
chatterbox_model = ChatterboxMultilingualTTS.from_pretrained(device="cpu")
File "/app/chatterbox-multilingual-finetuning/src/chatterbox/mtl_tts.py", line 188, in from_pretrained
return cls.from_local(ckpt_dir, device)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
TypeError: ChatterboxMultilingualTTS.from_local() takes 2 positional arguments but 3 were given
Download complete: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.21G/3.21G [00:13<00:00, 239MB/s]