| 2025-03-31 12:36:41,400 INFO MainThread:1805682 [wandb_setup.py:_flush():67] Current SDK version is 0.19.8 | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_setup.py:_flush():67] Configure stats pid to 1805682 | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_setup.py:_flush():67] Loading settings from /home/yangyaodong/.config/wandb/settings | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_setup.py:_flush():67] Loading settings from /aifs4su/yaodong/hantao/align-anything/scripts/wandb/settings | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_setup.py:_flush():67] Loading settings from environment variables | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_init.py:setup_run_log_directory():647] Logging user logs to ../outputs/chameleon_sft/top1-40/wandb/run-20250331_123641-9q25qf42/logs/debug.log | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_init.py:setup_run_log_directory():648] Logging internal logs to ../outputs/chameleon_sft/top1-40/wandb/run-20250331_123641-9q25qf42/logs/debug-internal.log | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_init.py:init():761] calling init triggers | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_init.py:init():766] wandb.init called with sweep_config: {} | |
| config: {'train_cfgs': {'save_checkpoint': True, 'load_checkpoint': False, 'ds_cfgs': 'ds_z3_config.json', 'epochs': 3, 'seed': 42, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 16, 'gradient_checkpointing': True, 'learning_rate': 2e-05, 'lr_scheduler_type': 'cosine', 'lr_warmup_ratio': 0.03, 'weight_decay': 0.0, 'adam_betas': [0.9, 0.95], 'adam_epsilon': 1e-08, 'bf16': True, 'fp16': False, 'eval_strategy': 'steps', 'eval_interval': 1000, 'freeze_language_model': False, 'max_grad_norm': 1.0}, 'data_cfgs': {'train_datasets': '/aifs4su/yaodong/hantao/datasets/MMInstruct-GPT4V_mistral-7b_cosi_cut/merged//top1-40', 'train_template': {}, 'train_size': {}, 'train_split': {}, 'train_name': {}, 'train_data_files': 'pre_tokenized/train.pt', 'train_optional_args': [], 'eval_datasets': {}, 'eval_template': {}, 'eval_size': {}, 'eval_split': {}, 'eval_subset': {}, 'eval_data_files': {}, 'eval_optional_args': []}, 'logger_cfgs': {'log_type': 'wandb', 'log_project': 'align-anything', 'log_run_name': 'sft', 'output_dir': '../outputs/chameleon_sft/top1-40', 'cache_dir': {}, 'save_total_limit': 12}, 'model_cfgs': {'model_name_or_path': '/aifs4su/yaodong/hantao/models/chameleon-7b-hf', 'trust_remote_code': True, 'model_max_length': 4096}, 'special_tokens': {}, '_wandb': {}} | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_init.py:init():784] starting backend | |
| 2025-03-31 12:36:41,401 INFO MainThread:1805682 [wandb_init.py:init():788] sending inform_init request | |
| 2025-03-31 12:36:41,411 INFO MainThread:1805682 [backend.py:_multiprocessing_setup():101] multiprocessing start_methods=fork,spawn,forkserver, using: spawn | |
| 2025-03-31 12:36:41,411 INFO MainThread:1805682 [wandb_init.py:init():798] backend started and connected | |
| 2025-03-31 12:36:41,412 INFO MainThread:1805682 [wandb_init.py:init():891] updated telemetry | |
| 2025-03-31 12:36:41,432 INFO MainThread:1805682 [wandb_init.py:init():915] communicating run to backend with 90.0 second timeout | |
| 2025-03-31 12:36:41,952 INFO MainThread:1805682 [wandb_init.py:init():990] starting run threads in backend | |
| 2025-03-31 12:36:42,326 INFO MainThread:1805682 [wandb_run.py:_console_start():2375] atexit reg | |
| 2025-03-31 12:36:42,326 INFO MainThread:1805682 [wandb_run.py:_redirect():2227] redirect: wrap_raw | |
| 2025-03-31 12:36:42,326 INFO MainThread:1805682 [wandb_run.py:_redirect():2292] Wrapping output streams. | |
| 2025-03-31 12:36:42,326 INFO MainThread:1805682 [wandb_run.py:_redirect():2315] Redirects installed. | |
| 2025-03-31 12:36:42,330 INFO MainThread:1805682 [wandb_init.py:init():1032] run started, returning control to user process | |
| 2025-03-31 21:51:16,752 INFO MainThread:1805682 [wandb_run.py:_finish():2112] finishing run htlou/align-anything/9q25qf42 | |
| 2025-03-31 21:51:16,757 INFO MainThread:1805682 [wandb_run.py:_atexit_cleanup():2340] got exitcode: 0 | |
| 2025-03-31 21:51:16,761 INFO MainThread:1805682 [wandb_run.py:_restore():2322] restore | |
| 2025-03-31 21:51:16,762 INFO MainThread:1805682 [wandb_run.py:_restore():2328] restore done | |
| 2025-03-31 21:51:17,768 INFO MainThread:1805682 [wandb_run.py:_restore():2322] restore | |
| 2025-03-31 21:51:17,773 INFO MainThread:1805682 [wandb_run.py:_restore():2328] restore done | |
| 2025-03-31 21:51:17,777 ERROR MainThread:1805682 [wandb_run.py:_atexit_cleanup():2361] Problem finishing run | |
| Traceback (most recent call last): | |
| File "/aifs4su/yaodong/miniconda3/envs/hantao_llama/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup | |
| self._on_finish() | |
| File "/aifs4su/yaodong/miniconda3/envs/hantao_llama/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish | |
| wait_with_progress( | |
| File "/aifs4su/yaodong/miniconda3/envs/hantao_llama/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress | |
| return wait_all_with_progress( | |
| ^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/aifs4su/yaodong/miniconda3/envs/hantao_llama/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress | |
| return asyncio_compat.run(progress_loop_with_timeout) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/aifs4su/yaodong/miniconda3/envs/hantao_llama/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run | |
| future = executor.submit(runner.run, fn) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/aifs4su/yaodong/miniconda3/envs/hantao_llama/lib/python3.11/concurrent/futures/thread.py", line 169, in submit | |
| raise RuntimeError('cannot schedule new futures after ' | |
| RuntimeError: cannot schedule new futures after interpreter shutdown | |
| 2025-03-31 21:51:18,260 INFO MsgRouterThr:1805682 [mailbox.py:close():129] Closing mailbox, abandoning 1 handles. | |