[2025-03-10 22:02:04,241] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:04,342] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:06,636] [WARNING] [runner.py:215:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2025-03-10 22:02:06,649] [INFO] [runner.py:607:main] cmd = /opt/conda/envs/aligner/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=26795 --module --enable_each_rank_log=None training.finetune --train_datasets correction-json::/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/data/train_data.json --model_name_or_path /inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/warmup_model/warmup-qwen-7b --max_length 2048 --trust_remote_code True --epochs 3 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --gradient_checkpointing --learning_rate 2e-5 --lr_scheduler_type cosine --lr_warmup_ratio 0.03 --weight_decay 0.0 --seed 42 --output_dir /inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/sft_model/thought-aligner-qwen-7b --log_type wandb --log_project Aligner-SFT --zero_stage 3 --offload none --bf16 True --tf32 True --save_16bit
[2025-03-10 22:02:08,185] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:08,335] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:10,633] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2025-03-10 22:02:10,633] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=4, node_rank=0
[2025-03-10 22:02:10,633] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2025-03-10 22:02:10,633] [INFO] [launch.py:164:main] dist_world_size=4
[2025-03-10 22:02:10,633] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2025-03-10 22:02:10,649] [INFO] [launch.py:256:main] process 2336 spawned with command: ['/opt/conda/envs/aligner/bin/python', '-u', '-m', 'training.finetune', '--local_rank=0', '--train_datasets', 'correction-json::/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/data/train_data.json', '--model_name_or_path', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/warmup_model/warmup-qwen-7b', '--max_length', '2048', '--trust_remote_code', 'True', '--epochs', '3', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '8', '--gradient_checkpointing', '--learning_rate', '2e-5', '--lr_scheduler_type', 'cosine', '--lr_warmup_ratio', '0.03', '--weight_decay', '0.0', '--seed', '42', '--output_dir', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/sft_model/thought-aligner-qwen-7b', '--log_type', 'wandb', '--log_project', 'Aligner-SFT', '--zero_stage', '3', '--offload', 'none', '--bf16', 'True', '--tf32', 'True', '--save_16bit']
[2025-03-10 22:02:10,664] [INFO] [launch.py:256:main] process 2337 spawned with command: ['/opt/conda/envs/aligner/bin/python', '-u', '-m', 'training.finetune', '--local_rank=1', '--train_datasets', 'correction-json::/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/data/train_data.json', '--model_name_or_path', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/warmup_model/warmup-qwen-7b', '--max_length', '2048', '--trust_remote_code', 'True', '--epochs', '3', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '8', '--gradient_checkpointing', '--learning_rate', '2e-5', '--lr_scheduler_type', 'cosine', '--lr_warmup_ratio', '0.03', '--weight_decay', '0.0', '--seed', '42', '--output_dir', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/sft_model/thought-aligner-qwen-7b', '--log_type', 'wandb', '--log_project', 'Aligner-SFT', '--zero_stage', '3', '--offload', 'none', '--bf16', 'True', '--tf32', 'True', '--save_16bit']
[2025-03-10 22:02:10,678] [INFO] [launch.py:256:main] process 2338 spawned with command: ['/opt/conda/envs/aligner/bin/python', '-u', '-m', 'training.finetune', '--local_rank=2', '--train_datasets', 'correction-json::/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/data/train_data.json', '--model_name_or_path', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/warmup_model/warmup-qwen-7b', '--max_length', '2048', '--trust_remote_code', 'True', '--epochs', '3', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '8', '--gradient_checkpointing', '--learning_rate', '2e-5', '--lr_scheduler_type', 'cosine', '--lr_warmup_ratio', '0.03', '--weight_decay', '0.0', '--seed', '42', '--output_dir', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/sft_model/thought-aligner-qwen-7b', '--log_type', 'wandb', '--log_project', 'Aligner-SFT', '--zero_stage', '3', '--offload', 'none', '--bf16', 'True', '--tf32', 'True', '--save_16bit']
[2025-03-10 22:02:10,692] [INFO] [launch.py:256:main] process 2339 spawned with command: ['/opt/conda/envs/aligner/bin/python', '-u', '-m', 'training.finetune', '--local_rank=3', '--train_datasets', 'correction-json::/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/data/train_data.json', '--model_name_or_path', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/warmup_model/warmup-qwen-7b', '--max_length', '2048', '--trust_remote_code', 'True', '--epochs', '3', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '8', '--gradient_checkpointing', '--learning_rate', '2e-5', '--lr_scheduler_type', 'cosine', '--lr_warmup_ratio', '0.03', '--weight_decay', '0.0', '--seed', '42', '--output_dir', '/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/sft_model/thought-aligner-qwen-7b', '--log_type', 'wandb', '--log_project', 'Aligner-SFT', '--zero_stage', '3', '--offload', 'none', '--bf16', 'True', '--tf32', 'True', '--save_16bit']
[2025-03-10 22:02:12,244] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,359] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,418] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,437] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,479] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,532] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,593] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:12,641] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-03-10 22:02:15,960] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-10 22:02:15,965] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-10 22:02:16,009] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-10 22:02:16,009] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2025-03-10 22:02:16,076] [INFO] [comm.py:658:init_distributed] cdb=None
Set logger level to WARNING.
[1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/DeepSpeed/csrc/includes -I/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/DeepSpeed/csrc/adam -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/aligner/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -std=c++17 -c /inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/DeepSpeed/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/DeepSpeed/csrc/includes -I/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/DeepSpeed/csrc/adam -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/aligner/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/aligner/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/DeepSpeed/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/opt/conda/envs/aligner/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so
Time to load fused_adam op: 26.22399663925171 seconds
Time to load fused_adam op: 26.236260175704956 seconds
Time to load fused_adam op: 26.23628854751587 seconds
Time to load fused_adam op: 26.23655343055725 seconds
Parameter Offload: Total persistent parameters: 333312 in 141 params
***** Running training *****
Saving model to "/inspire/hdd/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0/public-project/jiangchangyue-240114020171/workspace/jcy/reasoning_safety/Thought_Aligner/sft_model/thought-aligner-qwen-7b" ...
Saving 16-bit model...
[2025-03-10 22:28:00,198] [INFO] [launch.py:351:main] Process 2338 exits successfully.
[2025-03-10 22:28:00,198] [INFO] [launch.py:351:main] Process 2339 exits successfully.
[2025-03-10 22:28:00,198] [INFO] [launch.py:351:main] Process 2337 exits successfully.
Model saved!
[2025-03-10 22:28:10,209] [INFO] [launch.py:351:main] Process 2336 exits successfully.