/home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] [INFO|2026-01-30 12:12:42] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:47407 /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] W0130 12:12:43.204000 1791939 site-packages/torch/distributed/run.py:774] W0130 12:12:43.204000 1791939 site-packages/torch/distributed/run.py:774] ***************************************** W0130 12:12:43.204000 1791939 site-packages/torch/distributed/run.py:774] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0130 12:12:43.204000 1791939 site-packages/torch/distributed/run.py:774] ***************************************** /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] [2026-01-30 12:12:51,268] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:51,528] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:51,614] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:51,679] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:51,949] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:51,949] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:52,425] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:52.999972680 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:52,826] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:52.400517899 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:52,856] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:52.430127932 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:52,867] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:52,878] [INFO] [comm.py:669:init_distributed] cdb=None [2026-01-30 12:12:52,878] [INFO] [comm.py:700:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [W130 12:12:52.453372269 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:53,114] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-30 12:12:53,329] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:53.903828929 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:53,353] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:53.928190235 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:53,929] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:53.503041418 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [2026-01-30 12:12:54,246] [INFO] [comm.py:669:init_distributed] cdb=None [W130 12:12:54.820017974 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 4, world size: 8, device: cuda:4, distributed training: True, compute dtype: torch.bfloat16 [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 0, world size: 8, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16 [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,335 >> loading file vocab.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,335 >> loading file merges.txt [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,336 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,336 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,336 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,336 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,336 >> loading file chat_template.jinja [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 6, world size: 8, device: cuda:6, distributed training: True, compute dtype: torch.bfloat16 [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 1, world size: 8, device: cuda:1, distributed training: True, compute dtype: torch.bfloat16 [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 3, world size: 8, device: cuda:3, distributed training: True, compute dtype: torch.bfloat16 [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 7, world size: 8, device: cuda:7, distributed training: True, compute dtype: torch.bfloat16 [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 2, world size: 8, device: cuda:2, distributed training: True, compute dtype: torch.bfloat16 [INFO|2026-01-30 12:12:57] llamafactory.hparams.parser:465 >> Process rank: 5, world size: 8, device: cuda:5, distributed training: True, compute dtype: torch.bfloat16 [INFO|tokenization_utils_base.py:2337] 2026-01-30 12:12:57,592 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|image_processing_base.py:374] 2026-01-30 12:12:57,592 >> loading configuration file /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface/preprocessor_config.json [INFO|image_processing_base.py:374] 2026-01-30 12:12:57,594 >> loading configuration file /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface/preprocessor_config.json [INFO|image_processing_base.py:421] 2026-01-30 12:12:57,595 >> Image processor Qwen2VLImageProcessor { "crop_size": null, "data_format": "channels_first", "default_to_square": true, "device": null, "disable_grouping": null, "do_center_crop": null, "do_convert_rgb": true, "do_normalize": true, "do_pad": null, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "input_data_format": null, "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "pad_size": null, "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "return_tensors": null, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "temporal_patch_size": 2 } [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file vocab.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file merges.txt [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2066] 2026-01-30 12:12:57,598 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2337] 2026-01-30 12:12:57,736 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|video_processing_utils.py:727] 2026-01-30 12:12:57,737 >> loading configuration file /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface/video_preprocessor_config.json [INFO|video_processing_utils.py:773] 2026-01-30 12:12:57,739 >> Video processor Qwen2VLVideoProcessor { "crop_size": null, "data_format": "channels_first", "default_to_square": true, "device": null, "do_center_crop": null, "do_convert_rgb": true, "do_normalize": true, "do_pad": null, "do_rescale": true, "do_resize": true, "do_sample_frames": false, "fps": null, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "input_data_format": null, "max_frames": 768, "max_pixels": 12845056, "merge_size": 2, "min_frames": 4, "min_pixels": 3136, "num_frames": null, "pad_size": null, "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "size_divisor": null, "temporal_patch_size": 2, "video_metadata": null, "video_processor_type": "Qwen2VLVideoProcessor" } [INFO|processing_utils.py:1051] 2026-01-30 12:12:57,739 >> loading configuration file None /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once [INFO|processing_utils.py:1136] 2026-01-30 12:12:57,991 >> Processor Qwen2_5_VLProcessor: - image_processor: Qwen2VLImageProcessor { "crop_size": null, "data_format": "channels_first", "default_to_square": true, "device": null, "disable_grouping": null, "do_center_crop": null, "do_convert_rgb": true, "do_normalize": true, "do_pad": null, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "input_data_format": null, "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "pad_size": null, "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "return_tensors": null, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "temporal_patch_size": 2 } - tokenizer: Qwen2Tokenizer(name_or_path='/mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface', vocab_size=151643, model_max_length=131072, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151657: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151658: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), } ) - video_processor: Qwen2VLVideoProcessor { "crop_size": null, "data_format": "channels_first", "default_to_square": true, "device": null, "do_center_crop": null, "do_convert_rgb": true, "do_normalize": true, "do_pad": null, "do_rescale": true, "do_resize": true, "do_sample_frames": false, "fps": null, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "input_data_format": null, "max_frames": 768, "max_pixels": 12845056, "merge_size": 2, "min_frames": 4, "min_pixels": 3136, "num_frames": null, "pad_size": null, "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "return_metadata": false, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "size_divisor": null, "temporal_patch_size": 2, "video_metadata": null, "video_processor_type": "Qwen2VLVideoProcessor" } { "processor_class": "Qwen2_5_VLProcessor" } [INFO|2026-01-30 12:12:58] llamafactory.data.loader:143 >> Loading dataset multi_turn_action_gen.json... /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once [rank4]:[W130 12:12:58.720205755 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. [rank7]:[W130 12:12:58.020678286 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. [rank6]:[W130 12:12:58.021496872 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. [rank3]:[W130 12:12:58.022788519 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once [rank1]:[W130 12:12:58.058315833 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. /home/ubuntu/miniconda3/envs/viewsuite/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. warnings.warn( # warn only once [rank5]:[W130 12:12:58.085599525 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. [rank2]:[W130 12:12:58.136571768 ProcessGroupNCCL.cpp:5023] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can specify device_id in init_process_group() to force use of a particular device. Setting num_proc from 4 back to 1 for the train split to disable multiprocessing as it only contains one shard. WARNING:datasets.builder:Setting num_proc from 4 back to 1 for the train split to disable multiprocessing as it only contains one shard. Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1468 examples [00:00, 20884.81 examples/s] Converting format of dataset (num_proc=4): 0%| | 0/1468 [00:00> Loading dataset forward_dynamics.json... Setting num_proc from 4 back to 1 for the train split to disable multiprocessing as it only contains one shard. WARNING:datasets.builder:Setting num_proc from 4 back to 1 for the train split to disable multiprocessing as it only contains one shard. Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1431 examples [00:00, 62050.15 examples/s] Converting format of dataset (num_proc=4): 0%| | 0/1431 [00:00> Loading dataset action_gen.json... Setting num_proc from 4 back to 1 for the train split to disable multiprocessing as it only contains one shard. WARNING:datasets.builder:Setting num_proc from 4 back to 1 for the train split to disable multiprocessing as it only contains one shard. Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 2896 examples [00:00, 34487.30 examples/s] Converting format of dataset (num_proc=4): 0%| | 0/2896 [00:00system You are a spatial reasoning agent navigating through a 3D scene. You are given an initial view and a target view. Navigate step by step to reach the target. Each action moves 0.5m or rotates 30.0 degrees. Output your action in the format: action1, action2, ... When you reach the target view, output: answer(tx, ty, tz, rx, ry, rz)<|im_end|> <|im_start|>user Navigate from the initial view <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> to the target view <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|>. Top-down reference: <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> Current camera 6-DoF (c2w, Euler XYZ, DEGREES): [tx=2.7451, ty=1.4536, tz=1.5408, rx=-120.00°, ry=0.00°, rz=-120.00°] Step 1/3<|im_end|> <|im_start|>assistant turn_right<|im_end|> <|im_start|>user format: ok Current camera 6-DoF (c2w, Euler XYZ, DEGREES): [tx=2.7451, ty=1.4536, tz=1.5408, rx=-120.00°, ry=0.00°, rz=-150.00°] <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> Step 2/3<|im_end|> <|im_start|>assistant turn_right<|im_end|> <|im_start|>user format: ok Current camera 6-DoF (c2w, Euler XYZ, DEGREES): [tx=2.7451, ty=1.4536, tz=1.5408, rx=-120.00°, ry=0.00°, rz=180.00°] <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> Step 3/3<|im_end|> <|im_start|>assistant answer(2.7, 1.5, 1.5, -120.0, 0.0, 180.0)<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 27, 1311, 29, 412, 10539, 522, 1311, 29, 151645, 198, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 27, 1311, 29, 412, 10539, 522, 1311, 29, 151645, 198, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 27, 1311, 29, 9217, 7, 17, 13, 22, 11, 220, 16, 13, 20, 11, 220, 16, 13, 20, 11, 481, 16, 17, 15, 13, 15, 11, 220, 15, 13, 15, 11, 220, 16, 23, 15, 13, 15, 12533, 1311, 29, 151645, 198] labels: turn_right<|im_end|> turn_right<|im_end|> answer(2.7, 1.5, 1.5, -120.0, 0.0, 180.0)<|im_end|> [INFO|configuration_utils.py:763] 2026-01-30 12:16:01,158 >> loading configuration file /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface/config.json [INFO|configuration_utils.py:839] 2026-01-30 12:16:01,169 >> Model config Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "dtype": "float32", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "pad_token_id": 151643, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "_name_or_path": "/mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_1/rl/verl_checkpoints/global_step_50/actor/huggingface", "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "dtype": "float32", "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": null, "initializer_range": 0.02, "intermediate_size": 18944, "layer_types": [ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention" ], "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "pad_token_id": 151643, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": null, "use_cache": false, "use_sliding_window": false, "video_token_id": null, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "transformers_version": "4.56.1", "use_cache": false, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "depth": 32, "dtype": "float32", "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 } [INFO|2026-01-30 12:16:01] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training. [WARNING|logging.py:328] 2026-01-30 12:16:01,671 >> `torch_dtype` is deprecated! Use `dtype` instead! [INFO|modeling_utils.py:1277] 2026-01-30 12:16:01,671 >> loading weights file /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface/model.safetensors.index.json [INFO|modeling_utils.py:1351] 2026-01-30 12:16:01,671 >> Will use dtype=torch.float32 as defined in model's config object [INFO|modeling_utils.py:2466] 2026-01-30 12:16:01,671 >> Instantiating Qwen2_5_VLForConditionalGeneration model under default dtype torch.float32. [INFO|modeling_utils.py:4489] 2026-01-30 12:16:01,671 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model [2026-01-30 12:16:01,671] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 [INFO|configuration_utils.py:1055] 2026-01-30 12:16:01,679 >> Generate config GenerationConfig { "eos_token_id": 151645, "pad_token_id": 151643, "use_cache": false } [INFO|modeling_utils.py:2466] 2026-01-30 12:16:01,680 >> Instantiating Qwen2_5_VisionTransformerPretrainedModel model under default dtype torch.float32. `torch_dtype` is deprecated! Use `dtype` instead! [2026-01-30 12:16:02,896] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 `torch_dtype` is deprecated! Use `dtype` instead! [2026-01-30 12:16:02,981] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 `torch_dtype` is deprecated! Use `dtype` instead! [2026-01-30 12:16:02,984] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! [2026-01-30 12:16:02,997] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 [2026-01-30 12:16:02,998] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 `torch_dtype` is deprecated! Use `dtype` instead! [2026-01-30 12:16:03,005] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 `torch_dtype` is deprecated! Use `dtype` instead! [2026-01-30 12:16:04,350] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 [INFO|modeling_utils.py:2466] 2026-01-30 12:16:04,479 >> Instantiating Qwen2_5_VLTextModel model under default dtype torch.float32. Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing Qwen2_5_VLForConditionalGeneration. [INFO|modeling_utils.py:5729] 2026-01-30 12:16:18,424 >> All the weights of Qwen2_5_VLForConditionalGeneration were initialized from the model checkpoint at /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2_5_VLForConditionalGeneration for predictions without further training. [INFO|configuration_utils.py:1008] 2026-01-30 12:16:18,427 >> loading configuration file /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/rl/verl_checkpoints/global_step_100/actor/huggingface/generation_config.json [INFO|configuration_utils.py:1055] 2026-01-30 12:16:18,427 >> Generate config GenerationConfig { "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 1e-06 } [INFO|2026-01-30 12:16:18] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled. [INFO|2026-01-30 12:16:18] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. [INFO|2026-01-30 12:16:18] llamafactory.model.adapter:143 >> DeepSpeed ZeRO3 detected, remaining trainable params in float32. [INFO|2026-01-30 12:16:18] llamafactory.model.adapter:143 >> Fine-tuning method: Full [INFO|2026-01-30 12:16:18] llamafactory.model.model_utils.visual:143 >> Set vision model not trainable: ['visual.patch_embed', 'visual.blocks']. [INFO|2026-01-30 12:16:18] llamafactory.model.model_utils.visual:143 >> Set multi model projector not trainable: visual.merger. [INFO|2026-01-30 12:16:18] llamafactory.model.loader:143 >> trainable params: 7,615,616,512 || all params: 8,292,166,656 || trainable%: 91.8411 [INFO|trainer.py:757] 2026-01-30 12:16:18,449 >> Using auto half precision backend WARNING:accelerate.accelerator:Gradient accumulation steps mismatch: GradientAccumulationPlugin has 1, DeepSpeed config has 2. Using DeepSpeed's value. [2026-01-30 12:16:18,842] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed info: version=0.16.9, git-hash=unknown, git-branch=unknown [2026-01-30 12:16:18,842] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8 [2026-01-30 12:16:18,850] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2026-01-30 12:16:18,851] [INFO] [logging.py:107:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2026-01-30 12:16:18,851] [INFO] [logging.py:107:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2026-01-30 12:16:18,862] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW [2026-01-30 12:16:18,862] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= [2026-01-30 12:16:18,862] [INFO] [logging.py:107:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False [2026-01-30 12:16:18,862] [INFO] [logging.py:107:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer [2026-01-30 12:16:19,048] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning [2026-01-30 12:16:19,049] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 4.85 GB CA 1.93 GB Max_CA 5 GB [2026-01-30 12:16:19,049] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 549.48 GB, percent = 15.6% [2026-01-30 12:16:19,050] [INFO] [stage3.py:170:__init__] Reduce bucket size 12845056 [2026-01-30 12:16:19,050] [INFO] [stage3.py:171:__init__] Prefetch bucket size 11560550 [2026-01-30 12:16:19,190] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] [2026-01-30 12:16:19,190] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.93 GB Max_CA 2 GB [2026-01-30 12:16:19,190] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 549.48 GB, percent = 15.6% Parameter Offload: Total persistent parameters: 848896 in 368 params [2026-01-30 12:16:19,383] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end] [2026-01-30 12:16:19,384] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.93 GB Max_CA 2 GB [2026-01-30 12:16:19,384] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 549.47 GB, percent = 15.6% [2026-01-30 12:16:19,526] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions [2026-01-30 12:16:19,526] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.93 GB Max_CA 2 GB [2026-01-30 12:16:19,526] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 549.46 GB, percent = 15.6% [2026-01-30 12:16:21,240] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 2 [2026-01-30 12:16:21,241] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.97 GB Max_CA 2 GB [2026-01-30 12:16:21,241] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 561.85 GB, percent = 15.9% [2026-01-30 12:16:21,395] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions [2026-01-30 12:16:21,395] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.97 GB Max_CA 2 GB [2026-01-30 12:16:21,395] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 560.91 GB, percent = 15.9% [2026-01-30 12:16:21,655] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions [2026-01-30 12:16:21,656] [INFO] [utils.py:782:see_memory_usage] MA 5.48 GB Max_MA 7.25 GB CA 7.28 GB Max_CA 7 GB [2026-01-30 12:16:21,656] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 559.75 GB, percent = 15.9% [2026-01-30 12:16:21,796] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states [2026-01-30 12:16:21,797] [INFO] [utils.py:782:see_memory_usage] MA 5.48 GB Max_MA 5.48 GB CA 7.28 GB Max_CA 7 GB [2026-01-30 12:16:21,797] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 559.76 GB, percent = 15.9% [2026-01-30 12:16:21,970] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states [2026-01-30 12:16:21,970] [INFO] [utils.py:782:see_memory_usage] MA 5.48 GB Max_MA 9.02 GB CA 10.84 GB Max_CA 11 GB [2026-01-30 12:16:21,970] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 559.77 GB, percent = 15.9% [2026-01-30 12:16:21,971] [INFO] [stage3.py:534:_setup_for_real_optimizer] optimizer state initialized [2026-01-30 12:16:22,206] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer [2026-01-30 12:16:22,207] [INFO] [utils.py:782:see_memory_usage] MA 7.27 GB Max_MA 9.3 GB CA 10.84 GB Max_CA 11 GB [2026-01-30 12:16:22,207] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 560.12 GB, percent = 15.9% [2026-01-30 12:16:22,207] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer_Stage3 [2026-01-30 12:16:22,207] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = None [2026-01-30 12:16:22,207] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2026-01-30 12:16:22,207] [INFO] [logging.py:107:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2026-01-30 12:16:22,208] [INFO] [config.py:1003:print] DeepSpeedEngine configuration: [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'intra_op_parallelism': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False} [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] amp_enabled .................. False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] amp_params ................... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] bfloat16_enabled ............. True [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] bfloat16_immediate_grad_update True [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] checkpoint_parallel_write_pipeline False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] checkpoint_tag_validation_enabled True [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] checkpoint_tag_validation_fail False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] comms_config ................. [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] communication_data_type ...... None [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] compile_config ............... deepcompile=False free_activation=False offload_activation=False offload_opt_states=False double_buffer=True symmetric_memory=False debug_log=False offload_parameters=False sync_before_reduce=False sync_after_reduce=False sync_before_allgather=False sync_after_allgather=False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] curriculum_enabled_legacy .... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] curriculum_params_legacy ..... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'pin_memory': False, 'curriculum_learning': {'enabled': False}, 'dynamic_batching': {'enabled': False, 'lr_scaling_method': 'linear', 'min_batch_size': 1, 'max_batch_size': None, 'sequence_picking_order': 'dataloader', 'verbose': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] data_efficiency_enabled ...... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] dataloader_drop_last ......... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] disable_allgather ............ False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] dump_state ................... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] dynamic_loss_scale_args ...... None [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_enabled ........... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_gas_boundary_resolution 1 [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_layer_name ........ bert.encoder.layer [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_layer_num ......... 0 [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_max_iter .......... 100 [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_stability ......... 1e-06 [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_tol ............... 0.01 [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] eigenvalue_verbose ........... False [2026-01-30 12:16:22,209] [INFO] [config.py:1007:print] elasticity_enabled ........... False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] fp16_auto_cast ............... None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] fp16_enabled ................. False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] fp16_master_weights_and_gradients False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] global_rank .................. 0 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] grad_accum_dtype ............. None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] gradient_accumulation_steps .. 2 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] gradient_clipping ............ 1.0 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] gradient_predivide_factor .... 1.0 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] graph_harvesting ............. False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] initial_dynamic_scale ........ 1 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] load_universal_checkpoint .... False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] loss_scale ................... 1.0 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] memory_breakdown ............. False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] mics_hierarchial_params_gather False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] mics_shard_size .............. -1 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] optimizer_legacy_fusion ...... False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] optimizer_name ............... None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] optimizer_params ............. None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] pld_enabled .................. False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] pld_params ................... False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] prescale_gradients ........... False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] scheduler_name ............... None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] scheduler_params ............. None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] seq_parallel_communication_data_type torch.float32 [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] sparse_attention ............. None [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] sparse_gradients_enabled ..... False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] steps_per_print .............. inf [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] tensor_parallel_config ....... dtype=torch.float16 autotp_size=0 tp_overlap_comm=False tensor_parallel=TPConfig(tp_size=1, tp_grain_size=1, mpu=None, tp_group=None) injection_policy_tuple=None keep_module_on_host=False replace_with_kernel_inject=False [2026-01-30 12:16:22,210] [INFO] [config.py:1007:print] timers_config ................ enabled=True synchronized=True [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] train_batch_size ............. 16 [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] train_micro_batch_size_per_gpu 1 [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] use_data_before_expert_parallel_ False [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] use_node_local_storage ....... False [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] wall_clock_breakdown ......... False [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] weight_quantization_config ... None [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] world_size ................... 8 [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] zero_allow_untested_optimizer True [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=12845056 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=11560550 param_persistence_threshold=35840 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True log_trace_cache_warnings=False [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] zero_enabled ................. True [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] zero_force_ds_cpu_optimizer .. True [2026-01-30 12:16:22,211] [INFO] [config.py:1007:print] zero_optimization_stage ...... 3 [2026-01-30 12:16:22,211] [INFO] [config.py:993:print_user_config] json = { "train_batch_size": 16, "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "zero_allow_untested_optimizer": true, "fp16": { "enabled": false, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "zero_optimization": { "stage": 3, "overlap_comm": false, "contiguous_gradients": true, "sub_group_size": 1.000000e+09, "reduce_bucket_size": 1.284506e+07, "stage3_prefetch_bucket_size": 1.156055e+07, "stage3_param_persistence_threshold": 3.584000e+04, "stage3_max_live_parameters": 1.000000e+09, "stage3_max_reuse_distance": 1.000000e+09, "stage3_gather_16bit_weights_on_model_save": true }, "steps_per_print": inf } [INFO|trainer.py:2523] 2026-01-30 12:16:22,212 >> ***** Running training ***** [INFO|trainer.py:2524] 2026-01-30 12:16:22,212 >> Num examples = 5,795 [INFO|trainer.py:2525] 2026-01-30 12:16:22,212 >> Num Epochs = 1 [INFO|trainer.py:2526] 2026-01-30 12:16:22,212 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2529] 2026-01-30 12:16:22,212 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2530] 2026-01-30 12:16:22,212 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2531] 2026-01-30 12:16:22,212 >> Total optimization steps = 363 [INFO|trainer.py:2532] 2026-01-30 12:16:22,214 >> Number of trainable parameters = 7,615,616,512 [INFO|integration_utils.py:869] 2026-01-30 12:16:22,215 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from WANDB_API_KEY. wandb: Currently logged in as: kangrw (ragen-V) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: setting up run eiq1f5w1 wandb: Tracking run with wandb version 0.24.0 wandb: Run data is saved locally in /home/ubuntu/projects/viewsuite/LLaMA-Factory/wandb/run-20260130_121623-eiq1f5w1 wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run skilled-universe-25 wandb: ⭐️ View project at https://wandb.ai/ragen-V/llamafactory wandb: 🚀 View run at https://wandb.ai/ragen-V/llamafactory/runs/eiq1f5w1 0%| | 0/363 [00:00> Saving model checkpoint to /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100 [INFO|configuration_utils.py:491] 2026-01-30 12:20:06,523 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/config.json [INFO|configuration_utils.py:826] 2026-01-30 12:20:06,524 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/generation_config.json [INFO|modeling_utils.py:4305] 2026-01-30 12:20:22,513 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:20:22,514 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:20:22,514 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:20:22,515 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:20:22,515 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/added_tokens.json [INFO|image_processing_base.py:253] 2026-01-30 12:20:22,735 >> Image processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/preprocessor_config.json [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:20:22,736 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:20:22,736 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:20:22,736 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:20:22,736 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/added_tokens.json [INFO|video_processing_utils.py:610] 2026-01-30 12:20:22,908 >> Video processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/video_preprocessor_config.json [INFO|processing_utils.py:752] 2026-01-30 12:20:22,908 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-100/chat_template.jinja 28%|██▊ | 101/363 [03:59<41:57, 9.61s/it] {'loss': 0.0809, 'grad_norm': 1.43840041921586, 'learning_rate': 1.8212958445580623e-05, 'epoch': 0.28} 28%|██▊ | 101/363 [03:59<41:57, 9.61s/it] 28%|██▊ | 102/363 [04:01<31:53, 7.33s/it] {'loss': 0.0987, 'grad_norm': 1.5624416473825227, 'learning_rate': 1.815759982516061e-05, 'epoch': 0.28} 28%|██▊ | 102/363 [04:01<31:53, 7.33s/it] 28%|██▊ | 103/363 [04:03<24:42, 5.70s/it] {'loss': 0.1055, 'grad_norm': 1.3722413173440258, 'learning_rate': 1.8101483633322255e-05, 'epoch': 0.28} 28%|██▊ | 103/363 [04:03<24:42, 5.70s/it] 29%|██▊ | 104/363 [04:05<20:07, 4.66s/it] {'loss': 0.12, 'grad_norm': 1.5055322976041905, 'learning_rate': 1.8044615081405153e-05, 'epoch': 0.29} 29%|██▊ | 104/363 [04:05<20:07, 4.66s/it] 29%|██▉ | 105/363 [04:07<16:39, 3.87s/it] {'loss': 0.1293, 'grad_norm': 1.9238100509571232, 'learning_rate': 1.7986999450618295e-05, 'epoch': 0.29} 29%|██▉ | 105/363 [04:07<16:39, 3.87s/it] 29%|██▉ | 106/363 [04:09<14:00, 3.27s/it] {'loss': 0.0945, 'grad_norm': 1.4169309751550185, 'learning_rate': 1.7928642091549616e-05, 'epoch': 0.29} 29%|██▉ | 106/363 [04:09<14:00, 3.27s/it] 29%|██▉ | 107/363 [04:11<12:12, 2.86s/it] {'loss': 0.1124, 'grad_norm': 1.7127369268478376, 'learning_rate': 1.7869548423669075e-05, 'epoch': 0.3} 29%|██▉ | 107/363 [04:11<12:12, 2.86s/it] 30%|██▉ | 108/363 [04:13<10:40, 2.51s/it] {'loss': 0.1182, 'grad_norm': 1.859903898855105, 'learning_rate': 1.7809723934825405e-05, 'epoch': 0.3} 30%|██▉ | 108/363 [04:13<10:40, 2.51s/it] 30%|███ | 109/363 [04:15<09:36, 2.27s/it] {'loss': 0.0857, 'grad_norm': 1.4775939025780847, 'learning_rate': 1.7749174180736443e-05, 'epoch': 0.3} 30%|███ | 109/363 [04:15<09:36, 2.27s/it] 30%|███ | 110/363 [04:16<09:01, 2.14s/it] {'loss': 0.0781, 'grad_norm': 1.4354799536836982, 'learning_rate': 1.768790478447319e-05, 'epoch': 0.3} 30%|███ | 110/363 [04:16<09:01, 2.14s/it] 31%|███ | 111/363 [04:18<08:43, 2.08s/it] {'loss': 0.0713, 'grad_norm': 1.0273549786544405, 'learning_rate': 1.762592143593764e-05, 'epoch': 0.31} 31%|███ | 111/363 [04:18<08:43, 2.08s/it] 31%|███ | 112/363 [04:20<08:20, 1.99s/it] {'loss': 0.1301, 'grad_norm': 1.9640486386494376, 'learning_rate': 1.756322989133434e-05, 'epoch': 0.31} 31%|███ | 112/363 [04:20<08:20, 1.99s/it] 31%|███ | 113/363 [04:22<08:03, 1.94s/it] {'loss': 0.0681, 'grad_norm': 1.7089277562540073, 'learning_rate': 1.749983597263586e-05, 'epoch': 0.31} 31%|███ | 113/363 [04:22<08:03, 1.94s/it] 31%|███▏ | 114/363 [04:24<07:54, 1.91s/it] {'loss': 0.1331, 'grad_norm': 1.9429869321530793, 'learning_rate': 1.7435745567042096e-05, 'epoch': 0.31} 31%|███▏ | 114/363 [04:24<07:54, 1.91s/it] 32%|███▏ | 115/363 [04:25<07:37, 1.84s/it] {'loss': 0.0604, 'grad_norm': 1.26536650090277, 'learning_rate': 1.737096462643357e-05, 'epoch': 0.32} 32%|███▏ | 115/363 [04:25<07:37, 1.84s/it] 32%|███▏ | 116/363 [04:27<07:34, 1.84s/it] {'loss': 0.1494, 'grad_norm': 2.087216494160791, 'learning_rate': 1.730549916681868e-05, 'epoch': 0.32} 32%|███▏ | 116/363 [04:27<07:34, 1.84s/it] 32%|███▏ | 117/363 [04:29<07:32, 1.84s/it] {'loss': 0.1092, 'grad_norm': 1.9975034441736954, 'learning_rate': 1.723935526777502e-05, 'epoch': 0.32} 32%|███▏ | 117/363 [04:29<07:32, 1.84s/it] 33%|███▎ | 118/363 [04:31<07:31, 1.84s/it] {'loss': 0.1082, 'grad_norm': 2.038235893575754, 'learning_rate': 1.717253907188477e-05, 'epoch': 0.33} 33%|███▎ | 118/363 [04:31<07:31, 1.84s/it] 33%|███▎ | 119/363 [04:33<07:24, 1.82s/it] {'loss': 0.1703, 'grad_norm': 3.3184113391653396, 'learning_rate': 1.7105056784164295e-05, 'epoch': 0.33} 33%|███▎ | 119/363 [04:33<07:24, 1.82s/it] 33%|███▎ | 120/363 [04:34<07:18, 1.81s/it] {'loss': 0.1303, 'grad_norm': 2.0270058507038966, 'learning_rate': 1.7036914671487854e-05, 'epoch': 0.33} 33%|███▎ | 120/363 [04:34<07:18, 1.81s/it] 33%|███▎ | 121/363 [04:36<07:11, 1.78s/it] {'loss': 0.1077, 'grad_norm': 1.8154593381383104, 'learning_rate': 1.6968119062005644e-05, 'epoch': 0.33} 33%|███▎ | 121/363 [04:36<07:11, 1.78s/it] 34%|███▎ | 122/363 [04:38<07:11, 1.79s/it] {'loss': 0.0924, 'grad_norm': 1.2350092975243878, 'learning_rate': 1.689867634455612e-05, 'epoch': 0.34} 34%|███▎ | 122/363 [04:38<07:11, 1.79s/it] 34%|███▍ | 123/363 [04:40<07:22, 1.84s/it] {'loss': 0.1221, 'grad_norm': 1.792900391384705, 'learning_rate': 1.682859296807268e-05, 'epoch': 0.34} 34%|███▍ | 123/363 [04:40<07:22, 1.84s/it] 34%|███▍ | 124/363 [04:42<07:29, 1.88s/it] {'loss': 0.1106, 'grad_norm': 1.9404357508554042, 'learning_rate': 1.675787544098477e-05, 'epoch': 0.34} 34%|███▍ | 124/363 [04:42<07:29, 1.88s/it] 34%|███▍ | 125/363 [04:44<07:50, 1.98s/it] {'loss': 0.1122, 'grad_norm': 1.7814508590384097, 'learning_rate': 1.6686530330613472e-05, 'epoch': 0.34} 34%|███▍ | 125/363 [04:44<07:50, 1.98s/it] 35%|███▍ | 126/363 [04:47<08:17, 2.10s/it] {'loss': 0.087, 'grad_norm': 1.53815702992395, 'learning_rate': 1.661456426256161e-05, 'epoch': 0.35} 35%|███▍ | 126/363 [04:47<08:17, 2.10s/it] 35%|███▍ | 127/363 [04:49<08:36, 2.19s/it] {'loss': 0.087, 'grad_norm': 1.9077349161199095, 'learning_rate': 1.6541983920098462e-05, 'epoch': 0.35} 35%|███▍ | 127/363 [04:49<08:36, 2.19s/it] 35%|███▌ | 128/363 [04:51<08:15, 2.11s/it] {'loss': 0.1254, 'grad_norm': 2.2996250559408704, 'learning_rate': 1.6468796043539082e-05, 'epoch': 0.35} 35%|███▌ | 128/363 [04:51<08:15, 2.11s/it] 36%|███▌ | 129/363 [04:52<07:41, 1.97s/it] {'loss': 0.0498, 'grad_norm': 1.2031295382564096, 'learning_rate': 1.639500742961838e-05, 'epoch': 0.36} 36%|███▌ | 129/363 [04:53<07:41, 1.97s/it] 36%|███▌ | 130/363 [04:54<07:14, 1.87s/it] {'loss': 0.077, 'grad_norm': 1.3436281870681093, 'learning_rate': 1.6320624930859905e-05, 'epoch': 0.36} 36%|███▌ | 130/363 [04:54<07:14, 1.87s/it] 36%|███▌ | 131/363 [04:56<07:11, 1.86s/it] {'loss': 0.0352, 'grad_norm': 0.6314939772663396, 'learning_rate': 1.6245655454939474e-05, 'epoch': 0.36} 36%|███▌ | 131/363 [04:56<07:11, 1.86s/it] 36%|███▋ | 132/363 [04:58<06:55, 1.80s/it] {'loss': 0.1228, 'grad_norm': 2.420910473317855, 'learning_rate': 1.6170105964043698e-05, 'epoch': 0.36} 36%|███▋ | 132/363 [04:58<06:55, 1.80s/it] 37%|███▋ | 133/363 [05:00<07:18, 1.91s/it] {'loss': 0.1799, 'grad_norm': 2.80050435155021, 'learning_rate': 1.6093983474223392e-05, 'epoch': 0.37} 37%|███▋ | 133/363 [05:00<07:18, 1.91s/it] 37%|███▋ | 134/363 [05:02<07:49, 2.05s/it] {'loss': 0.1088, 'grad_norm': 1.9733179937634817, 'learning_rate': 1.6017295054742045e-05, 'epoch': 0.37} 37%|███▋ | 134/363 [05:02<07:49, 2.05s/it] 37%|███▋ | 135/363 [05:05<08:27, 2.22s/it] {'loss': 0.0722, 'grad_norm': 0.8866446082715613, 'learning_rate': 1.5940047827419305e-05, 'epoch': 0.37} 37%|███▋ | 135/363 [05:05<08:27, 2.22s/it] 37%|███▋ | 136/363 [05:07<08:19, 2.20s/it] {'loss': 0.0902, 'grad_norm': 1.4747083716633578, 'learning_rate': 1.5862248965969604e-05, 'epoch': 0.38} 37%|███▋ | 136/363 [05:07<08:19, 2.20s/it] 38%|███▊ | 137/363 [05:09<08:08, 2.16s/it] {'loss': 0.0763, 'grad_norm': 1.203872871772734, 'learning_rate': 1.5783905695335947e-05, 'epoch': 0.38} 38%|███▊ | 137/363 [05:09<08:08, 2.16s/it] 38%|███▊ | 138/363 [05:11<07:47, 2.08s/it] {'loss': 0.0582, 'grad_norm': 1.295759494638433, 'learning_rate': 1.570502529101896e-05, 'epoch': 0.38} 38%|███▊ | 138/363 [05:11<07:47, 2.08s/it] 38%|███▊ | 139/363 [05:13<07:18, 1.96s/it] {'loss': 0.1131, 'grad_norm': 2.082719193654199, 'learning_rate': 1.5625615078401244e-05, 'epoch': 0.38} 38%|███▊ | 139/363 [05:13<07:18, 1.96s/it] 39%|███▊ | 140/363 [05:14<06:54, 1.86s/it] {'loss': 0.1319, 'grad_norm': 2.4303248947365046, 'learning_rate': 1.5545682432067068e-05, 'epoch': 0.39} 39%|███▊ | 140/363 [05:14<06:54, 1.86s/it] 39%|███▉ | 141/363 [05:16<06:55, 1.87s/it] {'loss': 0.1232, 'grad_norm': 2.308746027421277, 'learning_rate': 1.5465234775117538e-05, 'epoch': 0.39} 39%|███▉ | 141/363 [05:16<06:55, 1.87s/it] 39%|███▉ | 142/363 [05:18<06:54, 1.88s/it] {'loss': 0.1537, 'grad_norm': 2.9364629760972907, 'learning_rate': 1.5384279578481223e-05, 'epoch': 0.39} 39%|███▉ | 142/363 [05:18<06:54, 1.88s/it] 39%|███▉ | 143/363 [05:20<07:03, 1.92s/it] {'loss': 0.1161, 'grad_norm': 1.8350345374035169, 'learning_rate': 1.5302824360220352e-05, 'epoch': 0.39} 39%|███▉ | 143/363 [05:20<07:03, 1.92s/it] 40%|███▉ | 144/363 [05:22<06:52, 1.89s/it] {'loss': 0.1054, 'grad_norm': 1.8562459888809821, 'learning_rate': 1.522087668483264e-05, 'epoch': 0.4} 40%|███▉ | 144/363 [05:22<06:52, 1.89s/it] 40%|███▉ | 145/363 [05:24<06:55, 1.91s/it] {'loss': 0.072, 'grad_norm': 2.2050577332052317, 'learning_rate': 1.5138444162548791e-05, 'epoch': 0.4} 40%|███▉ | 145/363 [05:24<06:55, 1.91s/it] 40%|████ | 146/363 [05:26<06:59, 1.93s/it] {'loss': 0.1956, 'grad_norm': 2.576862971551694, 'learning_rate': 1.5055534448625766e-05, 'epoch': 0.4} 40%|████ | 146/363 [05:26<06:59, 1.93s/it] 40%|████ | 147/363 [05:28<06:50, 1.90s/it] {'loss': 0.1163, 'grad_norm': 2.526412596716041, 'learning_rate': 1.4972155242635853e-05, 'epoch': 0.41} 40%|████ | 147/363 [05:28<06:50, 1.90s/it] 41%|████ | 148/363 [05:30<07:39, 2.13s/it] {'loss': 0.1588, 'grad_norm': 2.4122326865926325, 'learning_rate': 1.488831428775164e-05, 'epoch': 0.41} 41%|████ | 148/363 [05:30<07:39, 2.13s/it] 41%|████ | 149/363 [05:32<07:17, 2.05s/it] {'loss': 0.0874, 'grad_norm': 1.5900334279122388, 'learning_rate': 1.4804019370026927e-05, 'epoch': 0.41} 41%|████ | 149/363 [05:32<07:17, 2.05s/it] 41%|████▏ | 150/363 [05:35<07:53, 2.22s/it] {'loss': 0.1225, 'grad_norm': 2.2583849148863284, 'learning_rate': 1.4719278317673655e-05, 'epoch': 0.41} 41%|████▏ | 150/363 [05:35<07:53, 2.22s/it] 42%|████▏ | 151/363 [05:37<07:38, 2.16s/it] {'loss': 0.0628, 'grad_norm': 1.5043786808378643, 'learning_rate': 1.4634099000334932e-05, 'epoch': 0.42} 42%|████▏ | 151/363 [05:37<07:38, 2.16s/it] 42%|████▏ | 152/363 [05:39<07:15, 2.06s/it] {'loss': 0.1119, 'grad_norm': 2.6867593671378707, 'learning_rate': 1.4548489328354197e-05, 'epoch': 0.42} 42%|████▏ | 152/363 [05:39<07:15, 2.06s/it] 42%|████▏ | 153/363 [05:41<07:26, 2.13s/it] {'loss': 0.1371, 'grad_norm': 1.892183158003583, 'learning_rate': 1.4462457252040606e-05, 'epoch': 0.42} 42%|████▏ | 153/363 [05:41<07:26, 2.13s/it] 42%|████▏ | 154/363 [05:43<07:12, 2.07s/it] {'loss': 0.0715, 'grad_norm': 0.7376897959603039, 'learning_rate': 1.437601076093073e-05, 'epoch': 0.42} 42%|████▏ | 154/363 [05:43<07:12, 2.07s/it] 43%|████▎ | 155/363 [05:45<07:02, 2.03s/it] {'loss': 0.0671, 'grad_norm': 1.2313754414547655, 'learning_rate': 1.4289157883046567e-05, 'epoch': 0.43} 43%|████▎ | 155/363 [05:45<07:02, 2.03s/it] 43%|████▎ | 156/363 [05:46<06:41, 1.94s/it] {'loss': 0.0937, 'grad_norm': 2.443959323743539, 'learning_rate': 1.420190668415002e-05, 'epoch': 0.43} 43%|████▎ | 156/363 [05:47<06:41, 1.94s/it] 43%|████▎ | 157/363 [05:49<07:02, 2.05s/it] {'loss': 0.1114, 'grad_norm': 1.8279711072151712, 'learning_rate': 1.4114265266993847e-05, 'epoch': 0.43} 43%|████▎ | 157/363 [05:49<07:02, 2.05s/it] 44%|████▎ | 158/363 [05:51<07:07, 2.09s/it] {'loss': 0.1388, 'grad_norm': 2.722814716999782, 'learning_rate': 1.4026241770569198e-05, 'epoch': 0.44} 44%|████▎ | 158/363 [05:51<07:07, 2.09s/it] 44%|████▍ | 159/363 [05:53<07:26, 2.19s/it] {'loss': 0.0862, 'grad_norm': 2.522323402694141, 'learning_rate': 1.3937844369349736e-05, 'epoch': 0.44} 44%|████▍ | 159/363 [05:53<07:26, 2.19s/it] 44%|████▍ | 160/363 [05:56<08:04, 2.39s/it] {'loss': 0.0752, 'grad_norm': 1.0652613518383625, 'learning_rate': 1.3849081272532545e-05, 'epoch': 0.44} 44%|████▍ | 160/363 [05:56<08:04, 2.39s/it] 44%|████▍ | 161/363 [05:59<08:02, 2.39s/it] {'loss': 0.1456, 'grad_norm': 2.6542750874407335, 'learning_rate': 1.375996072327573e-05, 'epoch': 0.44} 44%|████▍ | 161/363 [05:59<08:02, 2.39s/it] 45%|████▍ | 162/363 [06:01<07:37, 2.28s/it] {'loss': 0.109, 'grad_norm': 1.8544625800539327, 'learning_rate': 1.3670490997932922e-05, 'epoch': 0.45} 45%|████▍ | 162/363 [06:01<07:37, 2.28s/it] 45%|████▍ | 163/363 [06:03<07:13, 2.17s/it] {'loss': 0.1144, 'grad_norm': 1.9735546120483556, 'learning_rate': 1.3580680405284666e-05, 'epoch': 0.45} 45%|████▍ | 163/363 [06:03<07:13, 2.17s/it] 45%|████▌ | 164/363 [06:04<06:54, 2.08s/it] {'loss': 0.0838, 'grad_norm': 1.2177687301571833, 'learning_rate': 1.3490537285766809e-05, 'epoch': 0.45} 45%|████▌ | 164/363 [06:04<06:54, 2.08s/it] 45%|████▌ | 165/363 [06:06<06:45, 2.05s/it] {'loss': 0.1309, 'grad_norm': 2.866049013585789, 'learning_rate': 1.3400070010695966e-05, 'epoch': 0.46} 45%|████▌ | 165/363 [06:06<06:45, 2.05s/it] 46%|████▌ | 166/363 [06:09<06:48, 2.07s/it] {'loss': 0.1336, 'grad_norm': 1.7714947880406966, 'learning_rate': 1.3309286981492084e-05, 'epoch': 0.46} 46%|████▌ | 166/363 [06:09<06:48, 2.07s/it] 46%|████▌ | 167/363 [06:11<06:41, 2.05s/it] {'loss': 0.0878, 'grad_norm': 2.3528937198669966, 'learning_rate': 1.3218196628898232e-05, 'epoch': 0.46} 46%|████▌ | 167/363 [06:11<06:41, 2.05s/it] 46%|████▋ | 168/363 [06:13<06:39, 2.05s/it] {'loss': 0.0866, 'grad_norm': 1.32763797639713, 'learning_rate': 1.3126807412197666e-05, 'epoch': 0.46} 46%|████▋ | 168/363 [06:13<06:39, 2.05s/it] 47%|████▋ | 169/363 [06:15<06:43, 2.08s/it] {'loss': 0.0863, 'grad_norm': 1.0489060017302254, 'learning_rate': 1.3035127818428239e-05, 'epoch': 0.47} 47%|████▋ | 169/363 [06:15<06:43, 2.08s/it] 47%|████▋ | 170/363 [06:17<06:57, 2.16s/it] {'loss': 0.1393, 'grad_norm': 1.4769228384530726, 'learning_rate': 1.2943166361594242e-05, 'epoch': 0.47} 47%|████▋ | 170/363 [06:17<06:57, 2.16s/it] 47%|████▋ | 171/363 [06:19<06:38, 2.08s/it] {'loss': 0.1762, 'grad_norm': 1.9367121483246261, 'learning_rate': 1.2850931581875723e-05, 'epoch': 0.47} 47%|████▋ | 171/363 [06:19<06:38, 2.08s/it] 47%|████▋ | 172/363 [06:21<06:21, 2.00s/it] {'loss': 0.0599, 'grad_norm': 1.5440366202640854, 'learning_rate': 1.275843204483539e-05, 'epoch': 0.47} 47%|████▋ | 172/363 [06:21<06:21, 2.00s/it] 48%|████▊ | 173/363 [06:23<06:14, 1.97s/it] {'loss': 0.0938, 'grad_norm': 1.691410284752133, 'learning_rate': 1.2665676340623172e-05, 'epoch': 0.48} 48%|████▊ | 173/363 [06:23<06:14, 1.97s/it] 48%|████▊ | 174/363 [06:25<06:04, 1.93s/it] {'loss': 0.098, 'grad_norm': 1.2010801724859, 'learning_rate': 1.2572673083178448e-05, 'epoch': 0.48} 48%|████▊ | 174/363 [06:25<06:04, 1.93s/it] 48%|████▊ | 175/363 [06:27<06:09, 1.97s/it] {'loss': 0.0936, 'grad_norm': 2.223070626201405, 'learning_rate': 1.2479430909430109e-05, 'epoch': 0.48} 48%|████▊ | 175/363 [06:27<06:09, 1.97s/it] 48%|████▊ | 176/363 [06:29<06:11, 1.99s/it] {'loss': 0.1279, 'grad_norm': 2.1176352077764107, 'learning_rate': 1.2385958478494487e-05, 'epoch': 0.49} 48%|████▊ | 176/363 [06:29<06:11, 1.99s/it] 49%|████▉ | 177/363 [06:31<06:15, 2.02s/it] {'loss': 0.0783, 'grad_norm': 1.3628256498935367, 'learning_rate': 1.2292264470871183e-05, 'epoch': 0.49} 49%|████▉ | 177/363 [06:31<06:15, 2.02s/it] 49%|████▉ | 178/363 [06:33<06:23, 2.07s/it] {'loss': 0.0525, 'grad_norm': 1.064266823164612, 'learning_rate': 1.2198357587636958e-05, 'epoch': 0.49} 49%|████▉ | 178/363 [06:33<06:23, 2.07s/it] 49%|████▉ | 179/363 [06:35<06:37, 2.16s/it] {'loss': 0.0978, 'grad_norm': 2.00945904980985, 'learning_rate': 1.2104246549637683e-05, 'epoch': 0.49} 49%|████▉ | 179/363 [06:35<06:37, 2.16s/it] 50%|████▉ | 180/363 [06:37<06:31, 2.14s/it] {'loss': 0.0963, 'grad_norm': 1.3141391526330848, 'learning_rate': 1.2009940096678451e-05, 'epoch': 0.5} 50%|████▉ | 180/363 [06:37<06:31, 2.14s/it] 50%|████▉ | 181/363 [06:39<06:23, 2.11s/it] {'loss': 0.1092, 'grad_norm': 1.6531005740449383, 'learning_rate': 1.1915446986711953e-05, 'epoch': 0.5} 50%|████▉ | 181/363 [06:39<06:23, 2.11s/it] 50%|█████ | 182/363 [06:42<06:21, 2.11s/it] {'loss': 0.0667, 'grad_norm': 1.4945974838001637, 'learning_rate': 1.1820775995025147e-05, 'epoch': 0.5} 50%|█████ | 182/363 [06:42<06:21, 2.11s/it] 50%|█████ | 183/363 [06:44<06:14, 2.08s/it] {'loss': 0.0882, 'grad_norm': 1.3762608443334923, 'learning_rate': 1.172593591342432e-05, 'epoch': 0.5} 50%|█████ | 183/363 [06:44<06:14, 2.08s/it] 51%|█████ | 184/363 [06:45<06:06, 2.05s/it] {'loss': 0.0865, 'grad_norm': 1.5415877422060749, 'learning_rate': 1.1630935549418627e-05, 'epoch': 0.51} 51%|█████ | 184/363 [06:46<06:06, 2.05s/it] 51%|█████ | 185/363 [06:48<06:22, 2.15s/it] {'loss': 0.1353, 'grad_norm': 2.0042018778537622, 'learning_rate': 1.1535783725402163e-05, 'epoch': 0.51} 51%|█████ | 185/363 [06:48<06:22, 2.15s/it] 51%|█████ | 186/363 [06:50<06:16, 2.13s/it] {'loss': 0.1393, 'grad_norm': 2.6818407651740266, 'learning_rate': 1.1440489277834645e-05, 'epoch': 0.51} 51%|█████ | 186/363 [06:50<06:16, 2.13s/it] 52%|█████▏ | 187/363 [06:52<06:19, 2.16s/it] {'loss': 0.1169, 'grad_norm': 2.3807831512162374, 'learning_rate': 1.134506105642081e-05, 'epoch': 0.52} 52%|█████▏ | 187/363 [06:52<06:19, 2.16s/it] 52%|█████▏ | 188/363 [06:55<06:56, 2.38s/it] {'loss': 0.1241, 'grad_norm': 2.202610904326787, 'learning_rate': 1.1249507923288563e-05, 'epoch': 0.52} 52%|█████▏ | 188/363 [06:55<06:56, 2.38s/it] 52%|█████▏ | 189/363 [06:58<07:16, 2.51s/it] {'loss': 0.0945, 'grad_norm': 1.8110167173727552, 'learning_rate': 1.115383875216598e-05, 'epoch': 0.52} 52%|█████▏ | 189/363 [06:58<07:16, 2.51s/it] 52%|█████▏ | 190/363 [07:00<07:07, 2.47s/it] {'loss': 0.0791, 'grad_norm': 1.2395129394737805, 'learning_rate': 1.105806242755723e-05, 'epoch': 0.52} 52%|█████▏ | 190/363 [07:00<07:07, 2.47s/it] 53%|█████▎ | 191/363 [07:03<07:22, 2.57s/it] {'loss': 0.0995, 'grad_norm': 2.269164321603063, 'learning_rate': 1.0962187843917498e-05, 'epoch': 0.53} 53%|█████▎ | 191/363 [07:03<07:22, 2.57s/it] 53%|█████▎ | 192/363 [07:05<06:53, 2.42s/it] {'loss': 0.0978, 'grad_norm': 2.1091615955126217, 'learning_rate': 1.0866223904826992e-05, 'epoch': 0.53} 53%|█████▎ | 192/363 [07:05<06:53, 2.42s/it] 53%|█████▎ | 193/363 [07:08<06:49, 2.41s/it] {'loss': 0.0654, 'grad_norm': 1.1661181979597077, 'learning_rate': 1.0770179522164079e-05, 'epoch': 0.53} 53%|█████▎ | 193/363 [07:08<06:49, 2.41s/it] 53%|█████▎ | 194/363 [07:10<06:27, 2.29s/it] {'loss': 0.0795, 'grad_norm': 1.529220319087916, 'learning_rate': 1.0674063615277681e-05, 'epoch': 0.54} 53%|█████▎ | 194/363 [07:10<06:27, 2.29s/it] 54%|█████▎ | 195/363 [07:12<06:26, 2.30s/it] {'loss': 0.0722, 'grad_norm': 1.61462843303553, 'learning_rate': 1.0577885110158959e-05, 'epoch': 0.54} 54%|█████▎ | 195/363 [07:12<06:26, 2.30s/it] 54%|█████▍ | 196/363 [07:14<06:19, 2.27s/it] {'loss': 0.1059, 'grad_norm': 1.0229094996985268, 'learning_rate': 1.0481652938612374e-05, 'epoch': 0.54} 54%|█████▍ | 196/363 [07:14<06:19, 2.27s/it] 54%|█████▍ | 197/363 [07:16<06:04, 2.19s/it] {'loss': 0.1007, 'grad_norm': 1.8905308810375994, 'learning_rate': 1.0385376037426227e-05, 'epoch': 0.54} 54%|█████▍ | 197/363 [07:16<06:04, 2.19s/it] 55%|█████▍ | 198/363 [07:18<05:46, 2.10s/it] {'loss': 0.0997, 'grad_norm': 1.4659160591839386, 'learning_rate': 1.0289063347542727e-05, 'epoch': 0.55} 55%|█████▍ | 198/363 [07:18<05:46, 2.10s/it] 55%|█████▍ | 199/363 [07:20<05:51, 2.15s/it] {'loss': 0.1803, 'grad_norm': 3.3227958982974637, 'learning_rate': 1.0192723813227672e-05, 'epoch': 0.55} 55%|█████▍ | 199/363 [07:20<05:51, 2.15s/it] 55%|█████▌ | 200/363 [07:22<05:48, 2.14s/it] {'loss': 0.1275, 'grad_norm': 1.963767479554977, 'learning_rate': 1.0096366381239808e-05, 'epoch': 0.55} 55%|█████▌ | 200/363 [07:22<05:48, 2.14s/it][INFO|trainer.py:4289] 2026-01-30 12:23:56,365 >> Saving model checkpoint to /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200 [INFO|configuration_utils.py:491] 2026-01-30 12:23:56,370 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/config.json [INFO|configuration_utils.py:826] 2026-01-30 12:23:56,370 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/generation_config.json [INFO|modeling_utils.py:4305] 2026-01-30 12:24:12,335 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:24:12,337 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:24:12,340 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:24:12,341 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:24:12,341 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/added_tokens.json [INFO|image_processing_base.py:253] 2026-01-30 12:24:12,986 >> Image processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/preprocessor_config.json [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:24:12,987 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:24:12,987 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:24:12,987 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:24:12,988 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/added_tokens.json [INFO|video_processing_utils.py:610] 2026-01-30 12:24:13,181 >> Video processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/video_preprocessor_config.json [INFO|processing_utils.py:752] 2026-01-30 12:24:13,181 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-200/chat_template.jinja 55%|█████▌ | 201/363 [07:50<26:11, 9.70s/it] {'loss': 0.0784, 'grad_norm': 1.6295769776645304, 'learning_rate': 1e-05, 'epoch': 0.55} 55%|█████▌ | 201/363 [07:50<26:11, 9.70s/it] 56%|█████▌ | 202/363 [07:52<19:55, 7.43s/it] {'loss': 0.0884, 'grad_norm': 2.440021613726331, 'learning_rate': 9.903633618760195e-06, 'epoch': 0.56} 56%|█████▌ | 202/363 [07:52<19:55, 7.43s/it] 56%|█████▌ | 203/363 [07:54<15:43, 5.90s/it] {'loss': 0.0852, 'grad_norm': 1.4242019780978539, 'learning_rate': 9.807276186772335e-06, 'epoch': 0.56} 56%|█████▌ | 203/363 [07:54<15:43, 5.90s/it] 56%|█████▌ | 204/363 [07:56<12:39, 4.78s/it] {'loss': 0.0528, 'grad_norm': 0.9963914089293635, 'learning_rate': 9.710936652457276e-06, 'epoch': 0.56} 56%|█████▌ | 204/363 [07:56<12:39, 4.78s/it] 56%|█████▋ | 205/363 [07:59<10:33, 4.01s/it] {'loss': 0.1213, 'grad_norm': 2.0171601668683863, 'learning_rate': 9.614623962573776e-06, 'epoch': 0.57} 56%|█████▋ | 205/363 [07:59<10:33, 4.01s/it] 57%|█████▋ | 206/363 [08:00<08:49, 3.37s/it] {'loss': 0.0813, 'grad_norm': 1.8302600165169711, 'learning_rate': 9.518347061387629e-06, 'epoch': 0.57} 57%|█████▋ | 206/363 [08:00<08:49, 3.37s/it] 57%|█████▋ | 207/363 [08:02<07:45, 2.99s/it] {'loss': 0.1114, 'grad_norm': 2.135574797850369, 'learning_rate': 9.422114889841045e-06, 'epoch': 0.57} 57%|█████▋ | 207/363 [08:02<07:45, 2.99s/it] 57%|█████▋ | 208/363 [08:05<07:11, 2.78s/it] {'loss': 0.1072, 'grad_norm': 1.856147430619363, 'learning_rate': 9.325936384722322e-06, 'epoch': 0.57} 57%|█████▋ | 208/363 [08:05<07:11, 2.78s/it] 58%|█████▊ | 209/363 [08:07<06:36, 2.57s/it] {'loss': 0.0629, 'grad_norm': 1.3734175489711284, 'learning_rate': 9.229820477835926e-06, 'epoch': 0.58} 58%|█████▊ | 209/363 [08:07<06:36, 2.57s/it] 58%|█████▊ | 210/363 [08:09<06:19, 2.48s/it] {'loss': 0.0798, 'grad_norm': 1.4955582653472665, 'learning_rate': 9.133776095173015e-06, 'epoch': 0.58} 58%|█████▊ | 210/363 [08:09<06:19, 2.48s/it] 58%|█████▊ | 211/363 [08:11<06:01, 2.38s/it] {'loss': 0.0832, 'grad_norm': 2.2098641671152075, 'learning_rate': 9.037812156082503e-06, 'epoch': 0.58} 58%|█████▊ | 211/363 [08:11<06:01, 2.38s/it] 58%|█████▊ | 212/363 [08:14<05:55, 2.36s/it] {'loss': 0.0912, 'grad_norm': 2.1234204791453446, 'learning_rate': 8.941937572442773e-06, 'epoch': 0.58} 58%|█████▊ | 212/363 [08:14<05:55, 2.36s/it] 59%|█████▊ | 213/363 [08:16<05:51, 2.34s/it] {'loss': 0.0757, 'grad_norm': 1.7048969824409477, 'learning_rate': 8.846161247834024e-06, 'epoch': 0.59} 59%|█████▊ | 213/363 [08:16<05:51, 2.34s/it] 59%|█████▉ | 214/363 [08:18<05:40, 2.29s/it] {'loss': 0.0607, 'grad_norm': 1.2386625705533991, 'learning_rate': 8.750492076711439e-06, 'epoch': 0.59} 59%|█████▉ | 214/363 [08:18<05:40, 2.29s/it] 59%|█████▉ | 215/363 [08:20<05:29, 2.23s/it] {'loss': 0.1315, 'grad_norm': 2.4577783979073518, 'learning_rate': 8.654938943579194e-06, 'epoch': 0.59} 59%|█████▉ | 215/363 [08:20<05:29, 2.23s/it] 60%|█████▉ | 216/363 [08:22<05:17, 2.16s/it] {'loss': 0.1015, 'grad_norm': 1.7822551401830198, 'learning_rate': 8.55951072216536e-06, 'epoch': 0.6} 60%|█████▉ | 216/363 [08:22<05:17, 2.16s/it] 60%|█████▉ | 217/363 [08:24<04:53, 2.01s/it] {'loss': 0.083, 'grad_norm': 1.627034418887004, 'learning_rate': 8.464216274597839e-06, 'epoch': 0.6} 60%|█████▉ | 217/363 [08:24<04:53, 2.01s/it] 60%|██████ | 218/363 [08:26<04:39, 1.93s/it] {'loss': 0.144, 'grad_norm': 2.4236805021780907, 'learning_rate': 8.369064450581374e-06, 'epoch': 0.6} 60%|██████ | 218/363 [08:26<04:39, 1.93s/it] 60%|██████ | 219/363 [08:27<04:26, 1.85s/it] {'loss': 0.1187, 'grad_norm': 2.0852606023844547, 'learning_rate': 8.274064086575682e-06, 'epoch': 0.6} 60%|██████ | 219/363 [08:27<04:26, 1.85s/it] 61%|██████ | 220/363 [08:29<04:16, 1.79s/it] {'loss': 0.101, 'grad_norm': 1.554124619394613, 'learning_rate': 8.179224004974857e-06, 'epoch': 0.61} 61%|██████ | 220/363 [08:29<04:16, 1.79s/it] 61%|██████ | 221/363 [08:31<04:38, 1.96s/it] {'loss': 0.0562, 'grad_norm': 1.0086858963161975, 'learning_rate': 8.084553013288048e-06, 'epoch': 0.61} 61%|██████ | 221/363 [08:31<04:38, 1.96s/it] 61%|██████ | 222/363 [08:33<04:37, 1.97s/it] {'loss': 0.1144, 'grad_norm': 1.744957411692858, 'learning_rate': 7.990059903321554e-06, 'epoch': 0.61} 61%|██████ | 222/363 [08:33<04:37, 1.97s/it] 61%|██████▏ | 223/363 [08:35<04:40, 2.00s/it] {'loss': 0.0431, 'grad_norm': 1.115741854823727, 'learning_rate': 7.89575345036232e-06, 'epoch': 0.62} 61%|██████▏ | 223/363 [08:35<04:40, 2.00s/it] 62%|██████▏ | 224/363 [08:37<04:33, 1.97s/it] {'loss': 0.1239, 'grad_norm': 2.4048467960954523, 'learning_rate': 7.801642412363042e-06, 'epoch': 0.62} 62%|██████▏ | 224/363 [08:37<04:33, 1.97s/it] 62%|██████▏ | 225/363 [08:39<04:28, 1.95s/it] {'loss': 0.0861, 'grad_norm': 1.7045994682708523, 'learning_rate': 7.707735529128819e-06, 'epoch': 0.62} 62%|██████▏ | 225/363 [08:39<04:28, 1.95s/it] 62%|██████▏ | 226/363 [08:41<04:23, 1.92s/it] {'loss': 0.0939, 'grad_norm': 1.5844433019812807, 'learning_rate': 7.614041521505517e-06, 'epoch': 0.62} 62%|██████▏ | 226/363 [08:41<04:23, 1.92s/it] 63%|██████▎ | 227/363 [08:43<04:16, 1.89s/it] {'loss': 0.0961, 'grad_norm': 1.817274462365455, 'learning_rate': 7.520569090569894e-06, 'epoch': 0.63} 63%|██████▎ | 227/363 [08:43<04:16, 1.89s/it] 63%|██████▎ | 228/363 [08:45<04:27, 1.98s/it] {'loss': 0.1141, 'grad_norm': 2.169489761841419, 'learning_rate': 7.427326916821557e-06, 'epoch': 0.63} 63%|██████▎ | 228/363 [08:45<04:27, 1.98s/it] 63%|██████▎ | 229/363 [08:47<04:18, 1.93s/it] {'loss': 0.0631, 'grad_norm': 1.2611061121212517, 'learning_rate': 7.3343236593768295e-06, 'epoch': 0.63} 63%|██████▎ | 229/363 [08:47<04:18, 1.93s/it] 63%|██████▎ | 230/363 [08:49<04:11, 1.89s/it] {'loss': 0.0607, 'grad_norm': 1.4981316942094398, 'learning_rate': 7.24156795516461e-06, 'epoch': 0.63} 63%|██████▎ | 230/363 [08:49<04:11, 1.89s/it] 64%|██████▎ | 231/363 [08:51<04:28, 2.04s/it] {'loss': 0.1243, 'grad_norm': 1.889688470210782, 'learning_rate': 7.149068418124281e-06, 'epoch': 0.64} 64%|██████▎ | 231/363 [08:51<04:28, 2.04s/it] 64%|██████▍ | 232/363 [08:53<04:15, 1.95s/it] {'loss': 0.1154, 'grad_norm': 1.6091621479461797, 'learning_rate': 7.056833638405762e-06, 'epoch': 0.64} 64%|██████▍ | 232/363 [08:53<04:15, 1.95s/it] 64%|██████▍ | 233/363 [08:55<04:11, 1.93s/it] {'loss': 0.0782, 'grad_norm': 1.7584988460897566, 'learning_rate': 6.964872181571765e-06, 'epoch': 0.64} 64%|██████▍ | 233/363 [08:55<04:11, 1.93s/it] 64%|██████▍ | 234/363 [08:56<03:59, 1.85s/it] {'loss': 0.1088, 'grad_norm': 1.9608264553670172, 'learning_rate': 6.87319258780234e-06, 'epoch': 0.65} 64%|██████▍ | 234/363 [08:56<03:59, 1.85s/it] 65%|██████▍ | 235/363 [08:58<03:59, 1.87s/it] {'loss': 0.0635, 'grad_norm': 1.1583598512074385, 'learning_rate': 6.781803371101774e-06, 'epoch': 0.65} 65%|██████▍ | 235/363 [08:58<03:59, 1.87s/it] 65%|██████▌ | 236/363 [09:00<03:57, 1.87s/it] {'loss': 0.065, 'grad_norm': 1.5744270190667782, 'learning_rate': 6.690713018507917e-06, 'epoch': 0.65} 65%|██████▌ | 236/363 [09:00<03:57, 1.87s/it] 65%|██████▌ | 237/363 [09:02<04:05, 1.95s/it] {'loss': 0.0941, 'grad_norm': 2.0489496115781147, 'learning_rate': 6.599929989304034e-06, 'epoch': 0.65} 65%|██████▌ | 237/363 [09:02<04:05, 1.95s/it] 66%|██████▌ | 238/363 [09:04<04:00, 1.92s/it] {'loss': 0.1049, 'grad_norm': 1.3833240601648478, 'learning_rate': 6.509462714233194e-06, 'epoch': 0.66} 66%|██████▌ | 238/363 [09:04<04:00, 1.92s/it] 66%|██████▌ | 239/363 [09:06<03:49, 1.85s/it] {'loss': 0.0795, 'grad_norm': 1.105761243006651, 'learning_rate': 6.419319594715338e-06, 'epoch': 0.66} 66%|██████▌ | 239/363 [09:06<03:49, 1.85s/it] 66%|██████▌ | 240/363 [09:08<03:50, 1.88s/it] {'loss': 0.1389, 'grad_norm': 2.2243011538195323, 'learning_rate': 6.32950900206708e-06, 'epoch': 0.66} 66%|██████▌ | 240/363 [09:08<03:50, 1.88s/it] 66%|██████▋ | 241/363 [09:10<03:59, 1.97s/it] {'loss': 0.0904, 'grad_norm': 1.6204675243320001, 'learning_rate': 6.240039276724273e-06, 'epoch': 0.66} 66%|██████▋ | 241/363 [09:10<03:59, 1.97s/it] 67%|██████▋ | 242/363 [09:12<03:52, 1.92s/it] {'loss': 0.0988, 'grad_norm': 1.7416445698806022, 'learning_rate': 6.150918727467455e-06, 'epoch': 0.67} 67%|██████▋ | 242/363 [09:12<03:52, 1.92s/it] 67%|██████▋ | 243/363 [09:13<03:47, 1.90s/it] {'loss': 0.0887, 'grad_norm': 1.5865476897173794, 'learning_rate': 6.062155630650265e-06, 'epoch': 0.67} 67%|██████▋ | 243/363 [09:13<03:47, 1.90s/it] 67%|██████▋ | 244/363 [09:16<03:50, 1.94s/it] {'loss': 0.0406, 'grad_norm': 1.0908181878235528, 'learning_rate': 5.973758229430806e-06, 'epoch': 0.67} 67%|██████▋ | 244/363 [09:16<03:50, 1.94s/it] 67%|██████▋ | 245/363 [09:18<03:50, 1.95s/it] {'loss': 0.0613, 'grad_norm': 1.2409267373149415, 'learning_rate': 5.8857347330061545e-06, 'epoch': 0.68} 67%|██████▋ | 245/363 [09:18<03:50, 1.95s/it] 68%|██████▊ | 246/363 [09:19<03:44, 1.92s/it] {'loss': 0.0621, 'grad_norm': 1.5366160191792417, 'learning_rate': 5.798093315849984e-06, 'epoch': 0.68} 68%|██████▊ | 246/363 [09:19<03:44, 1.92s/it] 68%|██████▊ | 247/363 [09:21<03:42, 1.92s/it] {'loss': 0.135, 'grad_norm': 2.401971583677582, 'learning_rate': 5.7108421169534376e-06, 'epoch': 0.68} 68%|██████▊ | 247/363 [09:21<03:42, 1.92s/it] 68%|██████▊ | 248/363 [09:23<03:40, 1.92s/it] {'loss': 0.0912, 'grad_norm': 1.7382959732076737, 'learning_rate': 5.623989239069275e-06, 'epoch': 0.68} 68%|██████▊ | 248/363 [09:23<03:40, 1.92s/it] 69%|██████▊ | 249/363 [09:25<03:41, 1.94s/it] {'loss': 0.0481, 'grad_norm': 1.1164138593496515, 'learning_rate': 5.5375427479593945e-06, 'epoch': 0.69} 69%|██████▊ | 249/363 [09:25<03:41, 1.94s/it] 69%|██████▉ | 250/363 [09:27<03:39, 1.94s/it] {'loss': 0.0915, 'grad_norm': 1.9207044725285578, 'learning_rate': 5.451510671645806e-06, 'epoch': 0.69} 69%|██████▉ | 250/363 [09:27<03:39, 1.94s/it] 69%|██████▉ | 251/363 [09:29<03:34, 1.92s/it] {'loss': 0.0932, 'grad_norm': 1.4554330468757273, 'learning_rate': 5.3659009996650704e-06, 'epoch': 0.69} 69%|██████▉ | 251/363 [09:29<03:34, 1.92s/it] 69%|██████▉ | 252/363 [09:31<03:30, 1.90s/it] {'loss': 0.1258, 'grad_norm': 1.5637751353358034, 'learning_rate': 5.280721682326349e-06, 'epoch': 0.7} 69%|██████▉ | 252/363 [09:31<03:30, 1.90s/it] 70%|██████▉ | 253/363 [09:33<03:24, 1.86s/it] {'loss': 0.136, 'grad_norm': 1.6864022667103549, 'learning_rate': 5.195980629973077e-06, 'epoch': 0.7} 70%|██████▉ | 253/363 [09:33<03:24, 1.86s/it] 70%|██████▉ | 254/363 [09:35<03:38, 2.00s/it] {'loss': 0.0744, 'grad_norm': 1.5424417657042435, 'learning_rate': 5.111685712248364e-06, 'epoch': 0.7} 70%|██████▉ | 254/363 [09:35<03:38, 2.00s/it] 70%|███████ | 255/363 [09:37<03:32, 1.97s/it] {'loss': 0.1436, 'grad_norm': 2.4152541945255965, 'learning_rate': 5.02784475736415e-06, 'epoch': 0.7} 70%|███████ | 255/363 [09:37<03:32, 1.97s/it] 71%|███████ | 256/363 [09:39<03:25, 1.92s/it] {'loss': 0.0864, 'grad_norm': 1.8800510082119968, 'learning_rate': 4.944465551374238e-06, 'epoch': 0.71} 71%|███████ | 256/363 [09:39<03:25, 1.92s/it] 71%|███████ | 257/363 [09:41<03:23, 1.92s/it] {'loss': 0.0931, 'grad_norm': 1.6391965299287088, 'learning_rate': 4.861555837451213e-06, 'epoch': 0.71} 71%|███████ | 257/363 [09:41<03:23, 1.92s/it] 71%|███████ | 258/363 [09:42<03:20, 1.91s/it] {'loss': 0.1589, 'grad_norm': 2.1164230178199688, 'learning_rate': 4.779123315167362e-06, 'epoch': 0.71} 71%|███████ | 258/363 [09:42<03:20, 1.91s/it] 71%|███████▏ | 259/363 [09:44<03:15, 1.88s/it] {'loss': 0.075, 'grad_norm': 2.1736287568680615, 'learning_rate': 4.6971756397796506e-06, 'epoch': 0.71} 71%|███████▏ | 259/363 [09:44<03:15, 1.88s/it] 72%|███████▏ | 260/363 [09:46<03:13, 1.88s/it] {'loss': 0.0423, 'grad_norm': 0.981201558647046, 'learning_rate': 4.61572042151878e-06, 'epoch': 0.72} 72%|███████▏ | 260/363 [09:46<03:13, 1.88s/it] 72%|███████▏ | 261/363 [09:48<03:10, 1.87s/it] {'loss': 0.0538, 'grad_norm': 1.2068505093305162, 'learning_rate': 4.534765224882463e-06, 'epoch': 0.72} 72%|███████▏ | 261/363 [09:48<03:10, 1.87s/it] 72%|███████▏ | 262/363 [09:50<03:08, 1.87s/it] {'loss': 0.0591, 'grad_norm': 0.9541230040360105, 'learning_rate': 4.4543175679329345e-06, 'epoch': 0.72} 72%|███████▏ | 262/363 [09:50<03:08, 1.87s/it] 72%|███████▏ | 263/363 [09:52<03:05, 1.86s/it] {'loss': 0.0965, 'grad_norm': 1.7162320823485342, 'learning_rate': 4.37438492159876e-06, 'epoch': 0.73} 72%|███████▏ | 263/363 [09:52<03:05, 1.86s/it] 73%|███████▎ | 264/363 [09:54<03:03, 1.86s/it] {'loss': 0.0383, 'grad_norm': 1.184768855122373, 'learning_rate': 4.294974708981041e-06, 'epoch': 0.73} 73%|███████▎ | 264/363 [09:54<03:03, 1.86s/it] 73%|███████▎ | 265/363 [09:56<03:14, 1.99s/it] {'loss': 0.0952, 'grad_norm': 1.5098290827125584, 'learning_rate': 4.216094304664056e-06, 'epoch': 0.73} 73%|███████▎ | 265/363 [09:56<03:14, 1.99s/it] 73%|███████▎ | 266/363 [09:58<03:08, 1.94s/it] {'loss': 0.0687, 'grad_norm': 1.3641276614652302, 'learning_rate': 4.1377510340304e-06, 'epoch': 0.73} 73%|███████▎ | 266/363 [09:58<03:08, 1.94s/it] 74%|███████▎ | 267/363 [09:59<03:03, 1.91s/it] {'loss': 0.1397, 'grad_norm': 1.9886328182742048, 'learning_rate': 4.059952172580694e-06, 'epoch': 0.74} 74%|███████▎ | 267/363 [09:59<03:03, 1.91s/it] 74%|███████▍ | 268/363 [10:01<02:57, 1.87s/it] {'loss': 0.1271, 'grad_norm': 2.0623334755221494, 'learning_rate': 3.982704945257957e-06, 'epoch': 0.74} 74%|███████▍ | 268/363 [10:01<02:57, 1.87s/it] 74%|███████▍ | 269/363 [10:03<02:52, 1.84s/it] {'loss': 0.0736, 'grad_norm': 2.027142606530138, 'learning_rate': 3.9060165257766116e-06, 'epoch': 0.74} 74%|███████▍ | 269/363 [10:03<02:52, 1.84s/it] 74%|███████▍ | 270/363 [10:05<02:46, 1.79s/it] {'loss': 0.1243, 'grad_norm': 1.8884432574197467, 'learning_rate': 3.829894035956306e-06, 'epoch': 0.74} 74%|███████▍ | 270/363 [10:05<02:46, 1.79s/it] 75%|███████▍ | 271/363 [10:06<02:43, 1.78s/it] {'loss': 0.0971, 'grad_norm': 1.5259208901463874, 'learning_rate': 3.754344545060529e-06, 'epoch': 0.75} 75%|███████▍ | 271/363 [10:06<02:43, 1.78s/it] 75%|███████▍ | 272/363 [10:08<02:44, 1.81s/it] {'loss': 0.1226, 'grad_norm': 1.454648528529707, 'learning_rate': 3.6793750691400996e-06, 'epoch': 0.75} 75%|███████▍ | 272/363 [10:08<02:44, 1.81s/it] 75%|███████▌ | 273/363 [10:10<02:44, 1.82s/it] {'loss': 0.0573, 'grad_norm': 1.1852288533881499, 'learning_rate': 3.604992570381621e-06, 'epoch': 0.75} 75%|███████▌ | 273/363 [10:10<02:44, 1.82s/it] 75%|███████▌ | 274/363 [10:12<02:49, 1.91s/it] {'loss': 0.0357, 'grad_norm': 0.9888790128432269, 'learning_rate': 3.5312039564609203e-06, 'epoch': 0.76} 75%|███████▌ | 274/363 [10:12<02:49, 1.91s/it] 76%|███████▌ | 275/363 [10:14<02:50, 1.93s/it] {'loss': 0.0841, 'grad_norm': 1.9227683916123242, 'learning_rate': 3.458016079901544e-06, 'epoch': 0.76} 76%|███████▌ | 275/363 [10:14<02:50, 1.93s/it] 76%|███████▌ | 276/363 [10:16<02:42, 1.87s/it] {'loss': 0.0749, 'grad_norm': 1.6925865088399465, 'learning_rate': 3.3854357374383905e-06, 'epoch': 0.76} 76%|███████▌ | 276/363 [10:16<02:42, 1.87s/it] 76%|███████▋ | 277/363 [10:18<02:42, 1.89s/it] {'loss': 0.1234, 'grad_norm': 2.482457504595889, 'learning_rate': 3.313469669386532e-06, 'epoch': 0.76} 76%|███████▋ | 277/363 [10:18<02:42, 1.89s/it] 77%|███████▋ | 278/363 [10:20<02:37, 1.85s/it] {'loss': 0.0752, 'grad_norm': 1.2830007481095762, 'learning_rate': 3.242124559015234e-06, 'epoch': 0.77} 77%|███████▋ | 278/363 [10:20<02:37, 1.85s/it] 77%|███████▋ | 279/363 [10:22<02:35, 1.85s/it] {'loss': 0.0991, 'grad_norm': 2.0768576257082114, 'learning_rate': 3.171407031927325e-06, 'epoch': 0.77} 77%|███████▋ | 279/363 [10:22<02:35, 1.85s/it] 77%|███████▋ | 280/363 [10:24<02:48, 2.03s/it] {'loss': 0.109, 'grad_norm': 1.8987466668093202, 'learning_rate': 3.101323655443882e-06, 'epoch': 0.77} 77%|███████▋ | 280/363 [10:24<02:48, 2.03s/it] 77%|███████▋ | 281/363 [10:26<02:41, 1.97s/it] {'loss': 0.0599, 'grad_norm': 1.6001709673447158, 'learning_rate': 3.0318809379943594e-06, 'epoch': 0.78} 77%|███████▋ | 281/363 [10:26<02:41, 1.97s/it] 78%|███████▊ | 282/363 [10:28<02:37, 1.95s/it] {'loss': 0.0641, 'grad_norm': 1.309542383665605, 'learning_rate': 2.9630853285121506e-06, 'epoch': 0.78} 78%|███████▊ | 282/363 [10:28<02:37, 1.95s/it] 78%|███████▊ | 283/363 [10:30<02:35, 1.95s/it] {'loss': 0.0909, 'grad_norm': 1.5565630583605414, 'learning_rate': 2.8949432158357083e-06, 'epoch': 0.78} 78%|███████▊ | 283/363 [10:30<02:35, 1.95s/it] 78%|███████▊ | 284/363 [10:31<02:30, 1.90s/it] {'loss': 0.1223, 'grad_norm': 1.2776706094735155, 'learning_rate': 2.8274609281152322e-06, 'epoch': 0.78} 78%|███████▊ | 284/363 [10:32<02:30, 1.90s/it] 79%|███████▊ | 285/363 [10:33<02:26, 1.87s/it] {'loss': 0.087, 'grad_norm': 1.6300368148462467, 'learning_rate': 2.7606447322249876e-06, 'epoch': 0.79} 79%|███████▊ | 285/363 [10:33<02:26, 1.87s/it] 79%|███████▉ | 286/363 [10:35<02:24, 1.87s/it] {'loss': 0.1002, 'grad_norm': 1.5096693670735182, 'learning_rate': 2.694500833181323e-06, 'epoch': 0.79} 79%|███████▉ | 286/363 [10:35<02:24, 1.87s/it] 79%|███████▉ | 287/363 [10:37<02:22, 1.88s/it] {'loss': 0.0915, 'grad_norm': 1.719399247233759, 'learning_rate': 2.629035373566433e-06, 'epoch': 0.79} 79%|███████▉ | 287/363 [10:37<02:22, 1.88s/it] 79%|███████▉ | 288/363 [10:39<02:22, 1.90s/it] {'loss': 0.0555, 'grad_norm': 1.1425139227274304, 'learning_rate': 2.5642544329579088e-06, 'epoch': 0.79} 79%|███████▉ | 288/363 [10:39<02:22, 1.90s/it] 80%|███████▉ | 289/363 [10:41<02:20, 1.90s/it] {'loss': 0.0555, 'grad_norm': 1.2082856945935607, 'learning_rate': 2.500164027364147e-06, 'epoch': 0.8} 80%|███████▉ | 289/363 [10:41<02:20, 1.90s/it] 80%|███████▉ | 290/363 [10:43<02:15, 1.86s/it] {'loss': 0.104, 'grad_norm': 2.028172728126609, 'learning_rate': 2.4367701086656625e-06, 'epoch': 0.8} 80%|███████▉ | 290/363 [10:43<02:15, 1.86s/it] 80%|████████ | 291/363 [10:44<02:13, 1.85s/it] {'loss': 0.0936, 'grad_norm': 1.498164820422529, 'learning_rate': 2.374078564062364e-06, 'epoch': 0.8} 80%|████████ | 291/363 [10:45<02:13, 1.85s/it] 80%|████████ | 292/363 [10:46<02:10, 1.84s/it] {'loss': 0.0898, 'grad_norm': 1.3254283577777912, 'learning_rate': 2.312095215526814e-06, 'epoch': 0.81} 80%|████████ | 292/363 [10:46<02:10, 1.84s/it] 81%|████████ | 293/363 [10:48<02:09, 1.85s/it] {'loss': 0.073, 'grad_norm': 1.3758428200048072, 'learning_rate': 2.2508258192635614e-06, 'epoch': 0.81} 81%|████████ | 293/363 [10:48<02:09, 1.85s/it] 81%|████████ | 294/363 [10:50<02:08, 1.86s/it] {'loss': 0.0675, 'grad_norm': 1.2825112587820704, 'learning_rate': 2.190276065174596e-06, 'epoch': 0.81} 81%|████████ | 294/363 [10:50<02:08, 1.86s/it] 81%|████████▏ | 295/363 [10:52<02:05, 1.85s/it] {'loss': 0.0871, 'grad_norm': 1.704844614821693, 'learning_rate': 2.130451576330925e-06, 'epoch': 0.81} 81%|████████▏ | 295/363 [10:52<02:05, 1.85s/it] 82%|████████▏ | 296/363 [10:54<02:04, 1.86s/it] {'loss': 0.075, 'grad_norm': 1.8236093781558738, 'learning_rate': 2.0713579084503877e-06, 'epoch': 0.82} 82%|████████▏ | 296/363 [10:54<02:04, 1.86s/it] 82%|████████▏ | 297/363 [10:56<02:01, 1.83s/it] {'loss': 0.0726, 'grad_norm': 1.7159210186184939, 'learning_rate': 2.0130005493817063e-06, 'epoch': 0.82} 82%|████████▏ | 297/363 [10:56<02:01, 1.83s/it] 82%|████████▏ | 298/363 [10:58<02:01, 1.87s/it] {'loss': 0.0585, 'grad_norm': 1.5402879375212146, 'learning_rate': 1.9553849185948514e-06, 'epoch': 0.82} 82%|████████▏ | 298/363 [10:58<02:01, 1.87s/it] 82%|████████▏ | 299/363 [10:59<01:57, 1.83s/it] {'loss': 0.1192, 'grad_norm': 2.088587364963122, 'learning_rate': 1.8985163666777473e-06, 'epoch': 0.82} 82%|████████▏ | 299/363 [10:59<01:57, 1.83s/it] 83%|████████▎ | 300/363 [11:01<01:56, 1.85s/it] {'loss': 0.0639, 'grad_norm': 1.385102288521542, 'learning_rate': 1.8424001748393905e-06, 'epoch': 0.83} 83%|████████▎ | 300/363 [11:01<01:56, 1.85s/it][INFO|trainer.py:4289] 2026-01-30 12:27:35,968 >> Saving model checkpoint to /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300 [INFO|configuration_utils.py:491] 2026-01-30 12:27:35,972 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/config.json [INFO|configuration_utils.py:826] 2026-01-30 12:27:35,973 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/generation_config.json [INFO|modeling_utils.py:4305] 2026-01-30 12:27:52,068 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:27:52,070 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:27:52,070 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:27:52,071 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:27:52,071 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/added_tokens.json [INFO|image_processing_base.py:253] 2026-01-30 12:27:52,271 >> Image processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/preprocessor_config.json [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:27:52,272 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:27:52,273 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:27:52,273 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:27:52,274 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/added_tokens.json [INFO|video_processing_utils.py:610] 2026-01-30 12:27:52,896 >> Video processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/video_preprocessor_config.json [INFO|processing_utils.py:752] 2026-01-30 12:27:52,897 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-300/chat_template.jinja 83%|████████▎ | 301/363 [11:29<10:04, 9.74s/it] {'loss': 0.0644, 'grad_norm': 1.199103692035072, 'learning_rate': 1.7870415544193808e-06, 'epoch': 0.83} 83%|████████▎ | 301/363 [11:29<10:04, 9.74s/it] 83%|████████▎ | 302/363 [11:31<07:30, 7.38s/it] {'loss': 0.0806, 'grad_norm': 1.576550399440244, 'learning_rate': 1.7324456464039751e-06, 'epoch': 0.83} 83%|████████▎ | 302/363 [11:31<07:30, 7.38s/it] 83%|████████▎ | 303/363 [11:33<05:44, 5.75s/it] {'loss': 0.1166, 'grad_norm': 1.9033469567963237, 'learning_rate': 1.6786175209486565e-06, 'epoch': 0.84} 83%|████████▎ | 303/363 [11:33<05:44, 5.75s/it] 84%|████████▎ | 304/363 [11:35<04:35, 4.66s/it] {'loss': 0.0883, 'grad_norm': 1.5481679752062283, 'learning_rate': 1.6255621769072805e-06, 'epoch': 0.84} 84%|████████▎ | 304/363 [11:35<04:35, 4.66s/it] 84%|████████▍ | 305/363 [11:37<03:47, 3.93s/it] {'loss': 0.105, 'grad_norm': 1.7374538978001977, 'learning_rate': 1.5732845413678477e-06, 'epoch': 0.84} 84%|████████▍ | 305/363 [11:37<03:47, 3.93s/it] 84%|████████▍ | 306/363 [11:39<03:08, 3.30s/it] {'loss': 0.0618, 'grad_norm': 1.3465892642582866, 'learning_rate': 1.521789469194952e-06, 'epoch': 0.84} 84%|████████▍ | 306/363 [11:39<03:08, 3.30s/it] 85%|████████▍ | 307/363 [11:41<02:41, 2.88s/it] {'loss': 0.0992, 'grad_norm': 2.312489005340463, 'learning_rate': 1.4710817425789015e-06, 'epoch': 0.85} 85%|████████▍ | 307/363 [11:41<02:41, 2.88s/it] 85%|████████▍ | 308/363 [11:43<02:18, 2.53s/it] {'loss': 0.0458, 'grad_norm': 1.1318530850342379, 'learning_rate': 1.4211660705916286e-06, 'epoch': 0.85} 85%|████████▍ | 308/363 [11:43<02:18, 2.53s/it] 85%|████████▌ | 309/363 [11:45<02:04, 2.31s/it] {'loss': 0.0516, 'grad_norm': 1.5063935617388766, 'learning_rate': 1.372047088749372e-06, 'epoch': 0.85} 85%|████████▌ | 309/363 [11:45<02:04, 2.31s/it] 85%|████████▌ | 310/363 [11:46<01:54, 2.16s/it] {'loss': 0.1156, 'grad_norm': 1.4001415457936668, 'learning_rate': 1.3237293585821786e-06, 'epoch': 0.86} 85%|████████▌ | 310/363 [11:46<01:54, 2.16s/it] 86%|████████▌ | 311/363 [11:48<01:46, 2.04s/it] {'loss': 0.1332, 'grad_norm': 2.487040806365276, 'learning_rate': 1.2762173672102996e-06, 'epoch': 0.86} 86%|████████▌ | 311/363 [11:48<01:46, 2.04s/it] 86%|████████▌ | 312/363 [11:50<01:41, 1.99s/it] {'loss': 0.0735, 'grad_norm': 1.514447174356807, 'learning_rate': 1.2295155269274827e-06, 'epoch': 0.86} 86%|████████▌ | 312/363 [11:50<01:41, 1.99s/it] 86%|████████▌ | 313/363 [11:52<01:37, 1.94s/it] {'loss': 0.0825, 'grad_norm': 1.9664878278885487, 'learning_rate': 1.1836281747912125e-06, 'epoch': 0.86} 86%|████████▌ | 313/363 [11:52<01:37, 1.94s/it] 87%|████████▋ | 314/363 [11:54<01:35, 1.95s/it] {'loss': 0.0945, 'grad_norm': 1.7248118984472842, 'learning_rate': 1.1385595722199438e-06, 'epoch': 0.87} 87%|████████▋ | 314/363 [11:54<01:35, 1.95s/it] 87%|████████▋ | 315/363 [11:56<01:32, 1.92s/it] {'loss': 0.0761, 'grad_norm': 1.232176840002336, 'learning_rate': 1.094313904597355e-06, 'epoch': 0.87} 87%|████████▋ | 315/363 [11:56<01:32, 1.92s/it] 87%|████████▋ | 316/363 [11:58<01:27, 1.86s/it] {'loss': 0.1104, 'grad_norm': 2.3846939660082636, 'learning_rate': 1.0508952808836682e-06, 'epoch': 0.87} 87%|████████▋ | 316/363 [11:58<01:27, 1.86s/it] 87%|████████▋ | 317/363 [12:00<01:27, 1.91s/it] {'loss': 0.137, 'grad_norm': 1.5107849502345858, 'learning_rate': 1.0083077332340563e-06, 'epoch': 0.87} 87%|████████▋ | 317/363 [12:00<01:27, 1.91s/it] 88%|████████▊ | 318/363 [12:01<01:23, 1.85s/it] {'loss': 0.1395, 'grad_norm': 2.5218758909483077, 'learning_rate': 9.665552166241965e-07, 'epoch': 0.88} 88%|████████▊ | 318/363 [12:01<01:23, 1.85s/it] 88%|████████▊ | 319/363 [12:03<01:22, 1.87s/it] {'loss': 0.1331, 'grad_norm': 1.913603235284768, 'learning_rate': 9.256416084829778e-07, 'epoch': 0.88} 88%|████████▊ | 319/363 [12:03<01:22, 1.87s/it] 88%|████████▊ | 320/363 [12:05<01:20, 1.87s/it] {'loss': 0.1077, 'grad_norm': 2.4049905043471806, 'learning_rate': 8.855707083324183e-07, 'epoch': 0.88} 88%|████████▊ | 320/363 [12:05<01:20, 1.87s/it] 88%|████████▊ | 321/363 [12:07<01:19, 1.88s/it] {'loss': 0.0638, 'grad_norm': 1.108589495662786, 'learning_rate': 8.46346237434813e-07, 'epoch': 0.89} 88%|████████▊ | 321/363 [12:07<01:19, 1.88s/it] 89%|████████▊ | 322/363 [12:09<01:19, 1.94s/it] {'loss': 0.0515, 'grad_norm': 1.222025656507595, 'learning_rate': 8.079718384471557e-07, 'epoch': 0.89} 89%|████████▊ | 322/363 [12:09<01:19, 1.94s/it] 89%|████████▉ | 323/363 [12:11<01:20, 2.00s/it] {'loss': 0.0623, 'grad_norm': 1.6319377780473996, 'learning_rate': 7.704510750828542e-07, 'epoch': 0.89} 89%|████████▉ | 323/363 [12:11<01:20, 2.00s/it] 89%|████████▉ | 324/363 [12:13<01:15, 1.93s/it] {'loss': 0.0516, 'grad_norm': 1.402983153361783, 'learning_rate': 7.337874317807803e-07, 'epoch': 0.89} 89%|████████▉ | 324/363 [12:13<01:15, 1.93s/it] 90%|████████▉ | 325/363 [12:15<01:12, 1.91s/it] {'loss': 0.0612, 'grad_norm': 1.4500270750731776, 'learning_rate': 6.979843133816744e-07, 'epoch': 0.9} 90%|████████▉ | 325/363 [12:15<01:12, 1.91s/it] 90%|████████▉ | 326/363 [12:17<01:15, 2.05s/it] {'loss': 0.0333, 'grad_norm': 1.095211249638046, 'learning_rate': 6.630450448119618e-07, 'epoch': 0.9} 90%|████████▉ | 326/363 [12:17<01:15, 2.05s/it] 90%|█████████ | 327/363 [12:19<01:13, 2.05s/it] {'loss': 0.0953, 'grad_norm': 1.8925065532997027, 'learning_rate': 6.289728707749609e-07, 'epoch': 0.9} 90%|█████████ | 327/363 [12:19<01:13, 2.05s/it] 90%|█████████ | 328/363 [12:21<01:12, 2.06s/it] {'loss': 0.0893, 'grad_norm': 1.747624078137272, 'learning_rate': 5.957709554495683e-07, 'epoch': 0.9} 90%|█████████ | 328/363 [12:21<01:12, 2.06s/it] 91%|█████████ | 329/363 [12:24<01:15, 2.21s/it] {'loss': 0.0423, 'grad_norm': 0.9113610050884592, 'learning_rate': 5.634423821964074e-07, 'epoch': 0.91} 91%|█████████ | 329/363 [12:24<01:15, 2.21s/it] 91%|█████████ | 330/363 [12:26<01:16, 2.33s/it] {'loss': 0.1055, 'grad_norm': 1.6463257868897792, 'learning_rate': 5.319901532714877e-07, 'epoch': 0.91} 91%|█████████ | 330/363 [12:26<01:16, 2.33s/it] 91%|█████████ | 331/363 [12:29<01:16, 2.40s/it] {'loss': 0.1232, 'grad_norm': 1.84950657136513, 'learning_rate': 5.014171895473929e-07, 'epoch': 0.91} 91%|█████████ | 331/363 [12:29<01:16, 2.40s/it] 91%|█████████▏| 332/363 [12:32<01:16, 2.48s/it] {'loss': 0.076, 'grad_norm': 1.5561236996647523, 'learning_rate': 4.717263302420283e-07, 'epoch': 0.92} 91%|█████████▏| 332/363 [12:32<01:16, 2.48s/it] 92%|█████████▏| 333/363 [12:34<01:15, 2.53s/it] {'loss': 0.0835, 'grad_norm': 1.2285714349711996, 'learning_rate': 4.429203326549525e-07, 'epoch': 0.92} 92%|█████████▏| 333/363 [12:34<01:15, 2.53s/it] 92%|█████████▏| 334/363 [12:37<01:10, 2.43s/it] {'loss': 0.0734, 'grad_norm': 2.8092718918728288, 'learning_rate': 4.150018719113147e-07, 'epoch': 0.92} 92%|█████████▏| 334/363 [12:37<01:10, 2.43s/it] 92%|█████████▏| 335/363 [12:38<01:01, 2.18s/it] {'loss': 0.0768, 'grad_norm': 1.766273016183224, 'learning_rate': 3.8797354071342443e-07, 'epoch': 0.92} 92%|█████████▏| 335/363 [12:38<01:01, 2.18s/it] 93%|█████████▎| 336/363 [12:40<00:54, 2.03s/it] {'loss': 0.1027, 'grad_norm': 2.1729531887535285, 'learning_rate': 3.618378490999719e-07, 'epoch': 0.93} 93%|█████████▎| 336/363 [12:40<00:54, 2.03s/it] 93%|█████████▎| 337/363 [12:42<00:52, 2.03s/it] {'loss': 0.1176, 'grad_norm': 1.7482937409483954, 'learning_rate': 3.365972242129378e-07, 'epoch': 0.93} 93%|█████████▎| 337/363 [12:42<00:52, 2.03s/it] 93%|█████████▎| 338/363 [12:44<00:49, 1.96s/it] {'loss': 0.068, 'grad_norm': 1.2860014321281648, 'learning_rate': 3.122540100721794e-07, 'epoch': 0.93} 93%|█████████▎| 338/363 [12:44<00:49, 1.96s/it] 93%|█████████▎| 339/363 [12:46<00:46, 1.95s/it] {'loss': 0.0826, 'grad_norm': 1.8682982161376196, 'learning_rate': 2.888104673577574e-07, 'epoch': 0.94} 93%|█████████▎| 339/363 [12:46<00:46, 1.95s/it] 94%|█████████▎| 340/363 [12:47<00:42, 1.84s/it] {'loss': 0.1094, 'grad_norm': 2.1192579972688548, 'learning_rate': 2.66268773199988e-07, 'epoch': 0.94} 94%|█████████▎| 340/363 [12:47<00:42, 1.84s/it] 94%|█████████▍| 341/363 [12:49<00:40, 1.85s/it] {'loss': 0.1397, 'grad_norm': 1.812620322889224, 'learning_rate': 2.4463102097726843e-07, 'epoch': 0.94} 94%|█████████▍| 341/363 [12:49<00:40, 1.85s/it] 94%|█████████▍| 342/363 [12:51<00:37, 1.80s/it] {'loss': 0.0883, 'grad_norm': 1.55323368762906, 'learning_rate': 2.2389922012165944e-07, 'epoch': 0.94} 94%|█████████▍| 342/363 [12:51<00:37, 1.80s/it] 94%|█████████▍| 343/363 [12:52<00:35, 1.80s/it] {'loss': 0.0537, 'grad_norm': 1.3537994599803411, 'learning_rate': 2.0407529593228114e-07, 'epoch': 0.95} 94%|█████████▍| 343/363 [12:52<00:35, 1.80s/it] 95%|█████████▍| 344/363 [12:54<00:33, 1.78s/it] {'loss': 0.0841, 'grad_norm': 1.9963802514280435, 'learning_rate': 1.8516108939651945e-07, 'epoch': 0.95} 95%|█████████▍| 344/363 [12:54<00:33, 1.78s/it] 95%|█████████▌| 345/363 [12:56<00:31, 1.75s/it] {'loss': 0.1278, 'grad_norm': 1.9489250790151131, 'learning_rate': 1.6715835701905604e-07, 'epoch': 0.95} 95%|█████████▌| 345/363 [12:56<00:31, 1.75s/it] 95%|█████████▌| 346/363 [12:58<00:31, 1.83s/it] {'loss': 0.062, 'grad_norm': 1.208403698921063, 'learning_rate': 1.5006877065874338e-07, 'epoch': 0.95} 95%|█████████▌| 346/363 [12:58<00:31, 1.83s/it] 96%|█████████▌| 347/363 [13:00<00:29, 1.83s/it] {'loss': 0.1123, 'grad_norm': 1.9902454269295837, 'learning_rate': 1.3389391737335112e-07, 'epoch': 0.96} 96%|█████████▌| 347/363 [13:00<00:29, 1.83s/it] 96%|█████████▌| 348/363 [13:02<00:27, 1.83s/it] {'loss': 0.0859, 'grad_norm': 1.5549649154456184, 'learning_rate': 1.1863529927217731e-07, 'epoch': 0.96} 96%|█████████▌| 348/363 [13:02<00:27, 1.83s/it] 96%|█████████▌| 349/363 [13:04<00:27, 1.97s/it] {'loss': 0.0664, 'grad_norm': 1.5672800792400794, 'learning_rate': 1.0429433337655115e-07, 'epoch': 0.96} 96%|█████████▌| 349/363 [13:04<00:27, 1.97s/it] 96%|█████████▋| 350/363 [13:06<00:27, 2.10s/it] {'loss': 0.0978, 'grad_norm': 1.980266389039632, 'learning_rate': 9.08723514882437e-08, 'epoch': 0.97} 96%|█████████▋| 350/363 [13:06<00:27, 2.10s/it] 97%|█████████▋| 351/363 [13:09<00:26, 2.18s/it] {'loss': 0.082, 'grad_norm': 1.924180909114094, 'learning_rate': 7.837060006577801e-08, 'epoch': 0.97} 97%|█████████▋| 351/363 [13:09<00:26, 2.18s/it] 97%|█████████▋| 352/363 [13:11<00:24, 2.24s/it] {'loss': 0.0937, 'grad_norm': 1.360244814652006, 'learning_rate': 6.679024010868617e-08, 'epoch': 0.97} 97%|█████████▋| 352/363 [13:11<00:24, 2.24s/it] 97%|█████████▋| 353/363 [13:14<00:23, 2.36s/it] {'loss': 0.0944, 'grad_norm': 1.3638474049460683, 'learning_rate': 5.6132347049679955e-08, 'epoch': 0.97} 97%|█████████▋| 353/363 [13:14<00:23, 2.36s/it] 98%|█████████▊| 354/363 [13:16<00:21, 2.37s/it] {'loss': 0.0373, 'grad_norm': 1.0043906816455541, 'learning_rate': 4.639791065478738e-08, 'epoch': 0.98} 98%|█████████▊| 354/363 [13:16<00:21, 2.37s/it] 98%|█████████▊| 355/363 [13:18<00:18, 2.33s/it] {'loss': 0.0938, 'grad_norm': 1.3136551040176567, 'learning_rate': 3.758783493142737e-08, 'epoch': 0.98} 98%|█████████▊| 355/363 [13:18<00:18, 2.33s/it] 98%|█████████▊| 356/363 [13:20<00:15, 2.22s/it] {'loss': 0.1083, 'grad_norm': 1.2330770172633143, 'learning_rate': 2.9702938044468e-08, 'epoch': 0.98} 98%|█████████▊| 356/363 [13:20<00:15, 2.22s/it] 98%|█████████▊| 357/363 [13:22<00:12, 2.15s/it] {'loss': 0.0667, 'grad_norm': 1.4318266541987408, 'learning_rate': 2.274395224023618e-08, 'epoch': 0.98} 98%|█████████▊| 357/363 [13:22<00:12, 2.15s/it] 99%|█████████▊| 358/363 [13:24<00:10, 2.09s/it] {'loss': 0.0907, 'grad_norm': 1.4889881534099365, 'learning_rate': 1.671152377852092e-08, 'epoch': 0.99} 99%|█████████▊| 358/363 [13:24<00:10, 2.09s/it] 99%|█████████▉| 359/363 [13:26<00:08, 2.01s/it] {'loss': 0.0937, 'grad_norm': 1.9950572073445185, 'learning_rate': 1.1606212872559142e-08, 'epoch': 0.99} 99%|█████████▉| 359/363 [13:26<00:08, 2.01s/it] 99%|█████████▉| 360/363 [13:28<00:05, 1.93s/it] {'loss': 0.0863, 'grad_norm': 1.2536334792292063, 'learning_rate': 7.42849363700282e-09, 'epoch': 0.99} 99%|█████████▉| 360/363 [13:28<00:05, 1.93s/it] 99%|█████████▉| 361/363 [13:30<00:03, 1.91s/it] {'loss': 0.0593, 'grad_norm': 1.2816200143172458, 'learning_rate': 4.178754043898669e-09, 'epoch': 1.0} 99%|█████████▉| 361/363 [13:30<00:03, 1.91s/it] 100%|█████████▉| 362/363 [13:31<00:01, 1.86s/it] {'loss': 0.0859, 'grad_norm': 0.7637431538197891, 'learning_rate': 1.8572958866514e-09, 'epoch': 1.0} 100%|█████████▉| 362/363 [13:31<00:01, 1.86s/it] 100%|██████████| 363/363 [13:32<00:00, 1.62s/it] {'loss': 0.0658, 'grad_norm': 2.3907025925168472, 'learning_rate': 4.643347520005836e-10, 'epoch': 1.0} 100%|██████████| 363/363 [13:32<00:00, 1.62s/it][INFO|trainer.py:4289] 2026-01-30 12:30:06,422 >> Saving model checkpoint to /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363 [INFO|configuration_utils.py:491] 2026-01-30 12:30:06,426 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/config.json [INFO|configuration_utils.py:826] 2026-01-30 12:30:06,427 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/generation_config.json [INFO|modeling_utils.py:4305] 2026-01-30 12:30:25,726 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:30:25,728 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:30:25,728 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:30:25,729 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:30:25,729 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/added_tokens.json [INFO|image_processing_base.py:253] 2026-01-30 12:30:26,775 >> Image processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/preprocessor_config.json [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:30:26,776 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:30:26,776 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:30:26,777 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:30:26,777 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/added_tokens.json [INFO|video_processing_utils.py:610] 2026-01-30 12:30:27,084 >> Video processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/video_preprocessor_config.json [INFO|processing_utils.py:752] 2026-01-30 12:30:27,084 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/checkpoint-363/chat_template.jinja [INFO|trainer.py:2808] 2026-01-30 12:30:27,576 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 845.3624, 'train_samples_per_second': 6.855, 'train_steps_per_second': 0.429, 'train_loss': 0.09636665488652289, 'epoch': 1.0} 100%|██████████| 363/363 [14:02<00:00, 1.62s/it] 100%|██████████| 363/363 [14:02<00:00, 2.32s/it] [INFO|image_processing_base.py:253] 2026-01-30 12:30:27,596 >> Image processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/preprocessor_config.json [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:30:27,597 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:30:27,597 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:30:27,597 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:30:27,598 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/added_tokens.json [INFO|video_processing_utils.py:610] 2026-01-30 12:30:27,913 >> Video processor saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/video_preprocessor_config.json [INFO|processing_utils.py:752] 2026-01-30 12:30:27,913 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/chat_template.jinja [INFO|trainer.py:4289] 2026-01-30 12:30:38,047 >> Saving model checkpoint to /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model [INFO|configuration_utils.py:491] 2026-01-30 12:30:38,053 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/config.json [INFO|configuration_utils.py:826] 2026-01-30 12:30:38,054 >> Configuration saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/generation_config.json [INFO|modeling_utils.py:4305] 2026-01-30 12:30:56,476 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2394] 2026-01-30 12:30:56,536 >> chat template saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/chat_template.jinja [INFO|tokenization_utils_base.py:2563] 2026-01-30 12:30:56,536 >> tokenizer config file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/tokenizer_config.json [INFO|tokenization_utils_base.py:2572] 2026-01-30 12:30:56,537 >> Special tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/special_tokens_map.json [INFO|tokenization_utils_base.py:2623] 2026-01-30 12:30:56,537 >> added tokens file saved in /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/added_tokens.json ***** train metrics ***** epoch = 1.0 total_flos = 47269GF train_loss = 0.0964 train_runtime = 0:14:05.36 train_samples_per_second = 6.855 train_steps_per_second = 0.429 Figure saved at: /mnt/disk1/exps/verl_vagen/iterative_ppo_sft/iteration_4/sft/model/training_loss.png [WARNING|2026-01-30 12:30:57] llamafactory.extras.ploting:148 >> No metric eval_loss to plot. [WARNING|2026-01-30 12:30:57] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot. [INFO|modelcard.py:456] 2026-01-30 12:30:57,408 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} wandb: wandb: 🚀 View run skilled-universe-25 at: https://wandb.ai/ragen-V/llamafactory/runs/eiq1f5w1 wandb: Find logs at: wandb/run-20260130_121623-eiq1f5w1/logs