ray init kwargs: {'num_cpus': None, 'runtime_env': {'env_vars': {'TOKENIZERS_PARALLELISM': 'true', 'NCCL_DEBUG': 'WARN', 'VLLM_LOGGING_LEVEL': 'WARN', 'VLLM_ALLOW_RUNTIME_LORA_UPDATING': 'true', 'CUDA_DEVICE_MAX_CONNECTIONS': '1', 'NCCL_CUMEM_ENABLE': '0', 'VLLM_DISABLE_COMPILE_CACHE': '1', 'HCCL_HOST_SOCKET_PORT_RANGE': 'auto', 'HCCL_NPU_SOCKET_PORT_RANGE': 'auto'}, 'working_dir': None}} 2026-04-12 11:27:02,205 WARNING services.py:2168 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 8360747008 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM. 2026-04-12 11:27:05,380 INFO worker.py:2004 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265  /storage/workspace/server-5/rl/miniconda3/envs/verl/lib/python3.10/site-packages/ray/_private/worker.py:2052: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0 warnings.warn( (TaskRunner pid=2823680) TaskRunner hostname: gpu-ssh-server-5, PID: 2823680 (TaskRunner pid=2823680) /storage/workspace/server-5/rl/jeremy/efficiency/verl/verl/trainer/main_ppo.py:302: UserWarning: Disabled critic as algorithm.adv_estimator != gae. If it is not intended, please set critic.enable=True (TaskRunner pid=2823680) use_critic=need_critic(config), (TaskRunner pid=2823680) {'actor_rollout_ref': {'actor': {'_target_': 'verl.workers.config.FSDPActorConfig', (TaskRunner pid=2823680) 'calculate_entropy': True, (TaskRunner pid=2823680) 'calculate_sum_pi_squared': False, (TaskRunner pid=2823680) 'checkpoint': {'_target_': 'verl.trainer.config.CheckpointConfig', (TaskRunner pid=2823680) 'async_save': False, (TaskRunner pid=2823680) 'load_contents': ['model', (TaskRunner pid=2823680) 'optimizer', (TaskRunner pid=2823680) 'extra'], (TaskRunner pid=2823680) 'mbridge_config': {}, (TaskRunner pid=2823680) 'save_contents': ['model', (TaskRunner pid=2823680) 'optimizer', (TaskRunner pid=2823680) 'extra']}, (TaskRunner pid=2823680) 'clip_ratio': 0.2, (TaskRunner pid=2823680) 'clip_ratio_c': 3.0, (TaskRunner pid=2823680) 'clip_ratio_high': 0.2, (TaskRunner pid=2823680) 'clip_ratio_low': 0.2, (TaskRunner pid=2823680) 'data_loader_seed': 42, (TaskRunner pid=2823680) 'entropy_checkpointing': False, (TaskRunner pid=2823680) 'entropy_coeff': 0, (TaskRunner pid=2823680) 'entropy_from_logits_with_chunking': False, (TaskRunner pid=2823680) 'freeze_vision_tower': False, (TaskRunner pid=2823680) 'fsdp_config': {'_target_': 'verl.workers.config.FSDPEngineConfig', (TaskRunner pid=2823680) 'dtype': 'bfloat16', (TaskRunner pid=2823680) 'entropy_checkpointing': False, (TaskRunner pid=2823680) 'entropy_from_logits_with_chunking': False, (TaskRunner pid=2823680) 'forward_only': False, (TaskRunner pid=2823680) 'forward_prefetch': False, (TaskRunner pid=2823680) 'fsdp_size': -1, (TaskRunner pid=2823680) 'full_determinism': False, (TaskRunner pid=2823680) 'model_dtype': 'fp32', (TaskRunner pid=2823680) 'offload_policy': False, (TaskRunner pid=2823680) 'optimizer_offload': False, (TaskRunner pid=2823680) 'param_offload': False, (TaskRunner pid=2823680) 'qat': {'_target_': 'verl.workers.config.QATEngineConfig', (TaskRunner pid=2823680) 'activation_observer': 'static_minmax', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'group_size': 16, (TaskRunner pid=2823680) 'ignore_patterns': ['lm_head', (TaskRunner pid=2823680) 'embed_tokens', (TaskRunner pid=2823680) 're:.*mlp.gate$'], (TaskRunner pid=2823680) 'mode': 'w4a16', (TaskRunner pid=2823680) 'quantization_config_path': None}, (TaskRunner pid=2823680) 'reshard_after_forward': True, (TaskRunner pid=2823680) 'seed': 42, (TaskRunner pid=2823680) 'strategy': 'fsdp', (TaskRunner pid=2823680) 'ulysses_sequence_parallel_size': 1, (TaskRunner pid=2823680) 'use_orig_params': False, (TaskRunner pid=2823680) 'use_torch_compile': True, (TaskRunner pid=2823680) 'wrap_policy': {'min_num_params': 0}}, (TaskRunner pid=2823680) 'grad_clip': 1.0, (TaskRunner pid=2823680) 'kl_loss_coef': 0.0, (TaskRunner pid=2823680) 'kl_loss_type': 'low_var_kl', (TaskRunner pid=2823680) 'loss_agg_mode': 'token-mean', (TaskRunner pid=2823680) 'loss_scale_factor': None, (TaskRunner pid=2823680) 'optim': {'_target_': 'verl.workers.config.FSDPOptimizerConfig', (TaskRunner pid=2823680) 'betas': [0.9, 0.999], (TaskRunner pid=2823680) 'clip_grad': 1.0, (TaskRunner pid=2823680) 'lr': 1e-06, (TaskRunner pid=2823680) 'lr_scheduler_type': 'constant', (TaskRunner pid=2823680) 'lr_warmup_steps': -1, (TaskRunner pid=2823680) 'lr_warmup_steps_ratio': 0.0, (TaskRunner pid=2823680) 'min_lr_ratio': 0.0, (TaskRunner pid=2823680) 'num_cycles': 0.5, (TaskRunner pid=2823680) 'optimizer': 'AdamW', (TaskRunner pid=2823680) 'optimizer_impl': 'torch.optim', (TaskRunner pid=2823680) 'override_optimizer_config': None, (TaskRunner pid=2823680) 'total_training_steps': -1, (TaskRunner pid=2823680) 'warmup_style': None, (TaskRunner pid=2823680) 'weight_decay': 0.01, (TaskRunner pid=2823680) 'zero_indexed_step': True}, (TaskRunner pid=2823680) 'policy_loss': {'_target_': 'verl.workers.config.PolicyLossConfig', (TaskRunner pid=2823680) 'clip_cov_lb': 1.0, (TaskRunner pid=2823680) 'clip_cov_ratio': 0.0002, (TaskRunner pid=2823680) 'clip_cov_ub': 5.0, (TaskRunner pid=2823680) 'kl_cov_ratio': 0.0002, (TaskRunner pid=2823680) 'loss_mode': 'vanilla', (TaskRunner pid=2823680) 'ppo_kl_coef': 0.1}, (TaskRunner pid=2823680) 'ppo_epochs': 1, (TaskRunner pid=2823680) 'ppo_max_token_len_per_gpu': 10240, (TaskRunner pid=2823680) 'ppo_micro_batch_size': None, (TaskRunner pid=2823680) 'ppo_micro_batch_size_per_gpu': 4, (TaskRunner pid=2823680) 'ppo_mini_batch_size': 8, (TaskRunner pid=2823680) 'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig', (TaskRunner pid=2823680) 'all_ranks': False, (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'ranks': [], (TaskRunner pid=2823680) 'save_path': 'outputs/profile', (TaskRunner pid=2823680) 'tool': None, (TaskRunner pid=2823680) 'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig', (TaskRunner pid=2823680) 'analysis': True, (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False, (TaskRunner pid=2823680) 'level': 'level0'}, (TaskRunner pid=2823680) 'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig', (TaskRunner pid=2823680) 'discrete': False}, (TaskRunner pid=2823680) 'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig', (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False}, (TaskRunner pid=2823680) 'torch_memory': {'_target_': 'verl.utils.profiler.config.TorchMemoryToolConfig', (TaskRunner pid=2823680) 'stack_depth': 32, (TaskRunner pid=2823680) 'trace_alloc_max_entries': 100000}}}, (TaskRunner pid=2823680) 'qat': {'activation_observer': 'static_minmax', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'group_size': 16, (TaskRunner pid=2823680) 'ignore_patterns': ['lm_head', (TaskRunner pid=2823680) 'embed_tokens', (TaskRunner pid=2823680) 're:.*mlp.gate$'], (TaskRunner pid=2823680) 'mode': 'w4a16', (TaskRunner pid=2823680) 'quantization_config_path': None}, (TaskRunner pid=2823680) 'rollout_n': 8, (TaskRunner pid=2823680) 'router_replay': {'_target_': 'verl.workers.config.RouterReplayConfig', (TaskRunner pid=2823680) 'mode': 'disabled', (TaskRunner pid=2823680) 'record_file': None, (TaskRunner pid=2823680) 'replay_file': None}, (TaskRunner pid=2823680) 'shuffle': False, (TaskRunner pid=2823680) 'strategy': 'fsdp', (TaskRunner pid=2823680) 'sum_pi_squared_checkpointing': False, (TaskRunner pid=2823680) 'tau_neg': 1.05, (TaskRunner pid=2823680) 'tau_pos': 1.0, (TaskRunner pid=2823680) 'ulysses_sequence_parallel_size': 1, (TaskRunner pid=2823680) 'use_dynamic_bsz': False, (TaskRunner pid=2823680) 'use_fused_kernels': False, (TaskRunner pid=2823680) 'use_kl_loss': False, (TaskRunner pid=2823680) 'use_prefix_grouper': False, (TaskRunner pid=2823680) 'use_remove_padding': True, (TaskRunner pid=2823680) 'use_torch_compile': True}, (TaskRunner pid=2823680) 'hybrid_engine': True, (TaskRunner pid=2823680) 'model': {'_target_': 'verl.workers.config.HFModelConfig', (TaskRunner pid=2823680) 'custom_chat_template': None, (TaskRunner pid=2823680) 'enable_activation_offload': False, (TaskRunner pid=2823680) 'enable_gradient_checkpointing': True, (TaskRunner pid=2823680) 'exclude_modules': None, (TaskRunner pid=2823680) 'external_lib': None, (TaskRunner pid=2823680) 'fused_kernel_options': {'impl_backend': 'torch'}, (TaskRunner pid=2823680) 'hf_config_path': None, (TaskRunner pid=2823680) 'lora_adapter_path': None, (TaskRunner pid=2823680) 'lora_alpha': 16, (TaskRunner pid=2823680) 'lora_rank': 0, (TaskRunner pid=2823680) 'mtp': {'_target_': 'verl.workers.config.MtpConfig', (TaskRunner pid=2823680) 'detach_encoder': False, (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'enable_rollout': False, (TaskRunner pid=2823680) 'enable_train': False, (TaskRunner pid=2823680) 'method': 'mtp', (TaskRunner pid=2823680) 'mtp_loss_scaling_factor': 0.1, (TaskRunner pid=2823680) 'num_speculative_tokens': 1, (TaskRunner pid=2823680) 'speculative_algorithm': 'EAGLE', (TaskRunner pid=2823680) 'speculative_eagle_topk': 1, (TaskRunner pid=2823680) 'speculative_num_draft_tokens': 4, (TaskRunner pid=2823680) 'speculative_num_steps': 3}, (TaskRunner pid=2823680) 'override_config': {'attn_implementation': 'flash_attention_2'}, (TaskRunner pid=2823680) 'path': 'RoadQAQ/Qwen2.5-Math-1.5B-16k-think', (TaskRunner pid=2823680) 'target_modules': 'all-linear', (TaskRunner pid=2823680) 'tiled_mlp': {'enabled': False, (TaskRunner pid=2823680) 'num_shards': 4}, (TaskRunner pid=2823680) 'tokenizer_path': None, (TaskRunner pid=2823680) 'trust_remote_code': False, (TaskRunner pid=2823680) 'use_fused_kernels': False, (TaskRunner pid=2823680) 'use_liger': False, (TaskRunner pid=2823680) 'use_remove_padding': True, (TaskRunner pid=2823680) 'use_shm': False}, (TaskRunner pid=2823680) 'nccl_timeout': 600, (TaskRunner pid=2823680) 'ref': {'_target_': 'verl.workers.config.FSDPActorConfig', (TaskRunner pid=2823680) 'entropy_checkpointing': False, (TaskRunner pid=2823680) 'entropy_from_logits_with_chunking': False, (TaskRunner pid=2823680) 'fsdp_config': {'_target_': 'verl.workers.config.FSDPEngineConfig', (TaskRunner pid=2823680) 'dtype': 'bfloat16', (TaskRunner pid=2823680) 'entropy_checkpointing': False, (TaskRunner pid=2823680) 'entropy_from_logits_with_chunking': False, (TaskRunner pid=2823680) 'forward_only': True, (TaskRunner pid=2823680) 'forward_prefetch': False, (TaskRunner pid=2823680) 'fsdp_size': -1, (TaskRunner pid=2823680) 'full_determinism': False, (TaskRunner pid=2823680) 'model_dtype': 'fp32', (TaskRunner pid=2823680) 'offload_policy': False, (TaskRunner pid=2823680) 'optimizer_offload': False, (TaskRunner pid=2823680) 'param_offload': False, (TaskRunner pid=2823680) 'qat': {'_target_': 'verl.workers.config.QATEngineConfig', (TaskRunner pid=2823680) 'activation_observer': 'static_minmax', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'group_size': 16, (TaskRunner pid=2823680) 'ignore_patterns': ['lm_head', (TaskRunner pid=2823680) 'embed_tokens', (TaskRunner pid=2823680) 're:.*mlp.gate$'], (TaskRunner pid=2823680) 'mode': 'w4a16', (TaskRunner pid=2823680) 'quantization_config_path': None}, (TaskRunner pid=2823680) 'reshard_after_forward': True, (TaskRunner pid=2823680) 'seed': 42, (TaskRunner pid=2823680) 'strategy': 'fsdp', (TaskRunner pid=2823680) 'ulysses_sequence_parallel_size': 1, (TaskRunner pid=2823680) 'use_orig_params': False, (TaskRunner pid=2823680) 'use_torch_compile': True, (TaskRunner pid=2823680) 'wrap_policy': {'min_num_params': 0}}, (TaskRunner pid=2823680) 'log_prob_max_token_len_per_gpu': 10240, (TaskRunner pid=2823680) 'log_prob_micro_batch_size': None, (TaskRunner pid=2823680) 'log_prob_micro_batch_size_per_gpu': 32, (TaskRunner pid=2823680) 'log_prob_use_dynamic_bsz': False, (TaskRunner pid=2823680) 'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig', (TaskRunner pid=2823680) 'all_ranks': False, (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'ranks': [], (TaskRunner pid=2823680) 'save_path': 'outputs/profile', (TaskRunner pid=2823680) 'tool': None, (TaskRunner pid=2823680) 'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig', (TaskRunner pid=2823680) 'analysis': True, (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False, (TaskRunner pid=2823680) 'level': 'level0'}, (TaskRunner pid=2823680) 'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig', (TaskRunner pid=2823680) 'discrete': False}, (TaskRunner pid=2823680) 'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig', (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False}, (TaskRunner pid=2823680) 'torch_memory': {'_target_': 'verl.utils.profiler.config.TorchMemoryToolConfig', (TaskRunner pid=2823680) 'stack_depth': 32, (TaskRunner pid=2823680) 'trace_alloc_max_entries': 100000}}}, (TaskRunner pid=2823680) 'rollout_n': 8, (TaskRunner pid=2823680) 'router_replay': {'_target_': 'verl.workers.config.RouterReplayConfig', (TaskRunner pid=2823680) 'mode': 'disabled', (TaskRunner pid=2823680) 'record_file': None, (TaskRunner pid=2823680) 'replay_file': None}, (TaskRunner pid=2823680) 'strategy': 'fsdp', (TaskRunner pid=2823680) 'ulysses_sequence_parallel_size': 1, (TaskRunner pid=2823680) 'use_torch_compile': True}, (TaskRunner pid=2823680) 'rollout': {'_target_': 'verl.workers.config.RolloutConfig', (TaskRunner pid=2823680) 'agent': {'_target_': 'verl.workers.config.AgentLoopConfig', (TaskRunner pid=2823680) 'agent_loop_config_path': None, (TaskRunner pid=2823680) 'custom_async_server': {'_target_': 'verl.workers.config.CustomAsyncServerConfig', (TaskRunner pid=2823680) 'name': None, (TaskRunner pid=2823680) 'path': None}, (TaskRunner pid=2823680) 'default_agent_loop': 'single_turn_agent', (TaskRunner pid=2823680) 'num_workers': 8}, (TaskRunner pid=2823680) 'calculate_log_probs': False, (TaskRunner pid=2823680) 'checkpoint_engine': {'_target_': 'verl.workers.config.CheckpointEngineConfig', (TaskRunner pid=2823680) 'backend': 'naive', (TaskRunner pid=2823680) 'engine_kwargs': {}, (TaskRunner pid=2823680) 'update_weights_bucket_megabytes': 4096}, (TaskRunner pid=2823680) 'cudagraph_capture_sizes': None, (TaskRunner pid=2823680) 'data_parallel_size': 1, (TaskRunner pid=2823680) 'disable_log_stats': True, (TaskRunner pid=2823680) 'do_sample': True, (TaskRunner pid=2823680) 'dtype': 'bfloat16', (TaskRunner pid=2823680) 'enable_chunked_prefill': True, (TaskRunner pid=2823680) 'enable_prefix_caching': True, (TaskRunner pid=2823680) 'enable_rollout_routing_replay': False, (TaskRunner pid=2823680) 'enforce_eager': False, (TaskRunner pid=2823680) 'engine_kwargs': {'sglang': {}, (TaskRunner pid=2823680) 'trtllm': {}, (TaskRunner pid=2823680) 'vllm': {'distributed_executor_backend': 'uni'}}, (TaskRunner pid=2823680) 'expert_parallel_size': 1, (TaskRunner pid=2823680) 'free_cache_engine': True, (TaskRunner pid=2823680) 'gpu_memory_utilization': 0.6, (TaskRunner pid=2823680) 'ignore_eos': False, (TaskRunner pid=2823680) 'layered_summon': False, (TaskRunner pid=2823680) 'load_format': 'dummy', (TaskRunner pid=2823680) 'log_prob_max_token_len_per_gpu': 10240, (TaskRunner pid=2823680) 'log_prob_micro_batch_size': None, (TaskRunner pid=2823680) 'log_prob_micro_batch_size_per_gpu': 4, (TaskRunner pid=2823680) 'log_prob_use_dynamic_bsz': False, (TaskRunner pid=2823680) 'logprobs_mode': 'processed_logprobs', (TaskRunner pid=2823680) 'max_model_len': 16384, (TaskRunner pid=2823680) 'max_num_batched_tokens': 8192, (TaskRunner pid=2823680) 'max_num_seqs': 1024, (TaskRunner pid=2823680) 'mode': 'async', (TaskRunner pid=2823680) 'mtp': {'_target_': 'verl.workers.config.MtpConfig', (TaskRunner pid=2823680) 'detach_encoder': False, (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'enable_rollout': False, (TaskRunner pid=2823680) 'enable_train': False, (TaskRunner pid=2823680) 'method': 'mtp', (TaskRunner pid=2823680) 'mtp_loss_scaling_factor': 0.1, (TaskRunner pid=2823680) 'num_speculative_tokens': 1, (TaskRunner pid=2823680) 'speculative_algorithm': 'EAGLE', (TaskRunner pid=2823680) 'speculative_eagle_topk': 1, (TaskRunner pid=2823680) 'speculative_num_draft_tokens': 4, (TaskRunner pid=2823680) 'speculative_num_steps': 3}, (TaskRunner pid=2823680) 'multi_stage_wake_up': False, (TaskRunner pid=2823680) 'multi_turn': {'_target_': 'verl.workers.config.MultiTurnConfig', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'format': 'hermes', (TaskRunner pid=2823680) 'interaction_config_path': None, (TaskRunner pid=2823680) 'max_assistant_turns': None, (TaskRunner pid=2823680) 'max_parallel_calls': 1, (TaskRunner pid=2823680) 'max_tool_response_length': 256, (TaskRunner pid=2823680) 'max_user_turns': None, (TaskRunner pid=2823680) 'num_repeat_rollouts': None, (TaskRunner pid=2823680) 'tokenization_sanity_check_mode': 'strict', (TaskRunner pid=2823680) 'tool_config_path': None, (TaskRunner pid=2823680) 'tool_response_truncate_side': 'middle', (TaskRunner pid=2823680) 'use_inference_chat_template': False}, (TaskRunner pid=2823680) 'n': 8, (TaskRunner pid=2823680) 'n_gpus_per_node': 4, (TaskRunner pid=2823680) 'name': 'vllm', (TaskRunner pid=2823680) 'nnodes': 0, (TaskRunner pid=2823680) 'over_sample_rate': 0, (TaskRunner pid=2823680) 'pipeline_model_parallel_size': 1, (TaskRunner pid=2823680) 'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig', (TaskRunner pid=2823680) 'all_ranks': False, (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'ranks': [], (TaskRunner pid=2823680) 'save_path': 'outputs/profile', (TaskRunner pid=2823680) 'tool': None, (TaskRunner pid=2823680) 'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig', (TaskRunner pid=2823680) 'analysis': True, (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False, (TaskRunner pid=2823680) 'level': 'level0'}, (TaskRunner pid=2823680) 'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig', (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False}}}, (TaskRunner pid=2823680) 'prometheus': {'_target_': 'verl.workers.config.PrometheusConfig', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'file': '/tmp/ray/session_latest/metrics/prometheus/prometheus.yml', (TaskRunner pid=2823680) 'port': 9090, (TaskRunner pid=2823680) 'served_model_name': 'RoadQAQ/Qwen2.5-Math-1.5B-16k-think'}, (TaskRunner pid=2823680) 'prompt_length': 2048, (TaskRunner pid=2823680) 'qat': {'_target_': 'verl.workers.config.QATEngineConfig', (TaskRunner pid=2823680) 'activation_observer': 'static_minmax', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'group_size': 16, (TaskRunner pid=2823680) 'ignore_patterns': ['lm_head', (TaskRunner pid=2823680) 'embed_tokens', (TaskRunner pid=2823680) 're:.*mlp.gate$'], (TaskRunner pid=2823680) 'mode': 'w4a16', (TaskRunner pid=2823680) 'quantization_config_path': None}, (TaskRunner pid=2823680) 'quantization': None, (TaskRunner pid=2823680) 'quantization_config_file': None, (TaskRunner pid=2823680) 'response_length': 8192, (TaskRunner pid=2823680) 'scheduling_policy': 'fcfs', (TaskRunner pid=2823680) 'skip_dump_dir': '/tmp/rollout_dump', (TaskRunner pid=2823680) 'skip_rollout': False, (TaskRunner pid=2823680) 'skip_tokenizer_init': True, (TaskRunner pid=2823680) 'temperature': 1.0, (TaskRunner pid=2823680) 'tensor_model_parallel_size': 1, (TaskRunner pid=2823680) 'top_k': -1, (TaskRunner pid=2823680) 'top_p': 1, (TaskRunner pid=2823680) 'trace': {'_target_': 'verl.workers.config.TraceConfig', (TaskRunner pid=2823680) 'backend': None, (TaskRunner pid=2823680) 'experiment_name': 'efficiency_qwen2_5_math_1_5b_16k_dapo_math_grpo_quarter_frontier-20260412.112633', (TaskRunner pid=2823680) 'max_samples_per_step_per_worker': None, (TaskRunner pid=2823680) 'project_name': 'efficiency', (TaskRunner pid=2823680) 'token2text': False}, (TaskRunner pid=2823680) 'val_kwargs': {'_target_': 'verl.workers.config.SamplingConfig', (TaskRunner pid=2823680) 'do_sample': True, (TaskRunner pid=2823680) 'n': 16, (TaskRunner pid=2823680) 'n_per_data_source': {'aime2024': 16, (TaskRunner pid=2823680) 'aime2025': 16, (TaskRunner pid=2823680) 'math500': 4}, (TaskRunner pid=2823680) 'temperature': 0.7, (TaskRunner pid=2823680) 'top_k': -1, (TaskRunner pid=2823680) 'top_p': 0.8}, (TaskRunner pid=2823680) 'val_response_length': 14336}}, (TaskRunner pid=2823680) 'algorithm': {'_target_': 'verl.trainer.config.AlgoConfig', (TaskRunner pid=2823680) 'adv_estimator': 'grpo', (TaskRunner pid=2823680) 'gamma': 1.0, (TaskRunner pid=2823680) 'kl_ctrl': {'_target_': 'verl.trainer.config.KLControlConfig', (TaskRunner pid=2823680) 'horizon': 10000, (TaskRunner pid=2823680) 'kl_coef': 0.001, (TaskRunner pid=2823680) 'target_kl': 0.1, (TaskRunner pid=2823680) 'type': 'fixed'}, (TaskRunner pid=2823680) 'kl_penalty': 'kl', (TaskRunner pid=2823680) 'lam': 1.0, (TaskRunner pid=2823680) 'norm_adv_by_std_in_grpo': True, (TaskRunner pid=2823680) 'pf_ppo': {'reweight_method': 'pow', 'weight_pow': 2.0}, (TaskRunner pid=2823680) 'rollout_correction': {'bypass_mode': False, (TaskRunner pid=2823680) 'loss_type': 'ppo_clip', (TaskRunner pid=2823680) 'rollout_is': None, (TaskRunner pid=2823680) 'rollout_is_batch_normalize': False, (TaskRunner pid=2823680) 'rollout_is_threshold': 2.0, (TaskRunner pid=2823680) 'rollout_rs': None, (TaskRunner pid=2823680) 'rollout_rs_threshold': None}, (TaskRunner pid=2823680) 'use_kl_in_reward': True, (TaskRunner pid=2823680) 'use_pf_ppo': False}, (TaskRunner pid=2823680) 'critic': {'_target_': 'verl.workers.config.FSDPCriticConfig', (TaskRunner pid=2823680) 'checkpoint': {'_target_': 'verl.trainer.config.CheckpointConfig', (TaskRunner pid=2823680) 'async_save': False, (TaskRunner pid=2823680) 'load_contents': ['model', 'optimizer', 'extra'], (TaskRunner pid=2823680) 'mbridge_config': {}, (TaskRunner pid=2823680) 'save_contents': ['model', 'optimizer', 'extra']}, (TaskRunner pid=2823680) 'cliprange_value': 0.5, (TaskRunner pid=2823680) 'data_loader_seed': 42, (TaskRunner pid=2823680) 'enable': None, (TaskRunner pid=2823680) 'forward_max_token_len_per_gpu': 32768, (TaskRunner pid=2823680) 'forward_micro_batch_size': None, (TaskRunner pid=2823680) 'forward_micro_batch_size_per_gpu': None, (TaskRunner pid=2823680) 'grad_clip': 1.0, (TaskRunner pid=2823680) 'loss_agg_mode': 'token-mean', (TaskRunner pid=2823680) 'model': {'_target_': 'verl.workers.config.FSDPCriticModelCfg', (TaskRunner pid=2823680) 'enable_activation_offload': False, (TaskRunner pid=2823680) 'enable_gradient_checkpointing': True, (TaskRunner pid=2823680) 'external_lib': None, (TaskRunner pid=2823680) 'fsdp_config': {'_target_': 'verl.workers.config.FSDPEngineConfig', (TaskRunner pid=2823680) 'dtype': 'bfloat16', (TaskRunner pid=2823680) 'entropy_checkpointing': False, (TaskRunner pid=2823680) 'entropy_from_logits_with_chunking': False, (TaskRunner pid=2823680) 'forward_only': False, (TaskRunner pid=2823680) 'forward_prefetch': False, (TaskRunner pid=2823680) 'fsdp_size': -1, (TaskRunner pid=2823680) 'full_determinism': False, (TaskRunner pid=2823680) 'model_dtype': 'fp32', (TaskRunner pid=2823680) 'offload_policy': False, (TaskRunner pid=2823680) 'optimizer_offload': False, (TaskRunner pid=2823680) 'param_offload': False, (TaskRunner pid=2823680) 'qat': {'_target_': 'verl.workers.config.QATEngineConfig', (TaskRunner pid=2823680) 'activation_observer': 'static_minmax', (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'group_size': 16, (TaskRunner pid=2823680) 'ignore_patterns': ['lm_head', (TaskRunner pid=2823680) 'embed_tokens', (TaskRunner pid=2823680) 're:.*mlp.gate$'], (TaskRunner pid=2823680) 'mode': 'w4a16', (TaskRunner pid=2823680) 'quantization_config_path': None}, (TaskRunner pid=2823680) 'reshard_after_forward': True, (TaskRunner pid=2823680) 'seed': 42, (TaskRunner pid=2823680) 'strategy': 'fsdp', (TaskRunner pid=2823680) 'ulysses_sequence_parallel_size': 1, (TaskRunner pid=2823680) 'use_orig_params': False, (TaskRunner pid=2823680) 'use_torch_compile': True, (TaskRunner pid=2823680) 'wrap_policy': {'min_num_params': 0}}, (TaskRunner pid=2823680) 'lora_alpha': 16, (TaskRunner pid=2823680) 'lora_rank': 0, (TaskRunner pid=2823680) 'override_config': {}, (TaskRunner pid=2823680) 'path': '~/models/deepseek-llm-7b-chat', (TaskRunner pid=2823680) 'target_modules': 'all-linear', (TaskRunner pid=2823680) 'tiled_mlp': {'enabled': False, 'num_shards': 4}, (TaskRunner pid=2823680) 'tokenizer_path': 'RoadQAQ/Qwen2.5-Math-1.5B-16k-think', (TaskRunner pid=2823680) 'trust_remote_code': False, (TaskRunner pid=2823680) 'use_remove_padding': False, (TaskRunner pid=2823680) 'use_shm': False}, (TaskRunner pid=2823680) 'optim': {'_target_': 'verl.workers.config.FSDPOptimizerConfig', (TaskRunner pid=2823680) 'betas': [0.9, 0.999], (TaskRunner pid=2823680) 'clip_grad': 1.0, (TaskRunner pid=2823680) 'lr': 1e-05, (TaskRunner pid=2823680) 'lr_scheduler_type': 'constant', (TaskRunner pid=2823680) 'lr_warmup_steps': -1, (TaskRunner pid=2823680) 'lr_warmup_steps_ratio': 0.0, (TaskRunner pid=2823680) 'min_lr_ratio': 0.0, (TaskRunner pid=2823680) 'num_cycles': 0.5, (TaskRunner pid=2823680) 'optimizer': 'AdamW', (TaskRunner pid=2823680) 'optimizer_impl': 'torch.optim', (TaskRunner pid=2823680) 'override_optimizer_config': None, (TaskRunner pid=2823680) 'total_training_steps': -1, (TaskRunner pid=2823680) 'warmup_style': None, (TaskRunner pid=2823680) 'weight_decay': 0.01, (TaskRunner pid=2823680) 'zero_indexed_step': True}, (TaskRunner pid=2823680) 'ppo_epochs': 1, (TaskRunner pid=2823680) 'ppo_max_token_len_per_gpu': 32768, (TaskRunner pid=2823680) 'ppo_micro_batch_size': None, (TaskRunner pid=2823680) 'ppo_micro_batch_size_per_gpu': None, (TaskRunner pid=2823680) 'ppo_mini_batch_size': 8, (TaskRunner pid=2823680) 'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig', (TaskRunner pid=2823680) 'all_ranks': False, (TaskRunner pid=2823680) 'enable': False, (TaskRunner pid=2823680) 'ranks': [], (TaskRunner pid=2823680) 'save_path': 'outputs/profile', (TaskRunner pid=2823680) 'tool': None, (TaskRunner pid=2823680) 'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig', (TaskRunner pid=2823680) 'analysis': True, (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False, (TaskRunner pid=2823680) 'level': 'level0'}, (TaskRunner pid=2823680) 'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig', (TaskRunner pid=2823680) 'discrete': False}, (TaskRunner pid=2823680) 'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig', (TaskRunner pid=2823680) 'contents': [], (TaskRunner pid=2823680) 'discrete': False}, (TaskRunner pid=2823680) 'torch_memory': {'_target_': 'verl.utils.profiler.config.TorchMemoryToolConfig', (TaskRunner pid=2823680) 'stack_depth': 32, (TaskRunner pid=2823680) 'trace_alloc_max_entries': 100000}}}, (TaskRunner pid=2823680) 'rollout_n': 8, (TaskRunner pid=2823680) 'shuffle': False, (TaskRunner pid=2823680) 'strategy': 'fsdp', (TaskRunner pid=2823680) 'ulysses_sequence_parallel_size': 1, (TaskRunner pid=2823680) 'use_dynamic_bsz': False}, (TaskRunner pid=2823680) 'data': {'apply_chat_template_kwargs': {}, (TaskRunner pid=2823680) 'cluster_sampler': {'acc_high_threshold': 0.6875, (TaskRunner pid=2823680) 'acc_low_threshold': 0.3125, (TaskRunner pid=2823680) 'active_clusters': 16, (TaskRunner pid=2823680) 'allow_batch_cluster_fallback': True, (TaskRunner pid=2823680) 'cluster_key': 'cluster_id', (TaskRunner pid=2823680) 'cluster_size_key': 'cluster_size', (TaskRunner pid=2823680) 'consecutive_mid_threshold': 2, (TaskRunner pid=2823680) 'frontier_advance_consecutive_mid': 16, (TaskRunner pid=2823680) 'frontier_advance_hard': 16, (TaskRunner pid=2823680) 'frontier_advance_high': 16, (TaskRunner pid=2823680) 'frontier_advance_mid': 0, (TaskRunner pid=2823680) 'log_top_k': 10, (TaskRunner pid=2823680) 'prob_snapshot_log_interval': 1, (TaskRunner pid=2823680) 'rank_key': 'rank_in_cluster', (TaskRunner pid=2823680) 'sample_id_key': 'sample_id', (TaskRunner pid=2823680) 'samples_per_cluster': 8, (TaskRunner pid=2823680) 'score_beta': 0.3, (TaskRunner pid=2823680) 'score_init': 2.0, (TaskRunner pid=2823680) 'score_max': 5.0, (TaskRunner pid=2823680) 'score_min': 1.0, (TaskRunner pid=2823680) 'seed': 42, (TaskRunner pid=2823680) 'target_score_high': 5, (TaskRunner pid=2823680) 'target_score_low': 1, (TaskRunner pid=2823680) 'target_score_mid': 3, (TaskRunner pid=2823680) 'window_size': 32}, (TaskRunner pid=2823680) 'custom_cls': {'name': None, 'path': None}, (TaskRunner pid=2823680) 'datagen': {'name': None, 'path': None}, (TaskRunner pid=2823680) 'dataloader_num_workers': 0, (TaskRunner pid=2823680) 'filter_overlong_prompts': True, (TaskRunner pid=2823680) 'filter_overlong_prompts_workers': 1, (TaskRunner pid=2823680) 'image_key': 'images', (TaskRunner pid=2823680) 'image_patch_size': 14, (TaskRunner pid=2823680) 'max_prompt_length': 2048, (TaskRunner pid=2823680) 'max_response_length': 8192, (TaskRunner pid=2823680) 'prompt_key': 'prompt', (TaskRunner pid=2823680) 'return_full_prompt': False, (TaskRunner pid=2823680) 'return_multi_modal_inputs': True, (TaskRunner pid=2823680) 'return_raw_chat': True, (TaskRunner pid=2823680) 'return_raw_input_ids': False, (TaskRunner pid=2823680) 'reward_fn_key': 'data_source', (TaskRunner pid=2823680) 'sampler': {'class_name': 'FrontierCurriculumSampler', (TaskRunner pid=2823680) 'class_path': 'pkg://verl.experimental.dataset.frontier_sampler'}, (TaskRunner pid=2823680) 'seed': None, (TaskRunner pid=2823680) 'shuffle': True, (TaskRunner pid=2823680) 'tokenizer': None, (TaskRunner pid=2823680) 'tool_config_path': None, (TaskRunner pid=2823680) 'train_batch_size': 128, (TaskRunner pid=2823680) 'train_files': '/storage/workspace/server-5/rl/jeremy/efficiency/outputs/openr1_math_46k_8192_quarter_1_5b_roadqaq_cot/phase5/train_clustered_sorted_11448.parquet', (TaskRunner pid=2823680) 'train_max_samples': -1, (TaskRunner pid=2823680) 'truncation': 'error', (TaskRunner pid=2823680) 'trust_remote_code': False, (TaskRunner pid=2823680) 'use_shm': False, (TaskRunner pid=2823680) 'val_batch_size': None, (TaskRunner pid=2823680) 'val_files': ['/storage/workspace/server-5/rl/jeremy/efficiency/dataset/aime2024/test.parquet', (TaskRunner pid=2823680) '/storage/workspace/server-5/rl/jeremy/efficiency/dataset/aime25/test.parquet', (TaskRunner pid=2823680) '/storage/workspace/server-5/rl/jeremy/efficiency/dataset/math500/test.parquet'], (TaskRunner pid=2823680) 'val_max_samples': -1, (TaskRunner pid=2823680) 'validation_shuffle': False, (TaskRunner pid=2823680) 'video_key': 'videos'}, (TaskRunner pid=2823680) 'global_profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig', (TaskRunner pid=2823680) 'global_tool_config': {'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig', (TaskRunner pid=2823680) 'controller_nsight_options': {'cuda-graph-trace': 'graph', (TaskRunner pid=2823680) 'cuda-memory-usage': 'true', (TaskRunner pid=2823680) 'trace': 'cuda,nvtx,cublas,ucx'}, (TaskRunner pid=2823680) 'discrete': False, (TaskRunner pid=2823680) 'worker_nsight_options': {'capture-range': 'cudaProfilerApi', (TaskRunner pid=2823680) 'capture-range-end': None, (TaskRunner pid=2823680) 'cuda-graph-trace': 'graph', (TaskRunner pid=2823680) 'cuda-memory-usage': 'true', (TaskRunner pid=2823680) 'kill': 'none', (TaskRunner pid=2823680) 'trace': 'cuda,nvtx,cublas,ucx'}}, (TaskRunner pid=2823680) 'torch_memory': {'context': 'all', (TaskRunner pid=2823680) 'kw_args': {}, (TaskRunner pid=2823680) 'stack_depth': 32, (TaskRunner pid=2823680) 'stacks': 'all', (TaskRunner pid=2823680) 'trace_alloc_max_entries': 100000}}, (TaskRunner pid=2823680) 'profile_continuous_steps': False, (TaskRunner pid=2823680) 'save_path': 'outputs/profile', (TaskRunner pid=2823680) 'steps': None, (TaskRunner pid=2823680) 'tool': None}, (TaskRunner pid=2823680) 'ray_kwargs': {'ray_init': {'num_cpus': None}, 'timeline_json_file': None}, (TaskRunner pid=2823680) 'reward': {'custom_reward_function': {'name': 'compute_score', (TaskRunner pid=2823680) 'path': '/storage/workspace/server-5/rl/jeremy/efficiency/verl/examples/grpo_trainer/reward_boxed_binary.py'}, (TaskRunner pid=2823680) 'num_workers': 8, (TaskRunner pid=2823680) 'reward_manager': {'_target_': 'verl.workers.config.reward_model.RewardManagerConfig', (TaskRunner pid=2823680) 'module': {'_target_': 'verl.trainer.config.config.ModuleConfig', (TaskRunner pid=2823680) 'name': 'custom_reward_manager', (TaskRunner pid=2823680) 'path': None}, (TaskRunner pid=2823680) 'name': 'naive', (TaskRunner pid=2823680) 'source': 'register'}, (TaskRunner pid=2823680) 'reward_model': {'enable': False, (TaskRunner pid=2823680) 'enable_resource_pool': False, (TaskRunner pid=2823680) 'model_path': None, (TaskRunner pid=2823680) 'n_gpus_per_node': 8, (TaskRunner pid=2823680) 'nnodes': 0, (TaskRunner pid=2823680) 'rollout': {'_target_': 'verl.workers.config.RolloutConfig', (TaskRunner pid=2823680) 'cudagraph_capture_sizes': None, (TaskRunner pid=2823680) 'data_parallel_size': 1, (TaskRunner pid=2823680) 'disable_log_stats': True, (TaskRunner pid=2823680) 'dtype': 'bfloat16', (TaskRunner pid=2823680) 'enable_chunked_prefill': True, (TaskRunner pid=2823680) 'enable_prefix_caching': True, (TaskRunner pid=2823680) 'enforce_eager': True, (TaskRunner pid=2823680) 'engine_kwargs': {}, (TaskRunner pid=2823680) 'expert_parallel_size': 1, (TaskRunner pid=2823680) 'free_cache_engine': True, (TaskRunner pid=2823680) 'gpu_memory_utilization': 0.5, (TaskRunner pid=2823680) 'limit_images': None, (TaskRunner pid=2823680) 'load_format': 'auto', (TaskRunner pid=2823680) 'max_model_len': None, (TaskRunner pid=2823680) 'max_num_batched_tokens': 8192, (TaskRunner pid=2823680) 'max_num_seqs': 1024, (TaskRunner pid=2823680) 'name': '???', (TaskRunner pid=2823680) 'prompt_length': 2048, (TaskRunner pid=2823680) 'response_length': 2048, (TaskRunner pid=2823680) 'skip_tokenizer_init': False, (TaskRunner pid=2823680) 'tensor_model_parallel_size': 2}}, (TaskRunner pid=2823680) 'sandbox_fusion': {'max_concurrent': 64, (TaskRunner pid=2823680) 'memory_limit_mb': 1024, (TaskRunner pid=2823680) 'url': None}}, (TaskRunner pid=2823680) 'trainer': {'balance_batch': True, (TaskRunner pid=2823680) 'best_ckpt_metric_key': 'val-core/aime2025/acc/best@16/mean', (TaskRunner pid=2823680) 'best_ckpt_mode': 'max', (TaskRunner pid=2823680) 'critic_warmup': 0, (TaskRunner pid=2823680) 'default_hdfs_dir': None, (TaskRunner pid=2823680) 'default_local_dir': '/storage/workspace/server-5/rl/jeremy/efficiency/verl/verl/checkpoints/efficiency_qwen2_5_math_1_5b_16k_dapo_math_grpo_quarter_frontier/efficiency_qwen2_5_math_1_5b_16k_dapo_math_grpo_quarter_frontier-20260412.112633', (TaskRunner pid=2823680) 'del_local_ckpt_after_load': False, (TaskRunner pid=2823680) 'device': 'cuda', (TaskRunner pid=2823680) 'esi_redundant_time': 0, (TaskRunner pid=2823680) 'experiment_name': 'efficiency_qwen2_5_math_1_5b_16k_dapo_math_grpo_quarter_frontier-20260412.112633', (TaskRunner pid=2823680) 'log_val_generations': 0, (TaskRunner pid=2823680) 'logger': ['console', 'wandb'], (TaskRunner pid=2823680) 'max_actor_ckpt_to_keep': None, (TaskRunner pid=2823680) 'max_critic_ckpt_to_keep': None, (TaskRunner pid=2823680) 'n_gpus_per_node': 4, (TaskRunner pid=2823680) 'nnodes': 1, (TaskRunner pid=2823680) 'project_name': 'efficiency', (TaskRunner pid=2823680) 'ray_wait_register_center_timeout': 300, (TaskRunner pid=2823680) 'resume_from_path': None, (TaskRunner pid=2823680) 'resume_mode': 'disable', (TaskRunner pid=2823680) 'rollout_data_dir': None, (TaskRunner pid=2823680) 'save_best_val_checkpoint': True, (TaskRunner pid=2823680) 'save_freq': 25, (TaskRunner pid=2823680) 'test_freq': 25, (TaskRunner pid=2823680) 'total_epochs': 10, (TaskRunner pid=2823680) 'total_training_steps': 800, (TaskRunner pid=2823680) 'use_legacy_worker_impl': 'auto', (TaskRunner pid=2823680) 'val_before_train': True, (TaskRunner pid=2823680) 'val_only': False, (TaskRunner pid=2823680) 'validation_data_dir': None}, (TaskRunner pid=2823680) 'transfer_queue': {'enable': False}} (TaskRunner pid=2823680) [validate_config] All configuration checks passed successfully! (TaskRunner pid=2823680) Using dataset class: RLHFDataset (TaskRunner pid=2823680) Generating train split: 0 examples [00:00, ? examples/s] (TaskRunner pid=2823680) Generating train split: 11448 examples [00:00, 28935.33 examples/s] (TaskRunner pid=2823680) Generating train split: 11448 examples [00:00, 12331.73 examples/s] (TaskRunner pid=2823680) dataset len: 11448 (TaskRunner pid=2823680) Setting TOKENIZERS_PARALLELISM=false for forked processes. (TaskRunner pid=2823680) WARNING:2026-04-12 11:27:21,177:Setting TOKENIZERS_PARALLELISM=false for forked processes. (TaskRunner pid=2823680) Filtering prompts longer than 2048 tokens (num_proc=1): 0%| | 0/11448 [00:00 (WorkerDict pid=2825158) [Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3 (WorkerDict pid=2825157) reference model: RoadQAQ/Qwen2.5-Math-1.5B-16k-think (WorkerDict pid=2825157) Model config after override: Qwen2Config { (WorkerDict pid=2825157) "architectures": [ (WorkerDict pid=2825157) "Qwen2ForCausalLM" (WorkerDict pid=2825157) ], (WorkerDict pid=2825157) "attention_dropout": 0.0, (WorkerDict pid=2825157) "attn_implementation": "flash_attention_2", (WorkerDict pid=2825157) "dtype": "bfloat16", (WorkerDict pid=2825157) "eos_token_id": 151643, (WorkerDict pid=2825157) "hidden_act": "silu", (WorkerDict pid=2825157) "hidden_size": 1536, (WorkerDict pid=2825157) "initializer_range": 0.02, (WorkerDict pid=2825157) "intermediate_size": 8960, (WorkerDict pid=2825157) "layer_types": [ (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention" (WorkerDict pid=2825157) ], (WorkerDict pid=2825157) "max_position_embeddings": 16384, (WorkerDict pid=2825157) "max_window_layers": 21, (WorkerDict pid=2825157) "model_type": "qwen2", (WorkerDict pid=2825157) "num_attention_heads": 12, (WorkerDict pid=2825157) "num_hidden_layers": 28, (WorkerDict pid=2825157) "num_key_value_heads": 2, (WorkerDict pid=2825157) "pad_token_id": 151643, (WorkerDict pid=2825157) "rms_norm_eps": 1e-06, (WorkerDict pid=2825157) "rope_scaling": null, (WorkerDict pid=2825157) "rope_theta": 40000, (WorkerDict pid=2825157) "sliding_window": null, (WorkerDict pid=2825157) "tie_word_embeddings": true, (WorkerDict pid=2825157) "transformers_version": "4.57.6", (WorkerDict pid=2825157) "use_cache": true, (WorkerDict pid=2825157) "use_mrope": false, (WorkerDict pid=2825157) "use_sliding_window": false, (WorkerDict pid=2825157) "vocab_size": 151936 (WorkerDict pid=2825157) } (WorkerDict pid=2825157) (WorkerDict pid=2825160) `torch_dtype` is deprecated! Use `dtype` instead! (WorkerDict pid=2825160) Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)` (WorkerDict pid=2825158) Monkey patch _flash_attention_forward in transformers.integrations.flash_attention (WorkerDict pid=2825158) Skipping monkey patch for Qwen2ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch (WorkerDict pid=2825159) [Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3 [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.) (WorkerDict pid=2825157) Qwen2ForCausalLM contains 1.54B parameters (WorkerDict pid=2825157) wrap_policy: functools.partial(, policies=[functools.partial(, transformer_layer_cls={})]) (WorkerDict pid=2825157) NCCL version 2.27.5+cuda12.9 (WorkerDict pid=2825157) Ref use_remove_padding=True (WorkerDict pid=2825157) Ref use_fused_kernels=False (WorkerDict pid=2825157) Ref use_prefix_grouper=False (WorkerDict pid=2825159) Monkey patch _flash_attention_forward in transformers.integrations.flash_attention [repeated 3x across cluster] (WorkerDict pid=2825159) Skipping monkey patch for Qwen2ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch [repeated 3x across cluster] (WorkerDict pid=2825157) Model config after override: Qwen2Config { (WorkerDict pid=2825157) "architectures": [ (WorkerDict pid=2825157) "Qwen2ForCausalLM" (WorkerDict pid=2825157) ], (WorkerDict pid=2825157) "attention_dropout": 0.0, (WorkerDict pid=2825157) "attn_implementation": "flash_attention_2", (WorkerDict pid=2825157) "dtype": "bfloat16", (WorkerDict pid=2825157) "eos_token_id": 151643, (WorkerDict pid=2825157) "hidden_act": "silu", (WorkerDict pid=2825157) "hidden_size": 1536, (WorkerDict pid=2825157) "initializer_range": 0.02, (WorkerDict pid=2825157) "intermediate_size": 8960, (WorkerDict pid=2825157) "layer_types": [ (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention", (WorkerDict pid=2825157) "full_attention" (WorkerDict pid=2825157) ], (WorkerDict pid=2825157) "max_position_embeddings": 16384, (WorkerDict pid=2825157) "max_window_layers": 21, (WorkerDict pid=2825157) "model_type": "qwen2", (WorkerDict pid=2825157) "num_attention_heads": 12, (WorkerDict pid=2825157) "num_hidden_layers": 28, (WorkerDict pid=2825157) "num_key_value_heads": 2, (WorkerDict pid=2825157) "pad_token_id": 151643, (WorkerDict pid=2825157) "rms_norm_eps": 1e-06, (WorkerDict pid=2825157) "rope_scaling": null, (WorkerDict pid=2825157) "rope_theta": 40000, (WorkerDict pid=2825157) "sliding_window": null, (WorkerDict pid=2825157) "tie_word_embeddings": true, (WorkerDict pid=2825157) "transformers_version": "4.57.6", (WorkerDict pid=2825157) "use_cache": true, (WorkerDict pid=2825157) "use_mrope": false, (WorkerDict pid=2825157) "use_sliding_window": false, (WorkerDict pid=2825157) "vocab_size": 151936 (WorkerDict pid=2825157) } (WorkerDict pid=2825157) (WorkerDict pid=2825158) Monkey patch _flash_attention_forward in transformers.integrations.flash_attention (WorkerDict pid=2825158) Skipping monkey patch for Qwen2ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch (WorkerDict pid=2825160) Monkey patch _flash_attention_forward in transformers.integrations.flash_attention (WorkerDict pid=2825160) Skipping monkey patch for Qwen2ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch (WorkerDict pid=2825157) Qwen2ForCausalLM contains 1.54B parameters (WorkerDict pid=2825157) wrap_policy: functools.partial(, policies=[functools.partial(, transformer_layer_cls={})]) (WorkerDict pid=2825157) Total steps: 800, num_warmup_steps: 0 (WorkerDict pid=2825157) Actor use_remove_padding=True (WorkerDict pid=2825157) Actor use_fused_kernels=False (WorkerDict pid=2825157) Actor use_prefix_grouper=False (WorkerDict pid=2825160) [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 (WorkerDict pid=2825160) [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 (WorkerDict pid=2825160) /storage/workspace/server-5/rl/jeremy/efficiency/verl/verl/workers/fsdp_workers.py:727: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . (WorkerDict pid=2825160) FSDP.set_state_dict_type( (WorkerDict pid=2825159) `torch_dtype` is deprecated! Use `dtype` instead! [repeated 3x across cluster] (WorkerDict pid=2825159) Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)` [repeated 7x across cluster] (TaskRunner pid=2823680) W0412 11:28:11.798000 2823680 site-packages/torch/utils/cpp_extension.py:118] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' (vLLMHttpServer pid=2827322) ['serve', (vLLMHttpServer pid=2827322) 'RoadQAQ/Qwen2.5-Math-1.5B-16k-think', (vLLMHttpServer pid=2827322) '--dtype', (vLLMHttpServer pid=2827322) 'bfloat16', (vLLMHttpServer pid=2827322) '--load_format', (vLLMHttpServer pid=2827322) 'dummy', (vLLMHttpServer pid=2827322) '--distributed_executor_backend', (vLLMHttpServer pid=2827322) 'uni', (vLLMHttpServer pid=2827322) '--worker_extension_cls', (vLLMHttpServer pid=2827322) 'verl.workers.rollout.vllm_rollout.utils.vLLMColocateWorkerExtension', (vLLMHttpServer pid=2827322) '--max_model_len', (vLLMHttpServer pid=2827322) '16384', (vLLMHttpServer pid=2827322) '--max_num_seqs', (vLLMHttpServer pid=2827322) '1024', (vLLMHttpServer pid=2827322) '--enable_chunked_prefill', (vLLMHttpServer pid=2827322) '--max_num_batched_tokens', (vLLMHttpServer pid=2827322) '8192', (vLLMHttpServer pid=2827322) '--enable_prefix_caching', (vLLMHttpServer pid=2827322) '--enable_sleep_mode', (vLLMHttpServer pid=2827322) '--logprobs_mode', (vLLMHttpServer pid=2827322) 'processed_logprobs', (vLLMHttpServer pid=2827322) '--gpu_memory_utilization', (vLLMHttpServer pid=2827322) '0.6', (vLLMHttpServer pid=2827322) '--disable_log_stats', (vLLMHttpServer pid=2827322) '--tensor_parallel_size', (vLLMHttpServer pid=2827322) '1', (vLLMHttpServer pid=2827322) '--seed', (vLLMHttpServer pid=2827322) '0', (vLLMHttpServer pid=2827322) '--override_generation_config', (vLLMHttpServer pid=2827322) '{"temperature": 1.0, "top_k": -1, "top_p": 1, "repetition_penalty": 1.0, ' (vLLMHttpServer pid=2827322) '"max_new_tokens": 8192}', (vLLMHttpServer pid=2827322) '--hf_overrides', (vLLMHttpServer pid=2827322) '{}', (vLLMHttpServer pid=2827322) '--scheduling_policy', (vLLMHttpServer pid=2827322) 'fcfs', (vLLMHttpServer pid=2827322) '--compilation_config', (vLLMHttpServer pid=2827322) '{"cudagraph_mode": "FULL_AND_PIECEWISE"}'] (WorkerDict pid=2825159) Monkey patch _flash_attention_forward in transformers.integrations.flash_attention [repeated 2x across cluster] (WorkerDict pid=2825159) Skipping monkey patch for Qwen2ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch [repeated 2x across cluster] (WorkerDict pid=2825157) [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [repeated 6x across cluster] (vLLMHttpServer pid=2827322) WARNING 04-12 11:28:34 [system_utils.py:152] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reasons: In a Ray actor and can only be spawned; CUDA is initialized (vLLMHttpServer pid=2827320) (EngineCore_DP0 pid=2828264) :1184: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (vLLMHttpServer pid=2827320) (EngineCore_DP0 pid=2828264) :1184: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (WorkerDict pid=2825159) /storage/workspace/server-5/rl/jeremy/efficiency/verl/verl/workers/fsdp_workers.py:727: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 3x across cluster] (WorkerDict pid=2825159) FSDP.set_state_dict_type( [repeated 3x across cluster] (vLLMHttpServer pid=2827321) (EngineCore_DP0 pid=2828243) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00:1184: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. [repeated 3x across cluster] (vLLMHttpServer pid=2827322) (EngineCore_DP0 pid=2828236) :1184: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. [repeated 3x across cluster] (vLLMHttpServer pid=2827321) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 2%|▏ | 1/51 [00:00<00:06, 7.63it/s] (vLLMHttpServer pid=2827321) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 4%|▍ | 2/51 [00:00<00:11, 4.17it/s] (vLLMHttpServer pid=2827321) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 6%|▌ | 3/51 [00:00<00:14, 3.36it/s] (vLLMHttpServer pid=2827321) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 94%|█████████▍| 48/51 [00:04<00:00, 6.58it/s] (vLLMHttpServer pid=2827321) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:04<00:00, 9.00it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:04<00:00, 11.71it/s] (vLLMHttpServer pid=2827321) (EngineCore_DP0 pid=2828243) Capturing CUDA graphs (decode, FULL): 0%| | 0/51 [00:00