INFO 10-26 08:02:52 [__init__.py:235] Automatically detected platform cuda. [2025-10-26 08:02:53,703] [ INFO]: --- INIT SEEDS --- (pipeline.py:249) [2025-10-26 08:02:53,704] [ INFO]: --- LOADING TASKS --- (pipeline.py:210) [2025-10-26 08:02:53,707] [ WARNING]: Careful, the task math_500 is using evaluation data to build the few shot examples. (lighteval_task.py:269) [2025-10-26 08:02:58,277] [ INFO]: --- LOADING MODEL --- (pipeline.py:177) `torch_dtype` is deprecated! Use `dtype` instead! [2025-10-26 08:03:06,028] [ INFO]: Using max model len 32768 (config.py:1604) [2025-10-26 08:03:06,860] [ INFO]: Chunked prefill is enabled with max_num_batched_tokens=2048. (config.py:2434) INFO 10-26 08:03:11 [__init__.py:235] Automatically detected platform cuda. INFO 10-26 08:03:13 [core.py:572] Waiting for init message from front-end. INFO 10-26 08:03:13 [core.py:71] Initializing a V1 LLM engine (v0.10.0) with config: model='/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562', speculative_config=None, tokenizer='/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562', skip_tokenizer_init=False, tokenizer_mode=auto, revision=main, override_neuron_config={}, tokenizer_revision=main, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1234, served_model_name=/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null} INFO 10-26 08:03:17 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 WARNING 10-26 08:03:17 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. INFO 10-26 08:03:17 [gpu_model_runner.py:1843] Starting to load model /mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562... INFO 10-26 08:03:18 [gpu_model_runner.py:1875] Loading model from scratch... INFO 10-26 08:03:18 [cuda.py:290] Using Flash Attention backend on V1 engine. Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00