INFO 10-26 08:02:52 [__init__.py:235] Automatically detected platform cuda. [2025-10-26 08:02:53,805] [ INFO]: --- INIT SEEDS --- (pipeline.py:249) [2025-10-26 08:02:53,806] [ INFO]: --- LOADING TASKS --- (pipeline.py:210) [2025-10-26 08:02:53,808] [ INFO]: Found 1 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/ifeval/main.py (registry.py:260) [2025-10-26 08:02:53,809] [ INFO]: Found 2 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/ifbench/main.py (registry.py:260) [2025-10-26 08:02:53,810] [ INFO]: Found 6 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/tiny_benchmarks/main.py (registry.py:260) [2025-10-26 08:02:53,811] [ INFO]: Found 1 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/mt_bench/main.py (registry.py:260) [2025-10-26 08:02:53,812] [ INFO]: Found 4 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/mix_eval/main.py (registry.py:260) [2025-10-26 08:02:53,813] [ INFO]: Found 5 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/olympiade_bench/main.py (registry.py:260) [2025-10-26 08:02:53,814] [ INFO]: Found 1 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/hle/main.py (registry.py:260) [2025-10-26 08:02:53,815] [ INFO]: Found 23 custom tasks in /mnt/public/wucanhui/lighteval/src/lighteval/tasks/extended/lcb/main.py (registry.py:260) [2025-10-26 08:02:53,817] [ WARNING]: Careful, the task lcb:codegeneration_v6 is using evaluation data to build the few shot examples. (lighteval_task.py:269) [2025-10-26 08:03:00,696] [ INFO]: --- LOADING MODEL --- (pipeline.py:177) `torch_dtype` is deprecated! Use `dtype` instead! [2025-10-26 08:03:06,813] [ INFO]: Using max model len 32768 (config.py:1604) [2025-10-26 08:03:07,320] [ INFO]: Chunked prefill is enabled with max_num_batched_tokens=2048. (config.py:2434) INFO 10-26 08:03:11 [__init__.py:235] Automatically detected platform cuda. INFO 10-26 08:03:13 [core.py:572] Waiting for init message from front-end. INFO 10-26 08:03:13 [core.py:71] Initializing a V1 LLM engine (v0.10.0) with config: model='/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562', speculative_config=None, tokenizer='/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562', skip_tokenizer_init=False, tokenizer_mode=auto, revision=main, override_neuron_config={}, tokenizer_revision=main, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1234, served_model_name=/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null} INFO 10-26 08:03:17 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 WARNING 10-26 08:03:17 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. INFO 10-26 08:03:17 [gpu_model_runner.py:1843] Starting to load model /mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562... INFO 10-26 08:03:17 [gpu_model_runner.py:1875] Loading model from scratch... INFO 10-26 08:03:17 [cuda.py:290] Using Flash Attention backend on V1 engine. Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00