| INFO 10-26 08:02:52 [__init__.py:235] Automatically detected platform cuda. | |
| [2025-10-26 08:02:53,753] [[32m INFO[0m]: --- INIT SEEDS --- (pipeline.py:249)[0m | |
| [2025-10-26 08:02:53,754] [[32m INFO[0m]: --- LOADING TASKS --- (pipeline.py:210)[0m | |
| [2025-10-26 08:02:53,757] [[33m WARNING[0m]: Careful, the task gpqa:diamond is using evaluation data to build the few shot examples. (lighteval_task.py:269)[0m | |
| [2025-10-26 08:03:00,271] [[32m INFO[0m]: --- LOADING MODEL --- (pipeline.py:177)[0m | |
| `torch_dtype` is deprecated! Use `dtype` instead! | |
| [2025-10-26 08:03:06,633] [[32m INFO[0m]: Using max model len 32768 (config.py:1604)[0m | |
| [2025-10-26 08:03:07,276] [[32m INFO[0m]: Chunked prefill is enabled with max_num_batched_tokens=2048. (config.py:2434)[0m | |
| INFO 10-26 08:03:11 [__init__.py:235] Automatically detected platform cuda. | |
| INFO 10-26 08:03:13 [core.py:572] Waiting for init message from front-end. | |
| INFO 10-26 08:03:13 [core.py:71] Initializing a V1 LLM engine (v0.10.0) with config: model='/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562', speculative_config=None, tokenizer='/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562', skip_tokenizer_init=False, tokenizer_mode=auto, revision=main, override_neuron_config={}, tokenizer_revision=main, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1234, served_model_name=/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null} | |
| INFO 10-26 08:03:17 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
| WARNING 10-26 08:03:17 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. | |
| INFO 10-26 08:03:17 [gpu_model_runner.py:1843] Starting to load model /mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562... | |
| INFO 10-26 08:03:18 [gpu_model_runner.py:1875] Loading model from scratch... | |
| INFO 10-26 08:03:18 [cuda.py:290] Using Flash Attention backend on V1 engine. | |
| Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s] | |
| Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:30<00:30, 30.81s/it] | |
| Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:52<00:00, 25.24s/it] | |
| Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:52<00:00, 26.07s/it] | |
| INFO 10-26 08:04:11 [default_loader.py:262] Loading weights took 52.58 seconds | |
| INFO 10-26 08:04:11 [gpu_model_runner.py:1892] Model loading took 7.5552 GiB and 53.175289 seconds | |
| INFO 10-26 08:04:12 [gpu_worker.py:255] Available KV cache memory: 117.60 GiB | |
| INFO 10-26 08:04:12 [kv_cache_utils.py:833] GPU KV cache size: 856,336 tokens | |
| INFO 10-26 08:04:12 [kv_cache_utils.py:837] Maximum concurrency for 32,768 tokens per request: 26.13x | |
| INFO 10-26 08:04:13 [core.py:193] init engine (profile, create kv cache, warmup model) took 1.35 seconds | |
| [2025-10-26 08:04:13,641] [[32m INFO[0m]: [CACHING] Initializing data cache (cache_management.py:105)[0m | |
| [2025-10-26 08:04:13,649] [[32m INFO[0m]: --- RUNNING MODEL --- (pipeline.py:330)[0m | |
| [2025-10-26 08:04:13,650] [[32m INFO[0m]: Running SamplingMethod.GENERATIVE requests (pipeline.py:313)[0m | |
| [2025-10-26 08:04:19,459] [[32m INFO[0m]: Cache: Starting to process 198/198 samples (not found in cache) for tasks lighteval|gpqa:diamond|0 (cd454966f6c36e03, GENERATIVE) (cache_management.py:399)[0m | |
| [2025-10-26 08:04:19,461] [[33m WARNING[0m]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:206)[0m | |
| Splits: 0%| | 0/1 [00:00<?, ?it/s][2025-10-26 08:04:19,562] [[33m WARNING[0m]: context_size + max_new_tokens=35595 which is greater than self.max_length=32768. Truncating context to 0 tokens. (vllm_model.py:367)[0m | |
| Adding requests: 0%| | 0/198 [00:00<?, ?it/s][A Adding requests: 100%|ββββββββββ| 198/198 [00:00<00:00, 14780.77it/s] | |
| Processed prompts: 0%| | 0/198 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A | |
| Processed prompts: 1%| | 1/198 [00:04<14:14, 4.34s/it, est. speed input: 49.11 toks/s, output: 55.79 toks/s][A | |
| Processed prompts: 1%| | 2/198 [00:04<06:07, 1.87s/it, est. speed input: 96.51 toks/s, output: 111.44 toks/s][A | |
| Processed prompts: 2%|β | 3/198 [00:05<04:39, 1.43s/it, est. speed input: 105.96 toks/s, output: 151.15 toks/s][A | |
| Processed prompts: 4%|β | 7/198 [00:05<01:21, 2.34it/s, est. speed input: 272.77 toks/s, output: 382.30 toks/s][A | |
| Processed prompts: 5%|β | 9/198 [00:05<00:59, 3.18it/s, est. speed input: 332.27 toks/s, output: 487.53 toks/s][A | |
| Processed prompts: 6%|β | 11/198 [00:05<00:46, 4.02it/s, est. speed input: 422.78 toks/s, output: 590.19 toks/s][A | |
| Processed prompts: 7%|β | 13/198 [00:06<00:43, 4.27it/s, est. speed input: 457.06 toks/s, output: 669.98 toks/s][A | |
| Processed prompts: 7%|β | 14/198 [00:06<00:39, 4.60it/s, est. speed input: 483.34 toks/s, output: 716.18 toks/s][A | |
| Processed prompts: 8%|β | 15/198 [00:07<00:57, 3.21it/s, est. speed input: 462.59 toks/s, output: 709.27 toks/s][A | |
| Processed prompts: 8%|β | 16/198 [00:07<00:49, 3.69it/s, est. speed input: 565.37 toks/s, output: 759.02 toks/s][A | |
| Processed prompts: 9%|β | 17/198 [00:07<00:47, 3.79it/s, est. speed input: 578.76 toks/s, output: 796.59 toks/s][A | |
| Processed prompts: 9%|β | 18/198 [00:07<00:42, 4.21it/s, est. speed input: 599.94 toks/s, output: 841.74 toks/s][A | |
| Processed prompts: 10%|β | 19/198 [00:07<00:36, 4.93it/s, est. speed input: 617.73 toks/s, output: 891.80 toks/s][A | |
| Processed prompts: 10%|β | 20/198 [00:08<00:53, 3.34it/s, est. speed input: 605.67 toks/s, output: 894.24 toks/s][A | |
| Processed prompts: 11%|β | 21/198 [00:08<00:53, 3.31it/s, est. speed input: 617.19 toks/s, output: 924.15 toks/s][A | |
| Processed prompts: 11%|β | 22/198 [00:08<00:50, 3.48it/s, est. speed input: 629.00 toks/s, output: 960.05 toks/s][A | |
| Processed prompts: 12%|ββ | 23/198 [00:10<01:34, 1.85it/s, est. speed input: 578.91 toks/s, output: 912.54 toks/s][A | |
| Processed prompts: 13%|ββ | 25/198 [00:10<01:05, 2.64it/s, est. speed input: 620.69 toks/s, output: 1002.76 toks/s][A | |
| Processed prompts: 14%|ββ | 27/198 [00:10<00:50, 3.39it/s, est. speed input: 654.55 toks/s, output: 1094.40 toks/s][A | |
| Processed prompts: 15%|ββ | 30/198 [00:10<00:31, 5.40it/s, est. speed input: 747.08 toks/s, output: 1263.64 toks/s][A | |
| Processed prompts: 16%|ββ | 32/198 [00:11<00:24, 6.88it/s, est. speed input: 773.07 toks/s, output: 1342.85 toks/s][A | |
| Processed prompts: 17%|ββ | 34/198 [00:11<00:20, 7.99it/s, est. speed input: 816.04 toks/s, output: 1448.44 toks/s][A | |
| Processed prompts: 18%|ββ | 36/198 [00:11<00:18, 8.85it/s, est. speed input: 840.03 toks/s, output: 1550.12 toks/s][A | |
| Processed prompts: 19%|ββ | 38/198 [00:11<00:19, 8.17it/s, est. speed input: 893.79 toks/s, output: 1636.07 toks/s][A | |
| Processed prompts: 20%|ββ | 40/198 [00:11<00:20, 7.87it/s, est. speed input: 993.76 toks/s, output: 1724.64 toks/s][A | |
| Processed prompts: 21%|ββ | 42/198 [00:12<00:22, 6.82it/s, est. speed input: 1006.86 toks/s, output: 1794.18 toks/s][A | |
| Processed prompts: 23%|βββ | 46/198 [00:12<00:14, 10.42it/s, est. speed input: 1081.84 toks/s, output: 1993.65 toks/s][A | |
| Processed prompts: 24%|βββ | 48/198 [00:12<00:16, 9.07it/s, est. speed input: 1113.51 toks/s, output: 2070.03 toks/s][A | |
| Processed prompts: 25%|βββ | 50/198 [00:13<00:16, 8.81it/s, est. speed input: 1140.91 toks/s, output: 2156.40 toks/s][A | |
| Processed prompts: 26%|βββ | 52/198 [00:13<00:19, 7.43it/s, est. speed input: 1142.02 toks/s, output: 2192.37 toks/s][A | |
| Processed prompts: 27%|βββ | 54/198 [00:14<00:25, 5.56it/s, est. speed input: 1148.01 toks/s, output: 2222.83 toks/s][A | |
| Processed prompts: 28%|βββ | 56/198 [00:14<00:25, 5.57it/s, est. speed input: 1163.71 toks/s, output: 2292.22 toks/s][A | |
| Processed prompts: 29%|βββ | 57/198 [00:14<00:27, 5.17it/s, est. speed input: 1165.49 toks/s, output: 2313.27 toks/s][A | |
| Processed prompts: 30%|βββ | 59/198 [00:14<00:22, 6.10it/s, est. speed input: 1209.92 toks/s, output: 2406.87 toks/s][A | |
| Processed prompts: 30%|βββ | 60/198 [00:15<00:26, 5.14it/s, est. speed input: 1204.03 toks/s, output: 2417.93 toks/s][A | |
| Processed prompts: 31%|ββββ | 62/198 [00:15<00:22, 5.92it/s, est. speed input: 1228.17 toks/s, output: 2504.77 toks/s][A | |
| Processed prompts: 32%|ββββ | 64/198 [00:15<00:20, 6.64it/s, est. speed input: 1256.62 toks/s, output: 2593.51 toks/s][A | |
| Processed prompts: 33%|ββββ | 65/198 [00:15<00:22, 5.92it/s, est. speed input: 1254.45 toks/s, output: 2586.31 toks/s][A | |
| Processed prompts: 34%|ββββ | 67/198 [00:16<00:18, 7.12it/s, est. speed input: 1286.11 toks/s, output: 2683.52 toks/s][A | |
| Processed prompts: 34%|ββββ | 68/198 [00:16<00:25, 5.07it/s, est. speed input: 1288.49 toks/s, output: 2678.29 toks/s][A | |
| Processed prompts: 36%|ββββ | 71/198 [00:16<00:16, 7.74it/s, est. speed input: 1329.31 toks/s, output: 2822.87 toks/s][A | |
| Processed prompts: 37%|ββββ | 73/198 [00:17<00:19, 6.40it/s, est. speed input: 1324.21 toks/s, output: 2834.80 toks/s][A | |
| Processed prompts: 37%|ββββ | 74/198 [00:17<00:18, 6.57it/s, est. speed input: 1331.69 toks/s, output: 2875.72 toks/s][A | |
| Processed prompts: 38%|ββββ | 75/198 [00:17<00:17, 6.87it/s, est. speed input: 1334.10 toks/s, output: 2918.51 toks/s][A | |
| Processed prompts: 38%|ββββ | 76/198 [00:17<00:26, 4.63it/s, est. speed input: 1307.74 toks/s, output: 2860.04 toks/s][A | |
| Processed prompts: 39%|ββββ | 77/198 [00:18<00:29, 4.08it/s, est. speed input: 1298.10 toks/s, output: 2852.18 toks/s][A | |
| Processed prompts: 39%|ββββ | 78/198 [00:18<00:25, 4.71it/s, est. speed input: 1304.17 toks/s, output: 2878.15 toks/s][A | |
| Processed prompts: 40%|ββββ | 79/198 [00:18<00:31, 3.83it/s, est. speed input: 1298.55 toks/s, output: 2881.28 toks/s][A | |
| Processed prompts: 40%|ββββ | 80/198 [00:19<00:40, 2.94it/s, est. speed input: 1274.53 toks/s, output: 2846.22 toks/s][A | |
| Processed prompts: 41%|ββββ | 81/198 [00:19<00:50, 2.31it/s, est. speed input: 1243.60 toks/s, output: 2796.53 toks/s][A | |
| Processed prompts: 42%|βββββ | 83/198 [00:20<00:36, 3.18it/s, est. speed input: 1265.71 toks/s, output: 2877.40 toks/s][A | |
| Processed prompts: 42%|βββββ | 84/198 [00:20<00:30, 3.79it/s, est. speed input: 1269.51 toks/s, output: 2900.81 toks/s][A | |
| Processed prompts: 43%|βββββ | 85/198 [00:20<00:30, 3.76it/s, est. speed input: 1263.91 toks/s, output: 2926.20 toks/s][A | |
| Processed prompts: 45%|βββββ | 89/198 [00:20<00:13, 7.86it/s, est. speed input: 1293.98 toks/s, output: 3030.93 toks/s][A | |
| Processed prompts: 46%|βββββ | 91/198 [00:21<00:16, 6.51it/s, est. speed input: 1288.31 toks/s, output: 3063.37 toks/s][A | |
| Processed prompts: 47%|βββββ | 93/198 [00:21<00:18, 5.54it/s, est. speed input: 1281.48 toks/s, output: 3080.15 toks/s][A | |
| Processed prompts: 48%|βββββ | 95/198 [00:21<00:17, 5.90it/s, est. speed input: 1280.79 toks/s, output: 3089.64 toks/s][A | |
| Processed prompts: 49%|βββββ | 97/198 [00:22<00:14, 7.03it/s, est. speed input: 1287.78 toks/s, output: 3139.40 toks/s][A | |
| Processed prompts: 49%|βββββ | 98/198 [00:22<00:16, 6.22it/s, est. speed input: 1293.76 toks/s, output: 3168.83 toks/s][A | |
| Processed prompts: 50%|βββββ | 99/198 [00:22<00:21, 4.54it/s, est. speed input: 1281.66 toks/s, output: 3169.18 toks/s][A | |
| Processed prompts: 52%|ββββββ | 102/198 [00:22<00:14, 6.83it/s, est. speed input: 1297.13 toks/s, output: 3283.92 toks/s][A | |
| Processed prompts: 52%|ββββββ | 103/198 [00:23<00:18, 5.26it/s, est. speed input: 1287.00 toks/s, output: 3293.20 toks/s][A | |
| Processed prompts: 53%|ββββββ | 105/198 [00:23<00:21, 4.41it/s, est. speed input: 1270.06 toks/s, output: 3294.28 toks/s][A | |
| Processed prompts: 54%|ββββββ | 106/198 [00:24<00:19, 4.81it/s, est. speed input: 1273.73 toks/s, output: 3340.82 toks/s][A | |
| Processed prompts: 54%|ββββββ | 107/198 [00:24<00:23, 3.88it/s, est. speed input: 1267.14 toks/s, output: 3346.09 toks/s][A | |
| Processed prompts: 55%|ββββββ | 109/198 [00:24<00:17, 5.19it/s, est. speed input: 1279.38 toks/s, output: 3450.72 toks/s][A | |
| Processed prompts: 56%|ββββββ | 111/198 [00:25<00:16, 5.18it/s, est. speed input: 1273.63 toks/s, output: 3487.43 toks/s][A | |
| Processed prompts: 57%|ββββββ | 112/198 [00:25<00:17, 4.88it/s, est. speed input: 1267.40 toks/s, output: 3486.67 toks/s][A | |
| Processed prompts: 58%|ββββββ | 114/198 [00:25<00:12, 6.68it/s, est. speed input: 1276.72 toks/s, output: 3558.72 toks/s][A | |
| Processed prompts: 58%|ββββββ | 115/198 [00:25<00:12, 6.51it/s, est. speed input: 1278.41 toks/s, output: 3599.78 toks/s][A | |
| Processed prompts: 59%|ββββββ | 116/198 [00:26<00:25, 3.28it/s, est. speed input: 1244.65 toks/s, output: 3518.60 toks/s][A | |
| Processed prompts: 59%|ββββββ | 117/198 [00:26<00:25, 3.21it/s, est. speed input: 1236.25 toks/s, output: 3511.89 toks/s][A | |
| Processed prompts: 60%|ββββββ | 118/198 [00:27<00:37, 2.15it/s, est. speed input: 1203.39 toks/s, output: 3453.06 toks/s][A | |
| Processed prompts: 60%|ββββββ | 119/198 [00:29<01:06, 1.19it/s, est. speed input: 1134.20 toks/s, output: 3274.82 toks/s][A | |
| Processed prompts: 61%|ββββββ | 121/198 [00:29<00:40, 1.90it/s, est. speed input: 1137.73 toks/s, output: 3348.52 toks/s][A | |
| Processed prompts: 62%|βββββββ | 122/198 [00:30<00:43, 1.75it/s, est. speed input: 1115.86 toks/s, output: 3302.50 toks/s][A | |
| Processed prompts: 62%|βββββββ | 123/198 [00:30<00:34, 2.18it/s, est. speed input: 1116.14 toks/s, output: 3324.28 toks/s][A | |
| Processed prompts: 63%|βββββββ | 124/198 [00:31<00:37, 2.00it/s, est. speed input: 1102.25 toks/s, output: 3324.86 toks/s][A | |
| Processed prompts: 63%|βββββββ | 125/198 [00:31<00:39, 1.87it/s, est. speed input: 1091.62 toks/s, output: 3325.71 toks/s][A | |
| Processed prompts: 64%|βββββββ | 127/198 [00:32<00:32, 2.19it/s, est. speed input: 1081.11 toks/s, output: 3345.98 toks/s][A | |
| Processed prompts: 65%|βββββββ | 128/198 [00:32<00:30, 2.27it/s, est. speed input: 1078.32 toks/s, output: 3372.93 toks/s][A | |
| Processed prompts: 65%|βββββββ | 129/198 [00:34<00:48, 1.42it/s, est. speed input: 1039.36 toks/s, output: 3294.16 toks/s][A | |
| Processed prompts: 66%|βββββββ | 130/198 [00:35<00:46, 1.47it/s, est. speed input: 1026.28 toks/s, output: 3277.33 toks/s][A | |
| Processed prompts: 66%|βββββββ | 131/198 [00:36<00:51, 1.31it/s, est. speed input: 1006.22 toks/s, output: 3255.03 toks/s][A | |
| Processed prompts: 67%|βββββββ | 132/198 [00:36<00:43, 1.53it/s, est. speed input: 1003.48 toks/s, output: 3287.93 toks/s][A | |
| Processed prompts: 67%|βββββββ | 133/198 [00:37<00:41, 1.55it/s, est. speed input: 993.76 toks/s, output: 3300.07 toks/s] [A | |
| Processed prompts: 68%|βββββββ | 134/198 [00:37<00:43, 1.48it/s, est. speed input: 983.46 toks/s, output: 3301.39 toks/s][A | |
| Processed prompts: 68%|βββββββ | 135/198 [00:38<00:38, 1.63it/s, est. speed input: 981.06 toks/s, output: 3327.89 toks/s][A | |
| Processed prompts: 69%|βββββββ | 137/198 [00:38<00:28, 2.12it/s, est. speed input: 975.17 toks/s, output: 3366.98 toks/s][A | |
| Processed prompts: 70%|βββββββ | 138/198 [00:39<00:24, 2.43it/s, est. speed input: 974.96 toks/s, output: 3414.96 toks/s][A | |
| Processed prompts: 70%|βββββββ | 139/198 [00:39<00:30, 1.94it/s, est. speed input: 962.24 toks/s, output: 3412.48 toks/s][A | |
| Processed prompts: 71%|βββββββ | 140/198 [00:40<00:28, 2.03it/s, est. speed input: 958.02 toks/s, output: 3443.46 toks/s][A | |
| Processed prompts: 71%|βββββββ | 141/198 [00:41<00:33, 1.68it/s, est. speed input: 946.21 toks/s, output: 3438.58 toks/s][A | |
| Processed prompts: 72%|ββββββββ | 142/198 [00:41<00:33, 1.65it/s, est. speed input: 935.92 toks/s, output: 3436.02 toks/s][A | |
| Processed prompts: 72%|ββββββββ | 143/198 [00:42<00:27, 2.02it/s, est. speed input: 934.67 toks/s, output: 3472.42 toks/s][A | |
| Processed prompts: 73%|ββββββββ | 145/198 [00:42<00:15, 3.34it/s, est. speed input: 942.39 toks/s, output: 3578.35 toks/s][A | |
| Processed prompts: 74%|ββββββββ | 147/198 [00:42<00:11, 4.62it/s, est. speed input: 948.25 toks/s, output: 3679.33 toks/s][A | |
| Processed prompts: 75%|ββββββββ | 148/198 [00:46<00:58, 1.17s/it, est. speed input: 861.22 toks/s, output: 3389.12 toks/s][A | |
| Processed prompts: 75%|ββββββββ | 149/198 [00:48<01:02, 1.27s/it, est. speed input: 839.64 toks/s, output: 3345.07 toks/s][A | |
| Processed prompts: 76%|ββββββββ | 150/198 [00:49<00:52, 1.10s/it, est. speed input: 835.95 toks/s, output: 3374.36 toks/s][A | |
| Processed prompts: 76%|ββββββββ | 151/198 [00:50<00:52, 1.11s/it, est. speed input: 822.89 toks/s, output: 3365.56 toks/s][A | |
| Processed prompts: 77%|ββββββββ | 153/198 [00:51<00:40, 1.11it/s, est. speed input: 811.38 toks/s, output: 3400.47 toks/s][A | |
| Processed prompts: 78%|ββββββββ | 155/198 [00:51<00:27, 1.58it/s, est. speed input: 815.39 toks/s, output: 3498.80 toks/s][A | |
| Processed prompts: 79%|ββββββββ | 156/198 [00:52<00:23, 1.79it/s, est. speed input: 815.40 toks/s, output: 3547.59 toks/s][A | |
| Processed prompts: 79%|ββββββββ | 157/198 [00:54<00:40, 1.01it/s, est. speed input: 782.71 toks/s, output: 3449.67 toks/s][A | |
| Processed prompts: 80%|ββββββββ | 158/198 [00:55<00:35, 1.12it/s, est. speed input: 778.23 toks/s, output: 3481.79 toks/s][A | |
| Processed prompts: 80%|ββββββββ | 159/198 [00:56<00:35, 1.10it/s, est. speed input: 776.61 toks/s, output: 3492.18 toks/s][A | |
| Processed prompts: 81%|ββββββββ | 160/198 [00:57<00:42, 1.11s/it, est. speed input: 758.84 toks/s, output: 3462.22 toks/s][A | |
| Processed prompts: 81%|βββββββββ | 161/198 [01:02<01:15, 2.05s/it, est. speed input: 706.52 toks/s, output: 3268.86 toks/s][A | |
| Processed prompts: 82%|βββββββββ | 162/198 [01:07<01:42, 2.85s/it, est. speed input: 658.28 toks/s, output: 3102.65 toks/s][A | |
| Processed prompts: 82%|βββββββββ | 163/198 [01:08<01:23, 2.37s/it, est. speed input: 648.27 toks/s, output: 3103.14 toks/s][A | |
| Processed prompts: 83%|βββββββββ | 164/198 [01:12<01:37, 2.87s/it, est. speed input: 615.22 toks/s, output: 2999.45 toks/s][A | |
| Processed prompts: 83%|βββββββββ | 165/198 [01:13<01:20, 2.44s/it, est. speed input: 605.61 toks/s, output: 3002.98 toks/s][A | |
| Processed prompts: 84%|βββββββββ | 166/198 [01:16<01:18, 2.47s/it, est. speed input: 587.00 toks/s, output: 2960.16 toks/s][A | |
| Processed prompts: 84%|βββββββββ | 167/198 [01:17<01:08, 2.20s/it, est. speed input: 577.86 toks/s, output: 2965.74 toks/s][A | |
| Processed prompts: 85%|βββββββββ | 168/198 [01:19<01:00, 2.02s/it, est. speed input: 568.89 toks/s, output: 2973.02 toks/s][A | |
| Processed prompts: 85%|βββββββββ | 169/198 [01:20<00:48, 1.67s/it, est. speed input: 564.84 toks/s, output: 3006.33 toks/s][A | |
| Processed prompts: 86%|βββββββββ | 170/198 [01:21<00:45, 1.64s/it, est. speed input: 556.40 toks/s, output: 3012.16 toks/s][A | |
| Processed prompts: 86%|βββββββββ | 171/198 [01:22<00:38, 1.41s/it, est. speed input: 554.01 toks/s, output: 3050.86 toks/s][A | |
| Processed prompts: 87%|βββββββββ | 172/198 [01:22<00:26, 1.04s/it, est. speed input: 556.31 toks/s, output: 3116.39 toks/s][A | |
| Processed prompts: 88%|βββββββββ | 174/198 [01:23<00:14, 1.61it/s, est. speed input: 558.90 toks/s, output: 3232.29 toks/s][A | |
| Processed prompts: 88%|βββββββββ | 175/198 [01:27<00:37, 1.63s/it, est. speed input: 531.29 toks/s, output: 3122.55 toks/s][A | |
| Processed prompts: 89%|βββββββββ | 176/198 [01:29<00:36, 1.65s/it, est. speed input: 523.74 toks/s, output: 3133.99 toks/s][A | |
| Processed prompts: 89%|βββββββββ | 177/198 [01:32<00:42, 2.00s/it, est. speed input: 508.91 toks/s, output: 3099.40 toks/s][A | |
| Processed prompts: 90%|βββββββββ | 178/198 [01:35<00:44, 2.25s/it, est. speed input: 496.60 toks/s, output: 3078.25 toks/s][A | |
| Processed prompts: 90%|βββββββββ | 179/198 [01:37<00:40, 2.13s/it, est. speed input: 489.62 toks/s, output: 3092.54 toks/s][A | |
| Processed prompts: 91%|βββββββββ | 180/198 [01:44<01:04, 3.61s/it, est. speed input: 457.18 toks/s, output: 2941.81 toks/s][A | |
| Processed prompts: 91%|ββββββββββ| 181/198 [01:46<00:54, 3.20s/it, est. speed input: 449.32 toks/s, output: 2953.95 toks/s][A | |
| Processed prompts: 92%|ββββββββββ| 182/198 [01:50<00:55, 3.47s/it, est. speed input: 434.95 toks/s, output: 2917.33 toks/s][A | |
| Processed prompts: 92%|ββββββββββ| 183/198 [02:01<01:25, 5.68s/it, est. speed input: 396.94 toks/s, output: 2720.90 toks/s][A | |
| Processed prompts: 93%|ββββββββββ| 185/198 [02:08<01:01, 4.74s/it, est. speed input: 378.03 toks/s, output: 2705.20 toks/s][A | |
| Processed prompts: 94%|ββββββββββ| 186/198 [02:09<00:43, 3.66s/it, est. speed input: 378.64 toks/s, output: 2766.62 toks/s][A | |
| Processed prompts: 94%|ββββββββββ| 187/198 [02:11<00:34, 3.17s/it, est. speed input: 375.14 toks/s, output: 2803.41 toks/s][A | |
| Processed prompts: 95%|ββββββββββ| 188/198 [02:15<00:34, 3.45s/it, est. speed input: 365.32 toks/s, output: 2791.42 toks/s][A | |
| Processed prompts: 95%|ββββββββββ| 189/198 [02:35<01:12, 8.06s/it, est. speed input: 319.61 toks/s, output: 2505.53 toks/s][A | |
| Processed prompts: 96%|ββββββββββ| 190/198 [07:01<10:52, 81.57s/it, est. speed input: 124.57 toks/s, output: 995.01 toks/s] [A | |
| Processed prompts: 96%|ββββββββββ| 191/198 [07:37<08:00, 68.65s/it, est. speed input: 115.12 toks/s, output: 985.74 toks/s][A | |
| Processed prompts: 97%|ββββββββββ| 192/198 [07:38<04:52, 48.74s/it, est. speed input: 115.47 toks/s, output: 1055.41 toks/s][A | |
| Processed prompts: 97%|ββββββββββ| 193/198 [07:38<02:52, 34.44s/it, est. speed input: 115.89 toks/s, output: 1125.79 toks/s][A | |
| Processed prompts: 98%|ββββββββββ| 194/198 [07:39<01:37, 24.40s/it, est. speed input: 116.13 toks/s, output: 1195.36 toks/s][A | |
| Processed prompts: 98%|ββββββββββ| 195/198 [07:46<00:57, 19.13s/it, est. speed input: 114.92 toks/s, output: 1248.11 toks/s][A | |
| Processed prompts: 99%|ββββββββββ| 197/198 [07:46<00:10, 10.51s/it, est. speed input: 115.57 toks/s, output: 1385.70 toks/s][A | |
| Processed prompts: 100%|ββββββββββ| 198/198 [07:49<00:00, 8.58s/it, est. speed input: 115.14 toks/s, output: 1447.26 toks/s][A | |
| Processed prompts: 100%|ββββββββββ| 198/198 [07:49<00:00, 8.58s/it, est. speed input: 115.14 toks/s, output: 1447.26 toks/s][A Processed prompts: 100%|ββββββββββ| 198/198 [07:49<00:00, 2.37s/it, est. speed input: 115.14 toks/s, output: 1447.26 toks/s] | |
| Splits: 100%|ββββββββββ| 1/1 [07:49<00:00, 469.64s/it] Splits: 100%|ββββββββββ| 1/1 [07:49<00:00, 469.64s/it] | |
| Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s] Creating parquet from Arrow format: 100%|ββββββββββ| 1/1 [00:00<00:00, 18.94ba/s] | |
| [2025-10-26 08:12:12,128] [[32m INFO[0m]: Cached 198 samples of lighteval|gpqa:diamond|0 (cd454966f6c36e03, GENERATIVE) at /mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562/0619260e1176b049/lighteval|gpqa:diamond|0/cd454966f6c36e03/GENERATIVE.parquet. (cache_management.py:345)[0m | |
| Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 198 examples [00:00, 5401.52 examples/s] | |
| [rank0]:[W1026 08:12:16.799573779 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
| [2025-10-26 08:12:17,647] [[32m INFO[0m]: --- POST-PROCESSING MODEL RESPONSES --- (pipeline.py:344)[0m | |
| [2025-10-26 08:12:17,650] [[32m INFO[0m]: --- COMPUTING METRICS --- (pipeline.py:371)[0m | |
| [2025-10-26 08:12:17,651] [[33m WARNING[0m]: n undefined in the pass@k. We assume it's the same as the sample's number of predictions. (metrics_sample.py:1302)[0m | |
| [2025-10-26 08:12:17,807] [[32m INFO[0m]: --- DISPLAYING RESULTS --- (pipeline.py:432)[0m | |
| [2025-10-26 08:12:17,817] [[32m INFO[0m]: --- SAVING AND PUSHING RESULTS --- (pipeline.py:422)[0m | |
| [2025-10-26 08:12:17,818] [[32m INFO[0m]: Saving experiment tracker (evaluation_tracker.py:246)[0m | |
| [2025-10-26 08:12:18,285] [[32m INFO[0m]: Saving results to /mnt/public/wucanhui/lighteval/results/results/mnt/public/wucanhui/outputs/Qwen3-4B-math-reasoning/checkpoint-2562/results_2025-10-26T08-12-17.819223.json (evaluation_tracker.py:310)[0m | |
| | Task |Version| Metric |Value | |Stderr| | |
| |------------------------|-------|------------------|-----:|---|-----:| | |
| |all | |gpqa_pass@k_with_k|0.3889|Β± |0.0347| | |
| |lighteval:gpqa:diamond:0| |gpqa_pass@k_with_k|0.3889|Β± |0.0347| | |