nohup: ignoring input ============================================ Running DFlash eval: denoise_steps=1 GPUs: 8, Samples: 500 ============================================ W0405 12:46:30.755000 12948 site-packages/torch/distributed/run.py:803] W0405 12:46:30.755000 12948 site-packages/torch/distributed/run.py:803] ***************************************** W0405 12:46:30.755000 12948 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0405 12:46:30.755000 12948 site-packages/torch/distributed/run.py:803] ***************************************** Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. ============================================================ DFlash Evaluation (Multi-GPU Data Parallel) ============================================================ Target model: /workspace/models/Qwen3-8B Draft model: /workspace/models/Qwen3-8B-DFlash-b16 Dataset: math500 Max samples: 500 Max new tokens: 512 Denoise steps: 1 Temperature: 0.0 GPUs: 8 Dtype: bfloat16 ============================================================ [1/4] Loading tokenizer... `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! [2/4] Loading target model on 8 GPUs... `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00 main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 79a748b0-c5ea-42c9-8374-f5e0cc42cd6e)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 79a748b0-c5ea-42c9-8374-f5e0cc42cd6e)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: be18fdeb-5afb-4c54-bd1d-8505efcaf008)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: be18fdeb-5afb-4c54-bd1d-8505efcaf008)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 8d718037-00db-435e-b4f5-f94cc36c65f9)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 8d718037-00db-435e-b4f5-f94cc36c65f9)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: c0ca0ba9-1e02-494b-a590-78a194d191ec)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: c0ca0ba9-1e02-494b-a590-78a194d191ec)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 5f7e482e-0bda-484d-86dc-50dbada58ac7)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 5f7e482e-0bda-484d-86dc-50dbada58ac7)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None main() File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 5bbb95d2-227e-461a-b5a6-5e4e5559e817)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 5bbb95d2-227e-461a-b5a6-5e4e5559e817)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 735a27ee-d350-44f2-946a-eb24a267728d)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md WARNING:huggingface_hub.utils._http:'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceH4/MATH-500/resolve/main/README.md (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))"), '(Request ID: 735a27ee-d350-44f2-946a-eb24a267728d)')' thrown while requesting HEAD https://huggingface.co/datasets/HuggingFaceH4/MATH-500/resolve/main/README.md Traceback (most recent call last): File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 372, in main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) W0405 12:48:37.276000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13068 closing signal SIGTERM W0405 12:48:37.277000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13069 closing signal SIGTERM W0405 12:48:37.277000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13070 closing signal SIGTERM W0405 12:48:37.277000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13071 closing signal SIGTERM W0405 12:48:37.277000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13072 closing signal SIGTERM W0405 12:48:37.278000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13073 closing signal SIGTERM W0405 12:48:37.278000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13075 closing signal SIGTERM E0405 12:48:38.043000 12948 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 6 (pid: 13074) of binary: /workspace/miniconda3/envs/specforge/bin/python3.11 Traceback (most recent call last): File "/workspace/miniconda3/envs/specforge/bin/torchrun", line 6, in sys.exit(main()) ^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main run(args) File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run elastic_launch( File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /workspace/hanrui/idea1/scripts/eval_dflash.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2026-04-05_12:48:37 host : job-006ce80a7c47-20260302193512-7694985998-5ng4c rank : 6 (local_rank: 6) exitcode : 1 (pid: 13074) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ ============================================ Running DFlash eval: denoise_steps=2 GPUs: 8, Samples: 500 ============================================ W0405 12:48:40.372000 13135 site-packages/torch/distributed/run.py:803] W0405 12:48:40.372000 13135 site-packages/torch/distributed/run.py:803] ***************************************** W0405 12:48:40.372000 13135 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0405 12:48:40.372000 13135 site-packages/torch/distributed/run.py:803] ***************************************** Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. Loading checkpoint shards: 0%| | 0/5 [00:00:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00 main() File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 254, in main all_prompts = load_eval_data(args) ^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/idea1/scripts/eval_dflash.py", line 76, in load_eval_data dataset = load_dataset("HuggingFaceH4/MATH-500")["test"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1485, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1130, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 1029, in dataset_module_factory raise e1 from None File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/datasets/load.py", line 971, in dataset_module_factory raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e ConnectionError: Couldn't reach 'HuggingFaceH4/MATH-500' on the Hub (SSLError) W0405 12:50:14.046000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13205 closing signal SIGTERM W0405 12:50:14.046000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13206 closing signal SIGTERM W0405 12:50:14.047000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13207 closing signal SIGTERM W0405 12:50:14.047000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13208 closing signal SIGTERM W0405 12:50:14.047000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13209 closing signal SIGTERM W0405 12:50:14.048000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13210 closing signal SIGTERM W0405 12:50:14.048000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 13211 closing signal SIGTERM E0405 12:50:15.214000 13135 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 7 (pid: 13212) of binary: /workspace/miniconda3/envs/specforge/bin/python3.11 Traceback (most recent call last): File "/workspace/miniconda3/envs/specforge/bin/torchrun", line 6, in sys.exit(main()) ^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main run(args) File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run elastic_launch( File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /workspace/hanrui/idea1/scripts/eval_dflash.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2026-04-05_12:50:14 host : job-006ce80a7c47-20260302193512-7694985998-5ng4c rank : 7 (local_rank: 7) exitcode : 1 (pid: 13212) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ ============================================ Running DFlash eval: denoise_steps=3 GPUs: 8, Samples: 500 ============================================ W0405 12:50:17.466000 13271 site-packages/torch/distributed/run.py:803] W0405 12:50:17.466000 13271 site-packages/torch/distributed/run.py:803] ***************************************** W0405 12:50:17.466000 13271 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0405 12:50:17.466000 13271 site-packages/torch/distributed/run.py:803] ***************************************** Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 144.22it/s] :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. ============================================================ DFlash Evaluation (Multi-GPU Data Parallel) ============================================================ Target model: /workspace/models/Qwen3-8B Draft model: /workspace/models/Qwen3-8B-DFlash-b16 Dataset: math500 Max samples: 500 Max new tokens: 512 Denoise steps: 3 Temperature: 0.0 GPUs: 8 Dtype: bfloat16 ============================================================ [1/4] Loading tokenizer... `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00 sys.exit(main()) ^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main run(args) File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run elastic_launch( File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 284, in launch_agent result = agent.run() ^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper result = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 717, in run result = self._invoke_run(role) ^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 881, in _invoke_run time.sleep(monitor_interval) File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 13271 got signal: 15 ============================================ All evaluations complete! Results in: /workspace/hanrui/idea1/results/dflash_eval/ ============================================ Quick comparison: steps= avg_tau=N/A