nohup: ignoring input ============================================ Running DFlash eval: denoise_steps=1 GPUs: 8, Samples: 500 ============================================ W0405 13:06:29.225000 14266 site-packages/torch/distributed/run.py:803] W0405 13:06:29.225000 14266 site-packages/torch/distributed/run.py:803] ***************************************** W0405 13:06:29.225000 14266 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0405 13:06:29.225000 14266 site-packages/torch/distributed/run.py:803] ***************************************** Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00 sys.exit(main()) ^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main run(args) File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run elastic_launch( File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 284, in launch_agent result = agent.run() ^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper result = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 717, in run result = self._invoke_run(role) ^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 881, in _invoke_run time.sleep(monitor_interval) File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 14266 got signal: 15 ============================================ Running DFlash eval: denoise_steps=2 GPUs: 8, Samples: 500 ============================================ ============================================ Running DFlash eval: denoise_steps=3 GPUs: 8, Samples: 500 ============================================ W0405 13:07:18.843000 14859 site-packages/torch/distributed/run.py:803] W0405 13:07:18.843000 14859 site-packages/torch/distributed/run.py:803] ***************************************** W0405 13:07:18.843000 14859 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0405 13:07:18.843000 14859 site-packages/torch/distributed/run.py:803] ***************************************** Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00