nohup: ignoring input ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( /workspace/hanrui/syxin_old/Specforge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( `torch_dtype` is deprecated! Use `dtype` instead! The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details. `torch_dtype` is deprecated! Use `dtype` instead! The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details. Loading checkpoint shards: 0%| | 0/5 [00:00", line 198, in _run_module_as_main File "", line 88, in _run_code File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 940, in main() File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main run(args) File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run elastic_launch( File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 284, in launch_agent result = agent.run() ^^^^^^^^^^^ File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper result = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 717, in run result = self._invoke_run(role) ^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 881, in _invoke_run time.sleep(monitor_interval) File "/workspace/hanrui/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 62051 got signal: 15