File size: 21,606 Bytes
212a146 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | nohup: ignoring input
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803]
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] *****************************************
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] *****************************************
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
warnings.warn(
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
INFO:specforge.utils:rank 7: bind to device 7
INFO:specforge.utils:rank 7: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 7: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 5: bind to device 5
INFO:specforge.utils:rank 5: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 5: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 144.85it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 146.67it/s]
INFO:specforge.utils:rank 2: bind to device 2
INFO:specforge.utils:rank 2: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 2: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 144.95it/s]
INFO:specforge.utils:rank 6: bind to device 6
INFO:specforge.utils:rank 6: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 6: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
INFO:specforge.utils:rank 4: bind to device 4
INFO:specforge.utils:rank 4: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 0: bind to device 0
INFO:specforge.utils:rank 4: Initialized distributed
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
`torch_dtype` is deprecated! Use `dtype` instead!
INFO:specforge.utils:rank 0: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 0: Initialized distributed
INFO:specforge.utils:Loading target model from /workspace/models/Qwen3-8B using hf backend
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 1: bind to device 1
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 1: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 1: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 3: bind to device 3
INFO:specforge.utils:rank 3: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 3: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 147.05it/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 144.71it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 147.19it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 147.49it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 144.23it/s]
INFO:specforge.utils:Loaded draft config from /workspace/hanrui/SpecForge/configs/qwen3-8b-dflash.json
INFO:specforge.utils:Using attention backend: flex_attention
INFO:specforge.utils:Draft config: block_size=16, num_hidden_layers=5, num_target_layers=36
INFO:specforge.utils:Draft model parameters: 1,048,626,432
INFO:specforge.utils:Using mask_token_id: 151669
INFO:specforge.utils:dflash_config: {'mask_token_id': 151669, 'target_layer_ids': [1, 9, 17, 25, 33]}
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 1837 examples [00:00, 11687.06 examples/s]
Generating train split: 3552 examples [00:00, 12905.26 examples/s]
Generating train split: 5305 examples [00:00, 13685.64 examples/s]
Generating train split: 7092 examples [00:00, 14087.24 examples/s]
Generating train split: 8810 examples [00:00, 13875.88 examples/s]
Generating train split: 10577 examples [00:00, 14070.80 examples/s]
Generating train split: 12339 examples [00:00, 14338.59 examples/s]
Generating train split: 14119 examples [00:01, 14408.97 examples/s]
Generating train split: 15875 examples [00:01, 13772.55 examples/s]
Generating train split: 18146 examples [00:01, 13631.08 examples/s]
Generating train split: 19821 examples [00:01, 13593.93 examples/s]
Generating train split: 21639 examples [00:01, 14037.45 examples/s]
Generating train split: 23383 examples [00:01, 14050.69 examples/s]
Generating train split: 25099 examples [00:01, 14084.22 examples/s]
Generating train split: 26883 examples [00:01, 14187.82 examples/s]
Generating train split: 28585 examples [00:02, 13753.75 examples/s]
Generating train split: 30239 examples [00:02, 13572.18 examples/s]
Generating train split: 31983 examples [00:02, 13849.88 examples/s]
Generating train split: 33781 examples [00:02, 14154.00 examples/s]
Generating train split: 35574 examples [00:02, 14123.50 examples/s]
Generating train split: 37211 examples [00:02, 13928.80 examples/s]
Generating train split: 38849 examples [00:02, 13744.18 examples/s]
Generating train split: 40492 examples [00:02, 13641.21 examples/s]
Generating train split: 42163 examples [00:03, 13830.61 examples/s]
Generating train split: 43858 examples [00:03, 13117.61 examples/s]
Generating train split: 45529 examples [00:03, 13362.01 examples/s]
Generating train split: 47168 examples [00:03, 13406.32 examples/s]
Generating train split: 48845 examples [00:03, 13647.43 examples/s]
Generating train split: 50514 examples [00:03, 13685.47 examples/s]
Generating train split: 52177 examples [00:03, 13816.16 examples/s]
Generating train split: 53848 examples [00:03, 13338.72 examples/s]
Generating train split: 55490 examples [00:04, 13486.62 examples/s]
Generating train split: 57140 examples [00:04, 13073.50 examples/s]
Generating train split: 58765 examples [00:04, 13223.92 examples/s]
Generating train split: 60428 examples [00:04, 13284.92 examples/s]
Generating train split: 62103 examples [00:04, 13510.17 examples/s]
Generating train split: 63757 examples [00:04, 13534.27 examples/s]
Generating train split: 65373 examples [00:04, 13635.48 examples/s]
Generating train split: 67054 examples [00:04, 13778.71 examples/s]
Generating train split: 68728 examples [00:05, 13958.56 examples/s]
Generating train split: 70334 examples [00:05, 13449.56 examples/s]
Generating train split: 71933 examples [00:05, 13524.01 examples/s]
Generating train split: 73524 examples [00:05, 13487.56 examples/s]
Generating train split: 75146 examples [00:05, 13619.17 examples/s]
Generating train split: 76875 examples [00:05, 13802.52 examples/s]
Generating train split: 78532 examples [00:05, 13914.99 examples/s]
Generating train split: 78809 examples [00:05, 13690.41 examples/s]
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkldataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkldataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32): 0%| | 0/78809 [00:00<?, ? examples/s]W0405 10:41:54.414000 1866 site-packages/torch/distributed/elastic/agent/server/api.py:725] Received 15 death signal, shutting down workers
W0405 10:41:54.415000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1952 closing signal SIGTERM
W0405 10:41:54.613000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1953 closing signal SIGTERM
W0405 10:41:54.613000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1954 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1955 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1956 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1957 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1958 closing signal SIGTERM
W0405 10:41:54.615000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1959 closing signal SIGTERM
W0405 10:41:58.415000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1956 closing signal SIGTERM
W0405 10:41:58.416000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1957 closing signal SIGTERM
Traceback (most recent call last):
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 717, in run
result = self._invoke_run(role)
^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 881, in _invoke_run
time.sleep(monitor_interval)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1866 got signal: 15
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/miniconda3/envs/specforge/bin/torchrun", line 6, in <module>
sys.exit(main())
^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
run(args)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
elastic_launch(
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 284, in launch_agent
result = agent.run()
^^^^^^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 726, in run
self._shutdown(e.sigval)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 369, in _shutdown
self._pcontext.close(death_sig)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 578, in close
self._close(death_sig=death_sig, timeout=timeout)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 920, in _close
handler.proc.wait(time_to_wait)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/subprocess.py", line 1264, in wait
return self._wait(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/specforge/lib/python3.11/subprocess.py", line 2047, in _wait
time.sleep(delay)
File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1866 got signal: 15
terminate called without an active exception
Fatal Python error: Aborted
Thread 0x00007f5a9a0a0740 (most recent call first):
<no Python frame>
Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 13)
examples/run_qwen3_8b_dflash_hf.sh: line 47: 1866 Aborted (core dumped) torchrun --standalone --nproc_per_node $NUM_GPUS $ROOT_DIR/scripts/train_dflash.py --target-model-path /workspace/models/Qwen3-8B --draft-config-path $ROOT_DIR/configs/qwen3-8b-dflash.json --train-data-path /workspace/hanrui/qwen3-8b_dflash_regen/sharegpt_train_regenerated.jsonl --output-dir $ROOT_DIR/outputs/qwen3-8b-dflash-hf --num-epochs 6 --batch-size 4 --learning-rate 6e-4 --warmup-ratio 0.04 --max-grad-norm 1.0 --max-length 3072 --chat-template qwen --attention-backend $ATTENTION_BACKEND --num-anchors 512 --loss-decay-gamma 7.0 --log-interval 50 --save-interval 1000 --report-to none --target-model-backend hf --block-size 16 --num-anchors 512
|