File size: 21,606 Bytes

212a146

nohup: ignoring input
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] 
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] *****************************************
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0405 10:25:31.190000 1866 site-packages/torch/distributed/run.py:803] *****************************************
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
/workspace/hanrui/SpecForge/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
INFO:specforge.utils:rank 7: bind to device 7
INFO:specforge.utils:rank 7: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 7: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 5: bind to device 5
INFO:specforge.utils:rank 5: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 5: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 144.85it/s]

Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 146.67it/s]
INFO:specforge.utils:rank 2: bind to device 2
INFO:specforge.utils:rank 2: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 2: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 144.95it/s]
INFO:specforge.utils:rank 6: bind to device 6
INFO:specforge.utils:rank 6: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 6: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
INFO:specforge.utils:rank 4: bind to device 4
INFO:specforge.utils:rank 4: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 0: bind to device 0
INFO:specforge.utils:rank 4: Initialized distributed
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
`torch_dtype` is deprecated! Use `dtype` instead!
INFO:specforge.utils:rank 0: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 0: Initialized distributed
INFO:specforge.utils:Loading target model from /workspace/models/Qwen3-8B using hf backend
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 1: bind to device 1
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 1: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 1: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
INFO:specforge.utils:rank 3: bind to device 3
INFO:specforge.utils:rank 3: device mesh: DeviceMesh((dp=8, tp=1), device: 'cuda', stride: (1, 1))
INFO:specforge.utils:rank 3: Initialized distributed
`torch_dtype` is deprecated! Use `dtype` instead!
The following generation flags are not valid and may be ignored: ['output_hidden_states']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 147.05it/s]

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 144.71it/s]

Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 147.19it/s]

Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 147.49it/s]

Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 144.23it/s]
INFO:specforge.utils:Loaded draft config from /workspace/hanrui/SpecForge/configs/qwen3-8b-dflash.json
INFO:specforge.utils:Using attention backend: flex_attention
INFO:specforge.utils:Draft config: block_size=16, num_hidden_layers=5, num_target_layers=36
INFO:specforge.utils:Draft model parameters: 1,048,626,432
INFO:specforge.utils:Using mask_token_id: 151669
INFO:specforge.utils:dflash_config: {'mask_token_id': 151669, 'target_layer_ids': [1, 9, 17, 25, 33]}

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 1837 examples [00:00, 11687.06 examples/s]
Generating train split: 3552 examples [00:00, 12905.26 examples/s]
Generating train split: 5305 examples [00:00, 13685.64 examples/s]
Generating train split: 7092 examples [00:00, 14087.24 examples/s]
Generating train split: 8810 examples [00:00, 13875.88 examples/s]
Generating train split: 10577 examples [00:00, 14070.80 examples/s]
Generating train split: 12339 examples [00:00, 14338.59 examples/s]
Generating train split: 14119 examples [00:01, 14408.97 examples/s]
Generating train split: 15875 examples [00:01, 13772.55 examples/s]
Generating train split: 18146 examples [00:01, 13631.08 examples/s]
Generating train split: 19821 examples [00:01, 13593.93 examples/s]
Generating train split: 21639 examples [00:01, 14037.45 examples/s]
Generating train split: 23383 examples [00:01, 14050.69 examples/s]
Generating train split: 25099 examples [00:01, 14084.22 examples/s]
Generating train split: 26883 examples [00:01, 14187.82 examples/s]
Generating train split: 28585 examples [00:02, 13753.75 examples/s]
Generating train split: 30239 examples [00:02, 13572.18 examples/s]
Generating train split: 31983 examples [00:02, 13849.88 examples/s]
Generating train split: 33781 examples [00:02, 14154.00 examples/s]
Generating train split: 35574 examples [00:02, 14123.50 examples/s]
Generating train split: 37211 examples [00:02, 13928.80 examples/s]
Generating train split: 38849 examples [00:02, 13744.18 examples/s]
Generating train split: 40492 examples [00:02, 13641.21 examples/s]
Generating train split: 42163 examples [00:03, 13830.61 examples/s]
Generating train split: 43858 examples [00:03, 13117.61 examples/s]
Generating train split: 45529 examples [00:03, 13362.01 examples/s]
Generating train split: 47168 examples [00:03, 13406.32 examples/s]
Generating train split: 48845 examples [00:03, 13647.43 examples/s]
Generating train split: 50514 examples [00:03, 13685.47 examples/s]
Generating train split: 52177 examples [00:03, 13816.16 examples/s]
Generating train split: 53848 examples [00:03, 13338.72 examples/s]
Generating train split: 55490 examples [00:04, 13486.62 examples/s]
Generating train split: 57140 examples [00:04, 13073.50 examples/s]
Generating train split: 58765 examples [00:04, 13223.92 examples/s]
Generating train split: 60428 examples [00:04, 13284.92 examples/s]
Generating train split: 62103 examples [00:04, 13510.17 examples/s]
Generating train split: 63757 examples [00:04, 13534.27 examples/s]
Generating train split: 65373 examples [00:04, 13635.48 examples/s]
Generating train split: 67054 examples [00:04, 13778.71 examples/s]
Generating train split: 68728 examples [00:05, 13958.56 examples/s]
Generating train split: 70334 examples [00:05, 13449.56 examples/s]
Generating train split: 71933 examples [00:05, 13524.01 examples/s]
Generating train split: 73524 examples [00:05, 13487.56 examples/s]
Generating train split: 75146 examples [00:05, 13619.17 examples/s]
Generating train split: 76875 examples [00:05, 13802.52 examples/s]
Generating train split: 78532 examples [00:05, 13914.99 examples/s]
Generating train split: 78809 examples [00:05, 13690.41 examples/s]
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkldataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl

dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkldataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl

dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl
dataset is cached at ./cache/processed_dataset/1ca66c4ec30f16c9add30cc4fc5f1b5a.pkl

Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]
Map (num_proc=32):   0%|          | 0/78809 [00:00<?, ? examples/s]W0405 10:41:54.414000 1866 site-packages/torch/distributed/elastic/agent/server/api.py:725] Received 15 death signal, shutting down workers
W0405 10:41:54.415000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1952 closing signal SIGTERM
W0405 10:41:54.613000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1953 closing signal SIGTERM
W0405 10:41:54.613000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1954 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1955 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1956 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1957 closing signal SIGTERM
W0405 10:41:54.614000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1958 closing signal SIGTERM
W0405 10:41:54.615000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1959 closing signal SIGTERM
W0405 10:41:58.415000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1956 closing signal SIGTERM
W0405 10:41:58.416000 1866 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 1957 closing signal SIGTERM
Traceback (most recent call last):
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 717, in run
    result = self._invoke_run(role)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 881, in _invoke_run
    time.sleep(monitor_interval)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler
    raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1866 got signal: 15

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/miniconda3/envs/specforge/bin/torchrun", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
    run(args)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
    elastic_launch(
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 284, in launch_agent
    result = agent.run()
             ^^^^^^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 726, in run
    self._shutdown(e.sigval)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 369, in _shutdown
    self._pcontext.close(death_sig)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 578, in close
    self._close(death_sig=death_sig, timeout=timeout)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 920, in _close
    handler.proc.wait(time_to_wait)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/subprocess.py", line 2047, in _wait
    time.sleep(delay)
  File "/workspace/miniconda3/envs/specforge/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 85, in _terminate_process_handler
    raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1866 got signal: 15
terminate called without an active exception
Fatal Python error: Aborted

Thread 0x00007f5a9a0a0740 (most recent call first):
  <no Python frame>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 13)
examples/run_qwen3_8b_dflash_hf.sh: line 47:  1866 Aborted                 (core dumped) torchrun --standalone --nproc_per_node $NUM_GPUS $ROOT_DIR/scripts/train_dflash.py --target-model-path /workspace/models/Qwen3-8B --draft-config-path $ROOT_DIR/configs/qwen3-8b-dflash.json --train-data-path /workspace/hanrui/qwen3-8b_dflash_regen/sharegpt_train_regenerated.jsonl --output-dir $ROOT_DIR/outputs/qwen3-8b-dflash-hf --num-epochs 6 --batch-size 4 --learning-rate 6e-4 --warmup-ratio 0.04 --max-grad-norm 1.0 --max-length 3072 --chat-template qwen --attention-backend $ATTENTION_BACKEND --num-anchors 512 --loss-decay-gamma 7.0 --log-interval 50 --save-interval 1000 --report-to none --target-model-backend hf --block-size 16 --num-anchors 512