File size: 3,604 Bytes

2d67aa6

Set TORCH_CUDA_ARCH_LIST to 9.0
/workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend.
  warnings.warn(
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
============================================================
DFlash Evaluation (Multi-GPU Data Parallel)
============================================================
  Target model:       /workspace/models/Qwen3-8B
  Draft model:        /workspace/models/Qwen3-8B-DFlash-b16
  Dataset:            math500
  Max samples:        2
  Max new tokens:     64
  Denoise steps:      2
  Temperature:        0.0
  GPUs:               1
  Dtype:              bfloat16
============================================================

[1/4] Loading tokenizer...
[2/4] Loading target model on 1 GPUs...
`torch_dtype` is deprecated! Use `dtype` instead!

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00, 151.62it/s]
[3/4] Loading draft model on 1 GPUs...
  Draft layers:       5
  Draft block_size:   16
  Draft mask_token:   151669
  Draft layer_ids:    [1, 9, 17, 25, 33]
[4/4] Loading evaluation data...
Using the latest cached version of the dataset since HuggingFaceH4/MATH-500 couldn't be found on the Hugging Face Hub (offline mode is enabled).
WARNING:datasets.load:Using the latest cached version of the dataset since HuggingFaceH4/MATH-500 couldn't be found on the Hugging Face Hub (offline mode is enabled).
Found the latest cached dataset configuration 'default' at /workspace/hanrui/datasets/HuggingFaceH4___math-500/default/0.0.0/6e4ed1a2a79af7d8630a6b768ec859cb5af4d3be (last modified on Tue Mar 17 13:17:15 2026).
WARNING:datasets.packaged_modules.cache.cache:Found the latest cached dataset configuration 'default' at /workspace/hanrui/datasets/HuggingFaceH4___math-500/default/0.0.0/6e4ed1a2a79af7d8630a6b768ec859cb5af4d3be (last modified on Tue Mar 17 13:17:15 2026).
  Total prompts: 2, ~2 per GPU

============================================================
Running evaluation...
============================================================
  [GPU 0] Sample 1/2 | tokens=64 | tau=1.56 | time=2.2s | <think> Okay, so I need to convert the rectangular coordinates (0, 3) to polar c...
  [GPU 0] Sample 2/2 | tokens=64 | tau=1.76 | time=1.4s | <think> Okay, so I need to find a way to express the double sum $\sum_{j = 1}^\i...

============================================================
RESULTS SUMMARY
============================================================
  Denoise steps:              2
  GPUs used:                  1
  Samples evaluated:          2
  Total blocks:               78
  Total generated tokens:     128
  Total GPU-time:             3.58s
  Wall-clock time (approx):   2.15s
  ---
  Avg acceptance length (tau): 1.65
  Median acceptance length:    1.0
  Per-sample avg tau:          ['1.56', '1.76']
  Min per-sample tau:          1.56
  Max per-sample tau:          1.76
============================================================

Results saved to /workspace/hanrui/idea1/results/dflash_eval/math500_steps2.json