| Set TORCH_CUDA_ARCH_LIST to 9.0 |
| /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. |
| warnings.warn( |
| <frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. |
| <frozen importlib._bootstrap_external>:1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. |
| ============================================================ |
| DFlash Evaluation (Multi-GPU Data Parallel) |
| ============================================================ |
| Target model: /workspace/models/Qwen3-8B |
| Draft model: /workspace/models/Qwen3-8B-DFlash-b16 |
| Dataset: math500 |
| Max samples: 2 |
| Max new tokens: 64 |
| Denoise steps: 2 |
| Temperature: 0.0 |
| GPUs: 1 |
| Dtype: bfloat16 |
| ============================================================ |
|
|
| [1/4] Loading tokenizer... |
| [2/4] Loading target model on 1 GPUs... |
| `torch_dtype` is deprecated! Use `dtype` instead! |
|
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββ| 5/5 [00:00<00:00, 151.62it/s] |
| [3/4] Loading draft model on 1 GPUs... |
| Draft layers: 5 |
| Draft block_size: 16 |
| Draft mask_token: 151669 |
| Draft layer_ids: [1, 9, 17, 25, 33] |
| [4/4] Loading evaluation data... |
| Using the latest cached version of the dataset since HuggingFaceH4/MATH-500 couldn't be found on the Hugging Face Hub (offline mode is enabled). |
| WARNING:datasets.load:Using the latest cached version of the dataset since HuggingFaceH4/MATH-500 couldn't be found on the Hugging Face Hub (offline mode is enabled). |
| Found the latest cached dataset configuration 'default' at /workspace/hanrui/datasets/HuggingFaceH4___math-500/default/0.0.0/6e4ed1a2a79af7d8630a6b768ec859cb5af4d3be (last modified on Tue Mar 17 13:17:15 2026). |
| WARNING:datasets.packaged_modules.cache.cache:Found the latest cached dataset configuration 'default' at /workspace/hanrui/datasets/HuggingFaceH4___math-500/default/0.0.0/6e4ed1a2a79af7d8630a6b768ec859cb5af4d3be (last modified on Tue Mar 17 13:17:15 2026). |
| Total prompts: 2, ~2 per GPU |
|
|
| ============================================================ |
| Running evaluation... |
| ============================================================ |
| [GPU 0] Sample 1/2 | tokens=64 | tau=1.56 | time=2.2s | <think> Okay, so I need to convert the rectangular coordinates (0, 3) to polar c... |
| [GPU 0] Sample 2/2 | tokens=64 | tau=1.76 | time=1.4s | <think> Okay, so I need to find a way to express the double sum $\sum_^\i... |
|
|
| ============================================================ |
| RESULTS SUMMARY |
| ============================================================ |
| Denoise steps: 2 |
| GPUs used: 1 |
| Samples evaluated: 2 |
| Total blocks: 78 |
| Total generated tokens: 128 |
| Total GPU-time: 3.58s |
| Wall-clock time (approx): 2.15s |
| --- |
| Avg acceptance length (tau): 1.65 |
| Median acceptance length: 1.0 |
| Per-sample avg tau: ['1.56', '1.76'] |
| Min per-sample tau: 1.56 |
| Max per-sample tau: 1.76 |
| ============================================================ |
|
|
| Results saved to /workspace/hanrui/idea1/results/dflash_eval/math500_steps2.json |
|
|