Set TORCH_CUDA_ARCH_LIST to 9.0 /workspace/hanrui/idea1/specforge/modeling/draft/llama3_eagle.py:29: UserWarning: flash_attn is not found, falling back to flex_attention. Please install flash_attn if you want to use the flash attention backend. warnings.warn( :1241: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. :1241: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. ============================================================ DFlash Evaluation (Multi-GPU Data Parallel) ============================================================ Target model: /workspace/models/Qwen3-8B Draft model: /workspace/models/Qwen3-8B-DFlash-b16 Dataset: math500 Max samples: 2 Max new tokens: 64 Denoise steps: 2 Temperature: 0.0 GPUs: 1 Dtype: bfloat16 ============================================================ [1/4] Loading tokenizer... [2/4] Loading target model on 1 GPUs... `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00 Okay, so I need to convert the rectangular coordinates (0, 3) to polar c... [GPU 0] Sample 2/2 | tokens=64 | tau=1.76 | time=1.4s | Okay, so I need to find a way to express the double sum $\sum_{j = 1}^\i... ============================================================ RESULTS SUMMARY ============================================================ Denoise steps: 2 GPUs used: 1 Samples evaluated: 2 Total blocks: 78 Total generated tokens: 128 Total GPU-time: 3.58s Wall-clock time (approx): 2.15s --- Avg acceptance length (tau): 1.65 Median acceptance length: 1.0 Per-sample avg tau: ['1.56', '1.76'] Min per-sample tau: 1.56 Max per-sample tau: 1.76 ============================================================ Results saved to /workspace/hanrui/idea1/results/dflash_eval/math500_steps2.json