BitNet decode roofline on your GPU

Measures your GPU's achievable VRAM read bandwidth, then computes the batch-1 decode ceiling for BitNet-2B (0.69 GB read per token) and how far the current kernel is from it.

starting…