TensorBend Shader Tests & Profiler

Validates every WGSL compute shader against CPU reference implementations

Shader Tests
Profiler
Benchmark

Profiles a single forward pass with per-operation timing. Requires a loaded model — enter repo below and load first.

Measures GPTQ matvec memory bandwidth at model-realistic sizes. Compares regular vs split-K dispatch strategies.