Ternary GEMV limiter profile — your GPU

Same GEMV access pattern, one knob at a time. read-only = the reads with zero math (memory ceiling at this pattern). dot ×1 = the real kernel. ×2/×4 = same reads, more ALU. If read-only ≫ dot, the ALU is the wall (int8 is the lever); if read-only ≈ dot, the memory access pattern is the wall (int8 won't help).

starting…