Kernels:

flashrt
/

grouped-moe-gemv

Kernel card Files Files and versions

grouped-moe-gemv

Native CUDA FlashRT grouped MoE GEMV kernels for BF16 activations and NVFP4 weights.

Available functions:

w4a16_decode_gemv_bf16
grouped_w4a16_gemv_bf16

Downloads last month: -

cuda

native-cuda

flashrt

Mixture of Experts

nvfp4

blackwell

apache-2.0

Supported hardwares new

CUDA

12.0a

DGX Spark

GB10

128GB

GPU

RTX PRO 6000 WS

96GB

GPU

RTX PRO 6000 Max-Q

96GB

GPU

RTX PRO 5000

48GB

GPU

RTX PRO 4500 WS

32GB

GPU

RTX PRO 4000

24GB

GPU

RTX PRO 4000 SFF

24GB

GPU

RTX PRO 2000

16GB

RTX

RTX 5090

32GB

RTX

RTX 5090 D

32GB

RTX

RTX 5090 Mobile

24GB

RTX

RTX 5080

16GB

RTX

RTX 5080 Mobile

16GB

RTX

RTX 5070

12GB

RTX

RTX 5070 Mobile

8GB

RTX

RTX 5070 Ti

16GB

RTX

RTX 5070 Ti Mobile

12GB

RTX

RTX 5060 Ti

16GB

RTX

RTX 5060

8GB

RTX

RTX 5060 Mobile

8GB

OS: linux

Arch: x86_64