This is the repository card of rootonchair/nunchaku-lite-kernels that has been pushed on the Hub. It was built to be used with the kernels library. This card was automatically generated.
How to use
# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel
kernel_module = get_kernel("rootonchair/nunchaku-lite-kernels", version=1)
attention_fp16_cuda = kernel_module.attention_fp16_cuda
attention_fp16_cuda(...)
Available functions
attention_fp16_cudaawq_gemm_w4a16_g128_int16awq_gemm_w4a16_g64_int32awq_gemv_w4a16_cudasvdq_gemm_w4a4_cudasvdq_quantize_w4a4_act_fuse_lora_cuda
Benchmarks
Benchmarking script is available for this kernel. Run kernels benchmark rootonchair/nunchaku-lite-kernels --version 1.
- Downloads last month
- 2
apache-2.0
Supported hardwares new
CUDA
- OS
- linux
- Arch
- x86_64




