Instructions to use kernels-community/vllm-flash-attn3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use kernels-community/vllm-flash-attn3 with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("kernels-community/vllm-flash-attn3") - Notebooks
- Google Colab
- Kaggle
Support for B200s?
#7
by shriramc - opened
I'm trying to use this kernel with B200s and am running into the error:
CUDA error (/build/source/flash-attn/flash_fwd_launch_template.h:191): no kernel image is available for execution on the device
Is there a timeline for supporting this kernel on B200s?
Same error on a RTX PRO 6000 Blackwell. I think it's mainly Blackwell GPU.
Strangly it run without issue on a H200.
Maybe the error is coming from somewhere else ?
FlashAttention3 doesn't support Blackwell, it was only made to work on Hopper GPUs. Blackwell support will come with FlashAttention4
@mgoin is there a timeline youre aware of for flashattention4? Seems like this would be blocking for models that currently require FA3