Instructions to use kernels-community/vllm-flash-attn3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use kernels-community/vllm-flash-attn3 with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("kernels-community/vllm-flash-attn3") - Notebooks
- Google Colab
- Kaggle
attention sinks & backward
#3
by acforvs - opened
Hi, thanks for all the amazing work!
I noticed that a new s_aux parameter has been added to the fwd pass to support attention sinks. However, I wasn't able to find any related changes in the backward pass.
Does the current implementation support training as well? if not, are there any plans to add support for attention sinks to the bwd pass?
Many thanks,
Vlad
Same question here! Is support for this planned?
Hi~ Any support for the backward now? Or still aviod any training?