Instructions to use JingzeShi/flash-sparse-attention with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use JingzeShi/flash-sparse-attention with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("JingzeShi/flash-sparse-attention") - Notebooks
- Google Colab
- Kaggle
| library_name: kernels | |
| license: bsd-3-clause | |
| # flash-sparse-attention | |
| Flash Sparse Attention Triton kernels — dense, sparse, and gated attention | |
| with forward, backward, and decode paths. | |
| ## Usage | |
| ```python | |
| from kernels import get_kernel | |
| fsa = get_kernel("JingzeShi/flash-sparse-attention", version=1) | |
| # Dense forward | |
| out = fsa.flash_dense_attn_func(q, k, v, is_causal=True) | |
| # Decode with KV cache | |
| out = fsa.flash_dense_attn_with_kvcache_func(q, k, v) | |
| # Sparse attention | |
| out = fsa.flash_sparse_attn_func(q, k, v, is_causal=True, softmax_threshold=0.01) | |
| # Gated attention | |
| out = fsa.flash_gated_attn_func(q, k, v, alpha, delta, is_causal=True) | |
| ``` | |
| ## Available functions | |
| - `flash_dense_attn_func` | |
| - `flash_dense_attn_with_kvcache_func` | |
| - `flash_dense_attn_varlen_func` | |
| - `flash_dense_attn_varlen_with_kvcache_func` | |
| - `flash_sparse_attn_func` | |
| - `flash_sparse_attn_with_kvcache_func` | |
| - `flash_sparse_attn_varlen_func` | |
| - `flash_sparse_attn_varlen_with_kvcache_func` | |
| - `flash_gated_attn_func` | |
| - `flash_gated_attn_with_kvcache_func` | |
| - `flash_gated_attn_varlen_func` | |
| - `flash_gated_attn_varlen_with_kvcache_func` | |
| ## Source | |
| Originally from [HKUSTDial/flash-sparse-attention](https://github.com/HKUSTDial/flash-sparse-attention). |