| | --- |
| | library_name: kernels |
| | license: apache-2.0 |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the training script had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| |
|
| | This is the repository card of {repo_id} that has been pushed on the Hub. It was built to be used with the [`kernels` library](https://github.com/huggingface/kernels). This card was automatically generated. |
| | |
| | |
| | ## How to use |
| | |
| | ```python |
| | # make sure `kernels` is installed: `pip install -U kernels` |
| | from kernels import get_kernel |
| |
|
| | kernel_module = get_kernel("REPO_ID") # <- change the ID if needed |
| | flash_attn_combine = kernel_module.flash_attn_combine |
| |
|
| | flash_attn_combine(...) |
| | ``` |
| | |
| | ## Available functions |
| | |
| | - `flash_attn_combine` |
| | - `flash_attn_func` |
| | - `flash_attn_qkvpacked_func` |
| | - `flash_attn_varlen_func` |
| | - `flash_attn_with_kvcache` |
| | - `get_scheduler_metadata` |
| | |
| | ## Supported backends |
| | |
| | - cuda |
| | |
| | ## CUDA Capabilities |
| | |
| | - 9.0a |
| | - 8.0 |
| | |
| | ## Benchmarks |
| | |
| | [TODO: provide benchmarks if available] |
| | |
| | ## Code source |
| | |
| | [TODO: provide original code source and other relevant citations if available] |
| | |
| | ## Notes |
| | |
| | [TODO: provide additional notes about this kernel if needed] |