| --- |
| license: bsd-3-clause |
| tags: |
| - flash-attention |
| - flash-attn |
| - windows |
| - cuda |
| - blackwell |
| - rtx-5090 |
| - rtx-4090 |
| - prebuilt-wheels |
| library_name: flash-attn |
| --- |
| |
| # Flash Attention Prebuilt Wheels for Windows |
|
|
| Prebuilt `flash-attn` wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support). |
|
|
| All wheels are built from the official [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell. |
|
|
| ## Available Wheels |
|
|
| | flash-attn | CUDA | PyTorch | Python | ABI | File | |
| |---|---|---|---|---|---| |
| | 2.8.3 | 13.0 | 2.11.0 | 3.12 | True | [`flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl`](./flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl) | |
|
|
| > Want a combination that isn't here? Open a discussion on the **Community** tab. |
|
|
| ## Install |
|
|
| Pick the wheel matching your environment and run: |
|
|
| ```bash |
| pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME> |
| ``` |
|
|
| Example: |
|
|
| ```bash |
| pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl |
| ``` |
|
|
| ## Picking the Right Wheel |
|
|
| The filename encodes everything you need to match: |
|
|
| ``` |
| flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl |
| ββ2.8.3ββ ββ130ββ ββ2.11.0ββ ββTRUEββ ββ312ββ |
| ``` |
|
|
| Verify your environment first: |
|
|
| ```bash |
| python -c "import torch; print(torch.__version__, torch.version.cuda)" |
| # e.g. "2.11.0+cu130 13.0" β use cu130 + torch2.11.0 wheel |
| ``` |
|
|
| ## GPU Support |
|
|
| Wheels are built with `TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0` unless noted otherwise, covering: |
|
|
| | Compute Capability | Architecture | Examples | |
| |---|---|---| |
| | 8.0 | Ampere | A100 | |
| | 8.6 | Ampere | RTX 30-series, A6000 | |
| | 8.9 | Ada Lovelace | RTX 40-series, L40S | |
| | 9.0 | Hopper | H100, H200 | |
| | 12.0 | Consumer Blackwell | **RTX 5090**, RTX Pro 6000 | |
|
|
| ## Verify Installation |
|
|
| ```python |
| import torch |
| from flash_attn import flash_attn_func |
| |
| q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda') |
| print(flash_attn_func(q, q, q).shape) |
| # Expected: torch.Size([1, 8, 128, 64]) |
| ``` |
|
|
| ## Build Environment |
|
|
| - **OS**: Windows 11 (64-bit) |
| - **Compiler**: MSVC 14.44 (Visual Studio 2022 Community) |
| - **CUDA Toolkit**: matches the `cu*` tag in each wheel filename |
| - **Source**: official Dao-AILab/flash-attention release tag matching the wheel version |
| - **No source patches** β these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (`NVCC_FLAGS=--compiler-options /Zc:preprocessor`). |
|
|
| ## License |
|
|
| [BSD-3-Clause](https://github.com/Dao-AILab/flash-attention/blob/main/LICENSE), matching upstream Flash Attention. |
|
|
| ## Disclaimer |
|
|
| Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA. |