Sumitc13
/

flash-attn-windows-wheels

+---
+license: bsd-3-clause
+tags:
+  - flash-attention
+  - flash-attn
+  - windows
+  - cuda
+  - blackwell
+  - rtx-5090
+  - rtx-4090
+  - prebuilt-wheels
+library_name: flash-attn
+---
+# Flash Attention Prebuilt Wheels for Windows
+Prebuilt `flash-attn` wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).
+All wheels are built from the official [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.
+## Available Wheels
+| flash-attn | CUDA | PyTorch | Python | ABI | File |
+|---|---|---|---|---|---|
+| 2.8.3 | 13.0 | 2.11.0 | 3.12 | True | [`flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl`](./flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl) |
+> Want a combination that isn't here? Open a discussion on the **Community** tab.
+## Install
+Pick the wheel matching your environment and run:
+```bash
+pip install https://huggingface.co/YOUR_USERNAME/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>
+```
+Example:
+```bash
+pip install https://huggingface.co/YOUR_USERNAME/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl
+```
+## Picking the Right Wheel
+The filename encodes everything you need to match:
+```
+flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
+            └─2.8.3─┘ └─130─┘   └─2.11.0─┘     └─TRUE─┘   └─312─┘
+```
+Verify your environment first:
+```bash
+python -c "import torch; print(torch.__version__, torch.version.cuda)"
+# e.g. "2.11.0+cu130 13.0" → use cu130 + torch2.11.0 wheel
+```
+## GPU Support
+Wheels are built with `TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0` unless noted otherwise, covering:
+| Compute Capability | Architecture | Examples |
+|---|---|---|
+| 8.0 | Ampere | A100 |
+| 8.6 | Ampere | RTX 30-series, A6000 |
+| 8.9 | Ada Lovelace | RTX 40-series, L40S |
+| 9.0 | Hopper | H100, H200 |
+| 12.0 | Consumer Blackwell | **RTX 5090**, RTX Pro 6000 |
+## Verify Installation
+```python
+import torch
+from flash_attn import flash_attn_func
+q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
+print(flash_attn_func(q, q, q).shape)
+# Expected: torch.Size([1, 8, 128, 64])
+```
+## Notes on Flash Attention 3 / 4 for RTX 5090
+Neither FA3 nor FA4 will run on consumer Blackwell GPUs (sm_120):
+- **FA3** is Hopper-only (sm_90), built around hardware features the RTX 5090 doesn't have.
+- **FA4** requires the TMEM (Tensor Memory) subsystem present only on datacenter Blackwell (sm_100/sm_103, e.g., B200/B300). Consumer Blackwell (sm_120) lacks TMEM.
+For RTX 5090 and other sm_120 cards, **Flash Attention 2 is the correct target**. If FA3/FA4 ever gain sm_120 support, wheels will be added here.
+## Build Environment
+- **OS**: Windows 11 (64-bit)
+- **Compiler**: MSVC 14.44 (Visual Studio 2022 Community)
+- **CUDA Toolkit**: matches the `cu*` tag in each wheel filename
+- **Source**: official Dao-AILab/flash-attention release tag matching the wheel version
+- **No source patches** — these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (`NVCC_FLAGS=--compiler-options /Zc:preprocessor`).
+## License
+[BSD-3-Clause](https://github.com/Dao-AILab/flash-attention/blob/main/LICENSE), matching upstream Flash Attention.
+## Disclaimer
+Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.