Flash Attention Prebuilt Wheels for Windows
Prebuilt flash-attn wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).
All wheels are built from the official Dao-AILab/flash-attention source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.
Available Wheels
| flash-attn | CUDA | PyTorch | Python | ABI | File |
|---|---|---|---|---|---|
| 2.8.3 | 13.0 | 2.11.0 | 3.12 | True | flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl |
Want a combination that isn't here? Open a discussion on the Community tab.
Install
Pick the wheel matching your environment and run:
pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>
Example:
pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl
Picking the Right Wheel
The filename encodes everything you need to match:
flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
ββ2.8.3ββ ββ130ββ ββ2.11.0ββ ββTRUEββ ββ312ββ
Verify your environment first:
python -c "import torch; print(torch.__version__, torch.version.cuda)"
# e.g. "2.11.0+cu130 13.0" β use cu130 + torch2.11.0 wheel
GPU Support
Wheels are built with TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0 unless noted otherwise, covering:
| Compute Capability | Architecture | Examples |
|---|---|---|
| 8.0 | Ampere | A100 |
| 8.6 | Ampere | RTX 30-series, A6000 |
| 8.9 | Ada Lovelace | RTX 40-series, L40S |
| 9.0 | Hopper | H100, H200 |
| 12.0 | Consumer Blackwell | RTX 5090, RTX Pro 6000 |
Verify Installation
import torch
from flash_attn import flash_attn_func
q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
print(flash_attn_func(q, q, q).shape)
# Expected: torch.Size([1, 8, 128, 64])
Build Environment
- OS: Windows 11 (64-bit)
- Compiler: MSVC 14.44 (Visual Studio 2022 Community)
- CUDA Toolkit: matches the
cu*tag in each wheel filename - Source: official Dao-AILab/flash-attention release tag matching the wheel version
- No source patches β these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (
NVCC_FLAGS=--compiler-options /Zc:preprocessor).
License
BSD-3-Clause, matching upstream Flash Attention.
Disclaimer
Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.