Flash Attention Prebuilt Wheels for Windows

Prebuilt flash-attn wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).

All wheels are built from the official Dao-AILab/flash-attention source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.

Available Wheels

flash-attn	CUDA	PyTorch	Python	ABI	File
2.8.3	13.0	2.11.0	3.12	True	`flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl`

Want a combination that isn't here? Open a discussion on the Community tab.

Install

Pick the wheel matching your environment and run:

pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>

Example:

pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl

Picking the Right Wheel

The filename encodes everything you need to match:

flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
            └─2.8.3─┘ └─130─┘   └─2.11.0─┘     └─TRUE─┘   └─312─┘

Verify your environment first:

python -c "import torch; print(torch.__version__, torch.version.cuda)"
# e.g. "2.11.0+cu130 13.0" → use cu130 + torch2.11.0 wheel

GPU Support

Wheels are built with TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0 unless noted otherwise, covering:

Compute Capability	Architecture	Examples
8.0	Ampere	A100
8.6	Ampere	RTX 30-series, A6000
8.9	Ada Lovelace	RTX 40-series, L40S
9.0	Hopper	H100, H200
12.0	Consumer Blackwell	RTX 5090, RTX Pro 6000

Verify Installation

import torch
from flash_attn import flash_attn_func

q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
print(flash_attn_func(q, q, q).shape)
# Expected: torch.Size([1, 8, 128, 64])

Build Environment

OS: Windows 11 (64-bit)
Compiler: MSVC 14.44 (Visual Studio 2022 Community)
CUDA Toolkit: matches the cu* tag in each wheel filename
Source: official Dao-AILab/flash-attention release tag matching the wheel version
No source patches — these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (NVCC_FLAGS=--compiler-options /Zc:preprocessor).

License

BSD-3-Clause, matching upstream Flash Attention.

Disclaimer

Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support