Flash Attention Prebuilt Wheels for Windows

Prebuilt flash-attn wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).

All wheels are built from the official Dao-AILab/flash-attention source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.

Available Wheels

flash-attn CUDA PyTorch Python ABI File
2.8.3 13.0 2.11.0 3.12 True flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl

Want a combination that isn't here? Open a discussion on the Community tab.

Install

Pick the wheel matching your environment and run:

pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>

Example:

pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl

Picking the Right Wheel

The filename encodes everything you need to match:

flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
            └─2.8.3β”€β”˜ └─130β”€β”˜   └─2.11.0β”€β”˜     └─TRUEβ”€β”˜   └─312β”€β”˜

Verify your environment first:

python -c "import torch; print(torch.__version__, torch.version.cuda)"
# e.g. "2.11.0+cu130 13.0" β†’ use cu130 + torch2.11.0 wheel

GPU Support

Wheels are built with TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0 unless noted otherwise, covering:

Compute Capability Architecture Examples
8.0 Ampere A100
8.6 Ampere RTX 30-series, A6000
8.9 Ada Lovelace RTX 40-series, L40S
9.0 Hopper H100, H200
12.0 Consumer Blackwell RTX 5090, RTX Pro 6000

Verify Installation

import torch
from flash_attn import flash_attn_func

q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
print(flash_attn_func(q, q, q).shape)
# Expected: torch.Size([1, 8, 128, 64])

Build Environment

  • OS: Windows 11 (64-bit)
  • Compiler: MSVC 14.44 (Visual Studio 2022 Community)
  • CUDA Toolkit: matches the cu* tag in each wheel filename
  • Source: official Dao-AILab/flash-attention release tag matching the wheel version
  • No source patches β€” these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (NVCC_FLAGS=--compiler-options /Zc:preprocessor).

License

BSD-3-Clause, matching upstream Flash Attention.

Disclaimer

Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support