Sumitc13
/

flash-attn-windows-wheels

flash-attention

prebuilt-wheels

Model card Files Files and versions

flash-attn-windows-wheels / README.md

Sumitc13's picture

Update README.md

b5fd8d0 verified 11 days ago

|

history blame contribute delete

3.13 kB

	---
	license: bsd-3-clause
	tags:
	- flash-attention
	- flash-attn
	- windows
	- cuda
	- blackwell
	- rtx-5090
	- rtx-4090
	- prebuilt-wheels
	library_name: flash-attn
	---

	# Flash Attention Prebuilt Wheels for Windows

	Prebuilt `flash-attn` wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).

	All wheels are built from the official [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.

	## Available Wheels

	\| flash-attn \| CUDA \| PyTorch \| Python \| ABI \| File \|
	\|---\|---\|---\|---\|---\|---\|
	\| 2.8.3 \| 13.0 \| 2.11.0 \| 3.12 \| True \| [`flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl`](./flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl) \|

	> Want a combination that isn't here? Open a discussion on the Community tab.

	## Install

	Pick the wheel matching your environment and run:

	```bash
	pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>
	```

	Example:

	```bash
	pip install https://huggingface.co/Sumitc13/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl
	```

	## Picking the Right Wheel

	The filename encodes everything you need to match:

	```
	flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
	└─2.8.3─┘ └─130─┘ └─2.11.0─┘ └─TRUE─┘ └─312─┘
	```

	Verify your environment first:

	```bash
	python -c "import torch; print(torch.__version__, torch.version.cuda)"
	# e.g. "2.11.0+cu130 13.0" → use cu130 + torch2.11.0 wheel
	```

	## GPU Support

	Wheels are built with `TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0` unless noted otherwise, covering:

	\| Compute Capability \| Architecture \| Examples \|
	\|---\|---\|---\|
	\| 8.0 \| Ampere \| A100 \|
	\| 8.6 \| Ampere \| RTX 30-series, A6000 \|
	\| 8.9 \| Ada Lovelace \| RTX 40-series, L40S \|
	\| 9.0 \| Hopper \| H100, H200 \|
	\| 12.0 \| Consumer Blackwell \| RTX 5090, RTX Pro 6000 \|

	## Verify Installation

	```python
	import torch
	from flash_attn import flash_attn_func

	q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
	print(flash_attn_func(q, q, q).shape)
	# Expected: torch.Size([1, 8, 128, 64])
	```

	## Build Environment

	- OS: Windows 11 (64-bit)
	- Compiler: MSVC 14.44 (Visual Studio 2022 Community)
	- CUDA Toolkit: matches the `cu*` tag in each wheel filename
	- Source: official Dao-AILab/flash-attention release tag matching the wheel version
	- No source patches — these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (`NVCC_FLAGS=--compiler-options /Zc:preprocessor`).

	## License

	[BSD-3-Clause](https://github.com/Dao-AILab/flash-attention/blob/main/LICENSE), matching upstream Flash Attention.

	## Disclaimer

	Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.