Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,105 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: bsd-3-clause
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: bsd-3-clause
|
| 3 |
+
tags:
|
| 4 |
+
- flash-attention
|
| 5 |
+
- flash-attn
|
| 6 |
+
- windows
|
| 7 |
+
- cuda
|
| 8 |
+
- blackwell
|
| 9 |
+
- rtx-5090
|
| 10 |
+
- rtx-4090
|
| 11 |
+
- prebuilt-wheels
|
| 12 |
+
library_name: flash-attn
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Flash Attention Prebuilt Wheels for Windows
|
| 16 |
+
|
| 17 |
+
Prebuilt `flash-attn` wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).
|
| 18 |
+
|
| 19 |
+
All wheels are built from the official [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.
|
| 20 |
+
|
| 21 |
+
## Available Wheels
|
| 22 |
+
|
| 23 |
+
| flash-attn | CUDA | PyTorch | Python | ABI | File |
|
| 24 |
+
|---|---|---|---|---|---|
|
| 25 |
+
| 2.8.3 | 13.0 | 2.11.0 | 3.12 | True | [`flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl`](./flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl) |
|
| 26 |
+
|
| 27 |
+
> Want a combination that isn't here? Open a discussion on the **Community** tab.
|
| 28 |
+
|
| 29 |
+
## Install
|
| 30 |
+
|
| 31 |
+
Pick the wheel matching your environment and run:
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
pip install https://huggingface.co/YOUR_USERNAME/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
Example:
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
pip install https://huggingface.co/YOUR_USERNAME/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Picking the Right Wheel
|
| 44 |
+
|
| 45 |
+
The filename encodes everything you need to match:
|
| 46 |
+
|
| 47 |
+
```
|
| 48 |
+
flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
|
| 49 |
+
ββ2.8.3ββ ββ130ββ ββ2.11.0ββ ββTRUEββ ββ312ββ
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
Verify your environment first:
|
| 53 |
+
|
| 54 |
+
```bash
|
| 55 |
+
python -c "import torch; print(torch.__version__, torch.version.cuda)"
|
| 56 |
+
# e.g. "2.11.0+cu130 13.0" β use cu130 + torch2.11.0 wheel
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
## GPU Support
|
| 60 |
+
|
| 61 |
+
Wheels are built with `TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0` unless noted otherwise, covering:
|
| 62 |
+
|
| 63 |
+
| Compute Capability | Architecture | Examples |
|
| 64 |
+
|---|---|---|
|
| 65 |
+
| 8.0 | Ampere | A100 |
|
| 66 |
+
| 8.6 | Ampere | RTX 30-series, A6000 |
|
| 67 |
+
| 8.9 | Ada Lovelace | RTX 40-series, L40S |
|
| 68 |
+
| 9.0 | Hopper | H100, H200 |
|
| 69 |
+
| 12.0 | Consumer Blackwell | **RTX 5090**, RTX Pro 6000 |
|
| 70 |
+
|
| 71 |
+
## Verify Installation
|
| 72 |
+
|
| 73 |
+
```python
|
| 74 |
+
import torch
|
| 75 |
+
from flash_attn import flash_attn_func
|
| 76 |
+
|
| 77 |
+
q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
|
| 78 |
+
print(flash_attn_func(q, q, q).shape)
|
| 79 |
+
# Expected: torch.Size([1, 8, 128, 64])
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Notes on Flash Attention 3 / 4 for RTX 5090
|
| 83 |
+
|
| 84 |
+
Neither FA3 nor FA4 will run on consumer Blackwell GPUs (sm_120):
|
| 85 |
+
|
| 86 |
+
- **FA3** is Hopper-only (sm_90), built around hardware features the RTX 5090 doesn't have.
|
| 87 |
+
- **FA4** requires the TMEM (Tensor Memory) subsystem present only on datacenter Blackwell (sm_100/sm_103, e.g., B200/B300). Consumer Blackwell (sm_120) lacks TMEM.
|
| 88 |
+
|
| 89 |
+
For RTX 5090 and other sm_120 cards, **Flash Attention 2 is the correct target**. If FA3/FA4 ever gain sm_120 support, wheels will be added here.
|
| 90 |
+
|
| 91 |
+
## Build Environment
|
| 92 |
+
|
| 93 |
+
- **OS**: Windows 11 (64-bit)
|
| 94 |
+
- **Compiler**: MSVC 14.44 (Visual Studio 2022 Community)
|
| 95 |
+
- **CUDA Toolkit**: matches the `cu*` tag in each wheel filename
|
| 96 |
+
- **Source**: official Dao-AILab/flash-attention release tag matching the wheel version
|
| 97 |
+
- **No source patches** β these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (`NVCC_FLAGS=--compiler-options /Zc:preprocessor`).
|
| 98 |
+
|
| 99 |
+
## License
|
| 100 |
+
|
| 101 |
+
[BSD-3-Clause](https://github.com/Dao-AILab/flash-attention/blob/main/LICENSE), matching upstream Flash Attention.
|
| 102 |
+
|
| 103 |
+
## Disclaimer
|
| 104 |
+
|
| 105 |
+
Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.
|