Sumitc13 commited on
Commit
f850e79
Β·
verified Β·
1 Parent(s): a7a2097

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -3
README.md CHANGED
@@ -1,3 +1,105 @@
1
- ---
2
- license: bsd-3-clause
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ tags:
4
+ - flash-attention
5
+ - flash-attn
6
+ - windows
7
+ - cuda
8
+ - blackwell
9
+ - rtx-5090
10
+ - rtx-4090
11
+ - prebuilt-wheels
12
+ library_name: flash-attn
13
+ ---
14
+
15
+ # Flash Attention Prebuilt Wheels for Windows
16
+
17
+ Prebuilt `flash-attn` wheels for Windows, focused on combinations that aren't published anywhere else (newer PyTorch versions, CUDA 13, Python 3.12+, consumer Blackwell support).
18
+
19
+ All wheels are built from the official [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) source and bundle kernels for multiple GPU architectures, so a single wheel works across most NVIDIA cards from Ampere through consumer Blackwell.
20
+
21
+ ## Available Wheels
22
+
23
+ | flash-attn | CUDA | PyTorch | Python | ABI | File |
24
+ |---|---|---|---|---|---|
25
+ | 2.8.3 | 13.0 | 2.11.0 | 3.12 | True | [`flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl`](./flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl) |
26
+
27
+ > Want a combination that isn't here? Open a discussion on the **Community** tab.
28
+
29
+ ## Install
30
+
31
+ Pick the wheel matching your environment and run:
32
+
33
+ ```bash
34
+ pip install https://huggingface.co/YOUR_USERNAME/flash-attn-windows-wheels/resolve/main/<WHEEL_FILENAME>
35
+ ```
36
+
37
+ Example:
38
+
39
+ ```bash
40
+ pip install https://huggingface.co/YOUR_USERNAME/flash-attn-windows-wheels/resolve/main/flash_attn-2.8.3+cu130torch2.11.0cxx11abiTRUE-cp312-cp312-win_amd64.whl
41
+ ```
42
+
43
+ ## Picking the Right Wheel
44
+
45
+ The filename encodes everything you need to match:
46
+
47
+ ```
48
+ flash_attn-{VERSION}+cu{CUDA}torch{TORCH}cxx11abi{ABI}-cp{PY}-cp{PY}-win_amd64.whl
49
+ └─2.8.3β”€β”˜ └─130β”€β”˜ └─2.11.0β”€β”˜ └─TRUEβ”€β”˜ └─312β”€β”˜
50
+ ```
51
+
52
+ Verify your environment first:
53
+
54
+ ```bash
55
+ python -c "import torch; print(torch.__version__, torch.version.cuda)"
56
+ # e.g. "2.11.0+cu130 13.0" β†’ use cu130 + torch2.11.0 wheel
57
+ ```
58
+
59
+ ## GPU Support
60
+
61
+ Wheels are built with `TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0;12.0` unless noted otherwise, covering:
62
+
63
+ | Compute Capability | Architecture | Examples |
64
+ |---|---|---|
65
+ | 8.0 | Ampere | A100 |
66
+ | 8.6 | Ampere | RTX 30-series, A6000 |
67
+ | 8.9 | Ada Lovelace | RTX 40-series, L40S |
68
+ | 9.0 | Hopper | H100, H200 |
69
+ | 12.0 | Consumer Blackwell | **RTX 5090**, RTX Pro 6000 |
70
+
71
+ ## Verify Installation
72
+
73
+ ```python
74
+ import torch
75
+ from flash_attn import flash_attn_func
76
+
77
+ q = torch.randn(1, 8, 128, 64, dtype=torch.float16, device='cuda')
78
+ print(flash_attn_func(q, q, q).shape)
79
+ # Expected: torch.Size([1, 8, 128, 64])
80
+ ```
81
+
82
+ ## Notes on Flash Attention 3 / 4 for RTX 5090
83
+
84
+ Neither FA3 nor FA4 will run on consumer Blackwell GPUs (sm_120):
85
+
86
+ - **FA3** is Hopper-only (sm_90), built around hardware features the RTX 5090 doesn't have.
87
+ - **FA4** requires the TMEM (Tensor Memory) subsystem present only on datacenter Blackwell (sm_100/sm_103, e.g., B200/B300). Consumer Blackwell (sm_120) lacks TMEM.
88
+
89
+ For RTX 5090 and other sm_120 cards, **Flash Attention 2 is the correct target**. If FA3/FA4 ever gain sm_120 support, wheels will be added here.
90
+
91
+ ## Build Environment
92
+
93
+ - **OS**: Windows 11 (64-bit)
94
+ - **Compiler**: MSVC 14.44 (Visual Studio 2022 Community)
95
+ - **CUDA Toolkit**: matches the `cu*` tag in each wheel filename
96
+ - **Source**: official Dao-AILab/flash-attention release tag matching the wheel version
97
+ - **No source patches** β€” these are stock builds with the documented CUDA 13 + MSVC preprocessor flag (`NVCC_FLAGS=--compiler-options /Zc:preprocessor`).
98
+
99
+ ## License
100
+
101
+ [BSD-3-Clause](https://github.com/Dao-AILab/flash-attention/blob/main/LICENSE), matching upstream Flash Attention.
102
+
103
+ ## Disclaimer
104
+
105
+ Unofficial community builds provided as-is with no warranty. Not affiliated with Dao-AILab or NVIDIA.