zbrl commited on
Commit
301ce4a
·
verified ·
1 Parent(s): 17e2275

Upload flash-attn3/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. flash-attn3/README.md +35 -0
flash-attn3/README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # flash-attn3
2
+
3
+ Flash Attention 3 wheel compiled from [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) (`hopper/` directory).
4
+
5
+ ## Build environment
6
+
7
+ | Component | Version |
8
+ |-----------|---------|
9
+ | Base image | NGC 25.11 |
10
+ | CUDA | 13.0 |
11
+ | PyTorch | 2.10 |
12
+ | Python | 3.12 |
13
+ | GPU target | NVIDIA H200 (SM90) |
14
+
15
+ ## Wheel naming
16
+
17
+ `flash_attn_3-3.0.0-cp39-abi3-linux_x86_64.whl`
18
+
19
+ - `cp39-abi3` = Python **Stable ABI**, compatible with Python >= 3.9 (including 3.12).
20
+ - CUDA and PyTorch versions are linked at compile time and **not** encoded in the filename.
21
+
22
+ ## Usage
23
+
24
+ ```python
25
+ from flash_attn_interface import flash_attn_func
26
+ ```
27
+
28
+ In HuggingFace Transformers:
29
+
30
+ ```python
31
+ model = AutoModel.from_pretrained(
32
+ model_id,
33
+ attn_implementation="flash_attention_3",
34
+ )
35
+ ```