zbrl
/

whl

zbrl commited on Apr 10

Commit

301ce4a

verified ·

1 Parent(s): 17e2275

Upload flash-attn3/README.md with huggingface_hub

Files changed (1) hide show

flash-attn3/README.md ADDED Viewed

+# flash-attn3
+Flash Attention 3 wheel compiled from [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) (`hopper/` directory).
+## Build environment
+| Component | Version |
+|-----------|---------|
+| Base image | NGC 25.11 |
+| CUDA | 13.0 |
+| PyTorch | 2.10 |
+| Python | 3.12 |
+| GPU target | NVIDIA H200 (SM90) |
+## Wheel naming
+`flash_attn_3-3.0.0-cp39-abi3-linux_x86_64.whl`
+- `cp39-abi3` = Python **Stable ABI**, compatible with Python >= 3.9 (including 3.12).
+- CUDA and PyTorch versions are linked at compile time and **not** encoded in the filename.
+## Usage
+```python
+from flash_attn_interface import flash_attn_func
+```
+In HuggingFace Transformers:
+```python
+model = AutoModel.from_pretrained(
+    model_id,
+    attn_implementation="flash_attention_3",
+)
+```