| --- |
| license: apache-2.0 |
| tags: |
| - prebuilt-wheels |
| - cuda |
| - triton |
| - sageattention |
| - pytorch |
| language: |
| - en |
| --- |
| |
| # Prebuilt CUDA Wheels — Triton 3.6.0 & SageAttention 2.2.0 |
|
|
| Pre-compiled Python wheels for **Linux x86_64**, built against **CUDA 12.8** with **Python 3.12**. |
| |
| No compilation needed — just `pip install` the `.whl` file matching your setup. |
| |
| ## Available Wheels |
| |
| ### Triton 3.6.0 |
| |
| | Wheel | Size | PyTorch | GPU | |
| |---|---|---|---| |
| | `triton-3.6.0-cp312-cp312-linux_x86_64.whl` | 339 MB | Any | All | |
| |
| Triton is **PyTorch-version independent** — one wheel works with both PyTorch 2.7 and 2.10. |
| |
| ### SageAttention 2.2.0 |
| |
| | Wheel | Size | PyTorch | GPU Arch | |
| |---|---|---|---| |
| | `sageattention-2.2.0+cu128torch2.10.0sm90-…` | 21.1 MB | 2.10.0 | Hopper (sm90) | |
| | `sageattention-2.2.0+cu128torch2.10.0sm120-…` | 15.6 MB | 2.10.0 | Blackwell (sm120) | |
| | `sageattention-2.2.0+cu128torch2.7.0sm90-…` | 20.2 MB | 2.7.0 | Hopper (sm90) | |
| | `sageattention-2.2.0+cu128torch2.7.0sm120-…` | 14.9 MB | 2.7.0 | Blackwell (sm120) | |
| |
| > **Pick the wheel matching your PyTorch version AND GPU architecture.** |
| |
| ## Quick Install |
| |
| ```bash |
| # Install Triton |
| pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/triton-3.6.0-cp312-cp312-linux_x86_64.whl |
| |
| # Install SageAttention — pick ONE matching your setup: |
| |
| # PyTorch 2.10 + Hopper (H100, H200) |
| pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm90-cp312-cp312-linux_x86_64.whl |
| |
| # PyTorch 2.10 + Blackwell (B100, B200, GB200) |
| pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm120-cp312-cp312-linux_x86_64.whl |
| |
| # PyTorch 2.7 + Hopper (H100, H200) |
| pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm90-cp312-cp312-linux_x86_64.whl |
| |
| # PyTorch 2.7 + Blackwell (B100, B200, GB200) |
| pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm120-cp312-cp312-linux_x86_64.whl |
| ``` |
| |
| ## Requirements |
| |
| - **OS**: Linux x86_64 |
| - **Python**: 3.12 |
| - **CUDA**: 12.8 |
| - **PyTorch**: 2.7.0 or 2.10.0 (match the wheel) |
| |
| ## Which GPU wheel do I need? |
| |
| | GPU | Architecture | Wheel suffix | |
| |---|---|---| |
| | H100, H200 | Hopper | `sm90` | |
| | B100, B200, GB200 | Blackwell | `sm120` | |
| |
| ## Build Info |
| |
| - Built from source in a Docker container (`nvidia/cuda:12.8.0-devel-ubuntu22.04`) |
| - SageAttention source: [SageAttention v2.2.0](https://github.com/thu-ml/SageAttention) |
| - Triton source: [Triton v3.6.0](https://github.com/triton-lang/triton) |
| - Split-arch build policy: each SageAttention wheel targets exactly one GPU architecture |
| |
| ## License |
| |
| - Triton: MIT License |
| - SageAttention: Apache 2.0 License |
| |
| |