Prebuilt CUDA Wheels — Triton 3.6.0 & SageAttention 2.2.0

Pre-compiled Python wheels for Linux x86_64, built against CUDA 12.8 with Python 3.12.

No compilation needed — just pip install the .whl file matching your setup.

Available Wheels

Triton 3.6.0

Wheel Size PyTorch GPU
triton-3.6.0-cp312-cp312-linux_x86_64.whl 339 MB Any All

Triton is PyTorch-version independent — one wheel works with both PyTorch 2.7 and 2.10.

SageAttention 2.2.0

Wheel Size PyTorch GPU Arch
sageattention-2.2.0+cu128torch2.10.0sm90-… 21.1 MB 2.10.0 Hopper (sm90)
sageattention-2.2.0+cu128torch2.10.0sm120-… 15.6 MB 2.10.0 Blackwell (sm120)
sageattention-2.2.0+cu128torch2.7.0sm90-… 20.2 MB 2.7.0 Hopper (sm90)
sageattention-2.2.0+cu128torch2.7.0sm120-… 14.9 MB 2.7.0 Blackwell (sm120)

Pick the wheel matching your PyTorch version AND GPU architecture.

Quick Install

# Install Triton
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/triton-3.6.0-cp312-cp312-linux_x86_64.whl

# Install SageAttention — pick ONE matching your setup:

# PyTorch 2.10 + Hopper (H100, H200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm90-cp312-cp312-linux_x86_64.whl

# PyTorch 2.10 + Blackwell (B100, B200, GB200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm120-cp312-cp312-linux_x86_64.whl

# PyTorch 2.7 + Hopper (H100, H200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm90-cp312-cp312-linux_x86_64.whl

# PyTorch 2.7 + Blackwell (B100, B200, GB200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm120-cp312-cp312-linux_x86_64.whl

Requirements

  • OS: Linux x86_64
  • Python: 3.12
  • CUDA: 12.8
  • PyTorch: 2.7.0 or 2.10.0 (match the wheel)

Which GPU wheel do I need?

GPU Architecture Wheel suffix
H100, H200 Hopper sm90
B100, B200, GB200 Blackwell sm120

Build Info

  • Built from source in a Docker container (nvidia/cuda:12.8.0-devel-ubuntu22.04)
  • SageAttention source: SageAttention v2.2.0
  • Triton source: Triton v3.6.0
  • Split-arch build policy: each SageAttention wheel targets exactly one GPU architecture

License

  • Triton: MIT License
  • SageAttention: Apache 2.0 License
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support