Prebuilt CUDA Wheels — Triton 3.6.0 & SageAttention 2.2.0

Pre-compiled Python wheels for Linux x86_64, built against CUDA 12.8 with Python 3.12.

No compilation needed — just pip install the .whl file matching your setup.

Available Wheels

Triton 3.6.0

Wheel	Size	PyTorch	GPU
`triton-3.6.0-cp312-cp312-linux_x86_64.whl`	339 MB	Any	All

Triton is PyTorch-version independent — one wheel works with both PyTorch 2.7 and 2.10.

SageAttention 2.2.0

Wheel	Size	PyTorch	GPU Arch
`sageattention-2.2.0+cu128torch2.10.0sm90-…`	21.1 MB	2.10.0	Hopper (sm90)
`sageattention-2.2.0+cu128torch2.10.0sm120-…`	15.6 MB	2.10.0	Blackwell (sm120)
`sageattention-2.2.0+cu128torch2.7.0sm90-…`	20.2 MB	2.7.0	Hopper (sm90)
`sageattention-2.2.0+cu128torch2.7.0sm120-…`	14.9 MB	2.7.0	Blackwell (sm120)

Pick the wheel matching your PyTorch version AND GPU architecture.

Quick Install

# Install Triton
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/triton-3.6.0-cp312-cp312-linux_x86_64.whl

# Install SageAttention — pick ONE matching your setup:

# PyTorch 2.10 + Hopper (H100, H200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm90-cp312-cp312-linux_x86_64.whl

# PyTorch 2.10 + Blackwell (B100, B200, GB200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm120-cp312-cp312-linux_x86_64.whl

# PyTorch 2.7 + Hopper (H100, H200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm90-cp312-cp312-linux_x86_64.whl

# PyTorch 2.7 + Blackwell (B100, B200, GB200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm120-cp312-cp312-linux_x86_64.whl

Requirements

OS: Linux x86_64
Python: 3.12
CUDA: 12.8
PyTorch: 2.7.0 or 2.10.0 (match the wheel)

Which GPU wheel do I need?

GPU	Architecture	Wheel suffix
H100, H200	Hopper	`sm90`
B100, B200, GB200	Blackwell	`sm120`

Build Info

Built from source in a Docker container (nvidia/cuda:12.8.0-devel-ubuntu22.04)
SageAttention source: SageAttention v2.2.0
Triton source: Triton v3.6.0
Split-arch build policy: each SageAttention wheel targets exactly one GPU architecture

License

Triton: MIT License
SageAttention: Apache 2.0 License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support