yo9otatara
/

prebuilt_wheels

prebuilt-wheels

Model card Files Files and versions

prebuilt_wheels / README.md

yo9otatara's picture

Create README.md

f0d9c15 verified about 1 month ago

|

history blame contribute delete

2.82 kB

	---
	license: apache-2.0
	tags:
	- prebuilt-wheels
	- cuda
	- triton
	- sageattention
	- pytorch
	language:
	- en
	---

	# Prebuilt CUDA Wheels — Triton 3.6.0 & SageAttention 2.2.0

	Pre-compiled Python wheels for Linux x86_64, built against CUDA 12.8 with Python 3.12.

	No compilation needed — just `pip install` the `.whl` file matching your setup.

	## Available Wheels

	### Triton 3.6.0

	\| Wheel \| Size \| PyTorch \| GPU \|
	\|---\|---\|---\|---\|
	\| `triton-3.6.0-cp312-cp312-linux_x86_64.whl` \| 339 MB \| Any \| All \|

	Triton is PyTorch-version independent — one wheel works with both PyTorch 2.7 and 2.10.

	### SageAttention 2.2.0

	\| Wheel \| Size \| PyTorch \| GPU Arch \|
	\|---\|---\|---\|---\|
	\| `sageattention-2.2.0+cu128torch2.10.0sm90-…` \| 21.1 MB \| 2.10.0 \| Hopper (sm90) \|
	\| `sageattention-2.2.0+cu128torch2.10.0sm120-…` \| 15.6 MB \| 2.10.0 \| Blackwell (sm120) \|
	\| `sageattention-2.2.0+cu128torch2.7.0sm90-…` \| 20.2 MB \| 2.7.0 \| Hopper (sm90) \|
	\| `sageattention-2.2.0+cu128torch2.7.0sm120-…` \| 14.9 MB \| 2.7.0 \| Blackwell (sm120) \|

	> Pick the wheel matching your PyTorch version AND GPU architecture.

	## Quick Install

	```bash
	# Install Triton
	pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/triton-3.6.0-cp312-cp312-linux_x86_64.whl

	# Install SageAttention — pick ONE matching your setup:

	# PyTorch 2.10 + Hopper (H100, H200)
	pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm90-cp312-cp312-linux_x86_64.whl

	# PyTorch 2.10 + Blackwell (B100, B200, GB200)
	pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm120-cp312-cp312-linux_x86_64.whl

	# PyTorch 2.7 + Hopper (H100, H200)
	pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm90-cp312-cp312-linux_x86_64.whl

	# PyTorch 2.7 + Blackwell (B100, B200, GB200)
	pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm120-cp312-cp312-linux_x86_64.whl
	```

	## Requirements

	- OS: Linux x86_64
	- Python: 3.12
	- CUDA: 12.8
	- PyTorch: 2.7.0 or 2.10.0 (match the wheel)

	## Which GPU wheel do I need?

	\| GPU \| Architecture \| Wheel suffix \|
	\|---\|---\|---\|
	\| H100, H200 \| Hopper \| `sm90` \|
	\| B100, B200, GB200 \| Blackwell \| `sm120` \|

	## Build Info

	- Built from source in a Docker container (`nvidia/cuda:12.8.0-devel-ubuntu22.04`)
	- SageAttention source: [SageAttention v2.2.0](https://github.com/thu-ml/SageAttention)
	- Triton source: [Triton v3.6.0](https://github.com/triton-lang/triton)
	- Split-arch build policy: each SageAttention wheel targets exactly one GPU architecture

	## License

	- Triton: MIT License
	- SageAttention: Apache 2.0 License