|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# ONNX Runtime GPU 1.24.0 - CUDA 13.0 Build with Blackwell Support |
|
|
|
|
|
## Overview |
|
|
|
|
|
Custom-built **ONNX Runtime GPU 1.24.0** for Windows with full **CUDA 13.0** and **Blackwell architecture (sm_120)** support. This build addresses the `cudaErrorNoKernelImageForDevice` error that occurs with RTX 5060 Ti and other Blackwell-generation GPUs when using official PyPI distributions. |
|
|
|
|
|
## Build Specifications |
|
|
|
|
|
### Environment |
|
|
- **OS**: Windows 10/11 x64 |
|
|
- **CUDA Toolkit**: 13.0 |
|
|
- **cuDNN**: 9.13 (CUDA 13.0 compatible) |
|
|
- **Visual Studio**: 2022 (v17.x) with Desktop development with C++ |
|
|
- **Python**: 3.13 |
|
|
- **CMake**: 3.26+ |
|
|
|
|
|
### Supported GPU Architectures |
|
|
- **sm_89**: Ada Lovelace (RTX 4060, 4070, etc.) |
|
|
- **sm_90**: Ada Lovelace High-end (RTX 4090) / Hopper (H100) |
|
|
- **sm_120**: Blackwell (RTX 5060 Ti, 5080, 5090) |
|
|
|
|
|
### Build Configuration |
|
|
|
|
|
```cmake |
|
|
CMAKE_CUDA_ARCHITECTURES=89;90;120 |
|
|
onnxruntime_USE_FLASH_ATTENTION=OFF |
|
|
CUDA_VERSION=13.0 |
|
|
``` |
|
|
|
|
|
**Note**: Flash Attention is disabled because ONNX Runtime 1.24.0's Flash Attention kernels are sm_80-specific and incompatible with sm_90/sm_120 architectures. |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install onnxruntime_gpu-1.24.0-cp313-cp313-win_amd64.whl |
|
|
``` |
|
|
|
|
|
### Verify Installation |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
print(f"Version: {ort.__version__}") |
|
|
print(f"Providers: {ort.get_available_providers()}") |
|
|
# Expected output: ['CUDAExecutionProvider', 'CPUExecutionProvider'] |
|
|
``` |
|
|
|
|
|
## Key Features |
|
|
|
|
|
✅ **Blackwell GPU Support**: Full compatibility with RTX 5060 Ti, 5080, 5090 |
|
|
✅ **CUDA 13.0 Optimized**: Built with latest CUDA toolkit for optimal performance |
|
|
✅ **Multi-Architecture**: Single build supports Ada Lovelace and Blackwell |
|
|
✅ **Stable for Inference**: Tested with WD14Tagger, Stable Diffusion pipelines |
|
|
|
|
|
## Known Limitations |
|
|
|
|
|
⚠️ **Flash Attention Disabled**: Due to sm_80-only kernel implementation in ONNX Runtime 1.24.0, Flash Attention is not available. This has minimal impact on most inference workloads (e.g., WD14Tagger, image generation models). |
|
|
|
|
|
⚠️ **Windows Only**: This build is specifically for Windows x64. Linux users should build from source with similar configurations. |
|
|
|
|
|
## Performance |
|
|
|
|
|
Compared to CPU-only execution: |
|
|
- **Image tagging (WD14Tagger)**: 10-50x faster |
|
|
- **Inference latency**: Significant reduction on GPU-accelerated operations |
|
|
- **Memory**: Efficiently utilizes 16GB VRAM on RTX 5060 Ti |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- **ComfyUI**: WD14Tagger nodes |
|
|
- **Stable Diffusion Forge**: ONNX-based models |
|
|
- **General ONNX Model Inference**: Any ONNX model requiring CUDA acceleration |
|
|
|
|
|
## Technical Background |
|
|
|
|
|
### Why This Build is Necessary |
|
|
|
|
|
Official ONNX Runtime GPU distributions (PyPI) are typically built for older CUDA versions (11.x/12.x) and do not include sm_120 (Blackwell) architecture support. When running inference on Blackwell GPUs with official builds, users encounter: |
|
|
|
|
|
``` |
|
|
cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device |
|
|
``` |
|
|
|
|
|
This custom build resolves the issue by: |
|
|
1. Compiling with CUDA 13.0 |
|
|
2. Explicitly targeting sm_89, sm_90, sm_120 |
|
|
3. Disabling incompatible Flash Attention kernels |
|
|
|
|
|
### Flash Attention Status |
|
|
|
|
|
ONNX Runtime's Flash Attention implementation currently only supports: |
|
|
- **sm_80**: Ampere (A100, RTX 3090) |
|
|
- Kernels are hardcoded with `*_sm80.cu` file naming |
|
|
|
|
|
Future ONNX Runtime versions may add sm_90/sm_120 support, but as of 1.24.0, this remains unavailable. |
|
|
|
|
|
## Build Script |
|
|
|
|
|
For those who want to replicate this build: |
|
|
|
|
|
```batch |
|
|
build.bat ^ |
|
|
--config Release ^ |
|
|
--build_shared_lib ^ |
|
|
--parallel ^ |
|
|
--use_cuda ^ |
|
|
--cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" ^ |
|
|
--cudnn_home "C:\Program Files\NVIDIA\CUDNN\v9.13" ^ |
|
|
--cuda_version=13.0 ^ |
|
|
--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="89;90;120" ^ |
|
|
CUDNN_INCLUDE_DIR="C:\Program Files\NVIDIA\CUDNN\v9.13\include\13.0" ^ |
|
|
CUDNN_LIBRARY="C:\Program Files\NVIDIA\CUDNN\v9.13\lib\13.0\x64\cudnn.lib" ^ |
|
|
onnxruntime_USE_FLASH_ATTENTION=OFF ^ |
|
|
--build_wheel ^ |
|
|
--skip_tests |
|
|
``` |
|
|
|
|
|
## Credits |
|
|
|
|
|
Built by [@ussoewwin](https://huggingface.co/ussoewwin) for the community facing Blackwell GPU compatibility issues with ONNX Runtime. |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as ONNX Runtime) |
|
|
|