--- license: apache-2.0 --- # ONNX Runtime GPU 1.24.0 - CUDA 13.0 Build with Blackwell Support ## Overview Custom-built **ONNX Runtime GPU 1.24.0** for Windows with full **CUDA 13.0** and **Blackwell architecture (sm_120)** support. This build addresses the `cudaErrorNoKernelImageForDevice` error that occurs with RTX 5060 Ti and other Blackwell-generation GPUs when using official PyPI distributions. ## Build Specifications ### Environment - **OS**: Windows 10/11 x64 - **CUDA Toolkit**: 13.0 - **cuDNN**: 9.13 (CUDA 13.0 compatible) - **Visual Studio**: 2022 (v17.x) with Desktop development with C++ - **Python**: 3.13 - **CMake**: 3.26+ ### Supported GPU Architectures - **sm_89**: Ada Lovelace (RTX 4060, 4070, etc.) - **sm_90**: Ada Lovelace High-end (RTX 4090) / Hopper (H100) - **sm_120**: Blackwell (RTX 5060 Ti, 5080, 5090) ### Build Configuration ```cmake CMAKE_CUDA_ARCHITECTURES=89;90;120 onnxruntime_USE_FLASH_ATTENTION=OFF CUDA_VERSION=13.0 ``` **Note**: Flash Attention is disabled because ONNX Runtime 1.24.0's Flash Attention kernels are sm_80-specific and incompatible with sm_90/sm_120 architectures. ## Installation ```bash pip install onnxruntime_gpu-1.24.0-cp313-cp313-win_amd64.whl ``` ### Verify Installation ```python import onnxruntime as ort print(f"Version: {ort.__version__}") print(f"Providers: {ort.get_available_providers()}") # Expected output: ['CUDAExecutionProvider', 'CPUExecutionProvider'] ``` ## Key Features ✅ **Blackwell GPU Support**: Full compatibility with RTX 5060 Ti, 5080, 5090 ✅ **CUDA 13.0 Optimized**: Built with latest CUDA toolkit for optimal performance ✅ **Multi-Architecture**: Single build supports Ada Lovelace and Blackwell ✅ **Stable for Inference**: Tested with WD14Tagger, Stable Diffusion pipelines ## Known Limitations ⚠️ **Flash Attention Disabled**: Due to sm_80-only kernel implementation in ONNX Runtime 1.24.0, Flash Attention is not available. This has minimal impact on most inference workloads (e.g., WD14Tagger, image generation models). ⚠️ **Windows Only**: This build is specifically for Windows x64. Linux users should build from source with similar configurations. ## Performance Compared to CPU-only execution: - **Image tagging (WD14Tagger)**: 10-50x faster - **Inference latency**: Significant reduction on GPU-accelerated operations - **Memory**: Efficiently utilizes 16GB VRAM on RTX 5060 Ti ## Use Cases - **ComfyUI**: WD14Tagger nodes - **Stable Diffusion Forge**: ONNX-based models - **General ONNX Model Inference**: Any ONNX model requiring CUDA acceleration ## Technical Background ### Why This Build is Necessary Official ONNX Runtime GPU distributions (PyPI) are typically built for older CUDA versions (11.x/12.x) and do not include sm_120 (Blackwell) architecture support. When running inference on Blackwell GPUs with official builds, users encounter: ``` cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device ``` This custom build resolves the issue by: 1. Compiling with CUDA 13.0 2. Explicitly targeting sm_89, sm_90, sm_120 3. Disabling incompatible Flash Attention kernels ### Flash Attention Status ONNX Runtime's Flash Attention implementation currently only supports: - **sm_80**: Ampere (A100, RTX 3090) - Kernels are hardcoded with `*_sm80.cu` file naming Future ONNX Runtime versions may add sm_90/sm_120 support, but as of 1.24.0, this remains unavailable. ## Build Script For those who want to replicate this build: ```batch build.bat ^ --config Release ^ --build_shared_lib ^ --parallel ^ --use_cuda ^ --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" ^ --cudnn_home "C:\Program Files\NVIDIA\CUDNN\v9.13" ^ --cuda_version=13.0 ^ --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="89;90;120" ^ CUDNN_INCLUDE_DIR="C:\Program Files\NVIDIA\CUDNN\v9.13\include\13.0" ^ CUDNN_LIBRARY="C:\Program Files\NVIDIA\CUDNN\v9.13\lib\13.0\x64\cudnn.lib" ^ onnxruntime_USE_FLASH_ATTENTION=OFF ^ --build_wheel ^ --skip_tests ``` ## Credits Built by [@ussoewwin](https://huggingface.co/ussoewwin) for the community facing Blackwell GPU compatibility issues with ONNX Runtime. ## License Apache 2.0 (same as ONNX Runtime)