ussoewwin
/

onnxruntime-gpu-1.24.0

Model card Files Files and versions

xet

Community

ussoewwin commited on Oct 24, 2025

Commit

4b344d2

verified ·

1 Parent(s): 591476f

Update README.md

Browse files

Files changed (1) hide show

README.md +136 -3

README.md CHANGED Viewed

@@ -1,3 +1,136 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# ONNX Runtime GPU 1.24.0 - CUDA 13.0 Build with Blackwell Support
+## Overview
+Custom-built **ONNX Runtime GPU 1.24.0** for Windows with full **CUDA 13.0** and **Blackwell architecture (sm_120)** support. This build addresses the `cudaErrorNoKernelImageForDevice` error that occurs with RTX 5060 Ti and other Blackwell-generation GPUs when using official PyPI distributions.
+## Build Specifications
+### Environment
+- **OS**: Windows 10/11 x64
+- **CUDA Toolkit**: 13.0
+- **cuDNN**: 9.13 (CUDA 13.0 compatible)
+- **Visual Studio**: 2022 (v17.x) with Desktop development with C++
+- **Python**: 3.13
+- **CMake**: 3.26+
+### Supported GPU Architectures
+- **sm_89**: Ada Lovelace (RTX 4060, 4070, etc.)
+- **sm_90**: Ada Lovelace High-end (RTX 4090) / Hopper (H100)
+- **sm_120**: Blackwell (RTX 5060 Ti, 5080, 5090)
+### Build Configuration
+```cmake
+CMAKE_CUDA_ARCHITECTURES=89;90;120
+onnxruntime_USE_FLASH_ATTENTION=OFF
+CUDA_VERSION=13.0
+```
+**Note**: Flash Attention is disabled because ONNX Runtime 1.24.0's Flash Attention kernels are sm_80-specific and incompatible with sm_90/sm_120 architectures.
+## Installation
+```bash
+pip install onnxruntime_gpu-1.24.0-cp313-cp313-win_amd64.whl
+```
+### Verify Installation
+```python
+import onnxruntime as ort
+print(f"Version: {ort.__version__}")
+print(f"Providers: {ort.get_available_providers()}")
+# Expected output: ['CUDAExecutionProvider', 'CPUExecutionProvider']
+```
+## Key Features
+✅ **Blackwell GPU Support**: Full compatibility with RTX 5060 Ti, 5080, 5090
+✅ **CUDA 13.0 Optimized**: Built with latest CUDA toolkit for optimal performance
+✅ **Multi-Architecture**: Single build supports Ada Lovelace and Blackwell
+✅ **Stable for Inference**: Tested with WD14Tagger, Stable Diffusion pipelines
+## Known Limitations
+⚠️ **Flash Attention Disabled**: Due to sm_80-only kernel implementation in ONNX Runtime 1.24.0, Flash Attention is not available. This has minimal impact on most inference workloads (e.g., WD14Tagger, image generation models).
+⚠️ **Windows Only**: This build is specifically for Windows x64. Linux users should build from source with similar configurations.
+## Performance
+Compared to CPU-only execution:
+- **Image tagging (WD14Tagger)**: 10-50x faster
+- **Inference latency**: Significant reduction on GPU-accelerated operations
+- **Memory**: Efficiently utilizes 16GB VRAM on RTX 5060 Ti
+## Use Cases
+- **ComfyUI**: WD14Tagger nodes
+- **Stable Diffusion Forge**: ONNX-based models
+- **General ONNX Model Inference**: Any ONNX model requiring CUDA acceleration
+## Technical Background
+### Why This Build is Necessary
+Official ONNX Runtime GPU distributions (PyPI) are typically built for older CUDA versions (11.x/12.x) and do not include sm_120 (Blackwell) architecture support. When running inference on Blackwell GPUs with official builds, users encounter:
+```
+cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device
+```
+This custom build resolves the issue by:
+1. Compiling with CUDA 13.0
+2. Explicitly targeting sm_89, sm_90, sm_120
+3. Disabling incompatible Flash Attention kernels
+### Flash Attention Status
+ONNX Runtime's Flash Attention implementation currently only supports:
+- **sm_80**: Ampere (A100, RTX 3090)
+- Kernels are hardcoded with `*_sm80.cu` file naming
+Future ONNX Runtime versions may add sm_90/sm_120 support, but as of 1.24.0, this remains unavailable.
+## Build Script
+For those who want to replicate this build:
+```batch
+build.bat ^
+  --config Release ^
+  --build_shared_lib ^
+  --parallel ^
+  --use_cuda ^
+  --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" ^
+  --cudnn_home "C:\Program Files\NVIDIA\CUDNN\v9.13" ^
+  --cuda_version=13.0 ^
+  --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="89;90;120" ^
+                         CUDNN_INCLUDE_DIR="C:\Program Files\NVIDIA\CUDNN\v9.13\include\13.0" ^
+                         CUDNN_LIBRARY="C:\Program Files\NVIDIA\CUDNN\v9.13\lib\13.0\x64\cudnn.lib" ^
+                         onnxruntime_USE_FLASH_ATTENTION=OFF ^
+  --build_wheel ^
+  --skip_tests
+```
+## Credits
+Built by [@ussoewwin](https://huggingface.co/ussoewwin) for the community facing Blackwell GPU compatibility issues with ONNX Runtime.
+## License
+Apache 2.0 (same as ONNX Runtime)
+## Related Projects
+- [Flash-Attention-2 for Windows](https://huggingface.co/ussoewwin/Flash-Attention-2_for_Windows)
+- [MediaPipe 0.10.21 Python 3.13](https://huggingface.co/ussoewwin/mediapipe-0.10.21-Python3.13)
+- [Nunchaku 1.0.1 torch2.9 cp313](https://huggingface.co/ussoewwin/nunchaku-1.0.1-torch2.9-cp313-cp313-win_amd64)
+---
+**For issues or questions**: Open an issue on the [community discussion](https://huggingface.co/ussoewwin/onnxruntime-gpu-1.24.0/discussions)