ussoewwin commited on
Commit
4b344d2
·
verified ·
1 Parent(s): 591476f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -3
README.md CHANGED
@@ -1,3 +1,136 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # ONNX Runtime GPU 1.24.0 - CUDA 13.0 Build with Blackwell Support
5
+
6
+ ## Overview
7
+
8
+ Custom-built **ONNX Runtime GPU 1.24.0** for Windows with full **CUDA 13.0** and **Blackwell architecture (sm_120)** support. This build addresses the `cudaErrorNoKernelImageForDevice` error that occurs with RTX 5060 Ti and other Blackwell-generation GPUs when using official PyPI distributions.
9
+
10
+ ## Build Specifications
11
+
12
+ ### Environment
13
+ - **OS**: Windows 10/11 x64
14
+ - **CUDA Toolkit**: 13.0
15
+ - **cuDNN**: 9.13 (CUDA 13.0 compatible)
16
+ - **Visual Studio**: 2022 (v17.x) with Desktop development with C++
17
+ - **Python**: 3.13
18
+ - **CMake**: 3.26+
19
+
20
+ ### Supported GPU Architectures
21
+ - **sm_89**: Ada Lovelace (RTX 4060, 4070, etc.)
22
+ - **sm_90**: Ada Lovelace High-end (RTX 4090) / Hopper (H100)
23
+ - **sm_120**: Blackwell (RTX 5060 Ti, 5080, 5090)
24
+
25
+ ### Build Configuration
26
+
27
+ ```cmake
28
+ CMAKE_CUDA_ARCHITECTURES=89;90;120
29
+ onnxruntime_USE_FLASH_ATTENTION=OFF
30
+ CUDA_VERSION=13.0
31
+ ```
32
+
33
+ **Note**: Flash Attention is disabled because ONNX Runtime 1.24.0's Flash Attention kernels are sm_80-specific and incompatible with sm_90/sm_120 architectures.
34
+
35
+ ## Installation
36
+
37
+ ```bash
38
+ pip install onnxruntime_gpu-1.24.0-cp313-cp313-win_amd64.whl
39
+ ```
40
+
41
+ ### Verify Installation
42
+
43
+ ```python
44
+ import onnxruntime as ort
45
+ print(f"Version: {ort.__version__}")
46
+ print(f"Providers: {ort.get_available_providers()}")
47
+ # Expected output: ['CUDAExecutionProvider', 'CPUExecutionProvider']
48
+ ```
49
+
50
+ ## Key Features
51
+
52
+ ✅ **Blackwell GPU Support**: Full compatibility with RTX 5060 Ti, 5080, 5090
53
+ ✅ **CUDA 13.0 Optimized**: Built with latest CUDA toolkit for optimal performance
54
+ ✅ **Multi-Architecture**: Single build supports Ada Lovelace and Blackwell
55
+ ✅ **Stable for Inference**: Tested with WD14Tagger, Stable Diffusion pipelines
56
+
57
+ ## Known Limitations
58
+
59
+ ⚠️ **Flash Attention Disabled**: Due to sm_80-only kernel implementation in ONNX Runtime 1.24.0, Flash Attention is not available. This has minimal impact on most inference workloads (e.g., WD14Tagger, image generation models).
60
+
61
+ ⚠️ **Windows Only**: This build is specifically for Windows x64. Linux users should build from source with similar configurations.
62
+
63
+ ## Performance
64
+
65
+ Compared to CPU-only execution:
66
+ - **Image tagging (WD14Tagger)**: 10-50x faster
67
+ - **Inference latency**: Significant reduction on GPU-accelerated operations
68
+ - **Memory**: Efficiently utilizes 16GB VRAM on RTX 5060 Ti
69
+
70
+ ## Use Cases
71
+
72
+ - **ComfyUI**: WD14Tagger nodes
73
+ - **Stable Diffusion Forge**: ONNX-based models
74
+ - **General ONNX Model Inference**: Any ONNX model requiring CUDA acceleration
75
+
76
+ ## Technical Background
77
+
78
+ ### Why This Build is Necessary
79
+
80
+ Official ONNX Runtime GPU distributions (PyPI) are typically built for older CUDA versions (11.x/12.x) and do not include sm_120 (Blackwell) architecture support. When running inference on Blackwell GPUs with official builds, users encounter:
81
+
82
+ ```
83
+ cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device
84
+ ```
85
+
86
+ This custom build resolves the issue by:
87
+ 1. Compiling with CUDA 13.0
88
+ 2. Explicitly targeting sm_89, sm_90, sm_120
89
+ 3. Disabling incompatible Flash Attention kernels
90
+
91
+ ### Flash Attention Status
92
+
93
+ ONNX Runtime's Flash Attention implementation currently only supports:
94
+ - **sm_80**: Ampere (A100, RTX 3090)
95
+ - Kernels are hardcoded with `*_sm80.cu` file naming
96
+
97
+ Future ONNX Runtime versions may add sm_90/sm_120 support, but as of 1.24.0, this remains unavailable.
98
+
99
+ ## Build Script
100
+
101
+ For those who want to replicate this build:
102
+
103
+ ```batch
104
+ build.bat ^
105
+ --config Release ^
106
+ --build_shared_lib ^
107
+ --parallel ^
108
+ --use_cuda ^
109
+ --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" ^
110
+ --cudnn_home "C:\Program Files\NVIDIA\CUDNN\v9.13" ^
111
+ --cuda_version=13.0 ^
112
+ --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="89;90;120" ^
113
+ CUDNN_INCLUDE_DIR="C:\Program Files\NVIDIA\CUDNN\v9.13\include\13.0" ^
114
+ CUDNN_LIBRARY="C:\Program Files\NVIDIA\CUDNN\v9.13\lib\13.0\x64\cudnn.lib" ^
115
+ onnxruntime_USE_FLASH_ATTENTION=OFF ^
116
+ --build_wheel ^
117
+ --skip_tests
118
+ ```
119
+
120
+ ## Credits
121
+
122
+ Built by [@ussoewwin](https://huggingface.co/ussoewwin) for the community facing Blackwell GPU compatibility issues with ONNX Runtime.
123
+
124
+ ## License
125
+
126
+ Apache 2.0 (same as ONNX Runtime)
127
+
128
+ ## Related Projects
129
+
130
+ - [Flash-Attention-2 for Windows](https://huggingface.co/ussoewwin/Flash-Attention-2_for_Windows)
131
+ - [MediaPipe 0.10.21 Python 3.13](https://huggingface.co/ussoewwin/mediapipe-0.10.21-Python3.13)
132
+ - [Nunchaku 1.0.1 torch2.9 cp313](https://huggingface.co/ussoewwin/nunchaku-1.0.1-torch2.9-cp313-cp313-win_amd64)
133
+
134
+ ---
135
+
136
+ **For issues or questions**: Open an issue on the [community discussion](https://huggingface.co/ussoewwin/onnxruntime-gpu-1.24.0/discussions)