Update README.md

cb96289 verified 4 months ago

4.32 kB

	---
	license: apache-2.0
	---
	# ONNX Runtime GPU 1.24.0 - CUDA 13.0 Build with Blackwell Support

	## Overview

	Custom-built ONNX Runtime GPU 1.24.0 for Windows with full CUDA 13.0 and Blackwell architecture (sm_120) support. This build addresses the `cudaErrorNoKernelImageForDevice` error that occurs with RTX 5060 Ti and other Blackwell-generation GPUs when using official PyPI distributions.

	## Build Specifications

	### Environment
	- OS: Windows 10/11 x64
	- CUDA Toolkit: 13.0
	- cuDNN: 9.13 (CUDA 13.0 compatible)
	- Visual Studio: 2022 (v17.x) with Desktop development with C++
	- Python: 3.13
	- CMake: 3.26+

	### Supported GPU Architectures
	- sm_89: Ada Lovelace (RTX 4060, 4070, etc.)
	- sm_90: Ada Lovelace High-end (RTX 4090) / Hopper (H100)
	- sm_120: Blackwell (RTX 5060 Ti, 5080, 5090)

	### Build Configuration

	```cmake
	CMAKE_CUDA_ARCHITECTURES=89;90;120
	onnxruntime_USE_FLASH_ATTENTION=OFF
	CUDA_VERSION=13.0
	```

	Note: Flash Attention is disabled because ONNX Runtime 1.24.0's Flash Attention kernels are sm_80-specific and incompatible with sm_90/sm_120 architectures.

	## Installation

	```bash
	pip install onnxruntime_gpu-1.24.0-cp313-cp313-win_amd64.whl
	```

	### Verify Installation

	```python
	import onnxruntime as ort
	print(f"Version: {ort.__version__}")
	print(f"Providers: {ort.get_available_providers()}")
	# Expected output: ['CUDAExecutionProvider', 'CPUExecutionProvider']
	```

	## Key Features

	✅ Blackwell GPU Support: Full compatibility with RTX 5060 Ti, 5080, 5090
	✅ CUDA 13.0 Optimized: Built with latest CUDA toolkit for optimal performance
	✅ Multi-Architecture: Single build supports Ada Lovelace and Blackwell
	✅ Stable for Inference: Tested with WD14Tagger, Stable Diffusion pipelines

	## Known Limitations

	⚠️ Flash Attention Disabled: Due to sm_80-only kernel implementation in ONNX Runtime 1.24.0, Flash Attention is not available. This has minimal impact on most inference workloads (e.g., WD14Tagger, image generation models).

	⚠️ Windows Only: This build is specifically for Windows x64. Linux users should build from source with similar configurations.

	## Performance

	Compared to CPU-only execution:
	- Image tagging (WD14Tagger): 10-50x faster
	- Inference latency: Significant reduction on GPU-accelerated operations
	- Memory: Efficiently utilizes 16GB VRAM on RTX 5060 Ti

	## Use Cases

	- ComfyUI: WD14Tagger nodes
	- Stable Diffusion Forge: ONNX-based models
	- General ONNX Model Inference: Any ONNX model requiring CUDA acceleration

	## Technical Background

	### Why This Build is Necessary

	Official ONNX Runtime GPU distributions (PyPI) are typically built for older CUDA versions (11.x/12.x) and do not include sm_120 (Blackwell) architecture support. When running inference on Blackwell GPUs with official builds, users encounter:

	```
	cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device
	```

	This custom build resolves the issue by:
	1. Compiling with CUDA 13.0
	2. Explicitly targeting sm_89, sm_90, sm_120
	3. Disabling incompatible Flash Attention kernels

	### Flash Attention Status

	ONNX Runtime's Flash Attention implementation currently only supports:
	- sm_80: Ampere (A100, RTX 3090)
	- Kernels are hardcoded with `*_sm80.cu` file naming

	Future ONNX Runtime versions may add sm_90/sm_120 support, but as of 1.24.0, this remains unavailable.

	## Build Script

	For those who want to replicate this build:

	```batch
	build.bat ^
	--config Release ^
	--build_shared_lib ^
	--parallel ^
	--use_cuda ^
	--cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" ^
	--cudnn_home "C:\Program Files\NVIDIA\CUDNN\v9.13" ^
	--cuda_version=13.0 ^
	--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="89;90;120" ^
	CUDNN_INCLUDE_DIR="C:\Program Files\NVIDIA\CUDNN\v9.13\include\13.0" ^
	CUDNN_LIBRARY="C:\Program Files\NVIDIA\CUDNN\v9.13\lib\13.0\x64\cudnn.lib" ^
	onnxruntime_USE_FLASH_ATTENTION=OFF ^
	--build_wheel ^
	--skip_tests
	```

	## Credits

	Built by [@ussoewwin](https://huggingface.co/ussoewwin) for the community facing Blackwell GPU compatibility issues with ONNX Runtime.

	## License

	Apache 2.0 (same as ONNX Runtime)