Spaces:
Runtime error
Runtime error
| # CUDA | |
| Install latest version of **CUDA** that matches major version of your **PyTorch** | |
| For example, CUDA 11.8 can be used with PyTorch compiled for CUDA 11.7, but CUDA 12.0 *cannot* | |
| - <https://developer.nvidia.com/cuda-downloads> | |
| Install latest version of **cuDNN** compatible with chosen CUDA version | |
| - <https://developer.nvidia.com/rdp/cudnn-download> | |
| Currently best options are **CUDA 11.8** with **cuDNN 8.7** | |
| Note that **CUDA 12** is not yet supported by PyTorch | |
| ## PyTorch | |
| *Note*: Uninstall `torch` and `triton` before attempting any new installs | |
| > pip uninstall torch torchvision torchaudio triton -y | |
| ### Stable | |
| **PyTorch 2.0.0** compiled with **CUDA 11.8**: | |
| > pip install torch torchaudio torchvision triton --force --extra-index-url https://download.pytorch.org/whl/cu118 | |
| > pip show torch | |
| > 2.0.0 | |
| ### Nightly | |
| **PyTorch 2.1-nightly** compiled with **CUDA 12.1**: | |
| > pip install --pre torch triton torchvision torchaudio --force --extra-index-url https://download.pytorch.org/whl/nightly/cu121 | |
| > pip show torch | |
| > 2.1.0.dev20230305+cu118 | |
| ### From source | |
| Read <https://github.com/pytorch/pytorch#from-source> | |
| Note: **PyTorch** heavily relies on **Anaconda** for its build process | |
| ### Monkey-patching | |
| Torch comes with its own version of `cuDNN` which is great for simplicity, | |
| but not so great if your performance is 50% of what's expected | |
| First make sure that your `cuDNN` is installed correctly and in `ldconfig` can find it | |
| Then, remove `cuDNN` from `torch` package: | |
| > rm ~/.local/lib/python3.10/site-packages/torch/lib/libcudnn* | |
| Now check if correct `cuDNN` libraries are found | |
| > sudo ldconfig | |
| > ldconfig -p | grep cudnn | |
| And if not, modify `LD_LIBRARY_PATH` to include `cuDNN` libraries and repeat `ldconfig` command | |
| > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 | |
| ## SDP cross-attention optimization | |
| Recommended if you are using **PyTorch 2.0** | |
| ## Xformers cross-attention optimization | |
| `xformers` is a library of optimized attention kernels for PyTorch | |
| Highly recommended for significant performance boost when using `Pytorch` **1.x** | |
| Not required when using `Pytorch` **2.0** | |
| ### xFormers Stable | |
| When using release version of **PyTorch 1.13.1**, simply install `xformers` from `PyPI`: | |
| > pip install -U xformers | |
| ### xFormers From Source | |
| Otherwise, build process takes a bit longer... | |
| Set your environment so `xformers` can be optimized for *your* GPU | |
| > python -c 'import torch; print(torch.cuda.get_device_capability())' | |
| > (8, 6) | |
| > export TORCH_CUDA_ARCH_LIST="8.6" | |
| Rebuild `xformers` | |
| > sudo apt install pybind11-dev | |
| > pip install ninja setuptools pybind11 | |
| > pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers | |
| This will compile `xformers` for your system which is preferred over using pre-built wheel | |
| Check functionality using: | |
| > python -m xformers.info | |
| Make sure that all fields marked with `memory_efficient` are set to `available` | |
| ## Triton | |
| ### Triton Stable | |
| There are separate `torchtriton` and `triton` packages as well as different sources for `triton` | |
| To avoid confusion, uninstall any existing `triton` packages before installing `torch` and install `triton` in the same install command as `torch` | |
| ### Triton From Source | |
| Default version of `triton` package is good-enough for a fully functional system | |
| unless you want to further experiment with torch `dynamo` just-in-time compiler, | |
| in which case you may need to build & install <https://github.com/openai/triton> package from source | |
| ## Accelerate | |
| Recommended to run in **FP16** mode with **Dynamo** accelerators | |
| But...**Dynamo** is only supported with **Torch 2.0**! | |
| Otherwise, run without **Dynamo** | |
| > pip install accelerate | |
| > accelerate config | |
| In which compute environment are you running? This machine | |
| Which type of machine are you using? No distributed training | |
| Do you want to run your training on CPU only (even if a GPU is available)? [yes/NO]: no | |
| Do you wish to optimize your script with torch dynamo?[yes/NO]: yes | |
| Which dynamo backend would you like to use? inductor <- only if using torch 2.0+, otherwise no | |
| Do you want to use DeepSpeed? [yes/NO]: no | |
| What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]: all | |
| Do you wish to use FP16 or BF16 (mixed precision)? fp16 | |
| > accelerate test | |
| ## Python | |
| PyTorch is **NOT** compatible with Python 3.11, use 3.10 instead | |
| Just install as usual, but also possible to build from sources | |
| ### Build | |
| You can install `python` itself from sources | |
| Download from <https://www.python.org/downloads/source/> | |
| Configure: | |
| > export CFLAGS="-march=native -O3 -pipe -Wno-unused-value -Wno-empty-body -DNDEBUG" | |
| > ./configure --prefix /usr --enable-optimizations --with-lto --enable-loadable-sqlite-extensions | |
| > time make -j32 | |
| Check: | |
| > ./python --version | |
| > ./python -c 'import sysconfig; print(sysconfig.get_config_var("PY_CFLAGS"))' | |
| Do side-by-side install: | |
| > sudo make altinstall | |
| > sudo update-alternatives --install /bin/python3 python3 /bin/python3.11 100 | |
| > sudo update-alternatives --list python3 | |
| Switch to new `python`: | |
| > sudo update-alternatives --config python3 | |
| > python -m pip install --upgrade pip | |
| > python -m pip uninstall torch torchaudio triton pytorch_triton -y | |
| > python -m pip install --pre torch triton torchaudio torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu118 --force | |
| > python -c 'import torch; print(torch.__path__, torch.__version__)' | |
| ## nVidia CUDA | |
| ### Windows WSL2 | |
| Requirements: | |
| - Latest versions of Windows: not included in RTM | |
| Note: Insider builds are no longer required as CUDA support is present in Beta builds | |
| - Updated WSL kernel: `wsl --update`, minimum **4.19.121** recommended **5.15.74** | |
| - Updated nVidia drivers: minimum **460** recommended **510** | |
| Links: | |
| - [nVidia install docs](https://docs.nvidia.com/cuda/wsl-user-guide/index.html) | |
| - [Ubuntu install docs](https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2) | |
| - [CUDA download](https://developer.nvidia.com/cuda-downloads) | |
| ### Install | |
| Install both `CUDA` and `cuDNN` | |
| - Note: Do not install drivers if running in VM, let host drivers be as-is | |
| Driver can be higher than runtime, but not opposite | |
| - Example: driver 510 supports Cuda 12 and is compatible with Cuda 11.6) | |
| Install using either: | |
| - Add nVidia repository and install using `apt` | |
| - Download installer and install manually | |
| ### Check | |
| Is CUDA detected and versions: | |
| > apt list cuda* | |
| List is long, but minimum packages are: | |
| cuda/now 11.6.1-1 | |
| cuda-11-6/now 11.6.1-1 | |
| cuda-cccl-11-6/now 11.6.55-1 | |
| cuda-command-line-tools-11-6/now 11.6.1-1 | |
| cuda-compiler-11-6/now 11.6.1-1 | |
| cuda-cudart-11-6/now 11.6.55-1 | |
| cuda-cupti-11-6/now 11.6.112-1 | |
| cuda-libraries-11-6/now 11.6.1-1 | |
| cuda-nvcc-11-6/now 11.6.112-1 | |
| cuda-runtime-11-6/now 11.6.1-1 | |
| cuda-toolkit-11-6/now 11.6.1-1 | |
| cuda-tools-11-6/now 11.6.1-1 | |
| > apt list libcudnn* | |
| libcudnn8/now 8.3.2.44-1+cuda11.5 | |
| > nvidia-smi | |
| NVIDIA-SMI 510.85.02 Driver Version: 526.98 CUDA Version: 12.0 | |
| > head /usr/local/cuda/version.json | |
| "cuda" : { | |
| "name" : "CUDA SDK", | |
| "version" : "11.6.1" | |
| }, | |
| ### NVCC | |
| Test: | |
| > git clone https://github.com/NVIDIA/cuda-samples | |
| Edit `Makefile` as needed to specify compute level and run `make` | |
| > Samples/1_Utilities/deviceQuery | |
| Device 0: "NVIDIA GeForce RTX 3060" | |
| CUDA Driver Version / Runtime Version 12.0 / 11.6 | |
| CUDA Capability Major/Minor version number: 8.6 | |
| Total amount of global memory: 12288 MBytes (12884377600 bytes) | |
| (028) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores | |
| GPU Max Clock rate: 1777 MHz (1.78 GHz) | |
| Memory Clock rate: 7501 Mhz | |
| Memory Bus Width: 192-bit | |
| ... | |
| ## Stable Diffusion | |
| Stable-Diffusion requires `CUDA` level **SM86** so version older than 11 are insufficient | |
| ## TensorFlow | |
| Install: | |
| > pip3 install tensorflow | |
| Tensorflow dynamically links to CUDA libraries, so as long as major version matches, it should work (e.g. Tensorflow 2.10 uses CUDA 11.x). | |
| But mixing different major versions between Tensorflow and CUDA does not work | |
| Check: | |
| > wget https://raw.githubusercontent.com/vladmandic/tfjs-utils/main/src/tfinfo.py | |
| > python src/tfinfo.py | |
| sysconfig: [ | |
| ('cpu_compiler', '/dt9/usr/bin/gcc'), | |
| ('cuda_compute_capabilities', ['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), | |
| ('cuda_version', '11.2'), | |
| ('cudnn_version', '8'), | |
| ('is_cuda_build', True), | |
| ('is_rocm_build', False), | |
| ('is_tensorrt_build', True) | |
| ] | |
| gpu device: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU') { | |
| 'compute_capability': (8, 6), | |
| 'device_name': 'NVIDIA GeForce RTX 3060' | |
| } | |
| logical device: LogicalDevice(name='/device:GPU:0', device_type='GPU') | |
| ## PyTorch | |
| Install **PyTorch** linked to *exact* major/minor version of **CUDA**: | |
| > pip3 uninstall torch torchvision torchaudio | |
| > pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 | |
| Note that `cu116` at the end refers to `CUDA` **11.6** which should match `CUDA` installation on your system | |
| Check: | |
| > wget https://raw.githubusercontent.com/vladmandic/tfjs-utils/main/src/torchinfo.py | |
| > python torchinfo.py | |
| torch version: 1.12.1+cu116 | |
| cuda available: True | |
| cuda version: 11.6 | |
| cuda arch list: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86'] | |
| device: NVIDIA GeForce RTX 3060 | |
| ## XFormers | |
| Download | |
| > git clone https://github.com/facebookresearch/xformers.git | |
| > cd xformers | |
| > git submodule update --init --recursive | |
| Compile | |
| > export FORCE_CUDA="1" | |
| > export TORCH_CUDA_ARCH_LIST=8.6 | |
| > pip install ninja pyre-extensions einops | |
| > python setup.py build develop | |
| > python setup.py bdist_wheel | |
| Install | |
| > pip install dist/* | |
| > python -m xformers.info | |