AMD support

#14

by 12letter - opened Mar 7, 2025

Mar 7, 2025

It's asking for nvcc when installing from this repo.
ComfyUI produces only black screen for me.
Has anyone managed to run this on an AMD card?

HappyLion2

Mar 13, 2025

•

edited Mar 13, 2025

Manually add --no-deps to the install command: pip install -r requirements.txt --no-deps and make sure you have a ROCm-specific torch install. I have it running on my RX 7900 XTX but the 25 steps is running for ~24 mins at 832*480 with a non-trivial VAE decode too. I need to get sage-attn installed again, as I did when I played with Hunyuan.

12letter

Mar 14, 2025

When you talk about "non-trivial VAE decode" you mean ComfyUI?
--no-deps option didn't help unfortunately:
Wan2.1# pip3.12 install -r requirements.txt --no-deps --break-system-packages
Requirement already satisfied: torch>=2.4.0 in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 1)) (2.7.0.dev20250119+rocm6.3)
Requirement already satisfied: torchvision>=0.19.0 in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 2)) (0.22.0.dev20250119+rocm6.3)
Requirement already satisfied: opencv-python>=4.9.0.80 in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 3)) (4.11.0.86)
Requirement already satisfied: diffusers>=0.31.0 in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 4)) (0.32.2)
Collecting transformers>=4.49.0 (from -r requirements.txt (line 5))
Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.0/44.0 kB 321.4 kB/s eta 0:00:00
Requirement already satisfied: tokenizers>=0.20.3 in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 6)) (0.21.0)
Requirement already satisfied: accelerate>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 7)) (1.3.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 8)) (4.67.1)
Collecting imageio (from -r requirements.txt (line 9))
Downloading imageio-2.37.0-py3-none-any.whl.metadata (5.2 kB)
Collecting easydict (from -r requirements.txt (line 10))
Downloading easydict-1.13-py3-none-any.whl.metadata (4.2 kB)
Collecting ftfy (from -r requirements.txt (line 11))
Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Collecting dashscope (from -r requirements.txt (line 12))
Downloading dashscope-1.22.2-py3-none-any.whl.metadata (6.8 kB)
Requirement already satisfied: imageio-ffmpeg in /usr/local/lib/python3.12/dist-packages (from -r requirements.txt (line 13)) (0.6.0)
Collecting flash_attn (from -r requirements.txt (line 14))
Downloading flash_attn-2.7.4.post1.tar.gz (6.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 1.1 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [21 lines of output]
/tmp/pip-install-ly5kyotc/flash-attn_a10f07a52fe1407e8b9f7bc672ef64d3/setup.py:106: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-ly5kyotc/flash-attn_a10f07a52fe1407e8b9f7bc672ef64d3/setup.py", line 198, in
CUDAExtension(
File "/usr/local/lib/python3.12/dist-packages/torch/utils/cpp_extension.py", line 1139, in CUDAExtension
library_dirs += library_paths(device_type="cuda")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/cpp_extension.py", line 1274, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/cpp_extension.py", line 2535, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.7.0.dev20250119+rocm6.3
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

HappyLion2

Mar 14, 2025

When you talk about "non-trivial VAE decode" you mean ComfyUI?

Yep.

As for your error - my apologies, I actually installed and ran the Kijai workflows which provide a different requirements.txt which does work on my AMD GPU: https://github.com/kijai/ComfyUI-WanVideoWrapper

Also, if it helps, I am using Python 3.10 and Ubuntu 24.04, with the latest version of ComfyUI.

HappyLion2

Mar 17, 2025

Incase it is useful to yourself or others: Fixed my VAE issues by not pre-empting OOMs and changing the tile size to 256x256 - a 720*480 video will decode in 16s now.

With TeaCache, SLG, and torch.compile(), I'm able to generate a 81 frame video from an input image in just under 20 minutes, with 30 steps, with the 480P 14B FP8_e5m2 model. This is using PyTorch's native Flash attention (via SDP) on PyTorch 2.6+rocm6.2.4.

I attempted to use flash attention 2 also, but it increased the generation time during the sampling by ~50%.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment