Failing to import - missing DLL's

#1
by Matthew-o - opened

It looks like the wheel might be missing the required DLL's.

import flash_attn
Traceback (most recent call last):
File "", line 1, in
import flash_attn
File "C:\Program Files\Python313\Lib\site-packages\flash_attn_init_.py", line 3, in
from flash_attn.flash_attn_interface import (
...<7 lines>...
)
File "C:\Program Files\Python313\Lib\site-packages\flash_attn\flash_attn_interface.py", line 15, in
import flash_attn_2_cuda as flash_attn_gpu
ImportError: DLL load failed while importing flash_attn_2_cuda: The specified module could not be found.

Do you have the additional dll files required to use flash attention?

If the LLM's are to be believed we should have these:

flash_attn_2_cuda.dll
flash_attn_2_cuda.lib
cute_kernels.dll

Owner

Make sure you have the full CUDA 13.0 toolkit installed (not just runtime). The toolkit includes the necessary DLLs in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\bin.
Under cmd use: where cudart64_*.dll
You should see something like: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\cudart64_12.dll

Hmm I wonder if I didn't have my path setup correctly. I'll give it another shot.

Hi, I just ran through this again. It looks like the flash_attn_2_cuda file is missing from the installation folder:

C:\Users\xxx\AppData\Roaming\Python\Python313\site-packages\flash_attn

The flash_attn_interface.py file is attempting to import it but can't find it. Is it in your site packages folder?

And just to add, I see the flash_attn_triton.py file, flash_attn_triton_og.py, flash_blocksparse_attention.py, flash_blockspars_Attn_interface.py, bert_padding, and the flash_attn_interface.py files along with the __init.py and the folders. One folder is titled flash_Attn_triton_amd.

Owner

Inside the flash_attn-2.8.3-cp313-cp313-win_amd64.whl you can find the flash_attn_2_cuda.cp313-win_amd64.pyd with all the necessary compiled extensions.
I downloaded and confirmed it is inside. You can rename the .whl to .zip to find it at the root.
I want to believe your issue is most likely coming from missing CUDA runtime/toolkit DLLs. The exact runtime I am using is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\cudart64_12.dll

My advice would be to use Dependencies (github.com/lucasg/Dependencies) on the .pyd file itself and see what it thinks is missing.

Gotcha, okay I see the flash_attn_2_cuda.cp313-win_amd64.pyd file in the Python\Python313\site-packages folder, copied it and renamed it to flash_attn_2_cuda.pyd just to be safe, confirmed the Python\Python313\site-packages is in my path

Did you compile for 12.8 or 13.0?

But I've tried both, confirmed cudart64_12.dll is in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\ and cudart64_13.dll is in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\bin\x64, both were set in the path variable. Used nvcc.exe --version to confirm the correct CUDA version was used during testing and updated the CUDA_HOME and CUDA_PATH variables to the CUDA version.

I've removed all other instances of python in the environment variables.

Still seeing the below error.

import flash_attn
Traceback (most recent call last):
File "", line 1, in
import flash_attn
File "C:\Program Files\Python313\Lib\site-packages\flash_attn_init_.py", line 3, in
from flash_attn.flash_attn_interface import (
...<7 lines>...
)
File "C:\Program Files\Python313\Lib\site-packages\flash_attn\flash_attn_interface.py", line 15, in
import flash_attn_2_cuda as flash_attn_gpu
ImportError: DLL load failed while importing flash_attn_2_cuda: The specified module could not be found.

So then I tried the dependencies application you recommended and found I'm missing the following dll's:

c10.dll
torch_cpu.dll
torch_python.dll
c10_cuda.dll
torch_cuda.dll

My cudart64_12.dll is missing:
api--ms-win-core-libraryloader-11-2-0.dll

So that was really helpful! thanks for the suggestion. Do you have a recommended method for obtaining the above dll's?

I saw PyTorch is releasing nightly builds that support Blackwell. Will that help with any of the missing dll's?

https://download.pytorch.org/whl/nightly/torch/

Although those are torch 2.11.0, not 2.8.3

https://download.pytorch.org/whl/nightly/cu128
https://download.pytorch.org/whl/nightly/cu130

Owner

Making good progress!

c10.dll
torch_cpu.dll
torch_python.dll
c10_cuda.dll
torch_cuda.dll
All of these are key cp313 PyTorch DLLs from Python 3.13's site-packages\torch\lib they will come with pip install torch --index-url https://download.pytorch.org/whl/cu128

For api--ms-win-core-libraryloader-11-2-0.dll mine is Redist\10.0.19041.0\ucrt\DLLs\x64\api-ms-win-core-libraryloader-l1-1-0.dll I used MSVC to build, but I'd just get the latest Visual C++ Redistributable vc_redist.x64.exe and that should cover all your bases.

The nightly builds should stay compatible, I've tested several different cu128 working environments from torch-2.7.0+cu128 to torch-2.11.0.dev20251217+cu128

cudart64_12.dll I don't know what to tell you tho, if you run "nvcc --version" command prompt you should see your nvcc should aim to match your torch whl so cu128, if not then it's most likely not on PATH

Alright πŸ˜„ it finally imported!

Thanks for the help! I installed the latest c++ redistributable from:

https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170#latest-supported-redistributable-version
File: https://aka.ms/vc14/vc_redist.x64.exe

Update the CUDA_HOME and CUDA_PATH to point to 12.8:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8

Used python 3.13.12

Installed torch using the non-nightly build although I feel like nightly may have worked and if i have any issues I'll try it:
pip install --force-reinstall --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

However, I was still finding c10.dll dependency error. It was in the Torch library. Opening c10.dll in the dependency gui showed it was fine on its own. The issue was that flash attention was installed in C:\Program Files\Python313\Lib\site-packages but in C:\Users\me\AppData\Roaming\Python\Python313\site-packages there was an empty flash_attn folder. That for some reason was causing torch's c10.dll to fail to load fully. When I tried to delete flash_attn I got an error it was in use so instead I copied the flash_attn/flash_attn-2.8.3.dist-info/flash_attn_2_cuda.cp313-win_amd64.pyd into C:\Users\Matthew\AppData\Roaming\Python\Python313\site-packages folder and bam, c10.dll no longer showed a dependency issue.

python
Python 3.13.12 (tags/v3.13.12:1cbe481, Feb 3 2026, 18:22:25) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

  • import flash_attn
  • (no error)

Thanks a lot! Appreciate the help. Hope this helps anyone else fighting the import gremlins.

Yeah you need CUDA 128, if you have CUDA 130 it simply won't work/won't import, at least that's what worked for me when I tried it, I had to downgrade to CUDA 128.

Sign up or log in to comment