Unable to run on V100

#11

by sdascoli - opened Feb 28, 2025

Feb 28, 2025

Hi, I'm trying to run on a V100 GPU and am following the recommendation of setting attn_implementation='eager', but this still returns RuntimeError: FlashAttention only supports Ampere GPUs or newer..
Any idea what is going on here?
Thanks!

prithivMLmods

Feb 28, 2025

V100 uses the Volta architecture. Try it on A100, A6000, or A40 GPUs!

sdascoli

Feb 28, 2025

I don't have have access to those, but it says in the model card that it should work on V100

prithivMLmods

Feb 28, 2025

Try na running it as a subprocess ?

subprocess.run(
"pip install flash-attn --no-build-isolation",
env={"FLASH_ATTENTION_SKIP_CUDA_BUILD": "TRUE"},
shell=True
)

sdascoli

Feb 28, 2025

Flash attention will not work on this type of GPU. My question is why the model still tries to run flash attention when attn_implementation='eager' is set

prithivMLmods

Feb 28, 2025

•

edited Feb 28, 2025

Because it overrides the model configuration,
_attn_implementation": "flash_attention_2"

nguyenbh changed discussion status to closed Mar 1, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment