How to bypass flash attention 2 requirement for Apple Silicon?

#63

by MC-QQ - opened Jan 27, 2025

Jan 27, 2025

I got one M4 Mac mini and try to run this model.

Got the following error

Library/Python/3.9/lib/python/site-packages/transformers/modeling_utils.py", line 1659, in _check_and_enable_flash_attn_2
raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

Tried the approach in https://huggingface.co/qnguyen3/nanoLLaVA-1.5/discussions/4 but didn't work.
Any suggestions? Thanks a ton!

beaugunderson

Jan 28, 2025

Hopefully it's available in mlx-vlm at some point...

54BB

Feb 4, 2025

Exact same issue on my side. There are some articles showing success run on Mac, but it doesn't work for me:
https://www.danielcorin.com/til/deekseek/janus-pro-local/
Is it caused by module inconsistency?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment