AssertionError: Flash Attention is not available, but is needed for dense attention

#30
by tpadhi1 - opened
model = AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs)
  File "/home/ubuntu/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
 return model_class.from_pretrained(
  File "/home/ubuntu/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained
 model = cls(config, *model_args, **model_kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 903, in __init__
 self.model = Phi3SmallModel(config)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 745, in __init__
 self.layers = nn.ModuleList([Phi3SmallDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 745, in <listcomp>
 self.layers = nn.ModuleList([Phi3SmallDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 651, in __init__
 self.self_attn = Phi3SmallSelfAttention(config, layer_idx)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 218, in __init__
 assert is_flash_attention_available, "Flash Attention is not available, but is needed for dense attention"
AssertionError: Flash Attention is not available, but is needed for dense attention ```

It is not mentioned in the readme, you need to install Flash Attention package (https://pypi.org/project/flash-attn/):

pip install flash-attn

Even after installation, it throws the same error.

Any follow up on this?

flash attention needs to be used on gpu not cpu, this could be the cause of the error. changing my runtime on colab to gpu fixed this error

Sign up or log in to comment