fix: import `flash_attn_varlen_func` from `flash_attn` instead of `transformers.modeling_flash_attention_utils`

When I load this model from the `transformers`.
```python
from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM

model_path = "lmms-lab/LLaVA-OneVision-1.5-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
```
The following error occurs.
```bash
ImportError: cannot import name 'flash_attn_varlen_func' from 'transformers.modeling_flash_attention_utils'
```
This is because the current `transformers` library no longer exposes the `flash_attn_varlen_func` API in the `transformers.modeling_flash_attention_utils` module.
The solution is to import the `flash_attn_varlen_func` API from `flash_attn`.

In fact, this bug has been fixed in [LLaVA-OneVision-1.5 GitHub Repo(fix_issue#31)](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5/pull/33).
However, it has not yet been synchronized to the Hugging Face repository.

Files changed (1) hide show

modeling_llavaonevision1_5.py +2 -1

modeling_llavaonevision1_5.py CHANGED Viewed

@@ -46,7 +46,8 @@ from .configuration_llavaonevision1_5 import Llavaonevision1_5Config, LLaVAOneVi
 if is_flash_attn_available():
-    from transformers.modeling_flash_attention_utils import _flash_attention_forward, flash_attn_varlen_func
 if is_torch_flex_attn_available():
     from torch.nn.attention.flex_attention import BlockMask

 if is_flash_attn_available():
+    from transformers.modeling_flash_attention_utils import _flash_attention_forward
+    from flash_attn import flash_attn_varlen_func
 if is_torch_flex_attn_available():
     from torch.nn.attention.flex_attention import BlockMask