fix: import `flash_attn_varlen_func` from `flash_attn` instead of `transformers.modeling_flash_attention_utils`
Browse filesWhen I load this model from the `transformers`.
```python
from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
model_path = "lmms-lab/LLaVA-OneVision-1.5-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
```
The following error occurs.
```bash
ImportError: cannot import name 'flash_attn_varlen_func' from 'transformers.modeling_flash_attention_utils'
```
This is because the current `transformers` library no longer exposes the `flash_attn_varlen_func` API in the `transformers.modeling_flash_attention_utils` module.
The solution is to import the `flash_attn_varlen_func` API from `flash_attn`.
In fact, this bug has been fixed in [LLaVA-OneVision-1.5 GitHub Repo(fix_issue#31)](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5/pull/33).
However, it has not yet been synchronized to the Hugging Face repository.
|
@@ -46,7 +46,8 @@ from .configuration_llavaonevision1_5 import Llavaonevision1_5Config, LLaVAOneVi
|
|
| 46 |
|
| 47 |
|
| 48 |
if is_flash_attn_available():
|
| 49 |
-
from transformers.modeling_flash_attention_utils import _flash_attention_forward
|
|
|
|
| 50 |
|
| 51 |
if is_torch_flex_attn_available():
|
| 52 |
from torch.nn.attention.flex_attention import BlockMask
|
|
|
|
| 46 |
|
| 47 |
|
| 48 |
if is_flash_attn_available():
|
| 49 |
+
from transformers.modeling_flash_attention_utils import _flash_attention_forward
|
| 50 |
+
from flash_attn import flash_attn_varlen_func
|
| 51 |
|
| 52 |
if is_torch_flex_attn_available():
|
| 53 |
from torch.nn.attention.flex_attention import BlockMask
|