Use sdpa instead of flash_attention_2 (flash_attn not built; sdpa is torch-native) 7ad6a31 multimodalart commited on 4 days ago