To compile flash attention for windows...

Install visual studio build tools.
Start a new windows terminal. Click "+" -> "developer command prompt for VS"
set CUDA_HOME=%CUDA_PATH_V13_0% Or whatever your version of CUDA is.
Follow instructions in https://huggingface.co/lldacing/flash-attention-windows-wheel
Leave your computer on overnight to compile it. (Might run faster if you change MAX_JOBS= in the script)

After upgrading major things like CUDA, python, you will need to upgrade just about everything else. If you upgrade to something just released recently, you may need the git version.

I have included sageattention if you have problems installing it. (https://github.com/thu-ml/SageAttention/issues/242)

You may also need to install xformers from the latest...

pip uninstall xformers pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support