YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
To compile flash attention for windows...
- Install visual studio build tools.
- Start a new windows terminal. Click "+" -> "developer command prompt for VS"
set CUDA_HOME=%CUDA_PATH_V13_0%Or whatever your version of CUDA is.- Follow instructions in https://huggingface.co/lldacing/flash-attention-windows-wheel
- Leave your computer on overnight to compile it. (Might run faster if you change MAX_JOBS= in the script)
After upgrading major things like CUDA, python, you will need to upgrade just about everything else. If you upgrade to something just released recently, you may need the git version.
I have included sageattention if you have problems installing it. (https://github.com/thu-ml/SageAttention/issues/242)
You may also need to install xformers from the latest...
pip uninstall xformers
pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support