YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Flash Attention 3 compatible with torch.compile. See this PR by guilhermeleobas for more details.

There is a build here for Torch 2.8, Torch 2.9, and a build for Torch 2.10 Nightlies.

The recommended use of this build is with Huggingface's kernels. You can see an example of this build being used in modded-nanogpt.

Reproduce:

Torch 2.8.0 Build

Compiled from https://github.com/varunneal/flash-attention on branch guilhermeleobas/fa3-compile.

Compilation commands:

pip install -U pip wheel setuptools ninja numpy packaging psutil
pip install torch==2.8.0

git clone https://github.com/varunneal/flash-attention
cd flash-attention/hopper
git switch fa3-compile

export MAX_JOBS=32
export FLASH_ATTENTION_FORCE_BUILD=TRUE        # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE       # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE       # leave BF16, FP8

# Optional, for faster compilation time
export FLASH_ATTENTION_DISABLE_HDIM64=TRUE
export FLASH_ATTENTION_DISABLE_HDIM96=TRUE
export FLASH_ATTENTION_DISABLE_HDIM192=TRUE
export FLASH_ATTENTION_DISABLE_HDIM256=TRUE

python setup.py bdist_wheel

Torch Nightlies build

Compiled from https://github.com/varunneal/flash-attention on branch stable.

This is a custom fork that combines ABI Compatibility with torch.compile compatbility. This build should be consistent with Torch Nightlies from 08/30 onward.

Compilation commands:

pip install -U pip wheel setuptools ninja numpy packaging psutil
# Any Torch Nightly after 08/30 should be alright
pip install --pre "torch==2.10.0.dev20250926+cu126" --index-url https://download.pytorch.org/whl/nightly/cu126

git clone https://github.com/varunneal/flash-attention
cd flash-attention/hopper
git switch stable

export MAX_JOBS=32
export FLASH_ATTENTION_FORCE_BUILD=TRUE        # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE       # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE       # leave BF16, FP8


python setup.py bdist_wheel

Tips for ARM builds

On an aarch64/ARM64 system, such as a GH200 server, building requires a bit of finesse. Try:

export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
export MAX_JOBS=4

Please contact me if you would like me to build wheels for any other version of python or torch.

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support