YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Compiled Flash Attention 3 from official repo. Built [DATE] for Torch versions 2.9+, Cuda 12.6, 12.8, 13.0, x86 and ARM for Hopper machines (H100, H200, GH200, etc).

Reproduce:

pip install -U pip wheel setuptools ninja numpy packaging psutil
# Any Torch Nightly after 08/30 should be alright
pip install --pre torch==2.10.0.dev20251210+cu130 --index-url https://download.pytorch.org/whl/nightly/cu130

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper

export MAX_JOBS=16
export FLASH_ATTENTION_FORCE_BUILD=TRUE        # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE       # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE       # leave BF16, FP8


python setup.py bdist_wheel

Tips for ARM builds

On an aarch64/ARM64 system, such as a GH200 server, building requires a bit of finesse. Try:

export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
export MAX_JOBS=4

Please contact me if you would like me to build wheels for any other version of python or torch. I may get around compiling this for Blackwell.

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support