YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Compiled Flash Attention 3 from official repo. Built [DATE] for Torch versions 2.9+, Cuda 12.6, 12.8, 13.0, x86 and ARM for Hopper machines (H100, H200, GH200, etc).
Reproduce:
pip install -U pip wheel setuptools ninja numpy packaging psutil
# Any Torch Nightly after 08/30 should be alright
pip install --pre torch==2.10.0.dev20251210+cu130 --index-url https://download.pytorch.org/whl/nightly/cu130
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
export MAX_JOBS=16
export FLASH_ATTENTION_FORCE_BUILD=TRUE # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE # leave BF16, FP8
python setup.py bdist_wheel
Tips for ARM builds
On an aarch64/ARM64 system, such as a GH200 server, building requires a bit of finesse. Try:
export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
export MAX_JOBS=4
Please contact me if you would like me to build wheels for any other version of python or torch. I may get around compiling this for Blackwell.
- Downloads last month
- 35
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support