Spaces:

ACE-Step
/

Ace-Step-v1.5

Running on Zero

App Files Files Community

Suggestions for stable HuggingFace ZeroGPU deployment (Fixing CUDA flash-attn error and batch size timeout)

#18

by ezmarynoori - opened 10 days ago

Discussion

ezmarynoori

10 days ago

Hi,

Thanks for this amazing project! I deployed the Space on HuggingFace ZeroGPU and encountered a couple of issues that might be helpful to address for stable deployment:

Flash Attention CUDA Error: On ZeroGPU, since the allocated GPU nodes change dynamically, the pre-compiled flash-attn wheel often causes "no kernel image is available for execution" crashes due to architecture mismatch. Forcing a fallback to PyTorch's native Scaled Dot-Product Attention (SDPA) resolves this and ensures 100% stability across all allocated GPU models.
Batch Size Timeout: The default batch size of 2 often exceeds the 120-second ZeroGPU execution limit, leading to aborted tasks. Making batch_size_input interactive (and defaulting to 1 on Space deployments) allows the pipeline to complete successfully in a single session and enables the Save & Resume mechanism to work properly when needed.

Perhaps adding a graceful fallback in the handler or environment checks would make HF Space deployments much more robust.

Best regards!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment