ACE-Step
/

acestep-v15-xl-sft-diffusers

AceStepPipeline

Model card Files Files and versions

ChuxiJ commited on 1 day ago

Commit

4bf7b60

·

verified ·

1 Parent(s): feb3161

Document 50-step default for XL SFT

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -46,7 +46,7 @@ output = pipe(
     prompt="An upbeat synthwave track with driving drums and a catchy lead",
     lyrics="[Verse]\nNeon lights are calling me\n[Chorus]\nRide the wave tonight",
     audio_duration=30.0,
-    num_inference_steps=8,
     guidance_scale=7.0,
     shift=3.0,
     generator=torch.Generator(device="cuda").manual_seed(42),
@@ -56,7 +56,7 @@ audio = output.audios[0]  # (channels, samples), 48 kHz
 sf.write("acestep-xl-sft.wav", audio.T.cpu().float().numpy(), pipe.sample_rate)
 ```
-Unlike the turbo checkpoint, XL SFT is not guidance-distilled. The pipeline uses ACE-Step's APG guidance path when `guidance_scale > 1.0`; `guidance_scale=7.0` and `shift=3.0` are the recommended defaults. You can increase `num_inference_steps` for slower, higher-quality sampling.
 For batched prompts with padding and FlashAttention, use the variable-length backend:

     prompt="An upbeat synthwave track with driving drums and a catchy lead",
     lyrics="[Verse]\nNeon lights are calling me\n[Chorus]\nRide the wave tonight",
     audio_duration=30.0,
+    num_inference_steps=50,
     guidance_scale=7.0,
     shift=3.0,
     generator=torch.Generator(device="cuda").manual_seed(42),
 sf.write("acestep-xl-sft.wav", audio.T.cpu().float().numpy(), pipe.sample_rate)
 ```
+Unlike the turbo checkpoint, XL SFT is not guidance-distilled. The pipeline uses ACE-Step's APG guidance path when `guidance_scale > 1.0`; `num_inference_steps=50`, `guidance_scale=7.0`, and `shift=3.0` are the recommended defaults. Pass `num_inference_steps=50` explicitly so generation does not use the lower-step turbo setting.
 For batched prompts with padding and FlashAttention, use the variable-length backend: