Text-to-Audio
Diffusers
Safetensors
ACE-Step
AceStepPipeline
audio
music
text-to-music
flow-matching
Instructions to use ACE-Step/acestep-v15-xl-sft-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ACE-Step/acestep-v15-xl-sft-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ACE-Step/acestep-v15-xl-sft-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - ACE-Step
How to use ACE-Step/acestep-v15-xl-sft-diffusers with ACE-Step:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Document 50-step default for XL SFT
Browse files
README.md
CHANGED
|
@@ -46,7 +46,7 @@ output = pipe(
|
|
| 46 |
prompt="An upbeat synthwave track with driving drums and a catchy lead",
|
| 47 |
lyrics="[Verse]\nNeon lights are calling me\n[Chorus]\nRide the wave tonight",
|
| 48 |
audio_duration=30.0,
|
| 49 |
-
num_inference_steps=
|
| 50 |
guidance_scale=7.0,
|
| 51 |
shift=3.0,
|
| 52 |
generator=torch.Generator(device="cuda").manual_seed(42),
|
|
@@ -56,7 +56,7 @@ audio = output.audios[0] # (channels, samples), 48 kHz
|
|
| 56 |
sf.write("acestep-xl-sft.wav", audio.T.cpu().float().numpy(), pipe.sample_rate)
|
| 57 |
```
|
| 58 |
|
| 59 |
-
Unlike the turbo checkpoint, XL SFT is not guidance-distilled. The pipeline uses ACE-Step's APG guidance path when `guidance_scale > 1.0`; `guidance_scale=7.0` and `shift=3.0` are the recommended defaults.
|
| 60 |
|
| 61 |
For batched prompts with padding and FlashAttention, use the variable-length backend:
|
| 62 |
|
|
|
|
| 46 |
prompt="An upbeat synthwave track with driving drums and a catchy lead",
|
| 47 |
lyrics="[Verse]\nNeon lights are calling me\n[Chorus]\nRide the wave tonight",
|
| 48 |
audio_duration=30.0,
|
| 49 |
+
num_inference_steps=50,
|
| 50 |
guidance_scale=7.0,
|
| 51 |
shift=3.0,
|
| 52 |
generator=torch.Generator(device="cuda").manual_seed(42),
|
|
|
|
| 56 |
sf.write("acestep-xl-sft.wav", audio.T.cpu().float().numpy(), pipe.sample_rate)
|
| 57 |
```
|
| 58 |
|
| 59 |
+
Unlike the turbo checkpoint, XL SFT is not guidance-distilled. The pipeline uses ACE-Step's APG guidance path when `guidance_scale > 1.0`; `num_inference_steps=50`, `guidance_scale=7.0`, and `shift=3.0` are the recommended defaults. Pass `num_inference_steps=50` explicitly so generation does not use the lower-step turbo setting.
|
| 60 |
|
| 61 |
For batched prompts with padding and FlashAttention, use the variable-length backend:
|
| 62 |
|