ACE-Step v1.5 1D VAE

Stable Audio Tools Format

Model Details

This is the 1D Variational Autoencoder (VAE) used in ACE-Step v1.5 for music generation. The weights are provided in stable-audio-tools compatible format, making it easy to load, fine-tune, and integrate into your own training pipelines.

Developed by: ACE-STEP
Model type: Audio VAE (Oobleck Autoencoder)
License: MIT

Parameter	Value
Architecture	Oobleck Autoencoder (VAE)
Audio Channels	2 (Stereo)
Sampling Rate	48,000 Hz
Latent Dim	64
Encoder Latent Dim	128
Downsampling Ratio	1,920
Encoder/Decoder Channels	128
Channel Multipliers	[1, 2, 4, 8, 16]
Strides	[2, 4, 4, 6, 10]
Activation	Snake

🏗️ Architecture

The VAE is a core component of the ACE-Step v1.5 pipeline, responsible for compressing raw stereo audio (48kHz) into a compact latent representation with a 1920x downsampling ratio and 64-dimensional latent space. The DiT operates in this latent space to generate music.

Quick Start

Installation

pip install stable-audio-tools torchaudio

Load and Use

from stable_audio_vae import StableAudioVAE

# Load model
vae = StableAudioVAE(
    config_path="config.json",
    checkpoint_path="checkpoint.ckpt",
)
vae = vae.cuda().eval()

# Encode audio
wav = vae.load_wav("input.wav")
wav = wav.cuda()
latent = vae.encode(wav)
print(f"Latent shape: {latent.shape}")  # [batch, 64, time/1920]

# Decode back to audio
output = vae.decode(latent)

Command Line

python stable_audio_vae.py -i input.wav -o output.wav

# For long audio, use chunked processing
python stable_audio_vae.py -i input.wav -o output.wav --chunked

Fine-Tuning

This checkpoint is compatible with stable-audio-tools training pipelines. The config.json includes full training configuration (optimizer, loss, discriminator settings) that you can use as a starting point for fine-tuning.

File Structure

.
├── config.json            # Model architecture and training config
├── checkpoint.ckpt        # Model weights (PyTorch checkpoint)
├── stable_audio_vae.py    # Inference script with StableAudioVAE wrapper
└── README.md

🦁 Related Models

Model	Description	Hugging Face
`acestep-v15-base`	DiT base model (CFG, 50 steps)	Link
`acestep-v15-sft`	DiT SFT model (CFG, 50 steps)	Link
`acestep-v15-turbo`	DiT turbo model (8 steps)	Link
`acestep-v15-xl-base`	XL DiT base (4B, CFG, 50 steps)	Link
`acestep-v15-xl-sft`	XL DiT SFT (4B, CFG, 50 steps)	Link
`acestep-v15-xl-turbo`	XL DiT turbo (4B, 8 steps)	Link

🙏 Acknowledgements

This project is co-led by ACE Studio and StepFun.

📖 Citation

If you find this project useful for your research, please consider citing:

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}

Downloads last month: -

Model tree for ACE-Step/ace-step-v1.5-1d-vae-stable-audio-format

Finetunes

1 model

Quantizations

1 model

Paper for ACE-Step/ace-step-v1.5-1d-vae-stable-audio-format

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Paper • 2602.00744 • Published Jan 31 • 13