Ace-Step-v1.5 / README.md
ChuxiJ's picture
update tech report link
fe01169
metadata
title: ACE-Step v1.5
emoji: 🎡
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Music Generation Foundation Model v1.5

ACE-Step 1.5

Pushing the Boundaries of Open-Source Music Generation

Project | Hugging Face | ModelScope | Space Demo | Discord | Technical Report

StepFun Logo

Table of Contents

πŸ“ Abstract

We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100Γ— compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprintsβ€”scaling from short loops to 10-minute compositionsβ€”while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilitiesβ€”such as cover generation, repainting, and vocal-to-BGM conversionβ€”while maintaining strict adherence to prompts across 50+ languages.

✨ Features

ACE-Step Framework

⚑ Performance

  • βœ… Ultra-Fast Generation β€” 0.5s to 10s generation time on A100 (depending on think mode & diffusion steps)
  • βœ… Flexible Duration β€” Supports 10 seconds to 10 minutes (600s) audio generation
  • βœ… Batch Generation β€” Generate up to 8 songs simultaneously

🎡 Generation Quality

  • βœ… Commercial-Grade Output β€” Quality between Suno v4.5 and Suno v5
  • βœ… Rich Style Support β€” 1000+ instruments and styles with fine-grained timbre description
  • βœ… Multi-Language Lyrics β€” Supports 50+ languages with lyrics prompt for structure & style control

πŸŽ›οΈ Versatility & Control

Feature Description
βœ… Reference Audio Input Use reference audio to guide generation style
βœ… Cover Generation Create covers from existing audio
βœ… Repaint & Edit Selective local audio editing and regeneration
βœ… Track Separation Separate audio into individual stems
βœ… Multi-Track Generation Add layers like Suno Studio's "Add Layer" feature
βœ… Vocal2BGM Auto-generate accompaniment for vocal tracks
βœ… Metadata Control Control duration, BPM, key/scale, time signature
βœ… Simple Mode Generate full songs from simple descriptions
βœ… Query Rewriting Auto LM expansion of tags and lyrics
βœ… Audio Understanding Extract BPM, key/scale, time signature & caption from audio
βœ… LRC Generation Auto-generate lyric timestamps for generated music
βœ… LoRA Training One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM)
βœ… Quality Scoring Automatic quality assessment for generated audio

πŸ“¦ Installation

Requirements: Python 3.11, CUDA GPU recommended (works on CPU/MPS but slower)

1. Install uv (Package Manager)

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone & Install

git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5
uv sync

3. Launch

πŸ–₯️ Gradio Web UI (Recommended)

uv run acestep

Open http://localhost:7860 in your browser. Models will be downloaded automatically on first run.

🌐 REST API Server

uv run acestep-api

API runs at http://localhost:8001. See API Documentation for endpoints.

Command Line Options

Gradio UI (acestep):

Option Default Description
--port 7860 Server port
--server-name 127.0.0.1 Server address (use 0.0.0.0 for network access)
--share false Create public Gradio link
--language en UI language: en, zh, ja
--init_service false Auto-initialize models on startup
--config_path auto DiT model (e.g., acestep-v15-turbo, acestep-v15-turbo-shift3)
--lm_model_path auto LM model (e.g., acestep-5Hz-lm-0.6B, acestep-5Hz-lm-1.7B)
--offload_to_cpu auto CPU offload (auto-enabled if VRAM < 16GB)

Examples:

# Public access with Chinese UI
uv run acestep --server-name 0.0.0.0 --share --language zh

# Pre-initialize models on startup
uv run acestep --init_service true --config_path acestep-v15-turbo

Development

# Add dependencies
uv add package-name
uv add --dev package-name

# Update all dependencies
uv sync --upgrade

πŸš€ Usage

We provide multiple ways to use ACE-Step:

Method Description Documentation
πŸ–₯️ Gradio Web UI Interactive web interface for music generation Gradio Guide
🐍 Python API Programmatic access for integration Inference API
🌐 REST API HTTP-based async API for services REST API

πŸ“š Documentation available in: English | δΈ­ζ–‡ | ζ—₯本θͺž

πŸ”¨ Train

See the LoRA Training tab in Gradio UI for one-click training, or check Gradio Guide - LoRA Training for details.

πŸ—οΈ Architecture

ACE-Step Framework

🦁 Model Zoo

Model Zoo

DiT Models

DiT Model Pre-Training SFT RL CFG Step Refer audio Text2Music Cover Repaint Extract Lego Complete Quality Diversity Fine-Tunability Hugging Face
acestep-v15-base βœ… ❌ ❌ βœ… 50 βœ… βœ… βœ… βœ… βœ… βœ… βœ… Medium High Easy Link
acestep-v15-sft βœ… βœ… ❌ βœ… 50 βœ… βœ… βœ… βœ… ❌ ❌ ❌ High Medium Easy Link
acestep-v15-turbo βœ… βœ… ❌ ❌ 8 βœ… βœ… βœ… βœ… ❌ ❌ ❌ Very High Medium Medium Link
acestep-v15-turbo-rl βœ… βœ… βœ… ❌ 8 βœ… βœ… βœ… βœ… ❌ ❌ ❌ Very High Medium Medium To be released

LM Models

LM Model Pretrain from Pre-Training SFT RL CoT metas Query rewrite Audio Understanding Composition Capability Copy Melody Hugging Face
acestep-5Hz-lm-0.6B Qwen3-0.6B βœ… βœ… βœ… βœ… βœ… Medium Medium Weak βœ…
acestep-5Hz-lm-1.7B Qwen3-1.7B βœ… βœ… βœ… βœ… βœ… Medium Medium Medium βœ…
acestep-5Hz-lm-4B Qwen3-4B βœ… βœ… βœ… βœ… βœ… Strong Strong Strong To be released

πŸ“œ License & Disclaimer

This project is licensed under MIT

ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.

πŸ”” Important Notice
The only official website for the ACE-Step project is our GitHub Pages site.
We do not operate any other websites.
🚫 Fake domains include but are not limited to: ac**p.com, a**p.org, a***c.org
⚠️ Please be cautious. Do not visit, trust, or make payments on any of those sites.

πŸ™ Acknowledgements

This project is co-led by ACE Studio and StepFun.

πŸ“– Citation

If you find this project useful for your research, please consider citing:

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Song Yulin, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}