ACE-Step 1.5
Pushing the Boundaries of Open-Source Music Generation
Project | Hugging Face | ModelScope | Space Demo | Discord Tech Report
Model Details
🚀 ACE-Step v1.5 is a highly efficient open-source music foundation model designed to bring commercial-grade music generation to consumer hardware.
Key Features
- 💰 Commercial-Ready: Unlike many models trained on ambiguous datasets, ACE-Step v1.5 is designed for creators. You can strictly use the generated music for commercial purposes.
- 📚 Safe & Robust Training Data: The model is trained on a massive, legally compliant dataset consisting of:
- Licensed Data: Professionally licensed music tracks.
- Royalty-Free / No-Copyright Data: A vast collection of public domain and royalty-free music.
- Synthetic Data: High-quality audio generated via advanced MIDI-to-Audio conversion.
- ⚡ Extreme Speed: Generates a full song in under 2 seconds on an A100 and under 10 seconds on an RTX 3090.
- 🖥️ Consumer Hardware Friendly: Runs locally with less than 4GB of VRAM.
Technical Capabilities
🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️
🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸
- Developed by: [ACE-STEP]
- Model type: [Text2Music]
- Language(s): [50+ languages]
- License: [MIT]
Evaluation
🏗️ Architecture
🦁 Model Zoo
DiT Models
| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
acestep-v15-base |
✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | High | Easy | Link |
acestep-v15-sft |
✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | High | Medium | Easy | Link |
acestep-v15-turbo |
✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | Link |
acestep-v15-turbo-rl |
✅ | ✅ | ✅ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | To be released |
LM Models
| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
|---|---|---|---|---|---|---|---|---|---|---|
acestep-5Hz-lm-0.6B |
Qwen3-0.6B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Weak | ✅ |
acestep-5Hz-lm-1.7B |
Qwen3-1.7B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Medium | ✅ |
acestep-5Hz-lm-4B |
Qwen3-4B | ✅ | ✅ | ✅ | ✅ | ✅ | Strong | Strong | Strong | ✅ |
🙏 Acknowledgements
This project is co-led by ACE Studio and StepFun.
📖 Citation
If you find this project useful for your research, please consider citing:
@misc{gong2026acestep,
title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
year={2026},
note={GitHub repository}
}
- Downloads last month
- 2



