Configuration Parsing Warning: Config file tokenizer_config.json cannot be fetched (too big)

ACE-Step 1.5

Pushing the Boundaries of Open-Source Music Generation

Project | Hugging Face | ModelScope | Space Demo | Discord Tech Report

image

Model Details

🚀 ACE-Step v1.5 is a highly efficient open-source music foundation model designed to bring commercial-grade music generation to consumer hardware.

Key Features

  • 💰 Commercial-Ready: Unlike many models trained on ambiguous datasets, ACE-Step v1.5 is designed for creators. You can strictly use the generated music for commercial purposes.
  • 📚 Safe & Robust Training Data: The model is trained on a massive, legally compliant dataset consisting of:
    • Licensed Data: Professionally licensed music tracks.
    • Royalty-Free / No-Copyright Data: A vast collection of public domain and royalty-free music.
    • Synthetic Data: High-quality audio generated via advanced MIDI-to-Audio conversion.
  • ⚡ Extreme Speed: Generates a full song in under 2 seconds on an A100 and under 10 seconds on an RTX 3090.
  • 🖥️ Consumer Hardware Friendly: Runs locally with less than 4GB of VRAM.

Technical Capabilities

🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️

🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸

  • Developed by: [ACE-STEP]
  • Model type: [Text2Music]
  • Language(s): [50+ languages]
  • License: [MIT]

Evaluation

image

🏗️ Architecture

image

🦁 Model Zoo

image

DiT Models

DiT Model Pre-Training SFT RL CFG Step Refer audio Text2Music Cover Repaint Extract Lego Complete Quality Diversity Fine-Tunability Hugging Face
acestep-v15-base 50 Medium High Easy Link
acestep-v15-sft 50 High Medium Easy Link
acestep-v15-turbo 8 Very High Medium Medium Link
acestep-v15-turbo-rl 8 Very High Medium Medium To be released

LM Models

LM Model Pretrain from Pre-Training SFT RL CoT metas Query rewrite Audio Understanding Composition Capability Copy Melody Hugging Face
acestep-5Hz-lm-0.6B Qwen3-0.6B Medium Medium Weak
acestep-5Hz-lm-1.7B Qwen3-1.7B Medium Medium Medium
acestep-5Hz-lm-4B Qwen3-4B Strong Strong Strong

🙏 Acknowledgements

This project is co-led by ACE Studio and StepFun.

📖 Citation

If you find this project useful for your research, please consider citing:

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}
Downloads last month
2
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ACE-Step/acestep-5Hz-lm-4B 1

Paper for ACE-Step/acestep-5Hz-lm-4B