--- title: ACE-Step 1.5 Custom Edition emoji: 🎵 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.9.1 app_file: app.py pinned: false license: mit python_version: "3.11" hardware: zero-gpu-medium --- # ACE-Step 1.5 Custom Edition A fully-featured implementation of ACE-Step 1.5 with custom GUI and workflow capabilities for local use and HuggingFace Space deployment. ## Features ### 🎵 Three Main Interfaces 1. **Standard ACE-Step GUI**: Full-featured standard ACE-Step 1.5 interface with all original capabilities 2. **Custom Timeline Workflow**: Advanced timeline-based generation with: - 32-second clip generation (2s lead-in + 28s main + 2s lead-out) - Seamless clip blending for continuous music - Context Length slider (0-120 seconds) for style guidance - Master timeline with extend, inpaint, and remix capabilities 3. **LoRA Training Studio**: Complete LoRA training interface with: - Audio file upload and preprocessing - Custom training configuration - Model download/upload for continued training ## Architecture - **Base Model**: ACE-Step v1.5 Turbo - **Framework**: Gradio 5.9.1, PyTorch - **Deployment**: Local execution + HuggingFace Spaces - **Audio Processing**: DiT + VAE + 5Hz Language Model ## Installation ### Local Setup ```bash # Clone the repository git clone https://github.com/yourusername/ace-step-custom.git cd ace-step-custom # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Download ACE-Step model python scripts/download_model.py # Run the application python app.py ``` ### HuggingFace Space Deployment 1. Create a new Space on HuggingFace 2. Upload all files to the Space 3. Set Space to use GPU (recommended: H200 or A100) 4. The app will automatically download models and start ## Usage ### Standard Mode Use the first tab for standard ACE-Step generation with all original features. ### Timeline Mode 1. Enter your prompt/lyrics 2. Adjust Context Length (how far back to reference previous clips) 3. Click "Generate" to create 32-second clips 4. Clips automatically blend and add to timeline 5. Use "Extend" to continue the song or other options for variations ### LoRA Training 1. Upload audio files for training 2. Configure training parameters 3. Train custom LoRA models 4. Download and reuse for continued training ## System Requirements ### Minimum - GPU: 8GB VRAM (with optimizations) - RAM: 16GB - Storage: 20GB ### Recommended - GPU: 16GB+ VRAM (A100, H200, or consumer GPUs) - RAM: 32GB - Storage: 50GB ## Technical Details - **Audio Format**: 48kHz, stereo - **Generation Speed**: ~8 inference steps (turbo model) - **Context Window**: Up to 120 seconds for style guidance - **Blend Regions**: 2-second crossfade between clips ## Credits Based on ACE-Step 1.5 by ACE Studio - GitHub: https://github.com/ace-step/ACE-Step-1.5 - Original Demo: https://huggingface.co/spaces/ACE-Step/ACE-Step ## License MIT License (see LICENSE file)