| --- |
| title: ACE-Step 1.5 Custom Edition |
| emoji: 🎵 |
| colorFrom: blue |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 5.9.1 |
| app_file: app.py |
| pinned: false |
| license: mit |
| python_version: "3.11" |
| hardware: zero-gpu-medium |
| --- |
| |
| # ACE-Step 1.5 Custom Edition |
|
|
| A fully-featured implementation of ACE-Step 1.5 with custom GUI and workflow capabilities for local use and HuggingFace Space deployment. |
|
|
| ## Features |
|
|
| ### 🎵 Three Main Interfaces |
|
|
| 1. **Standard ACE-Step GUI**: Full-featured standard ACE-Step 1.5 interface with all original capabilities |
| 2. **Custom Timeline Workflow**: Advanced timeline-based generation with: |
| - 32-second clip generation (2s lead-in + 28s main + 2s lead-out) |
| - Seamless clip blending for continuous music |
| - Context Length slider (0-120 seconds) for style guidance |
| - Master timeline with extend, inpaint, and remix capabilities |
| 3. **LoRA Training Studio**: Complete LoRA training interface with: |
| - Audio file upload and preprocessing |
| - Custom training configuration |
| - Model download/upload for continued training |
|
|
| ## Architecture |
|
|
| - **Base Model**: ACE-Step v1.5 Turbo |
| - **Framework**: Gradio 5.9.1, PyTorch |
| - **Deployment**: Local execution + HuggingFace Spaces |
| - **Audio Processing**: DiT + VAE + 5Hz Language Model |
|
|
| ## Installation |
|
|
| ### Local Setup |
|
|
| ```bash |
| # Clone the repository |
| git clone https://github.com/yourusername/ace-step-custom.git |
| cd ace-step-custom |
| |
| # Create virtual environment |
| python -m venv venv |
| source venv/bin/activate # On Windows: venv\Scripts\activate |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| |
| # Download ACE-Step model |
| python scripts/download_model.py |
| |
| # Run the application |
| python app.py |
| ``` |
|
|
| ### HuggingFace Space Deployment |
|
|
| 1. Create a new Space on HuggingFace |
| 2. Upload all files to the Space |
| 3. Set Space to use GPU (recommended: H200 or A100) |
| 4. The app will automatically download models and start |
|
|
| ## Usage |
|
|
| ### Standard Mode |
| Use the first tab for standard ACE-Step generation with all original features. |
|
|
| ### Timeline Mode |
| 1. Enter your prompt/lyrics |
| 2. Adjust Context Length (how far back to reference previous clips) |
| 3. Click "Generate" to create 32-second clips |
| 4. Clips automatically blend and add to timeline |
| 5. Use "Extend" to continue the song or other options for variations |
|
|
| ### LoRA Training |
| 1. Upload audio files for training |
| 2. Configure training parameters |
| 3. Train custom LoRA models |
| 4. Download and reuse for continued training |
|
|
| ## System Requirements |
|
|
| ### Minimum |
| - GPU: 8GB VRAM (with optimizations) |
| - RAM: 16GB |
| - Storage: 20GB |
|
|
| ### Recommended |
| - GPU: 16GB+ VRAM (A100, H200, or consumer GPUs) |
| - RAM: 32GB |
| - Storage: 50GB |
|
|
| ## Technical Details |
|
|
| - **Audio Format**: 48kHz, stereo |
| - **Generation Speed**: ~8 inference steps (turbo model) |
| - **Context Window**: Up to 120 seconds for style guidance |
| - **Blend Regions**: 2-second crossfade between clips |
|
|
| ## Credits |
|
|
| Based on ACE-Step 1.5 by ACE Studio |
| - GitHub: https://github.com/ace-step/ACE-Step-1.5 |
| - Original Demo: https://huggingface.co/spaces/ACE-Step/ACE-Step |
|
|
| ## License |
|
|
| MIT License (see LICENSE file) |
|
|