Spaces:

Gamahea
/

ACE-Step-Custom

Running on Zero

File size: 3,061 Bytes

a602628

---
title: ACE-Step 1.5 Custom Edition
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
python_version: "3.11"
hardware: zero-gpu-medium
---

# ACE-Step 1.5 Custom Edition

A fully-featured implementation of ACE-Step 1.5 with custom GUI and workflow capabilities for local use and HuggingFace Space deployment.

## Features

### 🎵 Three Main Interfaces

1. **Standard ACE-Step GUI**: Full-featured standard ACE-Step 1.5 interface with all original capabilities
2. **Custom Timeline Workflow**: Advanced timeline-based generation with:
   - 32-second clip generation (2s lead-in + 28s main + 2s lead-out)
   - Seamless clip blending for continuous music
   - Context Length slider (0-120 seconds) for style guidance
   - Master timeline with extend, inpaint, and remix capabilities
3. **LoRA Training Studio**: Complete LoRA training interface with:
   - Audio file upload and preprocessing
   - Custom training configuration
   - Model download/upload for continued training

## Architecture

- **Base Model**: ACE-Step v1.5 Turbo
- **Framework**: Gradio 5.9.1, PyTorch
- **Deployment**: Local execution + HuggingFace Spaces
- **Audio Processing**: DiT + VAE + 5Hz Language Model

## Installation

### Local Setup

```bash
# Clone the repository
git clone https://github.com/yourusername/ace-step-custom.git
cd ace-step-custom

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download ACE-Step model
python scripts/download_model.py

# Run the application
python app.py
```

### HuggingFace Space Deployment

1. Create a new Space on HuggingFace
2. Upload all files to the Space
3. Set Space to use GPU (recommended: H200 or A100)
4. The app will automatically download models and start

## Usage

### Standard Mode
Use the first tab for standard ACE-Step generation with all original features.

### Timeline Mode
1. Enter your prompt/lyrics
2. Adjust Context Length (how far back to reference previous clips)
3. Click "Generate" to create 32-second clips
4. Clips automatically blend and add to timeline
5. Use "Extend" to continue the song or other options for variations

### LoRA Training
1. Upload audio files for training
2. Configure training parameters
3. Train custom LoRA models
4. Download and reuse for continued training

## System Requirements

### Minimum
- GPU: 8GB VRAM (with optimizations)
- RAM: 16GB
- Storage: 20GB

### Recommended
- GPU: 16GB+ VRAM (A100, H200, or consumer GPUs)
- RAM: 32GB
- Storage: 50GB

## Technical Details

- **Audio Format**: 48kHz, stereo
- **Generation Speed**: ~8 inference steps (turbo model)
- **Context Window**: Up to 120 seconds for style guidance
- **Blend Regions**: 2-second crossfade between clips

## Credits

Based on ACE-Step 1.5 by ACE Studio
- GitHub: https://github.com/ace-step/ACE-Step-1.5
- Original Demo: https://huggingface.co/spaces/ACE-Step/ACE-Step

## License

MIT License (see LICENSE file)