Spaces:
Running
Running
| title: ACE-Step 1.5 XL Music Generation (CPU) | |
| emoji: 🎵 | |
| colorFrom: indigo | |
| colorTo: yellow | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| tags: | |
| - music-generation | |
| - ace-step | |
| - gguf | |
| - lora | |
| - training | |
| - cpu | |
| - mcp-server | |
| short_description: ACE-Step 1.5 XL - CPU music generation + LoRA training | |
| models: | |
| - ACE-Step/Ace-Step1.5 | |
| startup_duration_timeout: 2h | |
| # ACE-Step 1.5 XL Music Generation (CPU) | |
| **GGUF inference + LoRA training** on free CPU Spaces. Powered by [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp). | |
| ## Features | |
| - **Music Generation** - Text/lyrics to stereo 48kHz MP3 via GGUF quantized models | |
| - **LoRA Training** - Fine-tune on your own audio (Side-Step engine, Adafactor optimizer) | |
| - **Multiple LM Sizes** - 0.6B / 1.7B / 4B language models (on-demand download) | |
| - **CPU Only** - Runs on free HuggingFace Spaces (2 vCPU, 18GB RAM) | |
| ## Music Generation | |
| 1. Enter a music description (e.g. "upbeat electronic dance music") | |
| 2. Enter lyrics or check **Instrumental** | |
| 3. Adjust BPM, duration, steps, seed | |
| 4. Select LM model (1.7B default, fastest on CPU) | |
| 5. Select LoRA adapter if trained | |
| 6. Click **Generate Music** | |
| **Timing:** ~270s for 10s audio with 1.7B LM, 8 steps. | |
| ## LoRA Training | |
| 1. Go to **Train LoRA** tab | |
| 2. Upload audio files (WAV/MP3, max 240s each) | |
| 3. Set LoRA name, epochs (1-10), rank (default 16) | |
| 4. Click **Train** - ace-server stops during training, restarts after | |
| 5. Use **Cancel** to stop early (saves checkpoint) | |
| 6. Trained adapter appears in the LoRA dropdown for inference | |
| **Timing:** ~170s preprocessing + ~10s/epoch on CPU. | |
| ## Models | |
| | Component | GGUF | Size | | |
| |-----------|------|------| | |
| | DiT (music) | acestep-v15-xl-turbo-Q4_K_M | 2.8 GB | | |
| | LM (captions) | acestep-5Hz-lm-1.7B-Q8_0 | 1.7 GB | | |
| | Text Encoder | Qwen3-Embedding-0.6B-Q8_0 | 0.75 GB | | |
| | VAE | vae-BF16 | 0.32 GB | | |
| LM alternatives (on-demand download): 0.6B Q8_0 (slow), 4B Q5_K_M (best quality, ~515s). | |
| --- | |
| ## API | |
| ### Python Client - Generate Music | |
| ```python | |
| from gradio_client import Client | |
| client = Client("WeReCooking/ACE-Step-CPU") | |
| result = client.predict( | |
| caption="upbeat electronic dance music", | |
| lyrics="[Instrumental]", | |
| instrumental=True, | |
| bpm=120, | |
| duration=10, | |
| seed=-1, # -1 = random | |
| steps=8, # 1-32, fewer = faster | |
| lora_select="None (no LoRA)", # or trained adapter name | |
| lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf", | |
| api_name="/generate" | |
| ) | |
| print(result) # (audio_path, status_message) | |
| ``` | |
| ### Python Client - Train LoRA | |
| ```python | |
| from gradio_client import Client, handle_file | |
| client = Client("WeReCooking/ACE-Step-CPU") | |
| result = client.predict( | |
| audio_files=[handle_file("song.mp3")], | |
| lora_name="my-style", | |
| epochs=3, | |
| lr=0.0001, | |
| rank=16, | |
| api_name="/train_lora" | |
| ) | |
| print(result) # (log_text, train_btn, cancel_btn) | |
| ``` | |
| ### Python Client - Server Status | |
| ```python | |
| result = client.predict(api_name="/server_status") | |
| print(result) # JSON with model info | |
| ``` | |
| ### MCP (Model Context Protocol) | |
| This Space supports MCP for AI assistants (Claude Desktop, Cursor, VS Code). | |
| **MCP Config:** | |
| ```json | |
| { | |
| "mcpServers": { | |
| "ace-step": {"url": "https://werecooking-ace-step-cpu.hf.space/gradio_api/mcp/"} | |
| } | |
| } | |
| ``` | |
| --- | |
| ## CLI Usage | |
| ```bash | |
| # Generate music | |
| python app.py "upbeat electronic dance music" --duration 10 --steps 8 --format mp3 | |
| # With lyrics | |
| python app.py "pop ballad" --lyrics "Hello world\nThis is a test" -d 30 | |
| # With LoRA adapter | |
| python app.py "jazz piano" --adapter my-style --seed 42 | |
| # Custom server URL | |
| python app.py "ambient" --server http://localhost:8085 | |
| ``` | |
| --- | |
| ## Architecture | |
| ``` | |
| ace-server (C++ GGUF) Gradio UI (Python) | |
| /lm -> LM generate app.py | |
| /synth -> DiT + VAE train_engine.py (Side-Step) | |
| /health | | |
| /props +-- preprocess_audio() | |
| /job +-- train_lora_generator() | |
| ``` | |
| - **Inference:** GGUF via [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) HTTP API | |
| - **Training:** PyTorch via ported [Side-Step](https://github.com/koda-dernet/Side-Step) engine | |
| - Training stops ace-server (free RAM), restarts after with new adapters | |
| ## Credits | |
| - [ACE-Step 1.5](https://github.com/ace-step/ACE-Step-1.5) - Model architecture | |
| - [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) - GGUF inference engine | |
| - [Side-Step](https://github.com/koda-dernet/Side-Step) - Training engine (ported) | |
| - [Serveurperso/ACE-Step-1.5-GGUF](https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF) - Quantized models | |