ACE-Step-CPU / README.md
Nekochu's picture
add full README with API docs, MCP, CLI, architecture
9d2d424
|
raw
history blame
4.61 kB
metadata
title: ACE-Step 1.5 XL Music Generation (CPU)
emoji: 🎵
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
license: mit
tags:
  - music-generation
  - ace-step
  - gguf
  - lora
  - training
  - cpu
  - mcp-server
short_description: ACE-Step 1.5 XL - CPU music generation + LoRA training
models:
  - ACE-Step/Ace-Step1.5
startup_duration_timeout: 2h

ACE-Step 1.5 XL Music Generation (CPU)

GGUF inference + LoRA training on free CPU Spaces. Powered by acestep.cpp.

Features

  • Music Generation - Text/lyrics to stereo 48kHz MP3 via GGUF quantized models
  • LoRA Training - Fine-tune on your own audio (Side-Step engine, Adafactor optimizer)
  • Multiple LM Sizes - 0.6B / 1.7B / 4B language models (on-demand download)
  • CPU Only - Runs on free HuggingFace Spaces (2 vCPU, 18GB RAM)

Music Generation

  1. Enter a music description (e.g. "upbeat electronic dance music")
  2. Enter lyrics or check Instrumental
  3. Adjust BPM, duration, steps, seed
  4. Select LM model (1.7B default, fastest on CPU)
  5. Select LoRA adapter if trained
  6. Click Generate Music

Timing: ~270s for 10s audio with 1.7B LM, 8 steps.

LoRA Training

  1. Go to Train LoRA tab
  2. Upload audio files (WAV/MP3, max 240s each)
  3. Set LoRA name, epochs (1-10), rank (default 16)
  4. Click Train - ace-server stops during training, restarts after
  5. Use Cancel to stop early (saves checkpoint)
  6. Trained adapter appears in the LoRA dropdown for inference

Timing: ~170s preprocessing + ~10s/epoch on CPU.

Models

Component GGUF Size
DiT (music) acestep-v15-xl-turbo-Q4_K_M 2.8 GB
LM (captions) acestep-5Hz-lm-1.7B-Q8_0 1.7 GB
Text Encoder Qwen3-Embedding-0.6B-Q8_0 0.75 GB
VAE vae-BF16 0.32 GB

LM alternatives (on-demand download): 0.6B Q8_0 (slow), 4B Q5_K_M (best quality, ~515s).


API

Python Client - Generate Music

from gradio_client import Client

client = Client("WeReCooking/ACE-Step-CPU")

result = client.predict(
    caption="upbeat electronic dance music",
    lyrics="[Instrumental]",
    instrumental=True,
    bpm=120,
    duration=10,
    seed=-1,                          # -1 = random
    steps=8,                          # 1-32, fewer = faster
    lora_select="None (no LoRA)",     # or trained adapter name
    lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf",
    api_name="/generate"
)
print(result)  # (audio_path, status_message)

Python Client - Train LoRA

from gradio_client import Client, handle_file

client = Client("WeReCooking/ACE-Step-CPU")

result = client.predict(
    audio_files=[handle_file("song.mp3")],
    lora_name="my-style",
    epochs=3,
    lr=0.0001,
    rank=16,
    api_name="/train_lora"
)
print(result)  # (log_text, train_btn, cancel_btn)

Python Client - Server Status

result = client.predict(api_name="/server_status")
print(result)  # JSON with model info

MCP (Model Context Protocol)

This Space supports MCP for AI assistants (Claude Desktop, Cursor, VS Code).

MCP Config:

{
  "mcpServers": {
    "ace-step": {"url": "https://werecooking-ace-step-cpu.hf.space/gradio_api/mcp/"}
  }
}

CLI Usage

# Generate music
python app.py "upbeat electronic dance music" --duration 10 --steps 8 --format mp3

# With lyrics
python app.py "pop ballad" --lyrics "Hello world\nThis is a test" -d 30

# With LoRA adapter
python app.py "jazz piano" --adapter my-style --seed 42

# Custom server URL
python app.py "ambient" --server http://localhost:8085

Architecture

ace-server (C++ GGUF)     Gradio UI (Python)
  /lm    -> LM generate     app.py
  /synth -> DiT + VAE       train_engine.py (Side-Step)
  /health                    |
  /props                     +-- preprocess_audio()
  /job                       +-- train_lora_generator()
  • Inference: GGUF via acestep.cpp HTTP API
  • Training: PyTorch via ported Side-Step engine
  • Training stops ace-server (free RAM), restarts after with new adapters

Credits