ace-step15-endpoint / docs /Summary1.md

Andrew

major changes

e3ac746 11 days ago

preview code

raw

history blame contribute delete

3.82 kB

from pathlib import Path

content = """# ACE-Step 1.5 Deployment Context (Hugging Face Endpoint) — Handoff Notes

Objective

Set up ACE-Step/Ace-Step1.5 on Hugging Face so we can:

Serve music generations through a private endpoint (token-protected),
Run on GPU (A100 preferred),
Control costs aggressively (scale-to-zero / pause when idle),
Call the endpoint from local scripts/app backend,
Transition from a sine-wave smoke test to real ACE-Step generations.

Current State (What’s Already Done)

Hugging Face auth is working in terminal (hf auth usable).
A private dedicated endpoint exists:
- https://xr81s77sis7hoggq.us-east-1.aws.endpoints.huggingface.cloud
Endpoint is connected to a custom repo containing handler.py.
Smoke-test handler.py was deployed and tested successfully:
- Returns base64-encoded WAV generated as sine wave/noise.
Local .bat + .ps1 testing flow works to hit endpoint and save .wav.

Key Constraint Discovered

ACE-Step/Ace-Step1.5 is not a one-click “Model Catalog verified” endpoint deployment. HF warning indicates:

no verified config,
missing handler.py if trying to deploy model repo directly.

Implication

Use a custom endpoint repo (our own repo) with:

handler.py
requirements.txt Then load ACE-Step/Ace-Step1.5 from code at runtime.

Important Product/Infra Notes

ZeroGPU vs Dedicated Endpoints

ZeroGPU applies to Spaces (good for demos/prototypes).
For production-like API serving, use Dedicated Inference Endpoints.

Idle Scaling

In Dedicated Endpoints UI, minimum idle scale-to-zero window observed is 15 minutes.
Faster than 15 min is not available in current UI setting.
To stop all billing immediately, use Pause endpoint.
Scale-to-zero (min replicas 0) is good for auto-wake behavior with cold starts.

Files in Custom Endpoint Repo (Expected)

handler.py -> custom inference logic
requirements.txt -> runtime dependencies
README.md -> optional docs/config context

Smoke-Test Handler Behavior (Current)

Current handler:

Does NOT load ACE-Step model.
Generates synthetic audio via numpy sine + noise.
Returns:
- audio_base64_wav
- sample_rate
- duration_sec

This validated endpoint wiring, auth, request/response format, and client decode pipeline.

What Needs to Happen Next (Critical Path)

1) Replace fallback generation with real ACE-Step inference

In handler.py:

__init__:
- Load ACE-Step pipeline/model once at container startup.
- Use model source ACE-Step/Ace-Step1.5.
- Move model to CUDA when available.
__call__:
- Parse request inputs:
  - prompt
  - lyrics
  - duration_sec
  - sample_rate
  - seed
  - optional: guidance_scale, steps, use_lm
- Execute ACE-Step generation.
- Convert output waveform to WAV bytes.
- Return base64 WAV in JSON.

2) Ensure dependencies are correct in `requirements.txt`

At minimum for current scaffold:

numpy
soundfile

Likely needed for ACE runtime:

torch
torchaudio
transformers
accelerate
huggingface_hub
any ACE-Step-specific package requirements from ACE docs/repo.

3) Push repo changes and redeploy/rebuild endpoint

git add .
git commit -m "..."
git push
wait for endpoint rebuild healthy status.

4) Run generation call with real payload

Use same client script and include prompt/lyrics.

Request Payload Contract (Target)

{
  "inputs": {
    "prompt": "upbeat pop rap with emotional guitar",
    "lyrics": "[Verse] city lights and midnight rain",
    "duration_sec": 12,
    "sample_rate": 44100,
    "seed": 42,
    "guidance_scale": 7.0,
    "steps": 50,
    "use_lm": true
  }
}