from pathlib import Path
content = """# ACE-Step 1.5 Deployment Context (Hugging Face Endpoint) — Handoff Notes
Objective
Set up ACE-Step/Ace-Step1.5 on Hugging Face so we can:
- Serve music generations through a private endpoint (token-protected),
- Run on GPU (A100 preferred),
- Control costs aggressively (scale-to-zero / pause when idle),
- Call the endpoint from local scripts/app backend,
- Transition from a sine-wave smoke test to real ACE-Step generations.
Current State (What’s Already Done)
- Hugging Face auth is working in terminal (
hf authusable). - A private dedicated endpoint exists:
https://xr81s77sis7hoggq.us-east-1.aws.endpoints.huggingface.cloud
- Endpoint is connected to a custom repo containing
handler.py. - Smoke-test
handler.pywas deployed and tested successfully:- Returns base64-encoded WAV generated as sine wave/noise.
- Local
.bat+.ps1testing flow works to hit endpoint and save.wav.
Key Constraint Discovered
ACE-Step/Ace-Step1.5 is not a one-click “Model Catalog verified” endpoint deployment.
HF warning indicates:
- no verified config,
- missing
handler.pyif trying to deploy model repo directly.
Implication
Use a custom endpoint repo (our own repo) with:
handler.pyrequirements.txtThen loadACE-Step/Ace-Step1.5from code at runtime.
Important Product/Infra Notes
ZeroGPU vs Dedicated Endpoints
- ZeroGPU applies to Spaces (good for demos/prototypes).
- For production-like API serving, use Dedicated Inference Endpoints.
Idle Scaling
- In Dedicated Endpoints UI, minimum idle scale-to-zero window observed is 15 minutes.
- Faster than 15 min is not available in current UI setting.
- To stop all billing immediately, use Pause endpoint.
- Scale-to-zero (min replicas 0) is good for auto-wake behavior with cold starts.
Files in Custom Endpoint Repo (Expected)
handler.py-> custom inference logicrequirements.txt-> runtime dependenciesREADME.md-> optional docs/config context
Smoke-Test Handler Behavior (Current)
Current handler:
- Does NOT load ACE-Step model.
- Generates synthetic audio via numpy sine + noise.
- Returns:
audio_base64_wavsample_rateduration_sec
This validated endpoint wiring, auth, request/response format, and client decode pipeline.
What Needs to Happen Next (Critical Path)
1) Replace fallback generation with real ACE-Step inference
In handler.py:
__init__:- Load ACE-Step pipeline/model once at container startup.
- Use model source
ACE-Step/Ace-Step1.5. - Move model to CUDA when available.
__call__:- Parse request inputs:
promptlyricsduration_secsample_rateseed- optional:
guidance_scale,steps,use_lm
- Execute ACE-Step generation.
- Convert output waveform to WAV bytes.
- Return base64 WAV in JSON.
- Parse request inputs:
2) Ensure dependencies are correct in requirements.txt
At minimum for current scaffold:
numpysoundfile
Likely needed for ACE runtime:
torchtorchaudiotransformersacceleratehuggingface_hub- any ACE-Step-specific package requirements from ACE docs/repo.
3) Push repo changes and redeploy/rebuild endpoint
git add .git commit -m "..."git push- wait for endpoint rebuild healthy status.
4) Run generation call with real payload
Use same client script and include prompt/lyrics.
Request Payload Contract (Target)
{
"inputs": {
"prompt": "upbeat pop rap with emotional guitar",
"lyrics": "[Verse] city lights and midnight rain",
"duration_sec": 12,
"sample_rate": 44100,
"seed": 42,
"guidance_scale": 7.0,
"steps": 50,
"use_lm": true
}
}