ace-step15-endpoint / docs /Summary1.md
Andrew
major changes
e3ac746
from pathlib import Path
content = """# ACE-Step 1.5 Deployment Context (Hugging Face Endpoint) — Handoff Notes
## Objective
Set up `ACE-Step/Ace-Step1.5` on Hugging Face so we can:
1. Serve music generations through a private endpoint (token-protected),
2. Run on GPU (A100 preferred),
3. Control costs aggressively (scale-to-zero / pause when idle),
4. Call the endpoint from local scripts/app backend,
5. Transition from a sine-wave smoke test to real ACE-Step generations.
---
## Current State (What’s Already Done)
- Hugging Face auth is working in terminal (`hf auth` usable).
- A private dedicated endpoint exists:
- `https://xr81s77sis7hoggq.us-east-1.aws.endpoints.huggingface.cloud`
- Endpoint is connected to a custom repo containing `handler.py`.
- Smoke-test `handler.py` was deployed and tested successfully:
- Returns base64-encoded WAV generated as sine wave/noise.
- Local `.bat` + `.ps1` testing flow works to hit endpoint and save `.wav`.
---
## Key Constraint Discovered
`ACE-Step/Ace-Step1.5` is **not** a one-click “Model Catalog verified” endpoint deployment.
HF warning indicates:
- no verified config,
- missing `handler.py` if trying to deploy model repo directly.
### Implication
Use a **custom endpoint repo** (our own repo) with:
- `handler.py`
- `requirements.txt`
Then load `ACE-Step/Ace-Step1.5` from code at runtime.
---
## Important Product/Infra Notes
### ZeroGPU vs Dedicated Endpoints
- **ZeroGPU** applies to **Spaces** (good for demos/prototypes).
- For production-like API serving, use **Dedicated Inference Endpoints**.
### Idle Scaling
- In Dedicated Endpoints UI, minimum idle scale-to-zero window observed is **15 minutes**.
- Faster than 15 min is not available in current UI setting.
- To stop all billing immediately, use **Pause** endpoint.
- Scale-to-zero (min replicas 0) is good for auto-wake behavior with cold starts.
---
## Files in Custom Endpoint Repo (Expected)
- `handler.py` -> custom inference logic
- `requirements.txt` -> runtime dependencies
- `README.md` -> optional docs/config context
---
## Smoke-Test Handler Behavior (Current)
Current handler:
- Does NOT load ACE-Step model.
- Generates synthetic audio via numpy sine + noise.
- Returns:
- `audio_base64_wav`
- `sample_rate`
- `duration_sec`
This validated endpoint wiring, auth, request/response format, and client decode pipeline.
---
## What Needs to Happen Next (Critical Path)
## 1) Replace fallback generation with real ACE-Step inference
In `handler.py`:
- `__init__`:
- Load ACE-Step pipeline/model once at container startup.
- Use model source `ACE-Step/Ace-Step1.5`.
- Move model to CUDA when available.
- `__call__`:
- Parse request inputs:
- `prompt`
- `lyrics`
- `duration_sec`
- `sample_rate`
- `seed`
- optional: `guidance_scale`, `steps`, `use_lm`
- Execute ACE-Step generation.
- Convert output waveform to WAV bytes.
- Return base64 WAV in JSON.
## 2) Ensure dependencies are correct in `requirements.txt`
At minimum for current scaffold:
- `numpy`
- `soundfile`
Likely needed for ACE runtime:
- `torch`
- `torchaudio`
- `transformers`
- `accelerate`
- `huggingface_hub`
- any ACE-Step-specific package requirements from ACE docs/repo.
## 3) Push repo changes and redeploy/rebuild endpoint
- `git add .`
- `git commit -m "..."`
- `git push`
- wait for endpoint rebuild healthy status.
## 4) Run generation call with real payload
Use same client script and include `prompt/lyrics`.
---
## Request Payload Contract (Target)
```json
{
"inputs": {
"prompt": "upbeat pop rap with emotional guitar",
"lyrics": "[Verse] city lights and midnight rain",
"duration_sec": 12,
"sample_rate": 44100,
"seed": 42,
"guidance_scale": 7.0,
"steps": 50,
"use_lm": true
}
}