aTrapDeer
/

ace-step15-endpoint

Model card Files Files and versions

ace-step15-endpoint / docs /deploy /ENDPOINT.md

Andrew

github push

bd37cca about 2 months ago

|

history blame contribute delete

1.87 kB

Deploy Inference To Your Own HF Dedicated Endpoint

This guide deploys the custom handler.py inference runtime to a Hugging Face Dedicated Inference Endpoint.

Prerequisites

Hugging Face account
HF_TOKEN with repo write access
Dedicated Endpoint access on your HF plan

1) Create/Update Your Endpoint Repo

python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO

This uploads:

handler.py
acestep/
requirements.txt
packages.txt
endpoint-specific README template

2) Create Endpoint In HF UI

Go to Inference Endpoints -> New endpoint.
Select your custom model repo: YOUR_USERNAME/YOUR_ENDPOINT_REPO.
Choose GPU hardware.
Deploy.

3) Recommended Endpoint Environment Variables

ACE_CONFIG_PATH (default: acestep-v15-sft)
ACE_LM_MODEL_PATH (default: acestep-5Hz-lm-4B)
ACE_LM_BACKEND (default: pt)
ACE_DOWNLOAD_SOURCE (huggingface or modelscope)
ACE_ENABLE_FALLBACK (false recommended for strict failure visibility)

4) Test The Endpoint

Set credentials:

# Linux/macOS
export HF_TOKEN=hf_xxx
export HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud

# Windows PowerShell
$env:HF_TOKEN="hf_xxx"
$env:HF_ENDPOINT_URL="https://your-endpoint-url.endpoints.huggingface.cloud"

Test with:

python scripts/endpoint/generate_interactive.py
scripts/endpoint/test.ps1

Request Contract

{
  "inputs": {
    "prompt": "upbeat pop rap with emotional guitar",
    "lyrics": "[Verse] city lights and midnight rain",
    "duration_sec": 12,
    "sample_rate": 44100,
    "seed": 42,
    "guidance_scale": 7.0,
    "steps": 50,
    "use_lm": true
  }
}

Cost Control

Use scale-to-zero for idle periods.
Pause endpoint for immediate spend stop.
Expect cold starts when scaled to zero.