| # Deploy Inference To Your Own HF Dedicated Endpoint |
|
|
| This guide deploys the custom `handler.py` inference runtime to a Hugging Face Dedicated Inference Endpoint. |
|
|
| ## Prerequisites |
|
|
| - Hugging Face account |
| - `HF_TOKEN` with repo write access |
| - Dedicated Endpoint access on your HF plan |
|
|
| ## 1) Create/Update Your Endpoint Repo |
|
|
| ```bash |
| python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO |
| ``` |
|
|
| This uploads: |
|
|
| - `handler.py` |
| - `acestep/` |
| - `requirements.txt` |
| - `packages.txt` |
| - endpoint-specific README template |
|
|
| ## 2) Create Endpoint In HF UI |
|
|
| 1. Go to **Inference Endpoints** -> **New endpoint**. |
| 2. Select your custom model repo: `YOUR_USERNAME/YOUR_ENDPOINT_REPO`. |
| 3. Choose GPU hardware. |
| 4. Deploy. |
|
|
| ## 3) Recommended Endpoint Environment Variables |
|
|
| - `ACE_CONFIG_PATH` (default: `acestep-v15-sft`) |
| - `ACE_LM_MODEL_PATH` (default: `acestep-5Hz-lm-4B`) |
| - `ACE_LM_BACKEND` (default: `pt`) |
| - `ACE_DOWNLOAD_SOURCE` (`huggingface` or `modelscope`) |
| - `ACE_ENABLE_FALLBACK` (`false` recommended for strict failure visibility) |
|
|
| ## 4) Test The Endpoint |
|
|
| Set credentials: |
|
|
| ```bash |
| # Linux/macOS |
| export HF_TOKEN=hf_xxx |
| export HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud |
| |
| # Windows PowerShell |
| $env:HF_TOKEN="hf_xxx" |
| $env:HF_ENDPOINT_URL="https://your-endpoint-url.endpoints.huggingface.cloud" |
| ``` |
|
|
| Test with: |
|
|
| - `python scripts/endpoint/generate_interactive.py` |
| - `scripts/endpoint/test.ps1` |
|
|
| ## Request Contract |
|
|
| ```json |
| { |
| "inputs": { |
| "prompt": "upbeat pop rap with emotional guitar", |
| "lyrics": "[Verse] city lights and midnight rain", |
| "duration_sec": 12, |
| "sample_rate": 44100, |
| "seed": 42, |
| "guidance_scale": 7.0, |
| "steps": 50, |
| "use_lm": true |
| } |
| } |
| ``` |
|
|
| ## Cost Control |
|
|
| - Use scale-to-zero for idle periods. |
| - Pause endpoint for immediate spend stop. |
| - Expect cold starts when scaled to zero. |
|
|