Spaces:
Running on Zero
Running on Zero
| # Deploy Inference To Your Own HF Dedicated Endpoint | |
| This guide deploys the custom `handler.py` inference runtime to a Hugging Face Dedicated Inference Endpoint. | |
| ## Prerequisites | |
| - Hugging Face account | |
| - `HF_TOKEN` with repo write access | |
| - Dedicated Endpoint access on your HF plan | |
| ## 1) Create/Update Your Endpoint Repo | |
| ```bash | |
| python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO | |
| ``` | |
| This uploads: | |
| - `handler.py` | |
| - `acestep/` | |
| - `requirements.txt` | |
| - `packages.txt` | |
| - endpoint-specific README template | |
| ## 2) Create Endpoint In HF UI | |
| 1. Go to **Inference Endpoints** -> **New endpoint**. | |
| 2. Select your custom model repo: `YOUR_USERNAME/YOUR_ENDPOINT_REPO`. | |
| 3. Choose GPU hardware. | |
| 4. Deploy. | |
| ## 3) Recommended Endpoint Environment Variables | |
| - `ACE_CONFIG_PATH` (default: `acestep-v15-sft`) | |
| - `ACE_LM_MODEL_PATH` (default: `acestep-5Hz-lm-4B`) | |
| - `ACE_LM_BACKEND` (default: `pt`) | |
| - `ACE_DOWNLOAD_SOURCE` (`huggingface` or `modelscope`) | |
| - `ACE_ENABLE_FALLBACK` (`false` recommended for strict failure visibility) | |
| ## 4) Test The Endpoint | |
| Set credentials: | |
| ```bash | |
| # Linux/macOS | |
| export HF_TOKEN=hf_xxx | |
| export HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud | |
| # Windows PowerShell | |
| $env:HF_TOKEN="hf_xxx" | |
| $env:HF_ENDPOINT_URL="https://your-endpoint-url.endpoints.huggingface.cloud" | |
| ``` | |
| Test with: | |
| - `python scripts/endpoint/generate_interactive.py` | |
| - `scripts/endpoint/test.ps1` | |
| ## Request Contract | |
| ```json | |
| { | |
| "inputs": { | |
| "prompt": "upbeat pop rap with emotional guitar", | |
| "lyrics": "[Verse] city lights and midnight rain", | |
| "duration_sec": 12, | |
| "sample_rate": 44100, | |
| "seed": 42, | |
| "guidance_scale": 7.0, | |
| "steps": 50, | |
| "use_lm": true | |
| } | |
| } | |
| ``` | |
| ## Cost Control | |
| - Use scale-to-zero for idle periods. | |
| - Pause endpoint for immediate spend stop. | |
| - Expect cold starts when scaled to zero. | |