# Deploy Inference To Your Own HF Dedicated Endpoint This guide deploys the custom `handler.py` inference runtime to a Hugging Face Dedicated Inference Endpoint. ## Prerequisites - Hugging Face account - `HF_TOKEN` with repo write access - Dedicated Endpoint access on your HF plan ## 1) Create/Update Your Endpoint Repo ```bash python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO ``` This uploads: - `handler.py` - `acestep/` - `requirements.txt` - `packages.txt` - endpoint-specific README template ## 2) Create Endpoint In HF UI 1. Go to **Inference Endpoints** -> **New endpoint**. 2. Select your custom model repo: `YOUR_USERNAME/YOUR_ENDPOINT_REPO`. 3. Choose GPU hardware. 4. Deploy. ## 3) Recommended Endpoint Environment Variables - `ACE_CONFIG_PATH` (default: `acestep-v15-sft`) - `ACE_LM_MODEL_PATH` (default: `acestep-5Hz-lm-4B`) - `ACE_LM_BACKEND` (default: `pt`) - `ACE_DOWNLOAD_SOURCE` (`huggingface` or `modelscope`) - `ACE_ENABLE_FALLBACK` (`false` recommended for strict failure visibility) ## 4) Test The Endpoint Set credentials: ```bash # Linux/macOS export HF_TOKEN=hf_xxx export HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud # Windows PowerShell $env:HF_TOKEN="hf_xxx" $env:HF_ENDPOINT_URL="https://your-endpoint-url.endpoints.huggingface.cloud" ``` Test with: - `python scripts/endpoint/generate_interactive.py` - `scripts/endpoint/test.ps1` ## Request Contract ```json { "inputs": { "prompt": "upbeat pop rap with emotional guitar", "lyrics": "[Verse] city lights and midnight rain", "duration_sec": 12, "sample_rate": 44100, "seed": 42, "guidance_scale": 7.0, "steps": 50, "use_lm": true } } ``` ## Cost Control - Use scale-to-zero for idle periods. - Pause endpoint for immediate spend stop. - Expect cold starts when scaled to zero.