Andrew
github push
bd37cca
# Deploy Inference To Your Own HF Dedicated Endpoint
This guide deploys the custom `handler.py` inference runtime to a Hugging Face Dedicated Inference Endpoint.
## Prerequisites
- Hugging Face account
- `HF_TOKEN` with repo write access
- Dedicated Endpoint access on your HF plan
## 1) Create/Update Your Endpoint Repo
```bash
python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
```
This uploads:
- `handler.py`
- `acestep/`
- `requirements.txt`
- `packages.txt`
- endpoint-specific README template
## 2) Create Endpoint In HF UI
1. Go to **Inference Endpoints** -> **New endpoint**.
2. Select your custom model repo: `YOUR_USERNAME/YOUR_ENDPOINT_REPO`.
3. Choose GPU hardware.
4. Deploy.
## 3) Recommended Endpoint Environment Variables
- `ACE_CONFIG_PATH` (default: `acestep-v15-sft`)
- `ACE_LM_MODEL_PATH` (default: `acestep-5Hz-lm-4B`)
- `ACE_LM_BACKEND` (default: `pt`)
- `ACE_DOWNLOAD_SOURCE` (`huggingface` or `modelscope`)
- `ACE_ENABLE_FALLBACK` (`false` recommended for strict failure visibility)
## 4) Test The Endpoint
Set credentials:
```bash
# Linux/macOS
export HF_TOKEN=hf_xxx
export HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud
# Windows PowerShell
$env:HF_TOKEN="hf_xxx"
$env:HF_ENDPOINT_URL="https://your-endpoint-url.endpoints.huggingface.cloud"
```
Test with:
- `python scripts/endpoint/generate_interactive.py`
- `scripts/endpoint/test.ps1`
## Request Contract
```json
{
"inputs": {
"prompt": "upbeat pop rap with emotional guitar",
"lyrics": "[Verse] city lights and midnight rain",
"duration_sec": 12,
"sample_rate": 44100,
"seed": 42,
"guidance_scale": 7.0,
"steps": 50,
"use_lm": true
}
}
```
## Cost Control
- Use scale-to-zero for idle periods.
- Pause endpoint for immediate spend stop.
- Expect cold starts when scaled to zero.