# Deploy Audio Flamingo 3 Caption Endpoint (Dedicated Endpoint)
Note: this guide is for the HF-converted audio-flamingo-3-hf runtime path.
For NVIDIA Space stack parity (llava + stage35 think adapter), use:
docs/deploy/AF3_NVIDIA_ENDPOINT.md.
1) Create endpoint runtime repo
python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO
This pushes:
handler.pyrequirements.txtREADME.md
from templates/hf-af3-caption-endpoint/.
2) Create endpoint from that model repo
In Hugging Face Endpoints:
- Create endpoint from
YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO. - Choose a GPU instance.
- Set task to
custom. - Set env vars:
AF3_MODEL_ID=nvidia/audio-flamingo-3-hfAF3_BOOTSTRAP_RUNTIME=1AF3_TRANSFORMERS_SPEC=transformers==5.1.0
3) Validate startup
If logs contain:
No custom pipeline found at /repository/handler.py
then handler.py is not in repo root. Re-upload the runtime template files.
If logs contain:
Failed to load AF3 processor classes after runtime bootstrap
keep endpoint task as custom, then check that startup could install runtime deps (network + disk). The first cold start can take several minutes.
4) Connect from local pipeline
Set:
HF_AF3_ENDPOINT_URLHF_TOKENOPENAI_API_KEY
Recommended local .env:
HF_AF3_ENDPOINT_URL=https://bc3r76slij67lskb.us-east-1.aws.endpoints.huggingface.cloud
HF_TOKEN=hf_xxx
OPENAI_API_KEY=sk-...
.env is git-ignored in this repo. Do not commit real credentials.
Then run:
python scripts/pipeline/run_af3_chatgpt_pipeline.py \
--audio ./train-dataset/track.mp3 \
--backend hf_endpoint \
--endpoint-url "$HF_AF3_ENDPOINT_URL" \
--openai-api-key "$OPENAI_API_KEY"
Or launch full GUI stack:
python af3_gui_app.py