ace-step15-endpoint / docs /deploy /AF3_ENDPOINT.md
Andrew
Consolidate AF3/Qwen pipelines, endpoint templates, and setup docs
8bdd018

# Deploy Audio Flamingo 3 Caption Endpoint (Dedicated Endpoint)

Note: this guide is for the HF-converted audio-flamingo-3-hf runtime path. For NVIDIA Space stack parity (llava + stage35 think adapter), use: docs/deploy/AF3_NVIDIA_ENDPOINT.md.

1) Create endpoint runtime repo

python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO

This pushes:

  • handler.py
  • requirements.txt
  • README.md

from templates/hf-af3-caption-endpoint/.

2) Create endpoint from that model repo

In Hugging Face Endpoints:

  1. Create endpoint from YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO.
  2. Choose a GPU instance.
  3. Set task to custom.
  4. Set env vars:
    • AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
    • AF3_BOOTSTRAP_RUNTIME=1
    • AF3_TRANSFORMERS_SPEC=transformers==5.1.0

3) Validate startup

If logs contain:

  • No custom pipeline found at /repository/handler.py

then handler.py is not in repo root. Re-upload the runtime template files.

If logs contain:

  • Failed to load AF3 processor classes after runtime bootstrap

keep endpoint task as custom, then check that startup could install runtime deps (network + disk). The first cold start can take several minutes.

4) Connect from local pipeline

Set:

  • HF_AF3_ENDPOINT_URL
  • HF_TOKEN
  • OPENAI_API_KEY

Recommended local .env:

HF_AF3_ENDPOINT_URL=https://bc3r76slij67lskb.us-east-1.aws.endpoints.huggingface.cloud
HF_TOKEN=hf_xxx
OPENAI_API_KEY=sk-...

.env is git-ignored in this repo. Do not commit real credentials.

Then run:

python scripts/pipeline/run_af3_chatgpt_pipeline.py \
  --audio ./train-dataset/track.mp3 \
  --backend hf_endpoint \
  --endpoint-url "$HF_AF3_ENDPOINT_URL" \
  --openai-api-key "$OPENAI_API_KEY"

Or launch full GUI stack:

python af3_gui_app.py