File size: 1,370 Bytes
8b64cd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
badd580
 
 
48e3736
badd580
48e3736
8b64cd2
48e3736
8b64cd2
48e3736
8b64cd2
48e3736
 
 
 
 
 
8b64cd2
 
 
 
48e3736
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Audio Flamingo 3 Caption Endpoint Template

Use this as a custom `handler.py` runtime for a Hugging Face Dedicated Endpoint.

## Request contract

```json
{
  "inputs": {
    "prompt": "Analyze this full song and summarize arrangement changes.",
    "audio_base64": "<base64-encoded WAV bytes>",
    "max_new_tokens": 1200,
    "temperature": 0.1
  }
}
```

## Response contract

```json
{
  "generated_text": "..."
}
```

## Setup

Fastest path from this repo:

```bash
python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO
```

Then deploy a Dedicated Endpoint from that model repo.

Important: make sure your endpoint repo contains top-level:
- `handler.py`
- `requirements.txt`
- `README.md`

Use endpoint task `custom` so the runtime loads `handler.py` instead of a default Transformers pipeline.

## Endpoint env vars

Required:
- `AF3_MODEL_ID=nvidia/audio-flamingo-3-hf`

Optional runtime bootstrap (defaults shown):
- `AF3_BOOTSTRAP_RUNTIME=1`
- `AF3_TRANSFORMERS_SPEC=transformers==5.1.0`
- `AF3_RUNTIME_DIR=/tmp/af3_runtime`
- `AF3_STUB_TORCHVISION=1`

## Notes

- Audio Flamingo 3 is large; use a GPU endpoint.
- First boot can take longer because the handler installs AF3-compatible runtime dependencies.
- This handler returns raw prose analysis. Use the local AF3+ChatGPT pipeline to normalize to LoRA sidecar JSON.