File size: 7,458 Bytes
549efd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb11c61
 
 
 
 
549efd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb11c61
 
 
 
 
 
549efd4
 
 
fb11c61
549efd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5952553
549efd4
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# SignBridge β€” Demo Video Script

> Target length: **2:30 (≀ 3 min)**. Format: 1080p MP4, MP3 audio. Aspect ratio 16:9.
> Tools: QuickTime Player (Mac) for screen + camera capture, iMovie or CapCut for editing.

---

## Story arc (3 acts)

| Time | Act | Beat |
|---|---|---|
| 0:00–0:20 | **Hook** | Open with the human problem; viewer must feel the gap. |
| 0:20–1:30 | **Demo** | Live SignBridge in action β€” both fingerspelling AND a motion sign. |
| 1:30–2:30 | **Why AMD + close** | Architecture diagram + concrete MI300X comparison + open-source ethics + URL. |

Hard rule: **no slide-by-slide voice-over reading**. The demo should *play live*; voice-over should narrate what we're seeing, not summarise text on screen.

---

## Shot list

### Act 1 β€” Hook (0:00 β†’ 0:20)

**Visual A (5 s):** Plain background, bold text card fades in:
> 70 million deaf people. Interpreters cost $50–200 / hour. They're scarce.

**Visual B (5 s):** Text card β†’ "What if your phone could just translate?"

**Visual C (10 s):** Camera shot of you (Lucas) in a quiet room, signing HELLO at the camera silently. No voice-over yet. Hold the silence β€” let the viewer feel that the sign means nothing to them.

**Voice-over:** *(starts at 0:15)*
> "Most of us can't read this. SignBridge can."

---

### Act 2 β€” Live demo (0:20 β†’ 1:30)

**Setup (0:20 β†’ 0:25):** 5-second screen-recording of the live HF Space loading at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. URL bar visible. Tabs visible: "Snapshot" and "Record sign". This proves it's a live deployed product, not a slide deck.

**Beat 2A β€” Fingerspelling (0:25 β†’ 0:55):**

**Visual (split screen recommended):** Left = your face/hand on webcam, right = the Gradio app receiving frames.
- Sign **L** clearly. Click the **πŸ“· camera button** in the preview. App shows "βœ“ added L (98%)".
- Sign **U**. Click πŸ“· again.
- Sign **C**. πŸ“·.
- Sign **A**. πŸ“·.
- Sign **S**. πŸ“·.
- Click **πŸ”Š Speak**. App composes β†’ speaks: **"Lucas."**

**Voice-over during this beat:**
> "First, fingerspelling. I sign each letter, the app captures it, andβ€”" *(pause for the speak)* β€” *"composed in natural English."*

**Beat 2B β€” Motion sign (0:55 β†’ 1:25):**

**Visual:** Switch tabs to **Record sign**. Hit Record, sign **HELLO** (the wave-from-forehead motion), stop, click Submit.
- Detected: **hello (85%)**. Click Speak.
- App says: **"Hello."**

Repeat one more sign for variety: **THANK_YOU**.

**Voice-over:**
> "But fingerspelling alone isn't real ASL β€” most signs are *motion*. Hold-to-record captures the whole gesture, not just one frame. The system detects the motion across frames and..." *(pause for the speak)*

**Beat 2C β€” Two-person scene (1:25 β†’ 1:30):** *(optional but high-impact)*

**Visual:** You sign something to a hearing person; they hear the AI say it; they react. Hold the human reaction for 2 seconds.

**No voice-over** during this beat β€” let the moment land.

---

### Act 3 β€” Architecture + AMD pitch (1:30 β†’ 2:30)

**Beat 3A β€” Architecture diagram (1:30 β†’ 1:55):**

**Visual:** Static slide showing the pipeline:
```
Webcam recording β†’ ffmpeg β†’ fine-tuned Qwen3-VL-8B (native video_url)
                                      ↓
                              Qwen3-8B (composer)
                                      ↓
                                gTTS (speech)
                  Both LLMs concurrent on a single AMD Instinct MI300X
```

**Voice-over:**
> "Under the hood: our fine-tuned Qwen3-VL-8B receives the recorded clip natively via vLLM's video_url block, Qwen3-8B composes the sentence, gTTS speaks it β€” both Qwen models running concurrently on a single AMD Instinct MI300X. Vision and reasoning on one GPU."

**Beat 3B β€” The MI300X comparison (1:55 β†’ 2:15):**

**Visual:** The comparison table from the walkthrough:

| | MI300X 1Γ— | H100 80 GB |
|---|---|---|
| V1 pipeline (~34 GB) | βœ… comfortable | ⚠ tight |
| V2 with Llama-3.1-70B FP8 (~70 GB extra) | βœ… still fits | ❌ doesn't fit |

**Voice-over:**
> "192 GB of HBM3. Same workload on NVIDIA H100 needs three GPUs. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables."

**Beat 3C β€” Substrate + close (2:15 β†’ 2:30):**

**Visual:** Final slide:
- "Open source, MIT β€” github.com/seekerPrice/signbridge"
- "Hugging Face Space β€” huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge"
- "ASL V1. Deaf-led teams own the rest."
- 🀟 SignBridge

**Voice-over:**
> "SignBridge is open source under MIT. It's a substrate β€” Deaf-led organisations deploy it for their own languages. The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. Thanks for watching."

---

## Voice-over recording tips

- Record voice **separately** from screen capture (better audio quality). Use QuickTime "New Audio Recording" with a mic 6–12 inches away.
- One take, then cut. Don't try to dub multiple takes line-by-line.
- Cadence: ~140 words/min. Pause for 0.5 s after each section.
- If you have a good pop filter / lavalier, use it. AirPods Pro built-in mic is workable but compresses dynamics.

---

## Editing notes

- **Captions/subtitles required.** Burn in the spoken English text below the speaker's face throughout β€” both for accessibility and so judges can follow with sound off.
- **Highlight the recognized token visually.** When the app shows "detected: hello (85%)", zoom in or add a brief highlight box on that text β€” judges' eyes need to find it fast.
- **Music: skip.** The demo is loud enough on its own; background music distracts from the speech-output beats.
- **Smooth transitions only** β€” don't use fancy wipes; cut on action.
- **Final cut export:** 1080p, H.264, MP4, ≀100 MB if possible (lablab uploader has size limits).

---

## Prep before recording

- [ ] AMD Dev Cloud credit landed (so the live demo uses MI300X β€” *this is the hackathon talk-track*); fall back to HF Inference if not.
- [ ] Lighting: front-facing soft light. No back-window glare.
- [ ] Plain background (white wall ideal).
- [ ] Wear a contrasting solid colour (not patterns) β€” VLM accuracy improves.
- [ ] Webcam height: at eye level. Hands need to be in frame for signs.
- [ ] Test the live HF Space URL once before recording. If it errors, fix before pressing record.
- [ ] One dry run end-to-end with a stopwatch. Trim if over 2:45.

---

## Recording order (don't shoot in story order)

1. **Live demo screen recording first** β€” 3 takes of the full demo flow, pick the cleanest.
2. **Voice-over second** β€” record continuous narration over the picked demo take.
3. **B-roll of you signing alone** (Act 1 silent shot, Act 2C two-person reaction) β€” last, since they're easier to re-shoot.
4. Edit it together in iMovie / CapCut.
5. Export.
6. Upload to YouTube as **Unlisted**, copy URL.
7. Paste URL into lablab.ai submission form's "Video Presentation" field.

---

## Export checklist

- [ ] Length 2:00–3:00
- [ ] Captions visible throughout
- [ ] AMD Dev Cloud / MI300X mentioned by name β‰₯3 times
- [ ] Qwen3-VL mentioned by name β‰₯2 times (Qwen Special Reward eligibility)
- [ ] HF Space URL shown on screen at least once
- [ ] GitHub URL shown on screen at least once
- [ ] No copyrighted music / footage
- [ ] Speaker face visible (judges remember faces)
- [ ] Final shot: SignBridge logo + URLs