File size: 7,458 Bytes
549efd4 fb11c61 549efd4 fb11c61 549efd4 fb11c61 549efd4 5952553 549efd4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | # SignBridge β Demo Video Script
> Target length: **2:30 (β€ 3 min)**. Format: 1080p MP4, MP3 audio. Aspect ratio 16:9.
> Tools: QuickTime Player (Mac) for screen + camera capture, iMovie or CapCut for editing.
---
## Story arc (3 acts)
| Time | Act | Beat |
|---|---|---|
| 0:00β0:20 | **Hook** | Open with the human problem; viewer must feel the gap. |
| 0:20β1:30 | **Demo** | Live SignBridge in action β both fingerspelling AND a motion sign. |
| 1:30β2:30 | **Why AMD + close** | Architecture diagram + concrete MI300X comparison + open-source ethics + URL. |
Hard rule: **no slide-by-slide voice-over reading**. The demo should *play live*; voice-over should narrate what we're seeing, not summarise text on screen.
---
## Shot list
### Act 1 β Hook (0:00 β 0:20)
**Visual A (5 s):** Plain background, bold text card fades in:
> 70 million deaf people. Interpreters cost $50β200 / hour. They're scarce.
**Visual B (5 s):** Text card β "What if your phone could just translate?"
**Visual C (10 s):** Camera shot of you (Lucas) in a quiet room, signing HELLO at the camera silently. No voice-over yet. Hold the silence β let the viewer feel that the sign means nothing to them.
**Voice-over:** *(starts at 0:15)*
> "Most of us can't read this. SignBridge can."
---
### Act 2 β Live demo (0:20 β 1:30)
**Setup (0:20 β 0:25):** 5-second screen-recording of the live HF Space loading at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. URL bar visible. Tabs visible: "Snapshot" and "Record sign". This proves it's a live deployed product, not a slide deck.
**Beat 2A β Fingerspelling (0:25 β 0:55):**
**Visual (split screen recommended):** Left = your face/hand on webcam, right = the Gradio app receiving frames.
- Sign **L** clearly. Click the **π· camera button** in the preview. App shows "β added L (98%)".
- Sign **U**. Click π· again.
- Sign **C**. π·.
- Sign **A**. π·.
- Sign **S**. π·.
- Click **π Speak**. App composes β speaks: **"Lucas."**
**Voice-over during this beat:**
> "First, fingerspelling. I sign each letter, the app captures it, andβ" *(pause for the speak)* β *"composed in natural English."*
**Beat 2B β Motion sign (0:55 β 1:25):**
**Visual:** Switch tabs to **Record sign**. Hit Record, sign **HELLO** (the wave-from-forehead motion), stop, click Submit.
- Detected: **hello (85%)**. Click Speak.
- App says: **"Hello."**
Repeat one more sign for variety: **THANK_YOU**.
**Voice-over:**
> "But fingerspelling alone isn't real ASL β most signs are *motion*. Hold-to-record captures the whole gesture, not just one frame. The system detects the motion across frames and..." *(pause for the speak)*
**Beat 2C β Two-person scene (1:25 β 1:30):** *(optional but high-impact)*
**Visual:** You sign something to a hearing person; they hear the AI say it; they react. Hold the human reaction for 2 seconds.
**No voice-over** during this beat β let the moment land.
---
### Act 3 β Architecture + AMD pitch (1:30 β 2:30)
**Beat 3A β Architecture diagram (1:30 β 1:55):**
**Visual:** Static slide showing the pipeline:
```
Webcam recording β ffmpeg β fine-tuned Qwen3-VL-8B (native video_url)
β
Qwen3-8B (composer)
β
gTTS (speech)
Both LLMs concurrent on a single AMD Instinct MI300X
```
**Voice-over:**
> "Under the hood: our fine-tuned Qwen3-VL-8B receives the recorded clip natively via vLLM's video_url block, Qwen3-8B composes the sentence, gTTS speaks it β both Qwen models running concurrently on a single AMD Instinct MI300X. Vision and reasoning on one GPU."
**Beat 3B β The MI300X comparison (1:55 β 2:15):**
**Visual:** The comparison table from the walkthrough:
| | MI300X 1Γ | H100 80 GB |
|---|---|---|
| V1 pipeline (~34 GB) | β
comfortable | β tight |
| V2 with Llama-3.1-70B FP8 (~70 GB extra) | β
still fits | β doesn't fit |
**Voice-over:**
> "192 GB of HBM3. Same workload on NVIDIA H100 needs three GPUs. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables."
**Beat 3C β Substrate + close (2:15 β 2:30):**
**Visual:** Final slide:
- "Open source, MIT β github.com/seekerPrice/signbridge"
- "Hugging Face Space β huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge"
- "ASL V1. Deaf-led teams own the rest."
- π€ SignBridge
**Voice-over:**
> "SignBridge is open source under MIT. It's a substrate β Deaf-led organisations deploy it for their own languages. The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. Thanks for watching."
---
## Voice-over recording tips
- Record voice **separately** from screen capture (better audio quality). Use QuickTime "New Audio Recording" with a mic 6β12 inches away.
- One take, then cut. Don't try to dub multiple takes line-by-line.
- Cadence: ~140 words/min. Pause for 0.5 s after each section.
- If you have a good pop filter / lavalier, use it. AirPods Pro built-in mic is workable but compresses dynamics.
---
## Editing notes
- **Captions/subtitles required.** Burn in the spoken English text below the speaker's face throughout β both for accessibility and so judges can follow with sound off.
- **Highlight the recognized token visually.** When the app shows "detected: hello (85%)", zoom in or add a brief highlight box on that text β judges' eyes need to find it fast.
- **Music: skip.** The demo is loud enough on its own; background music distracts from the speech-output beats.
- **Smooth transitions only** β don't use fancy wipes; cut on action.
- **Final cut export:** 1080p, H.264, MP4, β€100 MB if possible (lablab uploader has size limits).
---
## Prep before recording
- [ ] AMD Dev Cloud credit landed (so the live demo uses MI300X β *this is the hackathon talk-track*); fall back to HF Inference if not.
- [ ] Lighting: front-facing soft light. No back-window glare.
- [ ] Plain background (white wall ideal).
- [ ] Wear a contrasting solid colour (not patterns) β VLM accuracy improves.
- [ ] Webcam height: at eye level. Hands need to be in frame for signs.
- [ ] Test the live HF Space URL once before recording. If it errors, fix before pressing record.
- [ ] One dry run end-to-end with a stopwatch. Trim if over 2:45.
---
## Recording order (don't shoot in story order)
1. **Live demo screen recording first** β 3 takes of the full demo flow, pick the cleanest.
2. **Voice-over second** β record continuous narration over the picked demo take.
3. **B-roll of you signing alone** (Act 1 silent shot, Act 2C two-person reaction) β last, since they're easier to re-shoot.
4. Edit it together in iMovie / CapCut.
5. Export.
6. Upload to YouTube as **Unlisted**, copy URL.
7. Paste URL into lablab.ai submission form's "Video Presentation" field.
---
## Export checklist
- [ ] Length 2:00β3:00
- [ ] Captions visible throughout
- [ ] AMD Dev Cloud / MI300X mentioned by name β₯3 times
- [ ] Qwen3-VL mentioned by name β₯2 times (Qwen Special Reward eligibility)
- [ ] HF Space URL shown on screen at least once
- [ ] GitHub URL shown on screen at least once
- [ ] No copyrighted music / footage
- [ ] Speaker face visible (judges remember faces)
- [ ] Final shot: SignBridge logo + URLs
|