signbridge / docs /demo-video-script.md
LucasLooTan's picture
docs+pptx: refresh all submission deliverables to match shipping pipeline
fb11c61
# SignBridge β€” Demo Video Script
> Target length: **2:30 (≀ 3 min)**. Format: 1080p MP4, MP3 audio. Aspect ratio 16:9.
> Tools: QuickTime Player (Mac) for screen + camera capture, iMovie or CapCut for editing.
---
## Story arc (3 acts)
| Time | Act | Beat |
|---|---|---|
| 0:00–0:20 | **Hook** | Open with the human problem; viewer must feel the gap. |
| 0:20–1:30 | **Demo** | Live SignBridge in action β€” both fingerspelling AND a motion sign. |
| 1:30–2:30 | **Why AMD + close** | Architecture diagram + concrete MI300X comparison + open-source ethics + URL. |
Hard rule: **no slide-by-slide voice-over reading**. The demo should *play live*; voice-over should narrate what we're seeing, not summarise text on screen.
---
## Shot list
### Act 1 β€” Hook (0:00 β†’ 0:20)
**Visual A (5 s):** Plain background, bold text card fades in:
> 70 million deaf people. Interpreters cost $50–200 / hour. They're scarce.
**Visual B (5 s):** Text card β†’ "What if your phone could just translate?"
**Visual C (10 s):** Camera shot of you (Lucas) in a quiet room, signing HELLO at the camera silently. No voice-over yet. Hold the silence β€” let the viewer feel that the sign means nothing to them.
**Voice-over:** *(starts at 0:15)*
> "Most of us can't read this. SignBridge can."
---
### Act 2 β€” Live demo (0:20 β†’ 1:30)
**Setup (0:20 β†’ 0:25):** 5-second screen-recording of the live HF Space loading at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. URL bar visible. Tabs visible: "Snapshot" and "Record sign". This proves it's a live deployed product, not a slide deck.
**Beat 2A β€” Fingerspelling (0:25 β†’ 0:55):**
**Visual (split screen recommended):** Left = your face/hand on webcam, right = the Gradio app receiving frames.
- Sign **L** clearly. Click the **πŸ“· camera button** in the preview. App shows "βœ“ added L (98%)".
- Sign **U**. Click πŸ“· again.
- Sign **C**. πŸ“·.
- Sign **A**. πŸ“·.
- Sign **S**. πŸ“·.
- Click **πŸ”Š Speak**. App composes β†’ speaks: **"Lucas."**
**Voice-over during this beat:**
> "First, fingerspelling. I sign each letter, the app captures it, andβ€”" *(pause for the speak)* β€” *"composed in natural English."*
**Beat 2B β€” Motion sign (0:55 β†’ 1:25):**
**Visual:** Switch tabs to **Record sign**. Hit Record, sign **HELLO** (the wave-from-forehead motion), stop, click Submit.
- Detected: **hello (85%)**. Click Speak.
- App says: **"Hello."**
Repeat one more sign for variety: **THANK_YOU**.
**Voice-over:**
> "But fingerspelling alone isn't real ASL β€” most signs are *motion*. Hold-to-record captures the whole gesture, not just one frame. The system detects the motion across frames and..." *(pause for the speak)*
**Beat 2C β€” Two-person scene (1:25 β†’ 1:30):** *(optional but high-impact)*
**Visual:** You sign something to a hearing person; they hear the AI say it; they react. Hold the human reaction for 2 seconds.
**No voice-over** during this beat β€” let the moment land.
---
### Act 3 β€” Architecture + AMD pitch (1:30 β†’ 2:30)
**Beat 3A β€” Architecture diagram (1:30 β†’ 1:55):**
**Visual:** Static slide showing the pipeline:
```
Webcam recording β†’ ffmpeg β†’ fine-tuned Qwen3-VL-8B (native video_url)
↓
Qwen3-8B (composer)
↓
gTTS (speech)
Both LLMs concurrent on a single AMD Instinct MI300X
```
**Voice-over:**
> "Under the hood: our fine-tuned Qwen3-VL-8B receives the recorded clip natively via vLLM's video_url block, Qwen3-8B composes the sentence, gTTS speaks it β€” both Qwen models running concurrently on a single AMD Instinct MI300X. Vision and reasoning on one GPU."
**Beat 3B β€” The MI300X comparison (1:55 β†’ 2:15):**
**Visual:** The comparison table from the walkthrough:
| | MI300X 1Γ— | H100 80 GB |
|---|---|---|
| V1 pipeline (~34 GB) | βœ… comfortable | ⚠ tight |
| V2 with Llama-3.1-70B FP8 (~70 GB extra) | βœ… still fits | ❌ doesn't fit |
**Voice-over:**
> "192 GB of HBM3. Same workload on NVIDIA H100 needs three GPUs. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables."
**Beat 3C β€” Substrate + close (2:15 β†’ 2:30):**
**Visual:** Final slide:
- "Open source, MIT β€” github.com/seekerPrice/signbridge"
- "Hugging Face Space β€” huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge"
- "ASL V1. Deaf-led teams own the rest."
- 🀟 SignBridge
**Voice-over:**
> "SignBridge is open source under MIT. It's a substrate β€” Deaf-led organisations deploy it for their own languages. The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. Thanks for watching."
---
## Voice-over recording tips
- Record voice **separately** from screen capture (better audio quality). Use QuickTime "New Audio Recording" with a mic 6–12 inches away.
- One take, then cut. Don't try to dub multiple takes line-by-line.
- Cadence: ~140 words/min. Pause for 0.5 s after each section.
- If you have a good pop filter / lavalier, use it. AirPods Pro built-in mic is workable but compresses dynamics.
---
## Editing notes
- **Captions/subtitles required.** Burn in the spoken English text below the speaker's face throughout β€” both for accessibility and so judges can follow with sound off.
- **Highlight the recognized token visually.** When the app shows "detected: hello (85%)", zoom in or add a brief highlight box on that text β€” judges' eyes need to find it fast.
- **Music: skip.** The demo is loud enough on its own; background music distracts from the speech-output beats.
- **Smooth transitions only** β€” don't use fancy wipes; cut on action.
- **Final cut export:** 1080p, H.264, MP4, ≀100 MB if possible (lablab uploader has size limits).
---
## Prep before recording
- [ ] AMD Dev Cloud credit landed (so the live demo uses MI300X β€” *this is the hackathon talk-track*); fall back to HF Inference if not.
- [ ] Lighting: front-facing soft light. No back-window glare.
- [ ] Plain background (white wall ideal).
- [ ] Wear a contrasting solid colour (not patterns) β€” VLM accuracy improves.
- [ ] Webcam height: at eye level. Hands need to be in frame for signs.
- [ ] Test the live HF Space URL once before recording. If it errors, fix before pressing record.
- [ ] One dry run end-to-end with a stopwatch. Trim if over 2:45.
---
## Recording order (don't shoot in story order)
1. **Live demo screen recording first** β€” 3 takes of the full demo flow, pick the cleanest.
2. **Voice-over second** β€” record continuous narration over the picked demo take.
3. **B-roll of you signing alone** (Act 1 silent shot, Act 2C two-person reaction) β€” last, since they're easier to re-shoot.
4. Edit it together in iMovie / CapCut.
5. Export.
6. Upload to YouTube as **Unlisted**, copy URL.
7. Paste URL into lablab.ai submission form's "Video Presentation" field.
---
## Export checklist
- [ ] Length 2:00–3:00
- [ ] Captions visible throughout
- [ ] AMD Dev Cloud / MI300X mentioned by name β‰₯3 times
- [ ] Qwen3-VL mentioned by name β‰₯2 times (Qwen Special Reward eligibility)
- [ ] HF Space URL shown on screen at least once
- [ ] GitHub URL shown on screen at least once
- [ ] No copyrighted music / footage
- [ ] Speaker face visible (judges remember faces)
- [ ] Final shot: SignBridge logo + URLs