| # SignBridge β Demo Video Script |
|
|
| > Target length: **2:30 (β€ 3 min)**. Format: 1080p MP4, MP3 audio. Aspect ratio 16:9. |
| > Tools: QuickTime Player (Mac) for screen + camera capture, iMovie or CapCut for editing. |
|
|
| --- |
|
|
| ## Story arc (3 acts) |
|
|
| | Time | Act | Beat | |
| |---|---|---| |
| | 0:00β0:20 | **Hook** | Open with the human problem; viewer must feel the gap. | |
| | 0:20β1:30 | **Demo** | Live SignBridge in action β both fingerspelling AND a motion sign. | |
| | 1:30β2:30 | **Why AMD + close** | Architecture diagram + concrete MI300X comparison + open-source ethics + URL. | |
|
|
| Hard rule: **no slide-by-slide voice-over reading**. The demo should *play live*; voice-over should narrate what we're seeing, not summarise text on screen. |
|
|
| --- |
|
|
| ## Shot list |
|
|
| ### Act 1 β Hook (0:00 β 0:20) |
|
|
| **Visual A (5 s):** Plain background, bold text card fades in: |
| > 70 million deaf people. Interpreters cost $50β200 / hour. They're scarce. |
|
|
| **Visual B (5 s):** Text card β "What if your phone could just translate?" |
|
|
| **Visual C (10 s):** Camera shot of you (Lucas) in a quiet room, signing HELLO at the camera silently. No voice-over yet. Hold the silence β let the viewer feel that the sign means nothing to them. |
|
|
| **Voice-over:** *(starts at 0:15)* |
| > "Most of us can't read this. SignBridge can." |
|
|
| --- |
|
|
| ### Act 2 β Live demo (0:20 β 1:30) |
|
|
| **Setup (0:20 β 0:25):** 5-second screen-recording of the live HF Space loading at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. URL bar visible. Tabs visible: "Snapshot" and "Record sign". This proves it's a live deployed product, not a slide deck. |
|
|
| **Beat 2A β Fingerspelling (0:25 β 0:55):** |
|
|
| **Visual (split screen recommended):** Left = your face/hand on webcam, right = the Gradio app receiving frames. |
| - Sign **L** clearly. Click the **π· camera button** in the preview. App shows "β added L (98%)". |
| - Sign **U**. Click π· again. |
| - Sign **C**. π·. |
| - Sign **A**. π·. |
| - Sign **S**. π·. |
| - Click **π Speak**. App composes β speaks: **"Lucas."** |
|
|
| **Voice-over during this beat:** |
| > "First, fingerspelling. I sign each letter, the app captures it, andβ" *(pause for the speak)* β *"composed in natural English."* |
|
|
| **Beat 2B β Motion sign (0:55 β 1:25):** |
|
|
| **Visual:** Switch tabs to **Record sign**. Hit Record, sign **HELLO** (the wave-from-forehead motion), stop, click Submit. |
| - Detected: **hello (85%)**. Click Speak. |
| - App says: **"Hello."** |
|
|
| Repeat one more sign for variety: **THANK_YOU**. |
| |
| **Voice-over:** |
| > "But fingerspelling alone isn't real ASL β most signs are *motion*. Hold-to-record captures the whole gesture, not just one frame. The system detects the motion across frames and..." *(pause for the speak)* |
| |
| **Beat 2C β Two-person scene (1:25 β 1:30):** *(optional but high-impact)* |
| |
| **Visual:** You sign something to a hearing person; they hear the AI say it; they react. Hold the human reaction for 2 seconds. |
| |
| **No voice-over** during this beat β let the moment land. |
| |
| --- |
| |
| ### Act 3 β Architecture + AMD pitch (1:30 β 2:30) |
| |
| **Beat 3A β Architecture diagram (1:30 β 1:55):** |
| |
| **Visual:** Static slide showing the pipeline: |
| ``` |
| Webcam recording β ffmpeg β fine-tuned Qwen3-VL-8B (native video_url) |
| β |
| Qwen3-8B (composer) |
| β |
| gTTS (speech) |
| Both LLMs concurrent on a single AMD Instinct MI300X |
| ``` |
| |
| **Voice-over:** |
| > "Under the hood: our fine-tuned Qwen3-VL-8B receives the recorded clip natively via vLLM's video_url block, Qwen3-8B composes the sentence, gTTS speaks it β both Qwen models running concurrently on a single AMD Instinct MI300X. Vision and reasoning on one GPU." |
| |
| **Beat 3B β The MI300X comparison (1:55 β 2:15):** |
| |
| **Visual:** The comparison table from the walkthrough: |
| |
| | | MI300X 1Γ | H100 80 GB | |
| |---|---|---| |
| | V1 pipeline (~34 GB) | β
comfortable | β tight | |
| | V2 with Llama-3.1-70B FP8 (~70 GB extra) | β
still fits | β doesn't fit | |
| |
| **Voice-over:** |
| > "192 GB of HBM3. Same workload on NVIDIA H100 needs three GPUs. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables." |
| |
| **Beat 3C β Substrate + close (2:15 β 2:30):** |
| |
| **Visual:** Final slide: |
| - "Open source, MIT β github.com/seekerPrice/signbridge" |
| - "Hugging Face Space β huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge" |
| - "ASL V1. Deaf-led teams own the rest." |
| - π€ SignBridge |
| |
| **Voice-over:** |
| > "SignBridge is open source under MIT. It's a substrate β Deaf-led organisations deploy it for their own languages. The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. Thanks for watching." |
| |
| --- |
| |
| ## Voice-over recording tips |
| |
| - Record voice **separately** from screen capture (better audio quality). Use QuickTime "New Audio Recording" with a mic 6β12 inches away. |
| - One take, then cut. Don't try to dub multiple takes line-by-line. |
| - Cadence: ~140 words/min. Pause for 0.5 s after each section. |
| - If you have a good pop filter / lavalier, use it. AirPods Pro built-in mic is workable but compresses dynamics. |
| |
| --- |
| |
| ## Editing notes |
| |
| - **Captions/subtitles required.** Burn in the spoken English text below the speaker's face throughout β both for accessibility and so judges can follow with sound off. |
| - **Highlight the recognized token visually.** When the app shows "detected: hello (85%)", zoom in or add a brief highlight box on that text β judges' eyes need to find it fast. |
| - **Music: skip.** The demo is loud enough on its own; background music distracts from the speech-output beats. |
| - **Smooth transitions only** β don't use fancy wipes; cut on action. |
| - **Final cut export:** 1080p, H.264, MP4, β€100 MB if possible (lablab uploader has size limits). |
| |
| --- |
| |
| ## Prep before recording |
| |
| - [ ] AMD Dev Cloud credit landed (so the live demo uses MI300X β *this is the hackathon talk-track*); fall back to HF Inference if not. |
| - [ ] Lighting: front-facing soft light. No back-window glare. |
| - [ ] Plain background (white wall ideal). |
| - [ ] Wear a contrasting solid colour (not patterns) β VLM accuracy improves. |
| - [ ] Webcam height: at eye level. Hands need to be in frame for signs. |
| - [ ] Test the live HF Space URL once before recording. If it errors, fix before pressing record. |
| - [ ] One dry run end-to-end with a stopwatch. Trim if over 2:45. |
| |
| --- |
| |
| ## Recording order (don't shoot in story order) |
| |
| 1. **Live demo screen recording first** β 3 takes of the full demo flow, pick the cleanest. |
| 2. **Voice-over second** β record continuous narration over the picked demo take. |
| 3. **B-roll of you signing alone** (Act 1 silent shot, Act 2C two-person reaction) β last, since they're easier to re-shoot. |
| 4. Edit it together in iMovie / CapCut. |
| 5. Export. |
| 6. Upload to YouTube as **Unlisted**, copy URL. |
| 7. Paste URL into lablab.ai submission form's "Video Presentation" field. |
| |
| --- |
| |
| ## Export checklist |
| |
| - [ ] Length 2:00β3:00 |
| - [ ] Captions visible throughout |
| - [ ] AMD Dev Cloud / MI300X mentioned by name β₯3 times |
| - [ ] Qwen3-VL mentioned by name β₯2 times (Qwen Special Reward eligibility) |
| - [ ] HF Space URL shown on screen at least once |
| - [ ] GitHub URL shown on screen at least once |
| - [ ] No copyrighted music / footage |
| - [ ] Speaker face visible (judges remember faces) |
| - [ ] Final shot: SignBridge logo + URLs |
| |