Spaces:

lablab-ai-amd-developer-hackathon
/

signbridge

Running

App Files Files Community

signbridge / docs /demo-video-script.md

LucasLooTan

docs+pptx: refresh all submission deliverables to match shipping pipeline

fb11c61 19 days ago

preview code

raw

history blame contribute delete

7.46 kB

	# SignBridge — Demo Video Script

	> Target length: 2:30 (≤ 3 min). Format: 1080p MP4, MP3 audio. Aspect ratio 16:9.
	> Tools: QuickTime Player (Mac) for screen + camera capture, iMovie or CapCut for editing.

	---

	## Story arc (3 acts)

	\| Time \| Act \| Beat \|
	\|---\|---\|---\|
	\| 0:00–0:20 \| Hook \| Open with the human problem; viewer must feel the gap. \|
	\| 0:20–1:30 \| Demo \| Live SignBridge in action — both fingerspelling AND a motion sign. \|
	\| 1:30–2:30 \| Why AMD + close \| Architecture diagram + concrete MI300X comparison + open-source ethics + URL. \|

	Hard rule: no slide-by-slide voice-over reading. The demo should play live; voice-over should narrate what we're seeing, not summarise text on screen.

	---

	## Shot list

	### Act 1 — Hook (0:00 → 0:20)

	Visual A (5 s): Plain background, bold text card fades in:
	> 70 million deaf people. Interpreters cost $50–200 / hour. They're scarce.

	Visual B (5 s): Text card → "What if your phone could just translate?"

	Visual C (10 s): Camera shot of you (Lucas) in a quiet room, signing HELLO at the camera silently. No voice-over yet. Hold the silence — let the viewer feel that the sign means nothing to them.

	Voice-over: (starts at 0:15)
	> "Most of us can't read this. SignBridge can."

	---

	### Act 2 — Live demo (0:20 → 1:30)

	Setup (0:20 → 0:25): 5-second screen-recording of the live HF Space loading at `huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge`. URL bar visible. Tabs visible: "Snapshot" and "Record sign". This proves it's a live deployed product, not a slide deck.

	Beat 2A — Fingerspelling (0:25 → 0:55):

	Visual (split screen recommended): Left = your face/hand on webcam, right = the Gradio app receiving frames.
	- Sign L clearly. Click the 📷 camera button in the preview. App shows "✓ added L (98%)".
	- Sign U. Click 📷 again.
	- Sign C. 📷.
	- Sign A. 📷.
	- Sign S. 📷.
	- Click 🔊 Speak. App composes → speaks: "Lucas."

	Voice-over during this beat:
	> "First, fingerspelling. I sign each letter, the app captures it, and—" (pause for the speak) — "composed in natural English."

	Beat 2B — Motion sign (0:55 → 1:25):

	Visual: Switch tabs to Record sign. Hit Record, sign HELLO (the wave-from-forehead motion), stop, click Submit.
	- Detected: hello (85%). Click Speak.
	- App says: "Hello."

	Repeat one more sign for variety: THANK_YOU.

	Voice-over:
	> "But fingerspelling alone isn't real ASL — most signs are motion. Hold-to-record captures the whole gesture, not just one frame. The system detects the motion across frames and..." (pause for the speak)

	Beat 2C — Two-person scene (1:25 → 1:30): (optional but high-impact)

	Visual: You sign something to a hearing person; they hear the AI say it; they react. Hold the human reaction for 2 seconds.

	No voice-over during this beat — let the moment land.

	---

	### Act 3 — Architecture + AMD pitch (1:30 → 2:30)

	Beat 3A — Architecture diagram (1:30 → 1:55):

	Visual: Static slide showing the pipeline:
	```
	Webcam recording → ffmpeg → fine-tuned Qwen3-VL-8B (native video_url)
	↓
	Qwen3-8B (composer)
	↓
	gTTS (speech)
	Both LLMs concurrent on a single AMD Instinct MI300X
	```

	Voice-over:
	> "Under the hood: our fine-tuned Qwen3-VL-8B receives the recorded clip natively via vLLM's video_url block, Qwen3-8B composes the sentence, gTTS speaks it — both Qwen models running concurrently on a single AMD Instinct MI300X. Vision and reasoning on one GPU."

	Beat 3B — The MI300X comparison (1:55 → 2:15):

	Visual: The comparison table from the walkthrough:

	\| \| MI300X 1× \| H100 80 GB \|
	\|---\|---\|---\|
	\| V1 pipeline (~34 GB) \| ✅ comfortable \| ⚠ tight \|
	\| V2 with Llama-3.1-70B FP8 (~70 GB extra) \| ✅ still fits \| ❌ doesn't fit \|

	Voice-over:
	> "192 GB of HBM3. Same workload on NVIDIA H100 needs three GPUs. Practical accessibility tools running globally need the cost-and-availability profile that AMD enables."

	Beat 3C — Substrate + close (2:15 → 2:30):

	Visual: Final slide:
	- "Open source, MIT — github.com/seekerPrice/signbridge"
	- "Hugging Face Space — huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge"
	- "ASL V1. Deaf-led teams own the rest."
	- 🤟 SignBridge

	Voice-over:
	> "SignBridge is open source under MIT. It's a substrate — Deaf-led organisations deploy it for their own languages. The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. Thanks for watching."

	---

	## Voice-over recording tips

	- Record voice separately from screen capture (better audio quality). Use QuickTime "New Audio Recording" with a mic 6–12 inches away.
	- One take, then cut. Don't try to dub multiple takes line-by-line.
	- Cadence: ~140 words/min. Pause for 0.5 s after each section.
	- If you have a good pop filter / lavalier, use it. AirPods Pro built-in mic is workable but compresses dynamics.

	---

	## Editing notes

	- Captions/subtitles required. Burn in the spoken English text below the speaker's face throughout — both for accessibility and so judges can follow with sound off.
	- Highlight the recognized token visually. When the app shows "detected: hello (85%)", zoom in or add a brief highlight box on that text — judges' eyes need to find it fast.
	- Music: skip. The demo is loud enough on its own; background music distracts from the speech-output beats.
	- Smooth transitions only — don't use fancy wipes; cut on action.
	- Final cut export: 1080p, H.264, MP4, ≤100 MB if possible (lablab uploader has size limits).

	---

	## Prep before recording

	- [ ] AMD Dev Cloud credit landed (so the live demo uses MI300X — this is the hackathon talk-track); fall back to HF Inference if not.
	- [ ] Lighting: front-facing soft light. No back-window glare.
	- [ ] Plain background (white wall ideal).
	- [ ] Wear a contrasting solid colour (not patterns) — VLM accuracy improves.
	- [ ] Webcam height: at eye level. Hands need to be in frame for signs.
	- [ ] Test the live HF Space URL once before recording. If it errors, fix before pressing record.
	- [ ] One dry run end-to-end with a stopwatch. Trim if over 2:45.

	---

	## Recording order (don't shoot in story order)

	1. Live demo screen recording first — 3 takes of the full demo flow, pick the cleanest.
	2. Voice-over second — record continuous narration over the picked demo take.
	3. B-roll of you signing alone (Act 1 silent shot, Act 2C two-person reaction) — last, since they're easier to re-shoot.
	4. Edit it together in iMovie / CapCut.
	5. Export.
	6. Upload to YouTube as Unlisted, copy URL.
	7. Paste URL into lablab.ai submission form's "Video Presentation" field.

	---

	## Export checklist

	- [ ] Length 2:00–3:00
	- [ ] Captions visible throughout
	- [ ] AMD Dev Cloud / MI300X mentioned by name ≥3 times
	- [ ] Qwen3-VL mentioned by name ≥2 times (Qwen Special Reward eligibility)
	- [ ] HF Space URL shown on screen at least once
	- [ ] GitHub URL shown on screen at least once
	- [ ] No copyrighted music / footage
	- [ ] Speaker face visible (judges remember faces)
	- [ ] Final shot: SignBridge logo + URLs