| # SignBridge β Pitch Deck (8 slides) |
|
|
| > Open a Google Slides deck (or Pitch). Paste each slide's content into the matching blank slide. Visuals are described in italics β replace with actual screenshots / diagrams / table renders. |
| > Aspect ratio: 16:9. Theme: indigoβpink gradient (matches HF Space card). |
|
|
| --- |
|
|
| ## Slide 1 β Title |
|
|
| **Title (huge):** |
| SignBridge |
|
|
| **Subtitle:** |
| Real-time ASL β English speech, on a single AMD Instinct MI300X. |
|
|
| **Footer (small):** |
| Track 3 Β· Vision & Multimodal AI Β· AMD Developer Hackathon 2026 Β· Lucas Loo Tan Yu Heng |
|
|
| *Visual: the cover.png we already shipped (1280Γ640 indigoβpink gradient with π€ + project name).* |
|
|
| --- |
|
|
| ## Slide 2 β The problem |
|
|
| **Headline:** |
| 70 million deaf people. Sign-language interpreters cost $50β200 per hour. They're scarce. |
|
|
| **Body bullets:** |
| - Courts, hospitals, schools, public services **must by law** provide interpretation (ADA Title II/III in the US; European Accessibility Act 2025 in the EU). |
| - **Sorenson VRS**, the dominant sign-language relay-services provider, books **$4B+ in annual revenue** filling this gap β proof the demand is enormous and budgeted-for. |
| - Existing AI alternatives (Be My Eyes, Microsoft Seeing AI) are turn-based, photo-only, English-default, and closed-source. Real ASL is *motion* β they fundamentally can't translate "HELLO" or "THANK YOU". |
|
|
| *Visual: a row of three context icons β courthouse / hospital / classroom β labeled with the mandates.* |
|
|
| --- |
|
|
| ## Slide 3 β The solution |
|
|
| **Headline:** |
| Hold to record. Sign. Speak. |
|
|
| **Body (3-step arc):** |
| 1. **Hold-to-record button** captures 1.5 seconds of your sign. |
| 2. A multi-stage pipeline (vision β reasoning β speech) translates it. |
| 3. The other person hears natural English. |
|
|
| **Tag line under the arc:** |
| Two people who couldn't communicate, now can. |
|
|
| *Visual: 3 screenshots of the live Gradio Space β (a) user signing into webcam; (b) "detected: HELLO (85%)"; (c) audio waveform playing "Hello.".* |
| *If single screenshot: just the Gradio "Record sign" tab mid-demo.* |
|
|
| --- |
|
|
| ## Slide 4 β Architecture (the AMD pitch) |
|
|
| **Headline:** |
| We fine-tuned Qwen3-VL-8B on a single MI300X β 54 minutes, 92% accuracy. |
|
|
| **Diagram (build in Slides; described as bullets):** |
| ``` |
| [ Webcam frame ] |
| β |
| βββΊ MediaPipe Hand β trained MLP classifier |
| β (90% on ASL fingerspelling, 50ms CPU) |
| β ββ falls through to β when no hand detected |
| β |
| βββΊ Fine-tuned Qwen3-VL-8B (LoRA on MI300X) |
| ββ webcam clip β ffmpeg β vLLM video_url block |
| ββ Qwen3-VL native temporal encoder (no manual frame sampling) |
| β |
| βΌ |
| [ Qwen3-8B composer ββ sign tokens β English ] |
| β |
| βΌ |
| [ gTTS ββ free, fast speech synthesis ] |
| β |
| βΌ |
| [ Audio out ] |
| ``` |
|
|
| **Comparison table (small print under diagram):** |
|
|
| | Component | Weights (FP16) | MI300X 1Γ (192 GB) | H100 80 GB | |
| |---|---|---|---| |
| | Fine-tuned Qwen3-VL-8B | ~16 GB | β
fits | β
| |
| | Qwen3-8B composer | ~16 GB | β
fits | β
| |
| | Whisper (V2 stretch) | ~3 GB | β
fits | β tight | |
| | (V2) **Llama-3.1-70B FP8 reasoner** | ~70 GB | **β
still fits** | **β doesn't fit at all** | |
|
|
| (gTTS runs as a small Python call from the Space; no GPU memory.) |
|
|
| **The MI300X did three jobs in this project:** (1) ran the LoRA fine-tune in 54 min, (2) hosts the merged 8B model for inference, (3) hosts the 8B composer in parallel β all on one GPU. That's the AMD pitch. |
|
|
| *Visual: the diagram + table as a single composite slide. Use a brand colour for the AMD column to highlight.* |
|
|
| --- |
|
|
| ## Slide 5 β Live demo |
|
|
| **Headline:** |
| *(blank β this slide is the live demo)* |
|
|
| **Speaker note:** |
| Switch to the live HF Space at huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge. 30 seconds: |
| 1. **Snapshot tab** β fingerspell L-U-C-A-S β click Speak β AI says "Lucas." |
| 2. **Record sign tab** β record HELLO β click Submit β "hello" detected β click Speak β AI says "Hello." |
|
|
| If demo fails / network down β fall back to the pre-recorded 2-min video on slide 6. |
|
|
| *Visual: leave the slide blank or use a single QR code linking to the Space URL for the audience to scan and try themselves.* |
|
|
| --- |
|
|
| ## Slide 6 β Demo video (fallback) |
|
|
| **Headline:** |
| *(blank β this slide embeds the demo video)* |
|
|
| **Embed:** |
| The 2β3 minute demo video, looping, autoplay-on-slide-show. |
|
|
| *Visual: video player.* |
|
|
| --- |
|
|
| ## Slide 6.5 β Qwen3-VL is the brain |
|
|
| **Headline:** |
| LoRA-fine-tuned Qwen3-VL-8B β the visual intelligence behind every sign. |
|
|
| **Body bullets:** |
| - The recognizer is **our LoRA-fine-tuned Qwen3-VL-8B** (`huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl`), trained in 54 minutes on a single AMD Instinct MI300X. Lifts ASL accuracy from **19% zero-shot β 92%**. |
| - For motion signs (HELLO, THANK_YOU, PLEASE, EAT) we send the **whole recorded clip natively** to Qwen3-VL via vLLM's `video_url` content block β Qwen3-VL's own temporal encoder handles the motion. No manual frame sampling. |
| - **Closed-vocabulary forcing** + domain priming keep Qwen on-rails for the 87-token sign vocab. |
| - **Qwen3-8B** then composes Qwen-VL's tokens into grammatical English (also on the MI300X via vLLM, separate port); **gTTS** synthesises the spoken sentence. |
|
|
| **Closer:** |
| Qwen3-VL is the only thing in the pipeline making the visual judgement. The rest is plumbing. |
|
|
| *Visual: a single screenshot of `signbridge/recognizer/vlm.py` showing the video_url Qwen call, alongside an arrow into a "detected: HELLO (85%)" overlay.* |
|
|
| --- |
|
|
| ## Slide 7 β Why this is the right submission for Track 3 |
|
|
| **Headline:** |
| Four judging criteria, four deliberate choices. |
|
|
| **Two-column layout:** |
|
|
| | Judging criterion | Our choice | |
| |---|---| |
| | **Application of Technology** | Multi-modal pipeline (vision + reasoning + voice) running concurrently on a single MI300X β exactly what Track 3's "massive memory bandwidth of AMD GPUs" was for. | |
| | **Presentation** | Demo is *experienced*: judge holds phone, signs HELLO, hears "Hello." 30 seconds, no explanation needed. | |
| | **Business Value** | $4B+ existing market (Sorenson VRS comparable), legally-mandated interpretation budgets, open-source so any Deaf-led NGO / ministry / school can self-host on their own AMD compute. | |
| | **Originality** | Streaming continuous multi-frame VLM agent for sign language β no peer-reviewed benchmark exists for this approach yet (we checked the literature). Real ASL motion-words, not just fingerspelling. | |
|
|
| *Visual: 2Γ2 grid of icons, one per criterion.* |
|
|
| --- |
|
|
| ## Slide 8 β Substrate, not product Β· Open Β· Deaf-led future |
|
|
| **Headline:** |
| SignBridge is a substrate. Deaf-led teams are the deployers. |
|
|
| **Body:** |
| - **MIT-licensed**, code at github.com/seekerPrice/signbridge β anyone can self-host. |
| - **ASL only V1 is a scope decision.** BSL, MSL, CSL, ISL, +200 sign languages each deserve their own teams, training data, and Deaf community leadership. (Citing Bragg et al., *"Systemic Biases in Sign Language AI Research"*, arXiv 2403.02563.) |
| - **Privacy by default** β frames and audio are processed in-memory and not persisted server-side beyond the request lifetime. |
|
|
| **Closing line (large):** |
| The hardest part of accessibility isn't building. It's deploying. AMD makes the deploying possible. |
|
|
| *Visual: world map outline with sign-language regional dots; or just the SignBridge logo with the closing tagline.* |
|
|
| --- |
|
|
| ## Speaker-note tips (read these before recording) |
|
|
| 1. **Lead with the human problem (Slide 2), not the architecture.** Architecture is for criterion 1; emotion is what closes criteria 2β4. |
| 2. **Time the live demo** β 30 seconds max. If it fails, switch to fallback video without comment. |
| 3. **Always say "AMD MI300X" by name** at least 3 times in the talk track. Sponsors notice. |
| 4. **End on the substrate framing** β pre-empts the "savior tech" critique that Deaf-AI judges look out for. |
|
|
| --- |
|
|
| ## Export |
|
|
| Once filled in: File β Download β PDF document β upload to lablab.ai submission form's "Slide Presentation" field. |
|
|