| # SignBridge β paste-ready lablab.ai submission |
|
|
| > Submission deadline: **2026-05-11 03:00 Malaysia Time** (= Sunday May 10 12:00 PM Pacific Time). |
| > Open https://lablab.ai/ai-hackathons/amd-developer β bottom of page β **Submit Project**. |
| > Each block below maps 1:1 to a form field. Paste verbatim. |
|
|
| --- |
|
|
| ## Project Title (form max: 50 chars, min 5) |
|
|
| ``` |
| SignBridge β fine-tuned Qwen3-VL on AMD MI300X |
| ``` |
|
|
| (47 characters; leads with Qwen + AMD for both the Qwen Special Reward and Track 3 narratives.) |
|
|
| --- |
|
|
| ## Short Description (form max: 255 chars, min 50) |
|
|
| ``` |
| Two people who couldn't communicate, now can. Real-time ASL β English speech, powered by Qwen3-VL we fine-tuned on AMD MI300X. |
| ``` |
|
|
| (126 characters β fits comfortably.) |
|
|
| --- |
|
|
| ## Long Description (form max: 2000 chars, min 600) |
|
|
| ``` |
| SignBridge is a real-time American Sign Language β English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI). We fine-tuned Qwen3-VL-8B on a single AMD Instinct MI300X and serve it natively through vLLM's video understanding API. |
| |
| The user signs at the webcam β fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) β and SignBridge replies in spoken English. Two people who couldn't communicate, now can. |
| |
| Architecture: (1) MediaPipe Hand β trained MLP classifier handles static fingerspelling at 90% accuracy, ~50 ms on CPU. (2) For motion words the webcam clip is transcoded with ffmpeg and sent natively to a LoRA-fine-tuned Qwen3-VL-8B via vLLM's video_url block β Qwen3-VL processes the clip with its own temporal encoder, no manual frame sampling. The 54-minute LoRA on a single MI300X lifts ASL accuracy from 19% zero-shot to 92% in transformers eval. (3) Qwen3-8B composes recognised tokens into English; gTTS speaks it. Both LLMs run concurrently on the same MI300X via vLLM 0.17.1 on ROCm 7.2. |
| |
| One MI300X did three jobs on one GPU: ran the LoRA fine-tune (54 min), hosts the merged Qwen3-VL-8B for inference, and hosts the 8B composer in parallel. 192 GB HBM3 means no swapping or sharding. The same workload on H100 (80 GB) needs a 3-GPU cluster. |
| |
| Fine-tune artefacts (judge-verifiable): merged Qwen3-VL-8B-ASL at huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl; MediaPipe-MLP classifier at huggingface.co/LucasLooTan/signbridge-asl-classifier. Both pulled at runtime via hf_hub_download. |
| |
| Why it matters: ASL interpreters cost $50β200/hr and are scarce. Sorenson VRS books $4B+/yr filling this gap. SignBridge is MIT-licensed open source β any Deaf-led NGO, school, ministry can self-host on their own AMD compute. V1 is ASL-only by design; sign languages aren't interchangeable. |
| |
| Built solo by Lucas Loo Tan Yu Heng, May 5β11, 2026. |
| ``` |
|
|
| (~1980 chars β fits the 2000 max with ~20 char buffer.) |
|
|
| --- |
|
|
| ## Technology & Category Tags |
|
|
| Pick from lablab dropdown: |
|
|
| **Primary (must select):** |
| - `Qwen` and/or `Qwen3-VL` |
| - `AMD Developer Cloud` |
| - `AMD ROCm` |
| - `HuggingFace Spaces` |
|
|
| **Secondary (relevant):** |
| - `LLaMA` (no β we replaced this with Qwen3-8B; skip) |
| - `Gradio` |
| - `FastAPI` |
| - `Vision` |
| - `Multimodal` |
| - `Accessibility` |
| - `Open Source` |
| - `vLLM` |
|
|
| **Track:** **Track 3 β Vision & Multimodal AI** (also satisfies Track 2 fine-tuning narrative if dual-track allowed) |
|
|
| --- |
|
|
| ## Pipeline at a glance (May 10 β current shipping) |
|
|
| Paste this block anywhere a one-screen architecture summary is needed (lablab form, slide notes, README): |
|
|
| ``` |
| - Static fingerspelling: MediaPipe Hand β trained MLP classifier (90% accuracy, ~50 ms on CPU) |
| - Motion signs: webcam recording β ffmpeg (480p, 8 fps, β€4 s, H.264) β vLLM /v1/chat/completions |
| with a video_url block β fine-tuned Qwen3-VL-8B on AMD MI300X |
| - Sentence composer: Qwen3-8B on the same MI300X (vLLM, separate port) |
| - Speech synthesis: gTTS (Google's free TTS, fast, MP3 output) |
| - Live demo: HF Space (Gradio Docker SDK) β both tabs, end-to-end |
| ``` |
|
|
| --- |
|
|
| ## Cover Image |
|
|
| Upload `assets/cover.png` from the repo (1280Γ640 PNG, indigoβpink gradient with π€ + project name). |
|
|
| --- |
|
|
| ## Video Presentation |
|
|
| Paste the **YouTube Unlisted URL** of your demo video. |
|
|
| Reference shot list: `docs/demo-video-script.md`. |
|
|
| --- |
|
|
| ## Slide Presentation |
|
|
| Upload the **deck PDF**. |
|
|
| Build from `docs/pitch-deck.md`: |
| 1. Open Google Slides β blank deck |
| 2. Paste each slide's content into a blank slide |
| 3. File β Download β PDF |
| 4. Upload here |
|
|
| --- |
|
|
| ## Public GitHub Repository |
|
|
| ``` |
| https://github.com/seekerPrice/signbridge |
| ``` |
|
|
| --- |
|
|
| ## Demo Application Platform |
|
|
| ``` |
| Hugging Face Space |
| ``` |
|
|
| --- |
|
|
| ## Application URL |
|
|
| ``` |
| https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge |
| ``` |
|
|
| --- |
|
|
| ## Final pre-submit checklist |
|
|
| Before clicking Submit: |
|
|
| - [ ] Title pasted (70 chars) |
| - [ ] Short description pasted (132 chars) |
| - [ ] Long description pasted (~350 words) |
| - [ ] Tags selected (at minimum: Qwen, AMD Developer Cloud, AMD ROCm, HuggingFace Spaces) |
| - [ ] Cover image uploaded (`assets/cover.png`) |
| - [ ] Video URL pasted (YouTube unlisted) |
| - [ ] Pitch deck PDF uploaded |
| - [ ] GitHub URL pasted |
| - [ ] HF Space URL pasted |
| - [ ] **Track selection: Track 3 β Vision & Multimodal AI** |
| - [ ] Open Space in incognito β confirm it loads |
| - [ ] GitHub repo public + has clean README |
| - [ ] LICENSE file is MIT |
|
|
| When all boxes ticked β click Submit β wait for confirmation email β done. |
|
|
| **Aim to submit by 2026-05-11 02:00 MYT** (1-hour buffer before the 03:00 cutoff). |
|
|