File size: 5,463 Bytes
fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 f928d83 fb11c61 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | # SignBridge β paste-ready lablab.ai submission
> Submission deadline: **2026-05-11 03:00 Malaysia Time** (= Sunday May 10 12:00 PM Pacific Time).
> Open https://lablab.ai/ai-hackathons/amd-developer β bottom of page β **Submit Project**.
> Each block below maps 1:1 to a form field. Paste verbatim.
---
## Project Title (form max: 50 chars, min 5)
```
SignBridge β fine-tuned Qwen3-VL on AMD MI300X
```
(47 characters; leads with Qwen + AMD for both the Qwen Special Reward and Track 3 narratives.)
---
## Short Description (form max: 255 chars, min 50)
```
Two people who couldn't communicate, now can. Real-time ASL β English speech, powered by Qwen3-VL we fine-tuned on AMD MI300X.
```
(126 characters β fits comfortably.)
---
## Long Description (form max: 2000 chars, min 600)
```
SignBridge is a real-time American Sign Language β English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI). We fine-tuned Qwen3-VL-8B on a single AMD Instinct MI300X and serve it natively through vLLM's video understanding API.
The user signs at the webcam β fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) β and SignBridge replies in spoken English. Two people who couldn't communicate, now can.
Architecture: (1) MediaPipe Hand β trained MLP classifier handles static fingerspelling at 90% accuracy, ~50 ms on CPU. (2) For motion words the webcam clip is transcoded with ffmpeg and sent natively to a LoRA-fine-tuned Qwen3-VL-8B via vLLM's video_url block β Qwen3-VL processes the clip with its own temporal encoder, no manual frame sampling. The 54-minute LoRA on a single MI300X lifts ASL accuracy from 19% zero-shot to 92% in transformers eval. (3) Qwen3-8B composes recognised tokens into English; gTTS speaks it. Both LLMs run concurrently on the same MI300X via vLLM 0.17.1 on ROCm 7.2.
One MI300X did three jobs on one GPU: ran the LoRA fine-tune (54 min), hosts the merged Qwen3-VL-8B for inference, and hosts the 8B composer in parallel. 192 GB HBM3 means no swapping or sharding. The same workload on H100 (80 GB) needs a 3-GPU cluster.
Fine-tune artefacts (judge-verifiable): merged Qwen3-VL-8B-ASL at huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl; MediaPipe-MLP classifier at huggingface.co/LucasLooTan/signbridge-asl-classifier. Both pulled at runtime via hf_hub_download.
Why it matters: ASL interpreters cost $50β200/hr and are scarce. Sorenson VRS books $4B+/yr filling this gap. SignBridge is MIT-licensed open source β any Deaf-led NGO, school, ministry can self-host on their own AMD compute. V1 is ASL-only by design; sign languages aren't interchangeable.
Built solo by Lucas Loo Tan Yu Heng, May 5β11, 2026.
```
(~1980 chars β fits the 2000 max with ~20 char buffer.)
---
## Technology & Category Tags
Pick from lablab dropdown:
**Primary (must select):**
- `Qwen` and/or `Qwen3-VL`
- `AMD Developer Cloud`
- `AMD ROCm`
- `HuggingFace Spaces`
**Secondary (relevant):**
- `LLaMA` (no β we replaced this with Qwen3-8B; skip)
- `Gradio`
- `FastAPI`
- `Vision`
- `Multimodal`
- `Accessibility`
- `Open Source`
- `vLLM`
**Track:** **Track 3 β Vision & Multimodal AI** (also satisfies Track 2 fine-tuning narrative if dual-track allowed)
---
## Pipeline at a glance (May 10 β current shipping)
Paste this block anywhere a one-screen architecture summary is needed (lablab form, slide notes, README):
```
- Static fingerspelling: MediaPipe Hand β trained MLP classifier (90% accuracy, ~50 ms on CPU)
- Motion signs: webcam recording β ffmpeg (480p, 8 fps, β€4 s, H.264) β vLLM /v1/chat/completions
with a video_url block β fine-tuned Qwen3-VL-8B on AMD MI300X
- Sentence composer: Qwen3-8B on the same MI300X (vLLM, separate port)
- Speech synthesis: gTTS (Google's free TTS, fast, MP3 output)
- Live demo: HF Space (Gradio Docker SDK) β both tabs, end-to-end
```
---
## Cover Image
Upload `assets/cover.png` from the repo (1280Γ640 PNG, indigoβpink gradient with π€ + project name).
---
## Video Presentation
Paste the **YouTube Unlisted URL** of your demo video.
Reference shot list: `docs/demo-video-script.md`.
---
## Slide Presentation
Upload the **deck PDF**.
Build from `docs/pitch-deck.md`:
1. Open Google Slides β blank deck
2. Paste each slide's content into a blank slide
3. File β Download β PDF
4. Upload here
---
## Public GitHub Repository
```
https://github.com/seekerPrice/signbridge
```
---
## Demo Application Platform
```
Hugging Face Space
```
---
## Application URL
```
https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
```
---
## Final pre-submit checklist
Before clicking Submit:
- [ ] Title pasted (70 chars)
- [ ] Short description pasted (132 chars)
- [ ] Long description pasted (~350 words)
- [ ] Tags selected (at minimum: Qwen, AMD Developer Cloud, AMD ROCm, HuggingFace Spaces)
- [ ] Cover image uploaded (`assets/cover.png`)
- [ ] Video URL pasted (YouTube unlisted)
- [ ] Pitch deck PDF uploaded
- [ ] GitHub URL pasted
- [ ] HF Space URL pasted
- [ ] **Track selection: Track 3 β Vision & Multimodal AI**
- [ ] Open Space in incognito β confirm it loads
- [ ] GitHub repo public + has clean README
- [ ] LICENSE file is MIT
When all boxes ticked β click Submit β wait for confirmation email β done.
**Aim to submit by 2026-05-11 02:00 MYT** (1-hour buffer before the 03:00 cutoff).
|