signbridge / docs /SUBMIT_NOW.md
LucasLooTan's picture
docs: trim SUBMIT_NOW to fit lablab.ai form limits
f928d83
# SignBridge β€” paste-ready lablab.ai submission
> Submission deadline: **2026-05-11 03:00 Malaysia Time** (= Sunday May 10 12:00 PM Pacific Time).
> Open https://lablab.ai/ai-hackathons/amd-developer β†’ bottom of page β†’ **Submit Project**.
> Each block below maps 1:1 to a form field. Paste verbatim.
---
## Project Title (form max: 50 chars, min 5)
```
SignBridge β€” fine-tuned Qwen3-VL on AMD MI300X
```
(47 characters; leads with Qwen + AMD for both the Qwen Special Reward and Track 3 narratives.)
---
## Short Description (form max: 255 chars, min 50)
```
Two people who couldn't communicate, now can. Real-time ASL β†’ English speech, powered by Qwen3-VL we fine-tuned on AMD MI300X.
```
(126 characters β€” fits comfortably.)
---
## Long Description (form max: 2000 chars, min 600)
```
SignBridge is a real-time American Sign Language β†’ English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI). We fine-tuned Qwen3-VL-8B on a single AMD Instinct MI300X and serve it natively through vLLM's video understanding API.
The user signs at the webcam β€” fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) β€” and SignBridge replies in spoken English. Two people who couldn't communicate, now can.
Architecture: (1) MediaPipe Hand β†’ trained MLP classifier handles static fingerspelling at 90% accuracy, ~50 ms on CPU. (2) For motion words the webcam clip is transcoded with ffmpeg and sent natively to a LoRA-fine-tuned Qwen3-VL-8B via vLLM's video_url block β€” Qwen3-VL processes the clip with its own temporal encoder, no manual frame sampling. The 54-minute LoRA on a single MI300X lifts ASL accuracy from 19% zero-shot to 92% in transformers eval. (3) Qwen3-8B composes recognised tokens into English; gTTS speaks it. Both LLMs run concurrently on the same MI300X via vLLM 0.17.1 on ROCm 7.2.
One MI300X did three jobs on one GPU: ran the LoRA fine-tune (54 min), hosts the merged Qwen3-VL-8B for inference, and hosts the 8B composer in parallel. 192 GB HBM3 means no swapping or sharding. The same workload on H100 (80 GB) needs a 3-GPU cluster.
Fine-tune artefacts (judge-verifiable): merged Qwen3-VL-8B-ASL at huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl; MediaPipe-MLP classifier at huggingface.co/LucasLooTan/signbridge-asl-classifier. Both pulled at runtime via hf_hub_download.
Why it matters: ASL interpreters cost $50–200/hr and are scarce. Sorenson VRS books $4B+/yr filling this gap. SignBridge is MIT-licensed open source β€” any Deaf-led NGO, school, ministry can self-host on their own AMD compute. V1 is ASL-only by design; sign languages aren't interchangeable.
Built solo by Lucas Loo Tan Yu Heng, May 5–11, 2026.
```
(~1980 chars β€” fits the 2000 max with ~20 char buffer.)
---
## Technology & Category Tags
Pick from lablab dropdown:
**Primary (must select):**
- `Qwen` and/or `Qwen3-VL`
- `AMD Developer Cloud`
- `AMD ROCm`
- `HuggingFace Spaces`
**Secondary (relevant):**
- `LLaMA` (no β€” we replaced this with Qwen3-8B; skip)
- `Gradio`
- `FastAPI`
- `Vision`
- `Multimodal`
- `Accessibility`
- `Open Source`
- `vLLM`
**Track:** **Track 3 β€” Vision & Multimodal AI** (also satisfies Track 2 fine-tuning narrative if dual-track allowed)
---
## Pipeline at a glance (May 10 β€” current shipping)
Paste this block anywhere a one-screen architecture summary is needed (lablab form, slide notes, README):
```
- Static fingerspelling: MediaPipe Hand β†’ trained MLP classifier (90% accuracy, ~50 ms on CPU)
- Motion signs: webcam recording β†’ ffmpeg (480p, 8 fps, ≀4 s, H.264) β†’ vLLM /v1/chat/completions
with a video_url block β†’ fine-tuned Qwen3-VL-8B on AMD MI300X
- Sentence composer: Qwen3-8B on the same MI300X (vLLM, separate port)
- Speech synthesis: gTTS (Google's free TTS, fast, MP3 output)
- Live demo: HF Space (Gradio Docker SDK) β€” both tabs, end-to-end
```
---
## Cover Image
Upload `assets/cover.png` from the repo (1280Γ—640 PNG, indigoβ†’pink gradient with 🀟 + project name).
---
## Video Presentation
Paste the **YouTube Unlisted URL** of your demo video.
Reference shot list: `docs/demo-video-script.md`.
---
## Slide Presentation
Upload the **deck PDF**.
Build from `docs/pitch-deck.md`:
1. Open Google Slides β†’ blank deck
2. Paste each slide's content into a blank slide
3. File β†’ Download β†’ PDF
4. Upload here
---
## Public GitHub Repository
```
https://github.com/seekerPrice/signbridge
```
---
## Demo Application Platform
```
Hugging Face Space
```
---
## Application URL
```
https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
```
---
## Final pre-submit checklist
Before clicking Submit:
- [ ] Title pasted (70 chars)
- [ ] Short description pasted (132 chars)
- [ ] Long description pasted (~350 words)
- [ ] Tags selected (at minimum: Qwen, AMD Developer Cloud, AMD ROCm, HuggingFace Spaces)
- [ ] Cover image uploaded (`assets/cover.png`)
- [ ] Video URL pasted (YouTube unlisted)
- [ ] Pitch deck PDF uploaded
- [ ] GitHub URL pasted
- [ ] HF Space URL pasted
- [ ] **Track selection: Track 3 β€” Vision & Multimodal AI**
- [ ] Open Space in incognito β†’ confirm it loads
- [ ] GitHub repo public + has clean README
- [ ] LICENSE file is MIT
When all boxes ticked β†’ click Submit β†’ wait for confirmation email β†’ done.
**Aim to submit by 2026-05-11 02:00 MYT** (1-hour buffer before the 03:00 cutoff).