File size: 5,463 Bytes
fb11c61
 
 
 
 
 
 
 
f928d83
fb11c61
 
f928d83
fb11c61
 
f928d83
fb11c61
 
 
f928d83
fb11c61
 
 
 
 
f928d83
fb11c61
 
 
f928d83
fb11c61
 
 
 
 
 
f928d83
fb11c61
f928d83
fb11c61
f928d83
fb11c61
f928d83
fb11c61
f928d83
fb11c61
 
f928d83
 
fb11c61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# SignBridge β€” paste-ready lablab.ai submission

> Submission deadline: **2026-05-11 03:00 Malaysia Time** (= Sunday May 10 12:00 PM Pacific Time).
> Open https://lablab.ai/ai-hackathons/amd-developer β†’ bottom of page β†’ **Submit Project**.
> Each block below maps 1:1 to a form field. Paste verbatim.

---

## Project Title (form max: 50 chars, min 5)

```
SignBridge β€” fine-tuned Qwen3-VL on AMD MI300X
```

(47 characters; leads with Qwen + AMD for both the Qwen Special Reward and Track 3 narratives.)

---

## Short Description (form max: 255 chars, min 50)

```
Two people who couldn't communicate, now can. Real-time ASL β†’ English speech, powered by Qwen3-VL we fine-tuned on AMD MI300X.
```

(126 characters β€” fits comfortably.)

---

## Long Description (form max: 2000 chars, min 600)

```
SignBridge is a real-time American Sign Language β†’ English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI). We fine-tuned Qwen3-VL-8B on a single AMD Instinct MI300X and serve it natively through vLLM's video understanding API.

The user signs at the webcam β€” fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) β€” and SignBridge replies in spoken English. Two people who couldn't communicate, now can.

Architecture: (1) MediaPipe Hand β†’ trained MLP classifier handles static fingerspelling at 90% accuracy, ~50 ms on CPU. (2) For motion words the webcam clip is transcoded with ffmpeg and sent natively to a LoRA-fine-tuned Qwen3-VL-8B via vLLM's video_url block β€” Qwen3-VL processes the clip with its own temporal encoder, no manual frame sampling. The 54-minute LoRA on a single MI300X lifts ASL accuracy from 19% zero-shot to 92% in transformers eval. (3) Qwen3-8B composes recognised tokens into English; gTTS speaks it. Both LLMs run concurrently on the same MI300X via vLLM 0.17.1 on ROCm 7.2.

One MI300X did three jobs on one GPU: ran the LoRA fine-tune (54 min), hosts the merged Qwen3-VL-8B for inference, and hosts the 8B composer in parallel. 192 GB HBM3 means no swapping or sharding. The same workload on H100 (80 GB) needs a 3-GPU cluster.

Fine-tune artefacts (judge-verifiable): merged Qwen3-VL-8B-ASL at huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl; MediaPipe-MLP classifier at huggingface.co/LucasLooTan/signbridge-asl-classifier. Both pulled at runtime via hf_hub_download.

Why it matters: ASL interpreters cost $50–200/hr and are scarce. Sorenson VRS books $4B+/yr filling this gap. SignBridge is MIT-licensed open source β€” any Deaf-led NGO, school, ministry can self-host on their own AMD compute. V1 is ASL-only by design; sign languages aren't interchangeable.

Built solo by Lucas Loo Tan Yu Heng, May 5–11, 2026.
```

(~1980 chars β€” fits the 2000 max with ~20 char buffer.)

---

## Technology & Category Tags

Pick from lablab dropdown:

**Primary (must select):**
- `Qwen` and/or `Qwen3-VL`
- `AMD Developer Cloud`
- `AMD ROCm`
- `HuggingFace Spaces`

**Secondary (relevant):**
- `LLaMA` (no β€” we replaced this with Qwen3-8B; skip)
- `Gradio`
- `FastAPI`
- `Vision`
- `Multimodal`
- `Accessibility`
- `Open Source`
- `vLLM`

**Track:** **Track 3 β€” Vision & Multimodal AI** (also satisfies Track 2 fine-tuning narrative if dual-track allowed)

---

## Pipeline at a glance (May 10 β€” current shipping)

Paste this block anywhere a one-screen architecture summary is needed (lablab form, slide notes, README):

```
- Static fingerspelling: MediaPipe Hand β†’ trained MLP classifier (90% accuracy, ~50 ms on CPU)
- Motion signs: webcam recording β†’ ffmpeg (480p, 8 fps, ≀4 s, H.264) β†’ vLLM /v1/chat/completions
                 with a video_url block β†’ fine-tuned Qwen3-VL-8B on AMD MI300X
- Sentence composer: Qwen3-8B on the same MI300X (vLLM, separate port)
- Speech synthesis: gTTS (Google's free TTS, fast, MP3 output)
- Live demo: HF Space (Gradio Docker SDK) β€” both tabs, end-to-end
```

---

## Cover Image

Upload `assets/cover.png` from the repo (1280Γ—640 PNG, indigoβ†’pink gradient with 🀟 + project name).

---

## Video Presentation

Paste the **YouTube Unlisted URL** of your demo video.

Reference shot list: `docs/demo-video-script.md`.

---

## Slide Presentation

Upload the **deck PDF**.

Build from `docs/pitch-deck.md`:
1. Open Google Slides β†’ blank deck
2. Paste each slide's content into a blank slide
3. File β†’ Download β†’ PDF
4. Upload here

---

## Public GitHub Repository

```
https://github.com/seekerPrice/signbridge
```

---

## Demo Application Platform

```
Hugging Face Space
```

---

## Application URL

```
https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/signbridge
```

---

## Final pre-submit checklist

Before clicking Submit:

- [ ] Title pasted (70 chars)
- [ ] Short description pasted (132 chars)
- [ ] Long description pasted (~350 words)
- [ ] Tags selected (at minimum: Qwen, AMD Developer Cloud, AMD ROCm, HuggingFace Spaces)
- [ ] Cover image uploaded (`assets/cover.png`)
- [ ] Video URL pasted (YouTube unlisted)
- [ ] Pitch deck PDF uploaded
- [ ] GitHub URL pasted
- [ ] HF Space URL pasted
- [ ] **Track selection: Track 3 β€” Vision & Multimodal AI**
- [ ] Open Space in incognito β†’ confirm it loads
- [ ] GitHub repo public + has clean README
- [ ] LICENSE file is MIT

When all boxes ticked β†’ click Submit β†’ wait for confirmation email β†’ done.

**Aim to submit by 2026-05-11 02:00 MYT** (1-hour buffer before the 03:00 cutoff).