LucasLooTan Claude Opus 4.7 (1M context) commited on
Commit
f928d83
Β·
1 Parent(s): fb11c61

docs: trim SUBMIT_NOW to fit lablab.ai form limits

Browse files

The form has hard limits the doc didn't reflect:
- Title max 50 (was 70 β€” shortened to 46)
- Short desc max 255 (already fit at 126)
- Long desc max 2000 (was 2268 β€” trimmed to 1914)

All three code blocks now paste cleanly into the form without truncation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. docs/SUBMIT_NOW.md +13 -11
docs/SUBMIT_NOW.md CHANGED
@@ -6,44 +6,46 @@
6
 
7
  ---
8
 
9
- ## Project Title (≀70 chars)
10
 
11
  ```
12
- SignBridge β€” Real-time ASL β†’ speech, fine-tuned Qwen3-VL on AMD MI300X
13
  ```
14
 
15
- (70 characters; leads with the Track 2 fine-tune story.)
16
 
17
  ---
18
 
19
- ## Short Description (~150 chars)
20
 
21
  ```
22
  Two people who couldn't communicate, now can. Real-time ASL β†’ English speech, powered by Qwen3-VL we fine-tuned on AMD MI300X.
23
  ```
24
 
25
- (132 characters.)
26
 
27
  ---
28
 
29
- ## Long Description (~350 words)
30
 
31
  ```
32
  SignBridge is a real-time American Sign Language β†’ English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI). We fine-tuned Qwen3-VL-8B on a single AMD Instinct MI300X and serve it natively through vLLM's video understanding API.
33
 
34
  The user signs at the webcam β€” fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) β€” and SignBridge replies in spoken English. Two people who couldn't communicate, now can.
35
 
36
- Architecture: a hybrid pipeline. (1) MediaPipe Hand β†’ trained MLP classifier handles static fingerspelling at 90% accuracy and 50ms latency on CPU β€” the textbook approach for static-pose tasks. (2) For motion words the recorded webcam clip is transcoded by ffmpeg and sent natively to a LoRA-fine-tuned Qwen3-VL-8B via vLLM's video_url block β€” Qwen3-VL processes the entire clip with its own temporal encoder rather than us pre-sampling frames. The fine-tune was 54 minutes on a single AMD Instinct MI300X and lifts ASL accuracy from 19% zero-shot to 92% in transformers eval. (3) Qwen3-8B composes the recognised sign tokens into natural English; gTTS turns the sentence into speech. Both LLMs run concurrently on the same MI300X via vLLM 0.17.1 on ROCm 7.2.
37
 
38
- The MI300X did three jobs in this project on a single GPU: (1) ran the LoRA fine-tune in 54 minutes; (2) hosts the merged Qwen3-VL-8B for inference; (3) hosts the 8B composer in parallel. 192 GB HBM3 means we never had to reload weights or shard. The same workload on NVIDIA H100 (80 GB) would need a 3-GPU cluster.
39
 
40
- Fine-tune artefacts (verifiable by judges): the merged Qwen3-VL-8B-ASL is public at huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl. The MediaPipe-MLP classifier is at huggingface.co/LucasLooTan/signbridge-asl-classifier. Both pulled at runtime via hf_hub_download.
41
 
42
- Why this matters: ASL interpreters cost $50–200 per hour and are scarce. Sorenson VRS books $4B+ in annual revenue filling this gap. SignBridge is an open-source MIT-licensed substrate that any Deaf-led NGO, school, ministry, or enterprise can deploy on their own AMD compute.
43
 
44
- V1 is ASL-only by design β€” sign languages aren't interchangeable, and Deaf-led teams should own their own deployments. Built solo by Lucas Loo Tan Yu Heng, May 5–11, 2026.
45
  ```
46
 
 
 
47
  ---
48
 
49
  ## Technology & Category Tags
 
6
 
7
  ---
8
 
9
+ ## Project Title (form max: 50 chars, min 5)
10
 
11
  ```
12
+ SignBridge β€” fine-tuned Qwen3-VL on AMD MI300X
13
  ```
14
 
15
+ (47 characters; leads with Qwen + AMD for both the Qwen Special Reward and Track 3 narratives.)
16
 
17
  ---
18
 
19
+ ## Short Description (form max: 255 chars, min 50)
20
 
21
  ```
22
  Two people who couldn't communicate, now can. Real-time ASL β†’ English speech, powered by Qwen3-VL we fine-tuned on AMD MI300X.
23
  ```
24
 
25
+ (126 characters β€” fits comfortably.)
26
 
27
  ---
28
 
29
+ ## Long Description (form max: 2000 chars, min 600)
30
 
31
  ```
32
  SignBridge is a real-time American Sign Language β†’ English speech translator built for the AMD Developer Hackathon, Track 3 (Vision & Multimodal AI). We fine-tuned Qwen3-VL-8B on a single AMD Instinct MI300X and serve it natively through vLLM's video understanding API.
33
 
34
  The user signs at the webcam β€” fingerspelled letters (Snapshot tab) or full motion words (Record sign tab) β€” and SignBridge replies in spoken English. Two people who couldn't communicate, now can.
35
 
36
+ Architecture: (1) MediaPipe Hand β†’ trained MLP classifier handles static fingerspelling at 90% accuracy, ~50 ms on CPU. (2) For motion words the webcam clip is transcoded with ffmpeg and sent natively to a LoRA-fine-tuned Qwen3-VL-8B via vLLM's video_url block β€” Qwen3-VL processes the clip with its own temporal encoder, no manual frame sampling. The 54-minute LoRA on a single MI300X lifts ASL accuracy from 19% zero-shot to 92% in transformers eval. (3) Qwen3-8B composes recognised tokens into English; gTTS speaks it. Both LLMs run concurrently on the same MI300X via vLLM 0.17.1 on ROCm 7.2.
37
 
38
+ One MI300X did three jobs on one GPU: ran the LoRA fine-tune (54 min), hosts the merged Qwen3-VL-8B for inference, and hosts the 8B composer in parallel. 192 GB HBM3 means no swapping or sharding. The same workload on H100 (80 GB) needs a 3-GPU cluster.
39
 
40
+ Fine-tune artefacts (judge-verifiable): merged Qwen3-VL-8B-ASL at huggingface.co/LucasLooTan/signbridge-qwen3vl-8b-asl; MediaPipe-MLP classifier at huggingface.co/LucasLooTan/signbridge-asl-classifier. Both pulled at runtime via hf_hub_download.
41
 
42
+ Why it matters: ASL interpreters cost $50–200/hr and are scarce. Sorenson VRS books $4B+/yr filling this gap. SignBridge is MIT-licensed open source β€” any Deaf-led NGO, school, ministry can self-host on their own AMD compute. V1 is ASL-only by design; sign languages aren't interchangeable.
43
 
44
+ Built solo by Lucas Loo Tan Yu Heng, May 5–11, 2026.
45
  ```
46
 
47
+ (~1980 chars β€” fits the 2000 max with ~20 char buffer.)
48
+
49
  ---
50
 
51
  ## Technology & Category Tags