--- title: LipNet Silent Speech Recognition emoji: 👄 colorFrom: purple colorTo: blue sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false --- # LipNet — Silent Speech Recognition A deep learning model that reads lips from video and predicts spoken text — no audio required. ## Model Architecture - **3× Conv3D** layers for spatiotemporal feature extraction - **2× Bidirectional LSTM** layers for sequence modelling - **CTC Loss** for sequence-to-sequence alignment - Input: 75 frames of mouth region (46×140 px, grayscale) ## How to Use 1. Upload a short `.mpg` or `.mp4` video showing a frontal face 2. Click **READ LIPS** 3. The predicted sentence appears on the right ## Dataset Trained on the [GRID Corpus](https://spandh.dcs.shef.ac.uk/gridcorpus/) — Speaker S1. Vocabulary: `a-z`, digits `1-9`, punctuation `'?!` and space (40 characters total). ## Files ``` app.py ← Gradio app + inference requirements.txt ← Dependencies models/checkpoint.weights.h5 ← Model weights (upload manually) ```