Spaces:
Build error
Build error
| title: LipNet Silent Speech Recognition | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| # LipNet β Silent Speech Recognition | |
| A deep learning model that reads lips from video and predicts spoken text β no audio required. | |
| ## Model Architecture | |
| - **3Γ Conv3D** layers for spatiotemporal feature extraction | |
| - **2Γ Bidirectional LSTM** layers for sequence modelling | |
| - **CTC Loss** for sequence-to-sequence alignment | |
| - Input: 75 frames of mouth region (46Γ140 px, grayscale) | |
| ## How to Use | |
| 1. Upload a short `.mpg` or `.mp4` video showing a frontal face | |
| 2. Click **READ LIPS** | |
| 3. The predicted sentence appears on the right | |
| ## Dataset | |
| Trained on the [GRID Corpus](https://spandh.dcs.shef.ac.uk/gridcorpus/) β Speaker S1. | |
| Vocabulary: `a-z`, digits `1-9`, punctuation `'?!` and space (40 characters total). | |
| ## Files | |
| ``` | |
| app.py β Gradio app + inference | |
| requirements.txt β Dependencies | |
| models/checkpoint.weights.h5 β Model weights (upload manually) | |
| ``` | |