Lip-Reader / README.md
omm7's picture
First Commit
09b0ff7 verified
---
title: LipNet Silent Speech Recognition
emoji: πŸ‘„
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
---
# LipNet β€” Silent Speech Recognition
A deep learning model that reads lips from video and predicts spoken text β€” no audio required.
## Model Architecture
- **3Γ— Conv3D** layers for spatiotemporal feature extraction
- **2Γ— Bidirectional LSTM** layers for sequence modelling
- **CTC Loss** for sequence-to-sequence alignment
- Input: 75 frames of mouth region (46Γ—140 px, grayscale)
## How to Use
1. Upload a short `.mpg` or `.mp4` video showing a frontal face
2. Click **READ LIPS**
3. The predicted sentence appears on the right
## Dataset
Trained on the [GRID Corpus](https://spandh.dcs.shef.ac.uk/gridcorpus/) β€” Speaker S1.
Vocabulary: `a-z`, digits `1-9`, punctuation `'?!` and space (40 characters total).
## Files
```
app.py ← Gradio app + inference
requirements.txt ← Dependencies
models/checkpoint.weights.h5 ← Model weights (upload manually)
```