---
title: LipNet Silent Speech Recognition
emoji: 👄
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
---

# LipNet — Silent Speech Recognition

A deep learning model that reads lips from video and predicts spoken text — no audio required.

## Model Architecture
- **3× Conv3D** layers for spatiotemporal feature extraction
- **2× Bidirectional LSTM** layers for sequence modelling
- **CTC Loss** for sequence-to-sequence alignment
- Input: 75 frames of mouth region (46×140 px, grayscale)

## How to Use
1. Upload a short `.mpg` or `.mp4` video showing a frontal face
2. Click **READ LIPS**
3. The predicted sentence appears on the right

## Dataset
Trained on the [GRID Corpus](https://spandh.dcs.shef.ac.uk/gridcorpus/) — Speaker S1.  
Vocabulary: `a-z`, digits `1-9`, punctuation `'?!` and space (40 characters total).

## Files
```
app.py                        ← Gradio app + inference
requirements.txt              ← Dependencies
models/checkpoint.weights.h5  ← Model weights (upload manually)
```