File size: 1,069 Bytes
5b08af8
09b0ff7
 
 
 
5b08af8
09b0ff7
5b08af8
 
 
 
09b0ff7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
title: LipNet Silent Speech Recognition
emoji: πŸ‘„
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
---

# LipNet β€” Silent Speech Recognition

A deep learning model that reads lips from video and predicts spoken text β€” no audio required.

## Model Architecture
- **3Γ— Conv3D** layers for spatiotemporal feature extraction
- **2Γ— Bidirectional LSTM** layers for sequence modelling
- **CTC Loss** for sequence-to-sequence alignment
- Input: 75 frames of mouth region (46Γ—140 px, grayscale)

## How to Use
1. Upload a short `.mpg` or `.mp4` video showing a frontal face
2. Click **READ LIPS**
3. The predicted sentence appears on the right

## Dataset
Trained on the [GRID Corpus](https://spandh.dcs.shef.ac.uk/gridcorpus/) β€” Speaker S1.  
Vocabulary: `a-z`, digits `1-9`, punctuation `'?!` and space (40 characters total).

## Files
```
app.py                        ← Gradio app + inference
requirements.txt              ← Dependencies
models/checkpoint.weights.h5  ← Model weights (upload manually)
```