File size: 1,027 Bytes
7fe13bb
798fdc2
 
 
 
7fe13bb
 
 
 
798fdc2
7fe13bb
798fdc2
7fe13bb
798fdc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
title: LipNet Silent Speech Recognition
emoji: πŸ‘„
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false
---

# LipNet β€” Silent Speech Recognition

Reads lips from video and predicts spoken text β€” no audio required.

## File Structure
```
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ models/
β”‚   └── checkpoint.weights.h5      ← upload your weights here
└── app/
    β”œβ”€β”€ app.py
    β”œβ”€β”€ modelutil.py
    β”œβ”€β”€ utils.py
    └── data/
        β”œβ”€β”€ s1/
        β”‚   └── *.mpg              ← sample videos from GRID corpus
        └── alignments/
            └── s1/
                └── *.align        ← alignment files
```

## Model
- **Input**: 75 frames, mouth crop 46Γ—140px, grayscale, z-score normalized  
- **Architecture**: Conv3D Γ— 3 β†’ Reshape β†’ BiLSTM Γ— 2 β†’ Dense(41) β†’ CTC  
- **Dataset**: GRID Corpus Speaker S1  
- **Vocab**: a–z, 1–9, `'`, `?`, `!`, space (40 chars + CTC blank = 41)