lip_reader / README.md
omm7's picture
Upload README.md with huggingface_hub
798fdc2 verified
---
title: LipNet Silent Speech Recognition
emoji: πŸ‘„
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false
---
# LipNet β€” Silent Speech Recognition
Reads lips from video and predicts spoken text β€” no audio required.
## File Structure
```
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ models/
β”‚ └── checkpoint.weights.h5 ← upload your weights here
└── app/
β”œβ”€β”€ app.py
β”œβ”€β”€ modelutil.py
β”œβ”€β”€ utils.py
└── data/
β”œβ”€β”€ s1/
β”‚ └── *.mpg ← sample videos from GRID corpus
└── alignments/
└── s1/
└── *.align ← alignment files
```
## Model
- **Input**: 75 frames, mouth crop 46Γ—140px, grayscale, z-score normalized
- **Architecture**: Conv3D Γ— 3 β†’ Reshape β†’ BiLSTM Γ— 2 β†’ Dense(41) β†’ CTC
- **Dataset**: GRID Corpus Speaker S1
- **Vocab**: a–z, 1–9, `'`, `?`, `!`, space (40 chars + CTC blank = 41)