Spaces:
Running
Running
metadata
title: LipNet Silent Speech Recognition
emoji: π
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false
LipNet β Silent Speech Recognition
Reads lips from video and predicts spoken text β no audio required.
File Structure
βββ Dockerfile
βββ requirements.txt
βββ README.md
βββ models/
β βββ checkpoint.weights.h5 β upload your weights here
βββ app/
βββ app.py
βββ modelutil.py
βββ utils.py
βββ data/
βββ s1/
β βββ *.mpg β sample videos from GRID corpus
βββ alignments/
βββ s1/
βββ *.align β alignment files
Model
- Input: 75 frames, mouth crop 46Γ140px, grayscale, z-score normalized
- Architecture: Conv3D Γ 3 β Reshape β BiLSTM Γ 2 β Dense(41) β CTC
- Dataset: GRID Corpus Speaker S1
- Vocab: aβz, 1β9,
',?,!, space (40 chars + CTC blank = 41)