Spaces:
Running
Running
| title: LipNet Silent Speech Recognition | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| # LipNet β Silent Speech Recognition | |
| Reads lips from video and predicts spoken text β no audio required. | |
| ## File Structure | |
| ``` | |
| βββ Dockerfile | |
| βββ requirements.txt | |
| βββ README.md | |
| βββ models/ | |
| β βββ checkpoint.weights.h5 β upload your weights here | |
| βββ app/ | |
| βββ app.py | |
| βββ modelutil.py | |
| βββ utils.py | |
| βββ data/ | |
| βββ s1/ | |
| β βββ *.mpg β sample videos from GRID corpus | |
| βββ alignments/ | |
| βββ s1/ | |
| βββ *.align β alignment files | |
| ``` | |
| ## Model | |
| - **Input**: 75 frames, mouth crop 46Γ140px, grayscale, z-score normalized | |
| - **Architecture**: Conv3D Γ 3 β Reshape β BiLSTM Γ 2 β Dense(41) β CTC | |
| - **Dataset**: GRID Corpus Speaker S1 | |
| - **Vocab**: aβz, 1β9, `'`, `?`, `!`, space (40 chars + CTC blank = 41) | |