--- title: LipNet Silent Speech Recognition emoji: 👄 colorFrom: purple colorTo: indigo sdk: docker pinned: false --- # LipNet — Silent Speech Recognition Reads lips from video and predicts spoken text — no audio required. ## File Structure ``` ├── Dockerfile ├── requirements.txt ├── README.md ├── models/ │ └── checkpoint.weights.h5 ← upload your weights here └── app/ ├── app.py ├── modelutil.py ├── utils.py └── data/ ├── s1/ │ └── *.mpg ← sample videos from GRID corpus └── alignments/ └── s1/ └── *.align ← alignment files ``` ## Model - **Input**: 75 frames, mouth crop 46×140px, grayscale, z-score normalized - **Architecture**: Conv3D × 3 → Reshape → BiLSTM × 2 → Dense(41) → CTC - **Dataset**: GRID Corpus Speaker S1 - **Vocab**: a–z, 1–9, `'`, `?`, `!`, space (40 chars + CTC blank = 41)