---
title: LipNet Silent Speech Recognition
emoji: 👄
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false
---

# LipNet — Silent Speech Recognition

Reads lips from video and predicts spoken text — no audio required.

## File Structure
```
├── Dockerfile
├── requirements.txt
├── README.md
├── models/
│   └── checkpoint.weights.h5      ← upload your weights here
└── app/
    ├── app.py
    ├── modelutil.py
    ├── utils.py
    └── data/
        ├── s1/
        │   └── *.mpg              ← sample videos from GRID corpus
        └── alignments/
            └── s1/
                └── *.align        ← alignment files
```

## Model
- **Input**: 75 frames, mouth crop 46×140px, grayscale, z-score normalized  
- **Architecture**: Conv3D × 3 → Reshape → BiLSTM × 2 → Dense(41) → CTC  
- **Dataset**: GRID Corpus Speaker S1  
- **Vocab**: a–z, 1–9, `'`, `?`, `!`, space (40 chars + CTC blank = 41)