lip_reader / README.md
omm7's picture
Upload README.md with huggingface_hub
798fdc2 verified
metadata
title: LipNet Silent Speech Recognition
emoji: πŸ‘„
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false

LipNet β€” Silent Speech Recognition

Reads lips from video and predicts spoken text β€” no audio required.

File Structure

β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ models/
β”‚   └── checkpoint.weights.h5      ← upload your weights here
└── app/
    β”œβ”€β”€ app.py
    β”œβ”€β”€ modelutil.py
    β”œβ”€β”€ utils.py
    └── data/
        β”œβ”€β”€ s1/
        β”‚   └── *.mpg              ← sample videos from GRID corpus
        └── alignments/
            └── s1/
                └── *.align        ← alignment files

Model

  • Input: 75 frames, mouth crop 46Γ—140px, grayscale, z-score normalized
  • Architecture: Conv3D Γ— 3 β†’ Reshape β†’ BiLSTM Γ— 2 β†’ Dense(41) β†’ CTC
  • Dataset: GRID Corpus Speaker S1
  • Vocab: a–z, 1–9, ', ?, !, space (40 chars + CTC blank = 41)