Lip-Reader / README.md
omm7's picture
First Commit
09b0ff7 verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: LipNet Silent Speech Recognition
emoji: πŸ‘„
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

LipNet β€” Silent Speech Recognition

A deep learning model that reads lips from video and predicts spoken text β€” no audio required.

Model Architecture

  • 3Γ— Conv3D layers for spatiotemporal feature extraction
  • 2Γ— Bidirectional LSTM layers for sequence modelling
  • CTC Loss for sequence-to-sequence alignment
  • Input: 75 frames of mouth region (46Γ—140 px, grayscale)

How to Use

  1. Upload a short .mpg or .mp4 video showing a frontal face
  2. Click READ LIPS
  3. The predicted sentence appears on the right

Dataset

Trained on the GRID Corpus β€” Speaker S1.
Vocabulary: a-z, digits 1-9, punctuation '?! and space (40 characters total).

Files

app.py                        ← Gradio app + inference
requirements.txt              ← Dependencies
models/checkpoint.weights.h5  ← Model weights (upload manually)