LipNet / README.md
thienphuc12339's picture
Upload 10 files
6d5d850 verified

A newer version of the Gradio SDK is available: 6.15.0

Upgrade
metadata
title: Lip Reading
emoji: 👄
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false

Lip Reading

Lip-reading demo using TensorFlow, MediaPipe, and Gradio. Upload a short clip or record with your webcam to get a transcription generated from mouth movements.

Features

  • Gradio UI with upload + webcam tabs
  • TensorFlow model loaded once and reused
  • MediaPipe lip cropping and normalization with frame caps for stability
  • Configurable ports, sharing, model path, and preprocessing thresholds via environment variables

Quickstart (local)

  1. Create a virtual environment
    • Windows: python -m venv .venv && .\.venv\Scripts\Activate.ps1
    • macOS/Linux: python -m venv .venv && source .venv/bin/activate
  2. Install dependencies
    pip install -r requirements.txt
  3. Run the app
    python app.py
  4. Open the URL printed to the console (default http://127.0.0.1:7860). Set GRADIO_SHARE=true if you need a public link.

Environment variables

  • PORT (default 7860): Port for Gradio.
  • GRADIO_SHARE (true/false, default false): Whether to expose a public link.
  • MAX_VIDEO_SIZE_MB (default 1000): Reject uploads larger than this.
  • LIPNET_MODEL_PATH (default best_model_1_WER.keras): Path to the saved model.
  • LIPNET_TARGET_SIZE (default 85): Target square size for lip crops.
  • LIPNET_MAX_FRAMES (default 160): Max frames processed per video to bound memory/time.
  • LIPNET_DETECTION_CONFIDENCE (default 0.5): MediaPipe detection confidence.
  • LIPNET_TRACKING_CONFIDENCE (default 0.5): MediaPipe tracking confidence.

Project structure

app.py                # Entry point
best_model_1_WER.keras# Trained model weights
lipnet/
  __init__.py
  config.py           # Runtime configuration
  model.py            # Model loading, inference, decoding
  preprocessing.py    # Lip detection, cropping, normalization
  ui.py               # Gradio components and handlers
requirements.txt

Usage tips

  • Keep videos short and ensure the mouth is well-lit and centered.
  • Supported inputs: MP4/AVI/MOV/MPG.
  • GPU improves speed; CPU also works but may be slower.
  • If no face is detected, check lighting, camera angle, and framing.

Troubleshooting

  • Model file missing: Set LIPNET_MODEL_PATH to the correct .keras file.
  • High memory use/OOM: Lower LIPNET_MAX_FRAMES or reduce input resolution.
  • Webcam not working: Ensure browser permissions are granted for camera access.
  • Mediapipe import error: Reinstall with pip install --force-reinstall mediapipe (version >= 0.10). On Apple/ARM or Windows CPU-only, prefer the latest 0.10.x wheel.