Spaces:

thienphuc12339
/

LipNet

Runtime error

App Files Files Community

LipNet / README.md

thienphuc12339

Upload 10 files

6d5d850 verified 5 months ago

preview code

raw

history blame contribute delete

2.69 kB

A newer version of the Gradio SDK is available: 6.15.0

Upgrade

metadata

title: Lip Reading
emoji: 👄
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false

Lip Reading

Lip-reading demo using TensorFlow, MediaPipe, and Gradio. Upload a short clip or record with your webcam to get a transcription generated from mouth movements.

Features

Gradio UI with upload + webcam tabs
TensorFlow model loaded once and reused
MediaPipe lip cropping and normalization with frame caps for stability
Configurable ports, sharing, model path, and preprocessing thresholds via environment variables

Quickstart (local)

Create a virtual environment
- Windows: python -m venv .venv && .\.venv\Scripts\Activate.ps1
- macOS/Linux: python -m venv .venv && source .venv/bin/activate
Install dependencies
pip install -r requirements.txt
Run the app
python app.py
Open the URL printed to the console (default http://127.0.0.1:7860). Set GRADIO_SHARE=true if you need a public link.

Environment variables

PORT (default 7860): Port for Gradio.
GRADIO_SHARE (true/false, default false): Whether to expose a public link.
MAX_VIDEO_SIZE_MB (default 1000): Reject uploads larger than this.
LIPNET_MODEL_PATH (default best_model_1_WER.keras): Path to the saved model.
LIPNET_TARGET_SIZE (default 85): Target square size for lip crops.
LIPNET_MAX_FRAMES (default 160): Max frames processed per video to bound memory/time.
LIPNET_DETECTION_CONFIDENCE (default 0.5): MediaPipe detection confidence.
LIPNET_TRACKING_CONFIDENCE (default 0.5): MediaPipe tracking confidence.

Project structure

app.py                # Entry point
best_model_1_WER.keras# Trained model weights
lipnet/
  __init__.py
  config.py           # Runtime configuration
  model.py            # Model loading, inference, decoding
  preprocessing.py    # Lip detection, cropping, normalization
  ui.py               # Gradio components and handlers
requirements.txt

Usage tips

Keep videos short and ensure the mouth is well-lit and centered.
Supported inputs: MP4/AVI/MOV/MPG.
GPU improves speed; CPU also works but may be slower.
If no face is detected, check lighting, camera angle, and framing.

Troubleshooting

Model file missing: Set LIPNET_MODEL_PATH to the correct .keras file.
High memory use/OOM: Lower LIPNET_MAX_FRAMES or reduce input resolution.
Webcam not working: Ensure browser permissions are granted for camera access.
Mediapipe import error: Reinstall with pip install --force-reinstall mediapipe (version >= 0.10). On Apple/ARM or Windows CPU-only, prefer the latest 0.10.x wheel.