LipNet / README.md
thienphuc12339's picture
Upload 10 files
6d5d850 verified
---
title: Lip Reading
emoji: ๐Ÿ‘„
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
---
# Lip Reading
Lip-reading demo using TensorFlow, MediaPipe, and Gradio. Upload a short clip or record with your webcam to get a transcription generated from mouth movements.
## Features
- Gradio UI with upload + webcam tabs
- TensorFlow model loaded once and reused
- MediaPipe lip cropping and normalization with frame caps for stability
- Configurable ports, sharing, model path, and preprocessing thresholds via environment variables
## Quickstart (local)
1. Create a virtual environment
- Windows: `python -m venv .venv && .\.venv\Scripts\Activate.ps1`
- macOS/Linux: `python -m venv .venv && source .venv/bin/activate`
2. Install dependencies
`pip install -r requirements.txt`
3. Run the app
`python app.py`
4. Open the URL printed to the console (default http://127.0.0.1:7860). Set `GRADIO_SHARE=true` if you need a public link.
## Environment variables
- `PORT` (default `7860`): Port for Gradio.
- `GRADIO_SHARE` (`true`/`false`, default `false`): Whether to expose a public link.
- `MAX_VIDEO_SIZE_MB` (default `1000`): Reject uploads larger than this.
- `LIPNET_MODEL_PATH` (default `best_model_1_WER.keras`): Path to the saved model.
- `LIPNET_TARGET_SIZE` (default `85`): Target square size for lip crops.
- `LIPNET_MAX_FRAMES` (default `160`): Max frames processed per video to bound memory/time.
- `LIPNET_DETECTION_CONFIDENCE` (default `0.5`): MediaPipe detection confidence.
- `LIPNET_TRACKING_CONFIDENCE` (default `0.5`): MediaPipe tracking confidence.
## Project structure
```
app.py # Entry point
best_model_1_WER.keras# Trained model weights
lipnet/
__init__.py
config.py # Runtime configuration
model.py # Model loading, inference, decoding
preprocessing.py # Lip detection, cropping, normalization
ui.py # Gradio components and handlers
requirements.txt
```
## Usage tips
- Keep videos short and ensure the mouth is well-lit and centered.
- Supported inputs: MP4/AVI/MOV/MPG.
- GPU improves speed; CPU also works but may be slower.
- If no face is detected, check lighting, camera angle, and framing.
## Troubleshooting
- **Model file missing**: Set `LIPNET_MODEL_PATH` to the correct `.keras` file.
- **High memory use/OOM**: Lower `LIPNET_MAX_FRAMES` or reduce input resolution.
- **Webcam not working**: Ensure browser permissions are granted for camera access.
- **Mediapipe import error**: Reinstall with `pip install --force-reinstall mediapipe` (version >= 0.10). On Apple/ARM or Windows CPU-only, prefer the latest 0.10.x wheel.