Spaces:
Runtime error
Runtime error
| title: Lip Reading | |
| emoji: ๐ | |
| colorFrom: indigo | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.5.0 | |
| app_file: app.py | |
| pinned: false | |
| # Lip Reading | |
| Lip-reading demo using TensorFlow, MediaPipe, and Gradio. Upload a short clip or record with your webcam to get a transcription generated from mouth movements. | |
| ## Features | |
| - Gradio UI with upload + webcam tabs | |
| - TensorFlow model loaded once and reused | |
| - MediaPipe lip cropping and normalization with frame caps for stability | |
| - Configurable ports, sharing, model path, and preprocessing thresholds via environment variables | |
| ## Quickstart (local) | |
| 1. Create a virtual environment | |
| - Windows: `python -m venv .venv && .\.venv\Scripts\Activate.ps1` | |
| - macOS/Linux: `python -m venv .venv && source .venv/bin/activate` | |
| 2. Install dependencies | |
| `pip install -r requirements.txt` | |
| 3. Run the app | |
| `python app.py` | |
| 4. Open the URL printed to the console (default http://127.0.0.1:7860). Set `GRADIO_SHARE=true` if you need a public link. | |
| ## Environment variables | |
| - `PORT` (default `7860`): Port for Gradio. | |
| - `GRADIO_SHARE` (`true`/`false`, default `false`): Whether to expose a public link. | |
| - `MAX_VIDEO_SIZE_MB` (default `1000`): Reject uploads larger than this. | |
| - `LIPNET_MODEL_PATH` (default `best_model_1_WER.keras`): Path to the saved model. | |
| - `LIPNET_TARGET_SIZE` (default `85`): Target square size for lip crops. | |
| - `LIPNET_MAX_FRAMES` (default `160`): Max frames processed per video to bound memory/time. | |
| - `LIPNET_DETECTION_CONFIDENCE` (default `0.5`): MediaPipe detection confidence. | |
| - `LIPNET_TRACKING_CONFIDENCE` (default `0.5`): MediaPipe tracking confidence. | |
| ## Project structure | |
| ``` | |
| app.py # Entry point | |
| best_model_1_WER.keras# Trained model weights | |
| lipnet/ | |
| __init__.py | |
| config.py # Runtime configuration | |
| model.py # Model loading, inference, decoding | |
| preprocessing.py # Lip detection, cropping, normalization | |
| ui.py # Gradio components and handlers | |
| requirements.txt | |
| ``` | |
| ## Usage tips | |
| - Keep videos short and ensure the mouth is well-lit and centered. | |
| - Supported inputs: MP4/AVI/MOV/MPG. | |
| - GPU improves speed; CPU also works but may be slower. | |
| - If no face is detected, check lighting, camera angle, and framing. | |
| ## Troubleshooting | |
| - **Model file missing**: Set `LIPNET_MODEL_PATH` to the correct `.keras` file. | |
| - **High memory use/OOM**: Lower `LIPNET_MAX_FRAMES` or reduce input resolution. | |
| - **Webcam not working**: Ensure browser permissions are granted for camera access. | |
| - **Mediapipe import error**: Reinstall with `pip install --force-reinstall mediapipe` (version >= 0.10). On Apple/ARM or Windows CPU-only, prefer the latest 0.10.x wheel. | |