File size: 2,686 Bytes
8ce07f1
6d5d850
 
 
 
8ce07f1
6d5d850
8ce07f1
 
 
 
6d5d850
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
title: Lip Reading
emoji: 👄
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
---

# Lip Reading

Lip-reading demo using TensorFlow, MediaPipe, and Gradio. Upload a short clip or record with your webcam to get a transcription generated from mouth movements.

## Features
- Gradio UI with upload + webcam tabs
- TensorFlow model loaded once and reused
- MediaPipe lip cropping and normalization with frame caps for stability
- Configurable ports, sharing, model path, and preprocessing thresholds via environment variables

## Quickstart (local)
1. Create a virtual environment  
   - Windows: `python -m venv .venv && .\.venv\Scripts\Activate.ps1`  
   - macOS/Linux: `python -m venv .venv && source .venv/bin/activate`
2. Install dependencies  
   `pip install -r requirements.txt`
3. Run the app  
   `python app.py`
4. Open the URL printed to the console (default http://127.0.0.1:7860). Set `GRADIO_SHARE=true` if you need a public link.

## Environment variables
- `PORT` (default `7860`): Port for Gradio.
- `GRADIO_SHARE` (`true`/`false`, default `false`): Whether to expose a public link.
- `MAX_VIDEO_SIZE_MB` (default `1000`): Reject uploads larger than this.
- `LIPNET_MODEL_PATH` (default `best_model_1_WER.keras`): Path to the saved model.
- `LIPNET_TARGET_SIZE` (default `85`): Target square size for lip crops.
- `LIPNET_MAX_FRAMES` (default `160`): Max frames processed per video to bound memory/time.
- `LIPNET_DETECTION_CONFIDENCE` (default `0.5`): MediaPipe detection confidence.
- `LIPNET_TRACKING_CONFIDENCE` (default `0.5`): MediaPipe tracking confidence.

## Project structure
```
app.py                # Entry point
best_model_1_WER.keras# Trained model weights
lipnet/
  __init__.py
  config.py           # Runtime configuration
  model.py            # Model loading, inference, decoding
  preprocessing.py    # Lip detection, cropping, normalization
  ui.py               # Gradio components and handlers
requirements.txt
```

## Usage tips
- Keep videos short and ensure the mouth is well-lit and centered.
- Supported inputs: MP4/AVI/MOV/MPG.
- GPU improves speed; CPU also works but may be slower.
- If no face is detected, check lighting, camera angle, and framing.

## Troubleshooting
- **Model file missing**: Set `LIPNET_MODEL_PATH` to the correct `.keras` file.
- **High memory use/OOM**: Lower `LIPNET_MAX_FRAMES` or reduce input resolution.
- **Webcam not working**: Ensure browser permissions are granted for camera access.
- **Mediapipe import error**: Reinstall with `pip install --force-reinstall mediapipe` (version >= 0.10). On Apple/ARM or Windows CPU-only, prefer the latest 0.10.x wheel.