Spaces:

thienphuc12339
/

LipNet

Runtime error

App Files Files Community

LipNet / README.md

thienphuc12339

Upload 10 files

6d5d850 verified 5 months ago

preview code

raw

history blame contribute delete

2.69 kB

	---
	title: Lip Reading
	emoji: 👄
	colorFrom: indigo
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.5.0
	app_file: app.py
	pinned: false
	---

	# Lip Reading

	Lip-reading demo using TensorFlow, MediaPipe, and Gradio. Upload a short clip or record with your webcam to get a transcription generated from mouth movements.

	## Features
	- Gradio UI with upload + webcam tabs
	- TensorFlow model loaded once and reused
	- MediaPipe lip cropping and normalization with frame caps for stability
	- Configurable ports, sharing, model path, and preprocessing thresholds via environment variables

	## Quickstart (local)
	1. Create a virtual environment
	- Windows: `python -m venv .venv && .\.venv\Scripts\Activate.ps1`
	- macOS/Linux: `python -m venv .venv && source .venv/bin/activate`
	2. Install dependencies
	`pip install -r requirements.txt`
	3. Run the app
	`python app.py`
	4. Open the URL printed to the console (default http://127.0.0.1:7860). Set `GRADIO_SHARE=true` if you need a public link.

	## Environment variables
	- `PORT` (default `7860`): Port for Gradio.
	- `GRADIO_SHARE` (`true`/`false`, default `false`): Whether to expose a public link.
	- `MAX_VIDEO_SIZE_MB` (default `1000`): Reject uploads larger than this.
	- `LIPNET_MODEL_PATH` (default `best_model_1_WER.keras`): Path to the saved model.
	- `LIPNET_TARGET_SIZE` (default `85`): Target square size for lip crops.
	- `LIPNET_MAX_FRAMES` (default `160`): Max frames processed per video to bound memory/time.
	- `LIPNET_DETECTION_CONFIDENCE` (default `0.5`): MediaPipe detection confidence.
	- `LIPNET_TRACKING_CONFIDENCE` (default `0.5`): MediaPipe tracking confidence.

	## Project structure
	```
	app.py # Entry point
	best_model_1_WER.keras# Trained model weights
	lipnet/
	__init__.py
	config.py # Runtime configuration
	model.py # Model loading, inference, decoding
	preprocessing.py # Lip detection, cropping, normalization
	ui.py # Gradio components and handlers
	requirements.txt
	```

	## Usage tips
	- Keep videos short and ensure the mouth is well-lit and centered.
	- Supported inputs: MP4/AVI/MOV/MPG.
	- GPU improves speed; CPU also works but may be slower.
	- If no face is detected, check lighting, camera angle, and framing.

	## Troubleshooting
	- Model file missing: Set `LIPNET_MODEL_PATH` to the correct `.keras` file.
	- High memory use/OOM: Lower `LIPNET_MAX_FRAMES` or reduce input resolution.
	- Webcam not working: Ensure browser permissions are granted for camera access.
	- Mediapipe import error: Reinstall with `pip install --force-reinstall mediapipe` (version >= 0.10). On Apple/ARM or Windows CPU-only, prefer the latest 0.10.x wheel.