Spaces:

Shami96
/

YT_Video

Sleeping

YT_Video / README.md

Update README.md

07bbcb5 verified 7 months ago

1.14 kB

	---
	title: YT Video
	emoji: 😻
	colorFrom: yellow
	colorTo: blue
	sdk: gradio
	sdk_version: 5.44.1
	app_file: app.py
	pinned: false
	license: mit
	---

	# Video → ZIP Caption Prep

	Input: a video file.
	Output: a `.zip` containing:
	- `frames/` sampled JPG frames
	- `transcription.txt` from ASR
	- `explanations.json` with per-frame captions
	- `manifest.json` summary

	## Models
	- ASR: `distil-whisper/distil-large-v3`
	- Vision captions: `Salesforce/blip-image-captioning-base`

	These are open-source on Hugging Face. GPU recommended. CPU works but slower.

	## How it works
	1) Extract audio with FFmpeg.
	2) Transcribe via Whisper pipeline.
	3) Sample frames every N seconds with OpenCV.
	4) Caption each frame with BLIP.
	5) Package outputs into a ZIP for downstream use.

	## Space usage
	1. Click Upload video.
	2. Adjust Frame interval or Max frames if needed.
	3. Press Process.
	4. Download the ZIP. Preview shows a few frames, transcript snippet, and first captions.

	## Local dev
	```bash
	pip install -r requirements.txt
	python app.py
	# or CLI:
	python runner.py --video path/to/video.mp4 --interval 2.0 --max_frames 150