YT_Video / README.md
Shami96's picture
Update README.md
07bbcb5 verified
---
title: YT Video
emoji: 😻
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
license: mit
---
# Video → ZIP Caption Prep
Input: a video file.
Output: a `.zip` containing:
- `frames/` sampled JPG frames
- `transcription.txt` from ASR
- `explanations.json` with per-frame captions
- `manifest.json` summary
## Models
- ASR: `distil-whisper/distil-large-v3`
- Vision captions: `Salesforce/blip-image-captioning-base`
These are open-source on Hugging Face. GPU recommended. CPU works but slower.
## How it works
1) Extract audio with FFmpeg.
2) Transcribe via Whisper pipeline.
3) Sample frames every *N* seconds with OpenCV.
4) Caption each frame with BLIP.
5) Package outputs into a ZIP for downstream use.
## Space usage
1. Click **Upload video**.
2. Adjust **Frame interval** or **Max frames** if needed.
3. Press **Process**.
4. Download the ZIP. Preview shows a few frames, transcript snippet, and first captions.
## Local dev
```bash
pip install -r requirements.txt
python app.py
# or CLI:
python runner.py --video path/to/video.mp4 --interval 2.0 --max_frames 150