YT_Video / README.md
Shami96's picture
Update README.md
07bbcb5 verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: YT Video
emoji: 😻
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
license: mit

Video → ZIP Caption Prep

Input: a video file.
Output: a .zip containing:

  • frames/ sampled JPG frames
  • transcription.txt from ASR
  • explanations.json with per-frame captions
  • manifest.json summary

Models

  • ASR: distil-whisper/distil-large-v3
  • Vision captions: Salesforce/blip-image-captioning-base

These are open-source on Hugging Face. GPU recommended. CPU works but slower.

How it works

  1. Extract audio with FFmpeg.
  2. Transcribe via Whisper pipeline.
  3. Sample frames every N seconds with OpenCV.
  4. Caption each frame with BLIP.
  5. Package outputs into a ZIP for downstream use.

Space usage

  1. Click Upload video.
  2. Adjust Frame interval or Max frames if needed.
  3. Press Process.
  4. Download the ZIP. Preview shows a few frames, transcript snippet, and first captions.

Local dev

pip install -r requirements.txt
python app.py
# or CLI:
python runner.py --video path/to/video.mp4 --interval 2.0 --max_frames 150