Spaces:

Shami96
/

YT_Video

Sleeping

Shami96 commited on Sep 2, 2025

Commit

07bbcb5

verified ·

1 Parent(s): 1f91fde

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -10,4 +10,38 @@ pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: mit
 ---
+# Video → ZIP Caption Prep
+Input: a video file.
+Output: a `.zip` containing:
+- `frames/` sampled JPG frames
+- `transcription.txt` from ASR
+- `explanations.json` with per-frame captions
+- `manifest.json` summary
+## Models
+- ASR: `distil-whisper/distil-large-v3`
+- Vision captions: `Salesforce/blip-image-captioning-base`
+These are open-source on Hugging Face. GPU recommended. CPU works but slower.
+## How it works
+1) Extract audio with FFmpeg.
+2) Transcribe via Whisper pipeline.
+3) Sample frames every *N* seconds with OpenCV.
+4) Caption each frame with BLIP.
+5) Package outputs into a ZIP for downstream use.
+## Space usage
+1. Click **Upload video**.
+2. Adjust **Frame interval** or **Max frames** if needed.
+3. Press **Process**.
+4. Download the ZIP. Preview shows a few frames, transcript snippet, and first captions.
+## Local dev
+```bash
+pip install -r requirements.txt
+python app.py
+# or CLI:
+python runner.py --video path/to/video.mp4 --interval 2.0 --max_frames 150