Shami96 commited on
Commit
07bbcb5
·
verified ·
1 Parent(s): 1f91fde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -10,4 +10,38 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: mit
11
  ---
12
 
13
+ # Video ZIP Caption Prep
14
+
15
+ Input: a video file.
16
+ Output: a `.zip` containing:
17
+ - `frames/` sampled JPG frames
18
+ - `transcription.txt` from ASR
19
+ - `explanations.json` with per-frame captions
20
+ - `manifest.json` summary
21
+
22
+ ## Models
23
+ - ASR: `distil-whisper/distil-large-v3`
24
+ - Vision captions: `Salesforce/blip-image-captioning-base`
25
+
26
+ These are open-source on Hugging Face. GPU recommended. CPU works but slower.
27
+
28
+ ## How it works
29
+ 1) Extract audio with FFmpeg.
30
+ 2) Transcribe via Whisper pipeline.
31
+ 3) Sample frames every *N* seconds with OpenCV.
32
+ 4) Caption each frame with BLIP.
33
+ 5) Package outputs into a ZIP for downstream use.
34
+
35
+ ## Space usage
36
+ 1. Click **Upload video**.
37
+ 2. Adjust **Frame interval** or **Max frames** if needed.
38
+ 3. Press **Process**.
39
+ 4. Download the ZIP. Preview shows a few frames, transcript snippet, and first captions.
40
+
41
+ ## Local dev
42
+ ```bash
43
+ pip install -r requirements.txt
44
+ python app.py
45
+ # or CLI:
46
+ python runner.py --video path/to/video.mp4 --interval 2.0 --max_frames 150
47
+