visual-narrator-llm / cli /README.md
Ytgetahun's picture
feat(VN-008): Educational Video Describer — education-specific GPT-4o prompt with NO_VISUAL_AID filtering
5179c8a
# Visual Narrator CLI
`visual-narrator` installs the `vn` command for generating audio description text from local video files, image frames, or YouTube URLs using the live Visual Narrator Frame Description API.
## Install
```bash
pip install visual-narrator
```
For local development from this repository:
```bash
cd cli
pip install -e .
```
The CLI uses `ffmpeg` for video frame extraction. Install it separately if it is not already available:
```bash
brew install ffmpeg
```
## Authentication
Pass an API key with `--api-key` or set `VN_API_KEY`:
```bash
export VN_API_KEY=vn_live_your_key
```
Create a free-tier key:
```bash
vn keys create dev@example.com
```
For education-focused gap narration, `vn edu` calls GPT-4o directly and does not use `VN_API_KEY`. Set:
```bash
export DEEPGRAM_API_KEY=dg_your_key
export OPENAI_API_KEY=sk-your-key
```
## Describe A Video
Local video:
```bash
vn describe ./demo.mp4 --api-key "$VN_API_KEY" --format json
```
YouTube URL:
```bash
vn describe "https://youtube.com/watch?v=VIDEO_ID" --api-key "$VN_API_KEY" --format srt
```
Single image frame:
```bash
vn describe /tmp/vn-test.jpg --api-key "$VN_API_KEY" --format text
```
Sampling defaults to one frame every three seconds. Configure it with `--fps`:
```bash
vn describe ./demo.mp4 --fps 1 --format json
```
Override the API base URL:
```bash
vn describe ./demo.mp4 --api-url http://localhost:3000 --api-key "$VN_API_KEY"
```
## Detect Narration Gaps
Find silence windows and dialogue breaks where narration can fit:
```bash
vn gaps ./demo.mp4 --format json
```
YouTube URLs use the same download path as `vn describe`:
```bash
vn gaps "https://youtube.com/watch?v=VIDEO_ID" --format text --min-gap 2.5
```
Choose the Whisper model with `--whisper-model` when you want a faster or more accurate pass:
```bash
vn gaps ./demo.mp4 --whisper-model base --format srt
```
Output formats:
```bash
vn gaps ./demo.mp4 --format json
vn gaps ./demo.mp4 --format text
vn gaps ./demo.mp4 --format srt
```
JSON output returns objects with `start_sec`, `end_sec`, `duration_sec`, and `gap_type`.
## Compliance Reports
Score a video against WCAG/CVAA audio-description checks using the same gap detector:
```bash
vn compliance ./demo.mp4 --format json
```
Text output is available for quick terminal review:
```bash
vn compliance ./demo.mp4 --format text --min-gap 3.0
```
Choose the Whisper model the same way as `vn gaps`:
```bash
vn compliance ./demo.mp4 --whisper-model base --format json
```
JSON output returns:
```json
{
"score": 67,
"wcag_level": "A",
"criteria": {
"wcag_1_2_3": {"passed": true},
"wcag_1_2_5": {"passed": false},
"cvaa_audio_description": {"passed": true}
},
"gaps": [],
"recommendations": []
}
```
Coverage is calculated from silence and music-only gap duration divided by total video duration. Recommendations are capped at 10 narration opportunities.
## Festival Film Accessibility Kit
Generate a complete gap-targeted narration package in one command:
```bash
vn kit ./demo.mp4 --api-key "$VN_API_KEY" --format json
```
The `kit` command:
- detects narration gaps with Deepgram + ffmpeg silence detection
- extracts one frame at each gap midpoint
- sends those frames to the live Visual Narrator API
- returns narration text, SRT timing, compliance scoring, and cost totals
Output formats:
```bash
vn kit ./demo.mp4 --format json
vn kit ./demo.mp4 --format srt
vn kit ./demo.mp4 --format text
```
Tune gap sensitivity with `--min-gap`:
```bash
vn kit ./demo.mp4 --min-gap 3.0 --format text
```
YouTube URLs use the same download path as `vn describe`:
```bash
vn kit "https://youtube.com/watch?v=VIDEO_ID" --format srt --api-key "$VN_API_KEY"
```
JSON output includes:
```json
{
"source": "./demo.mp4",
"duration_seconds": 5421.4,
"gaps_found": 6,
"narrations": [
{
"start_sec": 12.4,
"end_sec": 16.1,
"gap_duration_sec": 3.7,
"gap_type": "silence",
"frame_timestamp_sec": 14.25,
"description": "A wide shot shows the ship drifting past Saturn.",
"cost_estimate": 0.0012,
"srt_index": 1
}
],
"compliance": {
"score": 67,
"wcag_level": "A"
},
"model_version": "visual-narrator-gpt4o-v1",
"cost_estimate": 0.0072
}
```
## Educational Video Describer
Generate accessibility narration only for frames that add educational value such as slides, diagrams, equations, code, and charts:
```bash
vn edu ./lecture.mp4 --format json
```
The `edu` command:
- detects narration gaps with Deepgram Nova-3 plus ffmpeg silence detection
- extracts one frame at each gap midpoint
- sends each frame directly to GPT-4o with an education-specific prompt
- filters out talking-head frames when the model returns `NO_VISUAL_AID`
- keeps WCAG/CVAA compliance scoring based on the full detected gap list
Output formats:
```bash
vn edu ./lecture.mp4 --format json
vn edu ./lecture.mp4 --format srt
vn edu ./lecture.mp4 --format text
```
Tune gap sensitivity with `--min-gap`:
```bash
vn edu ./lecture.mp4 --min-gap 3.0 --format text
```
YouTube URLs are supported:
```bash
vn edu "https://youtube.com/watch?v=VIDEO_ID" --format srt
```
JSON output includes:
```json
{
"source": "./lecture.mp4",
"duration_seconds": 1800.0,
"gaps_analyzed": 12,
"visual_moments": 7,
"skipped_talking_head": 5,
"narrations": [
{
"start_sec": 42.0,
"end_sec": 46.5,
"gap_duration_sec": 4.5,
"gap_type": "music_only",
"frame_timestamp_sec": 44.25,
"description": "A slide compares precision and recall with a 2x2 confusion matrix and labels true positives, false positives, false negatives, and true negatives.",
"cost_estimate": 0.0013,
"srt_index": 1,
"visual_moment": true
}
],
"compliance": {
"score": 100,
"wcag_level": "AA"
},
"model_version": "gpt-4o",
"cost_estimate": 0.0091
}
```
## Output Formats
JSON:
```bash
vn describe ./demo.mp4 --format json
```
Returns:
```json
[
{
"timecode": "00:00:00.000",
"description": "A person walks through a city street.",
"objects_detected": [{"label": "Person", "confidence": 98.7}],
"latency_ms": 241
}
]
```
SRT:
```bash
vn describe ./demo.mp4 --format srt
```
Text:
```bash
vn describe ./demo.mp4 --format text
```
## Benchmark
Run a single-frame benchmark comparing Visual Narrator, GPT-4o, and Gemini:
```bash
vn benchmark /tmp/vn-test.jpg
```
Use a custom API deployment:
```bash
vn benchmark /tmp/vn-test.jpg --api-url http://localhost:3000
```
## Commands
```bash
vn --help
vn describe --help
vn kit --help
vn edu --help
vn gaps --help
vn compliance --help
vn benchmark --help
vn keys create --help
```