# Visual Narrator CLI `visual-narrator` installs the `vn` command for generating audio description text from local video files, image frames, or YouTube URLs using the live Visual Narrator Frame Description API. ## Install ```bash pip install visual-narrator ``` For local development from this repository: ```bash cd cli pip install -e . ``` The CLI uses `ffmpeg` for video frame extraction. Install it separately if it is not already available: ```bash brew install ffmpeg ``` ## Authentication Pass an API key with `--api-key` or set `VN_API_KEY`: ```bash export VN_API_KEY=vn_live_your_key ``` Create a free-tier key: ```bash vn keys create dev@example.com ``` For education-focused gap narration, `vn edu` calls GPT-4o directly and does not use `VN_API_KEY`. Set: ```bash export DEEPGRAM_API_KEY=dg_your_key export OPENAI_API_KEY=sk-your-key ``` ## Describe A Video Local video: ```bash vn describe ./demo.mp4 --api-key "$VN_API_KEY" --format json ``` YouTube URL: ```bash vn describe "https://youtube.com/watch?v=VIDEO_ID" --api-key "$VN_API_KEY" --format srt ``` Single image frame: ```bash vn describe /tmp/vn-test.jpg --api-key "$VN_API_KEY" --format text ``` Sampling defaults to one frame every three seconds. Configure it with `--fps`: ```bash vn describe ./demo.mp4 --fps 1 --format json ``` Override the API base URL: ```bash vn describe ./demo.mp4 --api-url http://localhost:3000 --api-key "$VN_API_KEY" ``` ## Detect Narration Gaps Find silence windows and dialogue breaks where narration can fit: ```bash vn gaps ./demo.mp4 --format json ``` YouTube URLs use the same download path as `vn describe`: ```bash vn gaps "https://youtube.com/watch?v=VIDEO_ID" --format text --min-gap 2.5 ``` Choose the Whisper model with `--whisper-model` when you want a faster or more accurate pass: ```bash vn gaps ./demo.mp4 --whisper-model base --format srt ``` Output formats: ```bash vn gaps ./demo.mp4 --format json vn gaps ./demo.mp4 --format text vn gaps ./demo.mp4 --format srt ``` JSON output returns objects with `start_sec`, `end_sec`, `duration_sec`, and `gap_type`. ## Compliance Reports Score a video against WCAG/CVAA audio-description checks using the same gap detector: ```bash vn compliance ./demo.mp4 --format json ``` Text output is available for quick terminal review: ```bash vn compliance ./demo.mp4 --format text --min-gap 3.0 ``` Choose the Whisper model the same way as `vn gaps`: ```bash vn compliance ./demo.mp4 --whisper-model base --format json ``` JSON output returns: ```json { "score": 67, "wcag_level": "A", "criteria": { "wcag_1_2_3": {"passed": true}, "wcag_1_2_5": {"passed": false}, "cvaa_audio_description": {"passed": true} }, "gaps": [], "recommendations": [] } ``` Coverage is calculated from silence and music-only gap duration divided by total video duration. Recommendations are capped at 10 narration opportunities. ## Festival Film Accessibility Kit Generate a complete gap-targeted narration package in one command: ```bash vn kit ./demo.mp4 --api-key "$VN_API_KEY" --format json ``` The `kit` command: - detects narration gaps with Deepgram + ffmpeg silence detection - extracts one frame at each gap midpoint - sends those frames to the live Visual Narrator API - returns narration text, SRT timing, compliance scoring, and cost totals Output formats: ```bash vn kit ./demo.mp4 --format json vn kit ./demo.mp4 --format srt vn kit ./demo.mp4 --format text ``` Tune gap sensitivity with `--min-gap`: ```bash vn kit ./demo.mp4 --min-gap 3.0 --format text ``` YouTube URLs use the same download path as `vn describe`: ```bash vn kit "https://youtube.com/watch?v=VIDEO_ID" --format srt --api-key "$VN_API_KEY" ``` JSON output includes: ```json { "source": "./demo.mp4", "duration_seconds": 5421.4, "gaps_found": 6, "narrations": [ { "start_sec": 12.4, "end_sec": 16.1, "gap_duration_sec": 3.7, "gap_type": "silence", "frame_timestamp_sec": 14.25, "description": "A wide shot shows the ship drifting past Saturn.", "cost_estimate": 0.0012, "srt_index": 1 } ], "compliance": { "score": 67, "wcag_level": "A" }, "model_version": "visual-narrator-gpt4o-v1", "cost_estimate": 0.0072 } ``` ## Educational Video Describer Generate accessibility narration only for frames that add educational value such as slides, diagrams, equations, code, and charts: ```bash vn edu ./lecture.mp4 --format json ``` The `edu` command: - detects narration gaps with Deepgram Nova-3 plus ffmpeg silence detection - extracts one frame at each gap midpoint - sends each frame directly to GPT-4o with an education-specific prompt - filters out talking-head frames when the model returns `NO_VISUAL_AID` - keeps WCAG/CVAA compliance scoring based on the full detected gap list Output formats: ```bash vn edu ./lecture.mp4 --format json vn edu ./lecture.mp4 --format srt vn edu ./lecture.mp4 --format text ``` Tune gap sensitivity with `--min-gap`: ```bash vn edu ./lecture.mp4 --min-gap 3.0 --format text ``` YouTube URLs are supported: ```bash vn edu "https://youtube.com/watch?v=VIDEO_ID" --format srt ``` JSON output includes: ```json { "source": "./lecture.mp4", "duration_seconds": 1800.0, "gaps_analyzed": 12, "visual_moments": 7, "skipped_talking_head": 5, "narrations": [ { "start_sec": 42.0, "end_sec": 46.5, "gap_duration_sec": 4.5, "gap_type": "music_only", "frame_timestamp_sec": 44.25, "description": "A slide compares precision and recall with a 2x2 confusion matrix and labels true positives, false positives, false negatives, and true negatives.", "cost_estimate": 0.0013, "srt_index": 1, "visual_moment": true } ], "compliance": { "score": 100, "wcag_level": "AA" }, "model_version": "gpt-4o", "cost_estimate": 0.0091 } ``` ## Output Formats JSON: ```bash vn describe ./demo.mp4 --format json ``` Returns: ```json [ { "timecode": "00:00:00.000", "description": "A person walks through a city street.", "objects_detected": [{"label": "Person", "confidence": 98.7}], "latency_ms": 241 } ] ``` SRT: ```bash vn describe ./demo.mp4 --format srt ``` Text: ```bash vn describe ./demo.mp4 --format text ``` ## Benchmark Run a single-frame benchmark comparing Visual Narrator, GPT-4o, and Gemini: ```bash vn benchmark /tmp/vn-test.jpg ``` Use a custom API deployment: ```bash vn benchmark /tmp/vn-test.jpg --api-url http://localhost:3000 ``` ## Commands ```bash vn --help vn describe --help vn kit --help vn edu --help vn gaps --help vn compliance --help vn benchmark --help vn keys create --help ```