visual-narrator-llm / cli /README.md
Ytgetahun's picture
feat(VN-008): Educational Video Describer — education-specific GPT-4o prompt with NO_VISUAL_AID filtering
5179c8a

Visual Narrator CLI

visual-narrator installs the vn command for generating audio description text from local video files, image frames, or YouTube URLs using the live Visual Narrator Frame Description API.

Install

pip install visual-narrator

For local development from this repository:

cd cli
pip install -e .

The CLI uses ffmpeg for video frame extraction. Install it separately if it is not already available:

brew install ffmpeg

Authentication

Pass an API key with --api-key or set VN_API_KEY:

export VN_API_KEY=vn_live_your_key

Create a free-tier key:

vn keys create dev@example.com

For education-focused gap narration, vn edu calls GPT-4o directly and does not use VN_API_KEY. Set:

export DEEPGRAM_API_KEY=dg_your_key
export OPENAI_API_KEY=sk-your-key

Describe A Video

Local video:

vn describe ./demo.mp4 --api-key "$VN_API_KEY" --format json

YouTube URL:

vn describe "https://youtube.com/watch?v=VIDEO_ID" --api-key "$VN_API_KEY" --format srt

Single image frame:

vn describe /tmp/vn-test.jpg --api-key "$VN_API_KEY" --format text

Sampling defaults to one frame every three seconds. Configure it with --fps:

vn describe ./demo.mp4 --fps 1 --format json

Override the API base URL:

vn describe ./demo.mp4 --api-url http://localhost:3000 --api-key "$VN_API_KEY"

Detect Narration Gaps

Find silence windows and dialogue breaks where narration can fit:

vn gaps ./demo.mp4 --format json

YouTube URLs use the same download path as vn describe:

vn gaps "https://youtube.com/watch?v=VIDEO_ID" --format text --min-gap 2.5

Choose the Whisper model with --whisper-model when you want a faster or more accurate pass:

vn gaps ./demo.mp4 --whisper-model base --format srt

Output formats:

vn gaps ./demo.mp4 --format json
vn gaps ./demo.mp4 --format text
vn gaps ./demo.mp4 --format srt

JSON output returns objects with start_sec, end_sec, duration_sec, and gap_type.

Compliance Reports

Score a video against WCAG/CVAA audio-description checks using the same gap detector:

vn compliance ./demo.mp4 --format json

Text output is available for quick terminal review:

vn compliance ./demo.mp4 --format text --min-gap 3.0

Choose the Whisper model the same way as vn gaps:

vn compliance ./demo.mp4 --whisper-model base --format json

JSON output returns:

{
  "score": 67,
  "wcag_level": "A",
  "criteria": {
    "wcag_1_2_3": {"passed": true},
    "wcag_1_2_5": {"passed": false},
    "cvaa_audio_description": {"passed": true}
  },
  "gaps": [],
  "recommendations": []
}

Coverage is calculated from silence and music-only gap duration divided by total video duration. Recommendations are capped at 10 narration opportunities.

Festival Film Accessibility Kit

Generate a complete gap-targeted narration package in one command:

vn kit ./demo.mp4 --api-key "$VN_API_KEY" --format json

The kit command:

  • detects narration gaps with Deepgram + ffmpeg silence detection
  • extracts one frame at each gap midpoint
  • sends those frames to the live Visual Narrator API
  • returns narration text, SRT timing, compliance scoring, and cost totals

Output formats:

vn kit ./demo.mp4 --format json
vn kit ./demo.mp4 --format srt
vn kit ./demo.mp4 --format text

Tune gap sensitivity with --min-gap:

vn kit ./demo.mp4 --min-gap 3.0 --format text

YouTube URLs use the same download path as vn describe:

vn kit "https://youtube.com/watch?v=VIDEO_ID" --format srt --api-key "$VN_API_KEY"

JSON output includes:

{
  "source": "./demo.mp4",
  "duration_seconds": 5421.4,
  "gaps_found": 6,
  "narrations": [
    {
      "start_sec": 12.4,
      "end_sec": 16.1,
      "gap_duration_sec": 3.7,
      "gap_type": "silence",
      "frame_timestamp_sec": 14.25,
      "description": "A wide shot shows the ship drifting past Saturn.",
      "cost_estimate": 0.0012,
      "srt_index": 1
    }
  ],
  "compliance": {
    "score": 67,
    "wcag_level": "A"
  },
  "model_version": "visual-narrator-gpt4o-v1",
  "cost_estimate": 0.0072
}

Educational Video Describer

Generate accessibility narration only for frames that add educational value such as slides, diagrams, equations, code, and charts:

vn edu ./lecture.mp4 --format json

The edu command:

  • detects narration gaps with Deepgram Nova-3 plus ffmpeg silence detection
  • extracts one frame at each gap midpoint
  • sends each frame directly to GPT-4o with an education-specific prompt
  • filters out talking-head frames when the model returns NO_VISUAL_AID
  • keeps WCAG/CVAA compliance scoring based on the full detected gap list

Output formats:

vn edu ./lecture.mp4 --format json
vn edu ./lecture.mp4 --format srt
vn edu ./lecture.mp4 --format text

Tune gap sensitivity with --min-gap:

vn edu ./lecture.mp4 --min-gap 3.0 --format text

YouTube URLs are supported:

vn edu "https://youtube.com/watch?v=VIDEO_ID" --format srt

JSON output includes:

{
  "source": "./lecture.mp4",
  "duration_seconds": 1800.0,
  "gaps_analyzed": 12,
  "visual_moments": 7,
  "skipped_talking_head": 5,
  "narrations": [
    {
      "start_sec": 42.0,
      "end_sec": 46.5,
      "gap_duration_sec": 4.5,
      "gap_type": "music_only",
      "frame_timestamp_sec": 44.25,
      "description": "A slide compares precision and recall with a 2x2 confusion matrix and labels true positives, false positives, false negatives, and true negatives.",
      "cost_estimate": 0.0013,
      "srt_index": 1,
      "visual_moment": true
    }
  ],
  "compliance": {
    "score": 100,
    "wcag_level": "AA"
  },
  "model_version": "gpt-4o",
  "cost_estimate": 0.0091
}

Output Formats

JSON:

vn describe ./demo.mp4 --format json

Returns:

[
  {
    "timecode": "00:00:00.000",
    "description": "A person walks through a city street.",
    "objects_detected": [{"label": "Person", "confidence": 98.7}],
    "latency_ms": 241
  }
]

SRT:

vn describe ./demo.mp4 --format srt

Text:

vn describe ./demo.mp4 --format text

Benchmark

Run a single-frame benchmark comparing Visual Narrator, GPT-4o, and Gemini:

vn benchmark /tmp/vn-test.jpg

Use a custom API deployment:

vn benchmark /tmp/vn-test.jpg --api-url http://localhost:3000

Commands

vn --help
vn describe --help
vn kit --help
vn edu --help
vn gaps --help
vn compliance --help
vn benchmark --help
vn keys create --help