# Audio-to-Image Video Generator - Conversion Plan

## Overview
Convert the Jupyter notebook into a proper Python CLI tool with optional Gradio UI.

## Architecture

```
audio_video_generator/
├── pyproject.toml
├── requirements.txt
├── README.md
├── src/
│   └── audio_video_generator/
│       ├── __init__.py
│       ├── __main__.py
│       ├── cli.py
│       ├── config.py
│       ├── core/
│       │   ├── __init__.py
│       │   ├── audio.py          # Audio loading, Whisper transcription
│       │   ├── alignment.py      # CSV-to-audio alignment logic
│       │   ├── images.py         # Image processing, ZIP handling
│       │   ├── video.py          # Video composition, animations
│       │   └── text_overlay.py   # Text overlay rendering
│       ├── utils/
│       │   ├── __init__.py
│       │   ├── files.py          # File utilities, checkpoints
│       │   └── text.py           # Text normalization, tokenization
│       └── web/
│           ├── __init__.py
│           └── gradio_ui.py      # Gradio interface
└── tests/
    └── ...
```

## Tasks

### Task 1: Project Structure and Packaging
Create the package structure with pyproject.toml, requirements.txt, and basic module setup.

**Files to create:**
- `pyproject.toml` - Package metadata, dependencies, entry points
- `requirements.txt` - Runtime dependencies
- `src/audio_video_generator/__init__.py` - Version info
- `src/audio_video_generator/config.py` - Configuration constants (RESOLUTION_MAP, ANIMATION_OPTIONS, etc.)

**Key specs:**
- Package name: `audio-video-generator`
- CLI entry point: `avg` command
- Version: 1.0.0
- Include all dependencies: whisper, moviepy, gradio, torch, pillow, pandas, numpy

### Task 2: Utility Modules
Extract utility functions from the notebook into reusable modules.

**Files to create:**
- `src/audio_video_generator/utils/text.py` - `normalize_text()`, `tokenize_text()`, `extract_number()`, `get_fuzzy_threshold()`, `clamp01()`, `safe_int()`, `apply_case_style()`
- `src/audio_video_generator/utils/files.py` - `ensure_dir()`, `make_run_dir()`, `safe_output_name()`, `write_json_file()`, `write_text_file()`, `extract_zip()`, `collect_images_recursive()`, `sort_images()`

**Key specs:**
- All functions must be pure (no global state)
- Add type hints
- Add docstrings

### Task 3: Audio and Transcription Module
Extract Whisper-related functionality.

**Files to create:**
- `src/audio_video_generator/core/audio.py` - `transcribe_with_words()`, `extract_word_timeline()`, `get_whisper_model()`, `get_device()`

**Key specs:**
- Use singleton pattern for Whisper model (lazy loading)
- Support CPU and CUDA
- Handle model caching properly

### Task 4: Image Processing Module
Extract image handling functionality.

**Files to create:**
- `src/audio_video_generator/core/images.py` - `prepare_image_inputs()`, `verify_and_filter_images()`, `build_image_indexes()`, `image_preflight_report()`, `resolve_image_reference()`, `resize_with_padding()`

**Key specs:**
- Support both ZIP and manual image inputs
- Image caching for performance
- Proper error handling for corrupt images

### Task 5: CSV Alignment Module
Extract CSV loading and alignment logic.

**Files to create:**
- `src/audio_video_generator/core/alignment.py` - `load_csv()`, `preprocess_csv()`, `mapping_preflight_report()`, `find_sentence_match()`, `build_timeline()`

**Key specs:**
- CSV must have exactly 2 columns: text, image
- Fuzzy matching with configurable thresholds
- Duplicate row handling

### Task 6: Video and Animation Module
Extract video composition and animation logic.

**Files to create:**
- `src/audio_video_generator/core/video.py` - `resolve_effect_sequence()`, `get_transition_duration()`, `apply_animation_to_clip()`, `apply_slide_position()`, `build_transition_overlay()`, `build_render_clips()`

**Key specs:**
- Support all animation types: none, zoom_in, zoom_out, fade_in, blink, pulse, fade_zoom_in
- Support all transitions: none, fade, crossfade, slide_left, slide_right, dip_to_black, flash
- Use moviepy for video processing

### Task 7: Text Overlay Module
Extract text overlay functionality.

**Files to create:**
- `src/audio_video_generator/core/text_overlay.py` - `load_overlay_txt()`, `find_all_phrase_matches()`, `build_text_overlay_events()`, `render_text_rgba()`, `resolve_overlay_position()`, `build_text_overlay_clips()`, `get_available_font_map()`, `get_pil_font()`, `make_text_style_config()`

**Key specs:**
- Support font selection, colors, patterns (solid, boxed, highlighted)
- Support text animations: fade_in, pop_in, pulse, slide_up, glow_pop, typewriter, zoom_in
- Typewriter effect builds character-by-character clips

### Task 8: CLI Interface
Create command-line interface using Click.

**Files to create:**
- `src/audio_video_generator/cli.py` - Main CLI with commands and options
- `src/audio_video_generator/__main__.py` - Entry point for `python -m`

**Key specs:**
- Command: `avg generate` or just `avg`
- Options for all major settings: audio, csv, images, resolution, output, animations, transitions
- Progress reporting
- Checkpoint saving
- Proper error handling with exit codes

### Task 9: Gradio Web UI
Extract and clean up the Gradio interface.

**Files to create:**
- `src/audio_video_generator/web/gradio_ui.py` - Full Gradio UI implementation
- `src/audio_video_generator/web/__init__.py`

**Key specs:**
- Command: `avg web` to launch UI
- Include all features from notebook: file uploads, path inputs, live preview, text overlay editor
- Drive integration optional (Colab-specific code made conditional)

### Task 10: Main Pipeline Integration
Create the main orchestration pipeline.

**Files to create:**
- `src/audio_video_generator/core/pipeline.py` - `create_video_pipeline()` function that orchestrates all components

**Key specs:**
- Error handling with cleanup
- Progress callbacks
- Memory management (gc.collect, CUDA cache clear)
- Report generation
- Checkpoint saving

### Task 11: Documentation
Create README and usage documentation.

**Files to create:**
- `README.md` - Installation, usage examples, CSV format, CLI reference

**Key specs:**
- Installation instructions
- CSV format specification
- CLI examples
- Web UI usage

## Execution Notes

- All code must work outside Colab (no hardcoded `/content` paths)
- Drive integration should be optional/conditional
- Keep checkpoint functionality for debugging
- Preserve all animation and transition options
- Ensure proper resource cleanup (moviepy clips, torch CUDA)