Spaces:

areebsa
/

imageaudiosync

Sleeping

App Files Files Community

imageaudiosync / plan.md

Nanny7

Initial commit: Audio Video Generator v1.0.0

929f41f 2 months ago

preview code

Raw

History Blame Contribute Delete

6.71 kB

	# Audio-to-Image Video Generator - Conversion Plan

	## Overview
	Convert the Jupyter notebook into a proper Python CLI tool with optional Gradio UI.

	## Architecture

	```
	audio_video_generator/
	├── pyproject.toml
	├── requirements.txt
	├── README.md
	├── src/
	│ └── audio_video_generator/
	│ ├── __init__.py
	│ ├── __main__.py
	│ ├── cli.py
	│ ├── config.py
	│ ├── core/
	│ │ ├── __init__.py
	│ │ ├── audio.py # Audio loading, Whisper transcription
	│ │ ├── alignment.py # CSV-to-audio alignment logic
	│ │ ├── images.py # Image processing, ZIP handling
	│ │ ├── video.py # Video composition, animations
	│ │ └── text_overlay.py # Text overlay rendering
	│ ├── utils/
	│ │ ├── __init__.py
	│ │ ├── files.py # File utilities, checkpoints
	│ │ └── text.py # Text normalization, tokenization
	│ └── web/
	│ ├── __init__.py
	│ └── gradio_ui.py # Gradio interface
	└── tests/
	└── ...
	```

	## Tasks

	### Task 1: Project Structure and Packaging
	Create the package structure with pyproject.toml, requirements.txt, and basic module setup.

	Files to create:
	- `pyproject.toml` - Package metadata, dependencies, entry points
	- `requirements.txt` - Runtime dependencies
	- `src/audio_video_generator/__init__.py` - Version info
	- `src/audio_video_generator/config.py` - Configuration constants (RESOLUTION_MAP, ANIMATION_OPTIONS, etc.)

	Key specs:
	- Package name: `audio-video-generator`
	- CLI entry point: `avg` command
	- Version: 1.0.0
	- Include all dependencies: whisper, moviepy, gradio, torch, pillow, pandas, numpy

	### Task 2: Utility Modules
	Extract utility functions from the notebook into reusable modules.

	Files to create:
	- `src/audio_video_generator/utils/text.py` - `normalize_text()`, `tokenize_text()`, `extract_number()`, `get_fuzzy_threshold()`, `clamp01()`, `safe_int()`, `apply_case_style()`
	- `src/audio_video_generator/utils/files.py` - `ensure_dir()`, `make_run_dir()`, `safe_output_name()`, `write_json_file()`, `write_text_file()`, `extract_zip()`, `collect_images_recursive()`, `sort_images()`

	Key specs:
	- All functions must be pure (no global state)
	- Add type hints
	- Add docstrings

	### Task 3: Audio and Transcription Module
	Extract Whisper-related functionality.

	Files to create:
	- `src/audio_video_generator/core/audio.py` - `transcribe_with_words()`, `extract_word_timeline()`, `get_whisper_model()`, `get_device()`

	Key specs:
	- Use singleton pattern for Whisper model (lazy loading)
	- Support CPU and CUDA
	- Handle model caching properly

	### Task 4: Image Processing Module
	Extract image handling functionality.

	Files to create:
	- `src/audio_video_generator/core/images.py` - `prepare_image_inputs()`, `verify_and_filter_images()`, `build_image_indexes()`, `image_preflight_report()`, `resolve_image_reference()`, `resize_with_padding()`

	Key specs:
	- Support both ZIP and manual image inputs
	- Image caching for performance
	- Proper error handling for corrupt images

	### Task 5: CSV Alignment Module
	Extract CSV loading and alignment logic.

	Files to create:
	- `src/audio_video_generator/core/alignment.py` - `load_csv()`, `preprocess_csv()`, `mapping_preflight_report()`, `find_sentence_match()`, `build_timeline()`

	Key specs:
	- CSV must have exactly 2 columns: text, image
	- Fuzzy matching with configurable thresholds
	- Duplicate row handling

	### Task 6: Video and Animation Module
	Extract video composition and animation logic.

	Files to create:
	- `src/audio_video_generator/core/video.py` - `resolve_effect_sequence()`, `get_transition_duration()`, `apply_animation_to_clip()`, `apply_slide_position()`, `build_transition_overlay()`, `build_render_clips()`

	Key specs:
	- Support all animation types: none, zoom_in, zoom_out, fade_in, blink, pulse, fade_zoom_in
	- Support all transitions: none, fade, crossfade, slide_left, slide_right, dip_to_black, flash
	- Use moviepy for video processing

	### Task 7: Text Overlay Module
	Extract text overlay functionality.

	Files to create:
	- `src/audio_video_generator/core/text_overlay.py` - `load_overlay_txt()`, `find_all_phrase_matches()`, `build_text_overlay_events()`, `render_text_rgba()`, `resolve_overlay_position()`, `build_text_overlay_clips()`, `get_available_font_map()`, `get_pil_font()`, `make_text_style_config()`

	Key specs:
	- Support font selection, colors, patterns (solid, boxed, highlighted)
	- Support text animations: fade_in, pop_in, pulse, slide_up, glow_pop, typewriter, zoom_in
	- Typewriter effect builds character-by-character clips

	### Task 8: CLI Interface
	Create command-line interface using Click.

	Files to create:
	- `src/audio_video_generator/cli.py` - Main CLI with commands and options
	- `src/audio_video_generator/__main__.py` - Entry point for `python -m`

	Key specs:
	- Command: `avg generate` or just `avg`
	- Options for all major settings: audio, csv, images, resolution, output, animations, transitions
	- Progress reporting
	- Checkpoint saving
	- Proper error handling with exit codes

	### Task 9: Gradio Web UI
	Extract and clean up the Gradio interface.

	Files to create:
	- `src/audio_video_generator/web/gradio_ui.py` - Full Gradio UI implementation
	- `src/audio_video_generator/web/__init__.py`

	Key specs:
	- Command: `avg web` to launch UI
	- Include all features from notebook: file uploads, path inputs, live preview, text overlay editor
	- Drive integration optional (Colab-specific code made conditional)

	### Task 10: Main Pipeline Integration
	Create the main orchestration pipeline.

	Files to create:
	- `src/audio_video_generator/core/pipeline.py` - `create_video_pipeline()` function that orchestrates all components

	Key specs:
	- Error handling with cleanup
	- Progress callbacks
	- Memory management (gc.collect, CUDA cache clear)
	- Report generation
	- Checkpoint saving

	### Task 11: Documentation
	Create README and usage documentation.

	Files to create:
	- `README.md` - Installation, usage examples, CSV format, CLI reference

	Key specs:
	- Installation instructions
	- CSV format specification
	- CLI examples
	- Web UI usage

	## Execution Notes

	- All code must work outside Colab (no hardcoded `/content` paths)
	- Drive integration should be optional/conditional
	- Keep checkpoint functionality for debugging
	- Preserve all animation and transition options
	- Ensure proper resource cleanup (moviepy clips, torch CUDA)