# file_utils.py ## Purpose File I/O operations for Nano Banana Streamlit. Centralized handling of image saving/loading, metadata management, filename generation, and directory operations. ## Responsibilities - Generate safe, unique filenames with timestamps - Save/load images to/from disk - Save/load metadata as JSON - Create standardized metadata dictionaries - Compute image hashes for change detection - Manage output directory structure - List recent generations ## Dependencies ### Imports - `json` - JSON serialization - `hashlib` - Image hashing (SHA-256) - `re` - Filename sanitization (regex) - `datetime` - Timestamps - `pathlib.Path` - Path operations - `PIL.Image` - Image handling - `config.settings.Settings` - Directory paths - `utils.logging_utils.get_logger` - Logging ### Used By - All services - Save generation results - All pages - Load/display images - Backend clients - Save API responses - `models/generation_result.py` - Metadata creation ## Public Interface ### Filename Utilities #### `sanitize_filename(name: str) -> str` Remove unsafe characters from filename. **Rules:** - Removes: `< > : " / \ | ? *` - Replaces with underscore - Strips leading/trailing spaces and dots - Limits to 100 characters - Falls back to "generated" if empty **Example:** ```python safe = sanitize_filename("My Character: v2.0") # Returns: "My_Character__v2_0" ``` #### `generate_timestamp_filename(base_name: str, extension: str = "png") -> str` Generate filename with timestamp. **Format:** `{base_name}_{YYYYMMDD_HHMMSS}.{extension}` **Example:** ```python filename = generate_timestamp_filename("character", "png") # Returns: "character_20251023_143052.png" ``` #### `get_unique_filename(directory: Path, base_name: str, extension: str = "png") -> Path` Generate unique filename that doesn't exist in directory. If file exists, appends counter: `_1`, `_2`, etc. **Example:** ```python path = get_unique_filename(Settings.CHARACTER_SHEETS_DIR, "hero", "png") # Returns: Path("outputs/character_sheets/hero_20251023_143052.png") # If exists: Path("outputs/character_sheets/hero_20251023_143052_1.png") ``` ### Image Operations #### `save_image(image: Image, directory: Path, base_name: str, metadata: dict = None) -> Tuple[Path, Path]` Save image and optional metadata. **Parameters:** - `image`: PIL Image to save - `directory`: Target directory (created if doesn't exist) - `base_name`: Base filename (will add timestamp) - `metadata`: Optional metadata dict (saved as JSON) **Returns:** `(image_path, metadata_path)` tuple **Example:** ```python metadata = {"prompt": "sunset", "backend": "Gemini"} img_path, meta_path = save_image( image=generated_image, directory=Settings.CHARACTER_SHEETS_DIR, base_name="hero", metadata=metadata ) # Saves: # outputs/character_sheets/hero_20251023_143052.png # outputs/character_sheets/hero_20251023_143052.json ``` #### `load_image(file_path: Path) -> Image` Load image from disk. **Raises:** - `FileNotFoundError`: If file doesn't exist - `IOError`: If file can't be read as image **Example:** ```python image = load_image(Path("outputs/character_sheets/hero_20251023_143052.png")) ``` ### Metadata Operations #### `save_metadata(file_path: Path, metadata: dict)` Save metadata dictionary as JSON. **Format:** Indented JSON with UTF-8 encoding **Raises:** `IOError` if write fails #### `load_metadata(file_path: Path) -> dict` Load metadata from JSON file. **Raises:** - `FileNotFoundError`: If file doesn't exist - `json.JSONDecodeError`: If invalid JSON **Example:** ```python meta = load_metadata(Path("outputs/character_sheets/hero_20251023_143052.json")) prompt = meta["prompt"] ``` #### `create_generation_metadata(...) -> dict` Create standardized metadata dictionary. **Parameters:** - `prompt`: Generation prompt (required) - `backend`: Backend used (required) - `aspect_ratio`: Aspect ratio (required) - `temperature`: Temperature value (required) - `input_images`: List of input image paths (optional) - `generation_time`: Time taken in seconds (optional) - `**kwargs`: Additional custom fields **Returns:** Metadata dictionary with standard fields **Standard Fields:** - `timestamp`: ISO format timestamp - `prompt`: Generation prompt - `backend`: Backend name - `aspect_ratio`: Aspect ratio string - `temperature`: Temperature value - `version`: Application version ("2.0.0-streamlit") - `input_images`: List of input paths (if provided) - `generation_time_seconds`: Time taken (if provided) **Example:** ```python metadata = create_generation_metadata( prompt="sunset over mountains", backend="Gemini API (Cloud)", aspect_ratio="16:9", temperature=0.4, generation_time=3.5, character_name="Hero", # Custom field stage="front_portrait" # Custom field ) ``` ### Image Hashing #### `compute_image_hash(image: Image) -> str` Compute SHA-256 hash of image data. Useful for detecting if input images have changed. **Returns:** Hex string (64 characters) **Example:** ```python hash1 = compute_image_hash(image1) hash2 = compute_image_hash(image2) if hash1 == hash2: print("Images are identical") ``` ### Directory Operations #### `ensure_output_directories()` Ensure all output directories exist. Creates all directories defined in Settings if they don't exist. Called on startup. #### `get_output_directory_for_type(generation_type: str) -> Path` Get appropriate output directory for generation type. **Types:** - `"character_sheet"` → `Settings.CHARACTER_SHEETS_DIR` - `"wardrobe"` → `Settings.WARDROBE_CHANGES_DIR` - `"composition"` → `Settings.COMPOSITIONS_DIR` - `"standard"` → `Settings.STANDARD_DIR` **Raises:** `ValueError` if unknown type **Example:** ```python output_dir = get_output_directory_for_type("character_sheet") # Returns: Path("outputs/character_sheets") ``` #### `list_recent_generations(generation_type: str, count: int = 10) -> list` List recent generation files in a directory. **Returns:** List of `(image_path, metadata_path)` tuples, newest first Metadata path is `None` if JSON file doesn't exist. **Example:** ```python recent = list_recent_generations("character_sheet", count=5) for img_path, meta_path in recent: image = load_image(img_path) if meta_path: metadata = load_metadata(meta_path) ``` ## Usage Examples ### Service Saving Output ```python from utils.file_utils import save_image, create_generation_metadata, get_output_directory_for_type class CharacterForgeService: def generate(self, prompt, backend, ...): # ... generation code ... # Create metadata metadata = create_generation_metadata( prompt=prompt, backend=backend, aspect_ratio="3:4", temperature=0.35, generation_time=elapsed_time, character_name=character_name, stage="front_portrait" ) # Save image and metadata output_dir = get_output_directory_for_type("character_sheet") img_path, meta_path = save_image( image=generated_image, directory=output_dir, base_name=character_name, metadata=metadata ) return img_path ``` ### Page Displaying Recent Generations ```python import streamlit as st from utils.file_utils import list_recent_generations, load_image st.subheader("Recent Character Sheets") recent = list_recent_generations("character_sheet", count=4) cols = st.columns(4) for idx, (img_path, meta_path) in enumerate(recent): with cols[idx]: image = load_image(img_path) st.image(image, caption=img_path.stem, use_container_width=True) ``` ### Loading Previous Generation ```python from utils.file_utils import load_image, load_metadata # User selects a previous generation image_path = st.selectbox("Load previous", [...]) if image_path: # Load image image = load_image(Path(image_path)) st.image(image) # Load metadata (if exists) meta_path = Path(image_path).with_suffix(".json") if meta_path.exists(): metadata = load_metadata(meta_path) st.json(metadata) # Restore settings st.session_state.prompt = metadata["prompt"] st.session_state.backend = metadata["backend"] ``` ## Error Handling ### File Operations All functions raise appropriate exceptions: - `FileNotFoundError`: File doesn't exist - `IOError`: Read/write error - `json.JSONDecodeError`: Invalid JSON - `ValueError`: Invalid parameters Errors are logged before raising. ### Automatic Recovery - Directories created automatically if they don't exist - Filename conflicts resolved with counter suffix - Missing metadata handled gracefully (returns None) ## Known Limitations - Filename length limit: 100 characters (base name) - No image format conversion (saves as PNG only) - No image compression options - No batch operations - No cloud storage integration - Hash only detects exact pixel matches (not perceptual similarity) ## Future Improvements - Support multiple image formats (JPEG, WEBP) - Add image compression/quality options - Add batch save/load operations - Add cloud storage backends (S3, GCS) - Add perceptual image hashing (pHash) - Add image metadata embedding (EXIF) - Add file cleanup/archiving utilities - Add generation statistics tracking ## Testing - Test sanitize_filename() with various unsafe characters - Test generate_timestamp_filename() format - Test get_unique_filename() collision handling - Test save_image() creates files correctly - Test load_image() with valid/invalid files - Test save/load_metadata() round-trip - Test create_generation_metadata() includes all fields - Test compute_image_hash() consistency - Test list_recent_generations() sorting ## Related Files - `config/settings.py` - Directory path constants - `utils/logging_utils.py` - Logging functions - All services - Save generation results - All pages - Load and display files - `models/generation_result.py` - Uses metadata creation ## Security Considerations - Filename sanitization prevents directory traversal - No arbitrary file paths allowed (always in Settings directories) - JSON encoding ensures no code injection - File permissions inherited from parent directory ## Performance Considerations - Image hashing loads full image into memory - Large images may be slow to hash - list_recent_generations() sorts by modification time (fast) - JSON serialization is fast for typical metadata size ## Change History - 2025-10-23: Initial creation for Streamlit migration - Centralized all file I/O operations - Added comprehensive filename handling - Added metadata standardization - Added directory management - Added recent generations listing - Integrated with Settings and logging