ghmk's picture
Initial deployment of Character Forge
5b6e956

file_utils.py

Purpose

File I/O operations for Nano Banana Streamlit. Centralized handling of image saving/loading, metadata management, filename generation, and directory operations.

Responsibilities

  • Generate safe, unique filenames with timestamps
  • Save/load images to/from disk
  • Save/load metadata as JSON
  • Create standardized metadata dictionaries
  • Compute image hashes for change detection
  • Manage output directory structure
  • List recent generations

Dependencies

Imports

  • json - JSON serialization
  • hashlib - Image hashing (SHA-256)
  • re - Filename sanitization (regex)
  • datetime - Timestamps
  • pathlib.Path - Path operations
  • PIL.Image - Image handling
  • config.settings.Settings - Directory paths
  • utils.logging_utils.get_logger - Logging

Used By

  • All services - Save generation results
  • All pages - Load/display images
  • Backend clients - Save API responses
  • models/generation_result.py - Metadata creation

Public Interface

Filename Utilities

sanitize_filename(name: str) -> str

Remove unsafe characters from filename.

Rules:

  • Removes: < > : " / \ | ? *
  • Replaces with underscore
  • Strips leading/trailing spaces and dots
  • Limits to 100 characters
  • Falls back to "generated" if empty

Example:

safe = sanitize_filename("My Character: v2.0")
# Returns: "My_Character__v2_0"

generate_timestamp_filename(base_name: str, extension: str = "png") -> str

Generate filename with timestamp.

Format: {base_name}_{YYYYMMDD_HHMMSS}.{extension}

Example:

filename = generate_timestamp_filename("character", "png")
# Returns: "character_20251023_143052.png"

get_unique_filename(directory: Path, base_name: str, extension: str = "png") -> Path

Generate unique filename that doesn't exist in directory.

If file exists, appends counter: _1, _2, etc.

Example:

path = get_unique_filename(Settings.CHARACTER_SHEETS_DIR, "hero", "png")
# Returns: Path("outputs/character_sheets/hero_20251023_143052.png")
# If exists: Path("outputs/character_sheets/hero_20251023_143052_1.png")

Image Operations

save_image(image: Image, directory: Path, base_name: str, metadata: dict = None) -> Tuple[Path, Path]

Save image and optional metadata.

Parameters:

  • image: PIL Image to save
  • directory: Target directory (created if doesn't exist)
  • base_name: Base filename (will add timestamp)
  • metadata: Optional metadata dict (saved as JSON)

Returns: (image_path, metadata_path) tuple

Example:

metadata = {"prompt": "sunset", "backend": "Gemini"}
img_path, meta_path = save_image(
    image=generated_image,
    directory=Settings.CHARACTER_SHEETS_DIR,
    base_name="hero",
    metadata=metadata
)
# Saves:
#   outputs/character_sheets/hero_20251023_143052.png
#   outputs/character_sheets/hero_20251023_143052.json

load_image(file_path: Path) -> Image

Load image from disk.

Raises:

  • FileNotFoundError: If file doesn't exist
  • IOError: If file can't be read as image

Example:

image = load_image(Path("outputs/character_sheets/hero_20251023_143052.png"))

Metadata Operations

save_metadata(file_path: Path, metadata: dict)

Save metadata dictionary as JSON.

Format: Indented JSON with UTF-8 encoding

Raises: IOError if write fails

load_metadata(file_path: Path) -> dict

Load metadata from JSON file.

Raises:

  • FileNotFoundError: If file doesn't exist
  • json.JSONDecodeError: If invalid JSON

Example:

meta = load_metadata(Path("outputs/character_sheets/hero_20251023_143052.json"))
prompt = meta["prompt"]

create_generation_metadata(...) -> dict

Create standardized metadata dictionary.

Parameters:

  • prompt: Generation prompt (required)
  • backend: Backend used (required)
  • aspect_ratio: Aspect ratio (required)
  • temperature: Temperature value (required)
  • input_images: List of input image paths (optional)
  • generation_time: Time taken in seconds (optional)
  • **kwargs: Additional custom fields

Returns: Metadata dictionary with standard fields

Standard Fields:

  • timestamp: ISO format timestamp
  • prompt: Generation prompt
  • backend: Backend name
  • aspect_ratio: Aspect ratio string
  • temperature: Temperature value
  • version: Application version ("2.0.0-streamlit")
  • input_images: List of input paths (if provided)
  • generation_time_seconds: Time taken (if provided)

Example:

metadata = create_generation_metadata(
    prompt="sunset over mountains",
    backend="Gemini API (Cloud)",
    aspect_ratio="16:9",
    temperature=0.4,
    generation_time=3.5,
    character_name="Hero",  # Custom field
    stage="front_portrait"   # Custom field
)

Image Hashing

compute_image_hash(image: Image) -> str

Compute SHA-256 hash of image data.

Useful for detecting if input images have changed.

Returns: Hex string (64 characters)

Example:

hash1 = compute_image_hash(image1)
hash2 = compute_image_hash(image2)
if hash1 == hash2:
    print("Images are identical")

Directory Operations

ensure_output_directories()

Ensure all output directories exist.

Creates all directories defined in Settings if they don't exist. Called on startup.

get_output_directory_for_type(generation_type: str) -> Path

Get appropriate output directory for generation type.

Types:

  • "character_sheet"Settings.CHARACTER_SHEETS_DIR
  • "wardrobe"Settings.WARDROBE_CHANGES_DIR
  • "composition"Settings.COMPOSITIONS_DIR
  • "standard"Settings.STANDARD_DIR

Raises: ValueError if unknown type

Example:

output_dir = get_output_directory_for_type("character_sheet")
# Returns: Path("outputs/character_sheets")

list_recent_generations(generation_type: str, count: int = 10) -> list

List recent generation files in a directory.

Returns: List of (image_path, metadata_path) tuples, newest first

Metadata path is None if JSON file doesn't exist.

Example:

recent = list_recent_generations("character_sheet", count=5)
for img_path, meta_path in recent:
    image = load_image(img_path)
    if meta_path:
        metadata = load_metadata(meta_path)

Usage Examples

Service Saving Output

from utils.file_utils import save_image, create_generation_metadata, get_output_directory_for_type

class CharacterForgeService:
    def generate(self, prompt, backend, ...):
        # ... generation code ...

        # Create metadata
        metadata = create_generation_metadata(
            prompt=prompt,
            backend=backend,
            aspect_ratio="3:4",
            temperature=0.35,
            generation_time=elapsed_time,
            character_name=character_name,
            stage="front_portrait"
        )

        # Save image and metadata
        output_dir = get_output_directory_for_type("character_sheet")
        img_path, meta_path = save_image(
            image=generated_image,
            directory=output_dir,
            base_name=character_name,
            metadata=metadata
        )

        return img_path

Page Displaying Recent Generations

import streamlit as st
from utils.file_utils import list_recent_generations, load_image

st.subheader("Recent Character Sheets")

recent = list_recent_generations("character_sheet", count=4)

cols = st.columns(4)
for idx, (img_path, meta_path) in enumerate(recent):
    with cols[idx]:
        image = load_image(img_path)
        st.image(image, caption=img_path.stem, use_container_width=True)

Loading Previous Generation

from utils.file_utils import load_image, load_metadata

# User selects a previous generation
image_path = st.selectbox("Load previous", [...])

if image_path:
    # Load image
    image = load_image(Path(image_path))
    st.image(image)

    # Load metadata (if exists)
    meta_path = Path(image_path).with_suffix(".json")
    if meta_path.exists():
        metadata = load_metadata(meta_path)
        st.json(metadata)

        # Restore settings
        st.session_state.prompt = metadata["prompt"]
        st.session_state.backend = metadata["backend"]

Error Handling

File Operations

All functions raise appropriate exceptions:

  • FileNotFoundError: File doesn't exist
  • IOError: Read/write error
  • json.JSONDecodeError: Invalid JSON
  • ValueError: Invalid parameters

Errors are logged before raising.

Automatic Recovery

  • Directories created automatically if they don't exist
  • Filename conflicts resolved with counter suffix
  • Missing metadata handled gracefully (returns None)

Known Limitations

  • Filename length limit: 100 characters (base name)
  • No image format conversion (saves as PNG only)
  • No image compression options
  • No batch operations
  • No cloud storage integration
  • Hash only detects exact pixel matches (not perceptual similarity)

Future Improvements

  • Support multiple image formats (JPEG, WEBP)
  • Add image compression/quality options
  • Add batch save/load operations
  • Add cloud storage backends (S3, GCS)
  • Add perceptual image hashing (pHash)
  • Add image metadata embedding (EXIF)
  • Add file cleanup/archiving utilities
  • Add generation statistics tracking

Testing

  • Test sanitize_filename() with various unsafe characters
  • Test generate_timestamp_filename() format
  • Test get_unique_filename() collision handling
  • Test save_image() creates files correctly
  • Test load_image() with valid/invalid files
  • Test save/load_metadata() round-trip
  • Test create_generation_metadata() includes all fields
  • Test compute_image_hash() consistency
  • Test list_recent_generations() sorting

Related Files

  • config/settings.py - Directory path constants
  • utils/logging_utils.py - Logging functions
  • All services - Save generation results
  • All pages - Load and display files
  • models/generation_result.py - Uses metadata creation

Security Considerations

  • Filename sanitization prevents directory traversal
  • No arbitrary file paths allowed (always in Settings directories)
  • JSON encoding ensures no code injection
  • File permissions inherited from parent directory

Performance Considerations

  • Image hashing loads full image into memory
  • Large images may be slow to hash
  • list_recent_generations() sorts by modification time (fast)
  • JSON serialization is fast for typical metadata size

Change History

  • 2025-10-23: Initial creation for Streamlit migration
    • Centralized all file I/O operations
    • Added comprehensive filename handling
    • Added metadata standardization
    • Added directory management
    • Added recent generations listing
    • Integrated with Settings and logging