Chronicle / design.md
topguy's picture
UX Polish, Metadata Embedding & Local Name Generation
de132df

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

RPGPortrait Design Document

Project Overview

RPGPortrait is a Gradio-based web application that helps users build detailed prompts and generate character portraits leveraging Gemini AI models.

UI Design

The UI is built using gradio.Blocks to allow for a structured, multi-column layout.

Layout Structure

  • Global Control: Save/Load character (JSON), Randomize features, Character Naming.
  • Input Sections: Organized into columns/rows of dropdown menus.
    • Identity: Race, Class (with 🎲 Auto-Name), Gender, Age.
    • Appearance: Hair Style/Color, Eye Color, Build, Skin Tone, Distinguishing Features.
    • Equipment: Armor/Clothing, Weapons, Accessories (up to 2), Materials.
    • Environment: Background, Lighting, Atmosphere.
    • Artistic Style: Art Style, Mood, Special Effects (VFX).
  • Generation Suite:
    • Large technical prompt textbox (read-only).
    • 🧠 Refine with Gemini: Calls Gemini 3 Pro to create a vivid prompt.
    • 🖼️ Generate Image: Routes to selected backend (Gemini or ComfyUI).
  • Backend Selector:
    • Radio buttons to toggle between Gemini (Cloud) and ComfyUI (Local).
  • Output Area:
    • Image display for the portrait.
    • 📥 Download button (PNG) for the generated image.

Prompt Assembly Logic

The app uses a 3-stage prompt pipeline:

  1. Technical Segmenting: Combines dropdown values and extra info text into a base technical prompt using a YAML-based template.
  2. AI Refinement: The technical prompt is sent to an LLM to "art-ify" the description. The system instructions for this are stored in prompts.yaml.
    • Cloud: Uses gemini-3-pro-preview.
    • Local: Uses Ollama (e.g., llama3).
  3. Image Synthesis:
    • Cloud: Sent to imagen-4.0-generate-001.
    • Local: Sent to a ComfyUI endpoint via POST /prompt, utilizing a custom workflow (comfy_rpg_char_gen.json).

Naming & Metadata

  • Local Naming: Uses a procedural generator based on the fictional-names library for instant, thematic results.
  • Metadata Embedding: Generation prompts and character names are embedded directly into the PNG tEXt chunks (Comment and CharacterName keys).
  • Filenames: Character names are sanitized and used for both JSON and PNG output.

Data & Persistence

  • YAML Configuration: features.yaml stores all possible dropdown values, their descriptive labels, and the final prompt template.
  • JSON Serialization: "Save/Load" functionality allows users to export and import their full character state as a JSON file.

Technical Stack

  • Backend: Python 3
  • SDK: google-genai
  • UI Framework: Gradio
  • AI Models:
    • Text (Cloud): gemini-3-pro-preview
    • Text (Local): Ollama
  • Image Cloud: imagen-4.0-generate-001
  • Image Local: ComfyUI (Local Server)
  • Configuration:
    • .env for API keys and connection hosts/ports (ComfyUI, Ollama).

Maintenance & Development Lessons

1. Server Lifecycle

  • Mandatory Restarts: Any modification to the modules/ folder or configuration files (features.yaml, prompts.yaml) requires a manual restart of the Gradio server.
  • Process Management (CRITICAL): Always ensure the previously running process is fully terminated before starting a new one.
  • Safe Command: Use this PowerShell command to only kill the process using the target port (7860):
    Stop-Process -Id (Get-NetTCPConnection -LocalPort 7860 -ErrorAction SilentlyContinue).OwningProcess -Force -ErrorAction SilentlyContinue
    

2. Project Architecture (Modular)

  • app.py: Entry point that launches the Gradio demo.
  • modules/config.py: Global constants and environment variables.
  • modules/integrations.py: Wrappers for AI backends (Gemini, Ollama, ComfyUI).
  • modules/core_logic.py: Character state management and prompt assembly.
  • modules/ui_layout.py: The full Gradio UI definition (build_ui).
  • modules/name_generator.py: Local procedural name generator using the fictional-names library.
  • comfy/: Dedicated folder for ComfyUI-specific JSON workflows and utility scripts.

3. Deployment (Hugging Face Spaces)

  • Containerization: The app is containerized using the provided Dockerfile and .dockerignore.
  • User Permissions: The Dockerfile uses useradd -m -u 1000 user to comply with Hugging Face's security requirements for non-root users.
  • Port Mapping: Hugging Face Spaces expects the app on port 7860. The GRADIO_SERVER_NAME="0.0.0.0" and GRADIO_SERVER_PORT=7860 environment variables ensure the app is bound correctly for external routing.
  • Local Testing: Run docker-compose up --build to verify the deployment state locally.