| # CompI Phase 1: Text-to-Image Generation Usage Guide | |
| This guide covers the Phase 1 implementation of CompI's text-to-image generation capabilities using Stable Diffusion. | |
| ## π Quick Start | |
| ### Basic Usage | |
| ```bash | |
| # Simple generation with interactive prompt | |
| python run_basic_generation.py | |
| # Generate from command line | |
| python run_basic_generation.py "A magical forest, digital art, highly detailed" | |
| # Or run directly from src/generators/ | |
| python src/generators/compi_phase1_text2image.py "A magical forest" | |
| ``` | |
| ### Advanced Usage | |
| ```bash | |
| # Advanced script with more options | |
| python run_advanced_generation.py "cyberpunk city at sunset" --negative "blurry, low quality" --steps 50 --batch 3 | |
| # Interactive mode for experimentation | |
| python run_advanced_generation.py --interactive | |
| # Or run directly from src/generators/ | |
| python src/generators/compi_phase1_advanced.py --interactive | |
| ``` | |
| ## π Available Scripts | |
| ### 1. `compi_phase1_text2image.py` - Basic Implementation | |
| **Features:** | |
| - Simple, standalone text-to-image generation | |
| - Automatic GPU/CPU detection | |
| - Command line or interactive prompts | |
| - Automatic output saving with descriptive filenames | |
| - Comprehensive logging | |
| **Usage:** | |
| ```bash | |
| python compi_phase1_text2image.py [prompt] | |
| ``` | |
| ### 2. `compi_phase1_advanced.py` - Enhanced Implementation | |
| **Features:** | |
| - Batch generation (multiple images) | |
| - Negative prompts (what to avoid) | |
| - Customizable parameters (steps, guidance, dimensions) | |
| - Interactive mode for experimentation | |
| - Metadata saving (JSON files with generation parameters) | |
| - Multiple model support | |
| **Command Line Options:** | |
| ```bash | |
| python compi_phase1_advanced.py [OPTIONS] [PROMPT] | |
| Options: | |
| --negative, -n TEXT Negative prompt (what to avoid) | |
| --steps, -s INTEGER Number of inference steps (default: 30) | |
| --guidance, -g FLOAT Guidance scale (default: 7.5) | |
| --seed INTEGER Random seed for reproducibility | |
| --batch, -b INTEGER Number of images to generate | |
| --width, -w INTEGER Image width (default: 512) | |
| --height INTEGER Image height (default: 512) | |
| --model, -m TEXT Model to use (default: runwayml/stable-diffusion-v1-5) | |
| --output, -o TEXT Output directory (default: outputs) | |
| --interactive, -i Interactive mode | |
| ``` | |
| ## π¨ Example Commands | |
| ### Basic Examples | |
| ```bash | |
| # Simple landscape | |
| python run_basic_generation.py "serene mountain lake, golden hour, photorealistic" | |
| # Digital art style | |
| python run_basic_generation.py "futuristic robot, neon lights, cyberpunk style, digital art" | |
| ``` | |
| ### Advanced Examples | |
| ```bash | |
| # High-quality generation with negative prompts | |
| python run_advanced_generation.py "beautiful portrait of a woman, oil painting style" \ | |
| --negative "blurry, distorted, low quality, bad anatomy" \ | |
| --steps 50 --guidance 8.0 | |
| # Batch generation with fixed seed | |
| python run_advanced_generation.py "abstract geometric patterns, colorful" \ | |
| --batch 5 --seed 12345 --steps 40 | |
| # Custom dimensions for landscape | |
| python run_advanced_generation.py "panoramic view of alien landscape" \ | |
| --width 768 --height 512 --steps 35 | |
| # Interactive experimentation | |
| python run_advanced_generation.py --interactive | |
| ``` | |
| ## π Output Structure | |
| Generated images are saved in the `outputs/` directory with descriptive filenames: | |
| ``` | |
| outputs/ | |
| βββ magical_forest_digital_art_20241225_143022_seed42.png | |
| βββ magical_forest_digital_art_20241225_143022_seed42_metadata.json | |
| βββ cyberpunk_city_sunset_20241225_143156_seed1337.png | |
| βββ cyberpunk_city_sunset_20241225_143156_seed1337_metadata.json | |
| ``` | |
| ### Metadata Files | |
| Each generated image (in advanced mode) includes a JSON metadata file with: | |
| - Original prompt and negative prompt | |
| - Generation parameters (steps, guidance, seed) | |
| - Image dimensions and model used | |
| - Timestamp and batch information | |
| ## βοΈ Configuration Tips | |
| ### For Best Quality | |
| - Use 30-50 inference steps | |
| - Guidance scale 7.5-12.0 | |
| - Include style descriptors ("digital art", "oil painting", "photorealistic") | |
| - Use negative prompts to avoid unwanted elements | |
| ### For Speed | |
| - Use 20-25 inference steps | |
| - Lower guidance scale (6.0-7.5) | |
| - Stick to 512x512 resolution | |
| ### For Experimentation | |
| - Use interactive mode | |
| - Try different seeds with the same prompt | |
| - Experiment with guidance scale values | |
| - Use batch generation to explore variations | |
| ## π§ Troubleshooting | |
| ### Common Issues | |
| 1. **CUDA out of memory**: Reduce batch size or image dimensions | |
| 2. **Slow generation**: Ensure CUDA is available and working | |
| 3. **Poor quality**: Increase steps, adjust guidance scale, improve prompts | |
| 4. **Model download fails**: Check internet connection, try again | |
| ### Performance Optimization | |
| - The scripts automatically enable attention slicing for memory efficiency | |
| - GPU detection is automatic | |
| - Models are cached after first download | |
| ## π¨ Phase 1.B: Style Conditioning & Prompt Engineering | |
| ### 3. `compi_phase1b_styled_generation.py` - Style Conditioning | |
| **Features:** | |
| - Interactive style and mood selection from curated lists | |
| - Intelligent prompt engineering and combination | |
| - Multiple variations with unique seeds | |
| - Comprehensive logging and filename organization | |
| **Usage:** | |
| ```bash | |
| python run_styled_generation.py [prompt] | |
| # Or directly: python src/generators/compi_phase1b_styled_generation.py [prompt] | |
| ``` | |
| ### 4. `compi_phase1b_advanced_styling.py` - Advanced Style Control | |
| **Features:** | |
| - 13 predefined art styles with optimized prompts and negative prompts | |
| - 9 mood categories with atmospheric conditioning | |
| - Quality presets (draft/standard/high) | |
| - Command line and interactive modes | |
| - Comprehensive metadata saving | |
| **Command Line Options:** | |
| ```bash | |
| python run_advanced_styling.py [OPTIONS] [PROMPT] | |
| # Or directly: python src/generators/compi_phase1b_advanced_styling.py [OPTIONS] [PROMPT] | |
| Options: | |
| --style, -s TEXT Art style (or number from list) | |
| --mood, -m TEXT Mood/atmosphere (or number from list) | |
| --variations, -v INT Number of variations (default: 1) | |
| --quality, -q CHOICE Quality preset [draft/standard/high] | |
| --negative, -n TEXT Negative prompt | |
| --interactive, -i Interactive mode | |
| --list-styles List available styles and exit | |
| --list-moods List available moods and exit | |
| ``` | |
| ### Style Conditioning Examples | |
| **Basic Style Selection:** | |
| ```bash | |
| # Interactive mode with guided selection | |
| python run_styled_generation.py | |
| # Command line with style selection | |
| python run_advanced_styling.py "mountain landscape" --style cyberpunk --mood dramatic | |
| ``` | |
| **Advanced Style Control:** | |
| ```bash | |
| # High quality with multiple variations | |
| python run_advanced_styling.py "portrait of a wizard" \ | |
| --style "oil painting" --mood "mysterious" \ | |
| --quality high --variations 3 \ | |
| --negative "blurry, distorted, amateur" | |
| # List available options | |
| python run_advanced_styling.py --list-styles | |
| python run_advanced_styling.py --list-moods | |
| ``` | |
| **Available Styles:** | |
| - digital art, oil painting, watercolor, cyberpunk | |
| - impressionist, concept art, anime, photorealistic | |
| - minimalist, surrealism, pixel art, steampunk, 3d render | |
| **Available Moods:** | |
| - dreamy, dark, peaceful, vibrant, melancholic | |
| - mysterious, whimsical, dramatic, retro | |
| ## π₯οΈ Phase 1.C: Interactive Web UI | |
| ### 5. `compi_phase1c_streamlit_ui.py` - Streamlit Web Interface | |
| **Features:** | |
| - Complete web-based interface for text-to-image generation | |
| - Interactive style and mood selection with custom options | |
| - Advanced settings (steps, guidance, dimensions, negative prompts) | |
| - Real-time image generation and display | |
| - Progress tracking and generation logs | |
| - Automatic saving with comprehensive metadata | |
| **Usage:** | |
| ```bash | |
| python run_ui.py | |
| # Or directly: streamlit run src/ui/compi_phase1c_streamlit_ui.py | |
| ``` | |
| ### 6. `compi_phase1c_gradio_ui.py` - Gradio Web Interface | |
| **Features:** | |
| - Alternative web interface with Gradio framework | |
| - Gallery view for multiple image variations | |
| - Collapsible advanced settings | |
| - Real-time generation logs | |
| - Mobile-friendly responsive design | |
| **Usage:** | |
| ```bash | |
| python run_gradio_ui.py | |
| # Or directly: python src/ui/compi_phase1c_gradio_ui.py | |
| ``` | |
| ## π Phase 1.D: Quality Evaluation Tools | |
| ### 7. `compi_phase1d_evaluate_quality.py` - Comprehensive Evaluation Interface | |
| **Features:** | |
| - Systematic image quality assessment with 5-criteria scoring system | |
| - Interactive Streamlit web interface for detailed evaluation | |
| - Objective metrics calculation (perceptual hashes, dimensions, file size) | |
| - Batch evaluation capabilities for efficient processing | |
| - Comprehensive logging and CSV export for trend analysis | |
| - Summary analytics with performance insights and recommendations | |
| **Usage:** | |
| ```bash | |
| python run_evaluation.py | |
| # Or directly: streamlit run src/generators/compi_phase1d_evaluate_quality.py | |
| ``` | |
| ### 8. `compi_phase1d_cli_evaluation.py` - Command-Line Evaluation Tools | |
| **Features:** | |
| - Batch evaluation and analysis from command line | |
| - Statistical summaries and performance reports | |
| - Filtering by style, mood, and evaluation status | |
| - Automated scoring for large image sets | |
| - Detailed report generation with recommendations | |
| **Command Line Options:** | |
| ```bash | |
| python src/generators/compi_phase1d_cli_evaluation.py [OPTIONS] | |
| Options: | |
| --analyze Display evaluation summary and statistics | |
| --report Generate detailed evaluation report | |
| --batch-score P S M Q A Batch score images (1-5 for each criteria) | |
| --list-all List all images with evaluation status | |
| --list-evaluated List only evaluated images | |
| --list-unevaluated List only unevaluated images | |
| --style TEXT Filter by style | |
| --mood TEXT Filter by mood | |
| --notes TEXT Notes for batch evaluation | |
| --output FILE Output file for reports | |
| ``` | |
| ## π¨ Phase 1.E: Personal Style Fine-tuning (LoRA) | |
| ### 9. `compi_phase1e_dataset_prep.py` - Dataset Preparation for LoRA Training | |
| **Features:** | |
| - Organize and validate personal style images for training | |
| - Generate appropriate training captions with trigger words | |
| - Resize and format images for optimal LoRA training | |
| - Create train/validation splits with metadata tracking | |
| - Support for multiple image formats and quality validation | |
| **Usage:** | |
| ```bash | |
| python src/generators/compi_phase1e_dataset_prep.py --input-dir my_artwork --style-name "my_art_style" | |
| # Or via wrapper: python run_dataset_prep.py --input-dir my_artwork --style-name "my_art_style" | |
| ``` | |
| ### 10. `compi_phase1e_lora_training.py` - LoRA Fine-tuning Engine | |
| **Features:** | |
| - Full LoRA (Low-Rank Adaptation) fine-tuning pipeline | |
| - Memory-efficient training with gradient checkpointing | |
| - Configurable LoRA parameters (rank, alpha, learning rate) | |
| - Automatic checkpoint saving and validation monitoring | |
| - Integration with PEFT library for optimal performance | |
| **Command Line Options:** | |
| ```bash | |
| python run_lora_training.py [OPTIONS] --dataset-dir DATASET_DIR | |
| Options: | |
| --dataset-dir DIR Required: Prepared dataset directory | |
| --epochs INT Number of training epochs (default: 100) | |
| --learning-rate FLOAT Learning rate (default: 1e-4) | |
| --lora-rank INT LoRA rank (default: 4) | |
| --lora-alpha INT LoRA alpha (default: 32) | |
| --batch-size INT Training batch size (default: 1) | |
| --save-steps INT Save checkpoint every N steps | |
| --gradient-checkpointing Enable gradient checkpointing for memory efficiency | |
| --mixed-precision Use mixed precision training | |
| ``` | |
| ### 11. `compi_phase1e_style_generation.py` - Personal Style Generation | |
| **Features:** | |
| - Generate images using trained LoRA personal styles | |
| - Adjustable style strength and generation parameters | |
| - Interactive and batch generation modes | |
| - Integration with existing CompI pipeline and metadata | |
| - Support for multiple LoRA styles and model switching | |
| **Usage:** | |
| ```bash | |
| python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "a cat in my_style" | |
| # Or directly: python src/generators/compi_phase1e_style_generation.py --lora-path PATH PROMPT | |
| ``` | |
| ### 12. `compi_phase1e_style_manager.py` - LoRA Style Management | |
| **Features:** | |
| - Manage multiple trained LoRA styles and checkpoints | |
| - Cleanup old checkpoints and organize model storage | |
| - Export style information and training analytics | |
| - Style database with automatic scanning and metadata | |
| - Batch operations for style maintenance and organization | |
| **Command Line Options:** | |
| ```bash | |
| python src/generators/compi_phase1e_style_manager.py [OPTIONS] | |
| Options: | |
| --list List all available LoRA styles | |
| --info STYLE_NAME Show detailed information about a style | |
| --refresh Refresh the styles database | |
| --cleanup STYLE_NAME Clean up old checkpoints for a style | |
| --export OUTPUT_FILE Export styles information to CSV | |
| --delete STYLE_NAME Delete a LoRA style (requires --confirm) | |
| ``` | |
| ### Web UI Examples | |
| **Streamlit Interface:** | |
| - Navigate to http://localhost:8501 after running | |
| - Full-featured interface with sidebar settings | |
| - Progress bars and status updates | |
| - Expandable sections for details | |
| **Gradio Interface:** | |
| - Navigate to http://localhost:7860 after running | |
| - Gallery-style image display | |
| - Compact, mobile-friendly design | |
| - Real-time generation feedback | |
| ## π― Next Steps | |
| Phase 1 establishes the foundation for CompI's text-to-image capabilities. Future phases will add: | |
| - Audio input processing | |
| - Emotion and style conditioning | |
| - Real-time data integration | |
| - Multimodal fusion | |
| - Advanced UI interfaces | |
| ## π Resources | |
| - [Stable Diffusion Documentation](https://huggingface.co/docs/diffusers) | |
| - [Prompt Engineering Guide](https://prompthero.com/stable-diffusion-prompt-guide) | |
| - [CompI Development Plan](development.md) | |