HPSv3 / generate /README.md
sdsdgwe's picture
update
9b57ce7

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Image Generation Module

This module is designed for generating images from text prompts using various pretrained diffusion models. It supports parallel generation across multiple GPUs and can be extended to include new models easily.

File Structure

  • gen_images_from_prompt.py: The main script for running the image generation process. It reads prompts from a JSON file and handles command-line arguments.
  • generator.py: Contains the core Generator class, which manages the model pipelines and distributes the generation tasks across different devices.
  • utils/pipelines.py: Defines the configurations for all supported pretrained models. This is where you can add or modify model parameters.
  • utils/utils.py: Contains helper functions for initializing diffusers pipelines and interacting with model APIs.

How to Use

To generate images, run the main script with the required arguments.

Basic Command

python gen_images_from_prompt.py \
    --json_path /path/to/your/prompts.json \
    --out_dir /path/to/your/output_directory \
    --pipeline_name sd_xl_pipe flux_schnell_pipe

Command-Line Arguments

  • --json_path (required): Path to a JSON file containing a list of prompts. Each item in the list should be an object with a "caption" key.

    • For generating images according to real images, you should specify "image_file" which is the original image path, and "aspect_ratio" of this image. The specific height and width will be adjusted according to model's best practice resolution.

    • For generating images from prompt only, you should specify "save_name", "height" and "width" Example prompts.json format:

    [
      {
        "image_file": "1.jpg",
        "caption": "A beautiful landscape painting of a mountain range at sunset.",
        "aspect_ratio": 0.5,
      },
      {
        "image_file": "2.jpg",
        "caption": "A close-up photo of a red rose with water droplets.",
        "aspect_ratio": 1.0,
      },
      {
        "image_file": "3.jpg",
        "caption": "An astronaut riding a horse on Mars, digital art.",
        "aspect_ratio": 1.77,
      }
    ]
    
  • --out_dir (required): The root directory where generated images will be saved. A subdirectory will be created for each pipeline.

  • --pipeline_name (required): One or more pipeline configuration names to use for generation. These names must correspond to the PipelineParam variable names defined in utils/pipelines.py.

  • --num_devices: The number of GPU devices to use for generation. Defaults to 8.

  • --batch_size: The batch size per device. Defaults to 1.

  • --num_machine: The total number of machines used in a distributed setup. Defaults to 1.

  • --machine_id: The ID of the current machine in a distributed setup. Defaults to 0.

  • --enable_availabel_check: If set, the script will first run a quick check on a small batch to ensure each pipeline can be loaded and run without errors.

  • --reverse: If set, the order of the specified pipelines will be reversed.

How to Add a New Model

You can easily add a new text-to-image model by configuring it in the utils/pipelines.py file.

  1. Open utils/pipelines.py.

  2. Import PipelineParam if it's not already imported.

  3. Create a new PipelineParam instance for your model. Define the following parameters:

    • pipeline_name: The model's path on the Hugging Face Hub or a local directory.
    • generation_path: The name of the subdirectory where the output images will be saved.
    • pipeline_type: The type of pipeline, e.g., 't2i' (text-to-image) or 't2v' (text-to-video). Defaults to 't2i'.
    • pipe_init_kwargs: A dictionary of arguments required for initializing the model pipeline (e.g., {"torch_dtype": torch.float16}).
    • generation_kwargs: A dictionary of arguments for the generation process (e.g., {"guidance_scale": 7.0, "num_inference_steps": 28}).
    • base_resolution: The base resolution the model was trained on (e.g., 1024).
    • force_aspect_ratio: Optionally force a specific aspect ratio (e.g., 1 for square images).

    Example:

    from pydantic import BaseModel, Field
    import torch
    
    class PipelineParam(BaseModel):
        # ... (class definition)
    
    # Add your new model configuration
    my_new_model_pipe = PipelineParam(
            pipeline_name='organization/my-cool-model',
            generation_path=f'generation/my_cool_model',
            pipe_init_kwargs={
                "torch_dtype": torch.float16,
            },
            base_resolution=1024,
            generation_kwargs={
                "guidance_scale": 5.0,
                "num_inference_steps": 30,
            }
        )
    
  4. Run the generation script using the name of your new PipelineParam variable in the --pipeline_name argument.

python gen_images_from_prompt.py --pipeline_name my_new_model_pipe ...