A newer version of the Gradio SDK is available:
6.1.0
Image Generation Module
This module is designed for generating images from text prompts using various pretrained diffusion models. It supports parallel generation across multiple GPUs and can be extended to include new models easily.
File Structure
gen_images_from_prompt.py: The main script for running the image generation process. It reads prompts from a JSON file and handles command-line arguments.generator.py: Contains the coreGeneratorclass, which manages the model pipelines and distributes the generation tasks across different devices.utils/pipelines.py: Defines the configurations for all supported pretrained models. This is where you can add or modify model parameters.utils/utils.py: Contains helper functions for initializingdiffuserspipelines and interacting with model APIs.
How to Use
To generate images, run the main script with the required arguments.
Basic Command
python gen_images_from_prompt.py \
--json_path /path/to/your/prompts.json \
--out_dir /path/to/your/output_directory \
--pipeline_name sd_xl_pipe flux_schnell_pipe
Command-Line Arguments
--json_path(required): Path to a JSON file containing a list of prompts. Each item in the list should be an object with a"caption"key.For generating images according to real images, you should specify
"image_file"which is the original image path, and"aspect_ratio"of this image. The specific height and width will be adjusted according to model's best practice resolution.For generating images from prompt only, you should specify
"save_name","height"and"width"Exampleprompts.jsonformat:
[ { "image_file": "1.jpg", "caption": "A beautiful landscape painting of a mountain range at sunset.", "aspect_ratio": 0.5, }, { "image_file": "2.jpg", "caption": "A close-up photo of a red rose with water droplets.", "aspect_ratio": 1.0, }, { "image_file": "3.jpg", "caption": "An astronaut riding a horse on Mars, digital art.", "aspect_ratio": 1.77, } ]--out_dir(required): The root directory where generated images will be saved. A subdirectory will be created for each pipeline.--pipeline_name(required): One or more pipeline configuration names to use for generation. These names must correspond to thePipelineParamvariable names defined inutils/pipelines.py.--num_devices: The number of GPU devices to use for generation. Defaults to8.--batch_size: The batch size per device. Defaults to1.--num_machine: The total number of machines used in a distributed setup. Defaults to1.--machine_id: The ID of the current machine in a distributed setup. Defaults to0.--enable_availabel_check: If set, the script will first run a quick check on a small batch to ensure each pipeline can be loaded and run without errors.--reverse: If set, the order of the specified pipelines will be reversed.
How to Add a New Model
You can easily add a new text-to-image model by configuring it in the utils/pipelines.py file.
Open
utils/pipelines.py.Import
PipelineParamif it's not already imported.Create a new
PipelineParaminstance for your model. Define the following parameters:pipeline_name: The model's path on the Hugging Face Hub or a local directory.generation_path: The name of the subdirectory where the output images will be saved.pipeline_type: The type of pipeline, e.g.,'t2i'(text-to-image) or't2v'(text-to-video). Defaults to't2i'.pipe_init_kwargs: A dictionary of arguments required for initializing the model pipeline (e.g.,{"torch_dtype": torch.float16}).generation_kwargs: A dictionary of arguments for the generation process (e.g.,{"guidance_scale": 7.0, "num_inference_steps": 28}).base_resolution: The base resolution the model was trained on (e.g.,1024).force_aspect_ratio: Optionally force a specific aspect ratio (e.g.,1for square images).
Example:
from pydantic import BaseModel, Field import torch class PipelineParam(BaseModel): # ... (class definition) # Add your new model configuration my_new_model_pipe = PipelineParam( pipeline_name='organization/my-cool-model', generation_path=f'generation/my_cool_model', pipe_init_kwargs={ "torch_dtype": torch.float16, }, base_resolution=1024, generation_kwargs={ "guidance_scale": 5.0, "num_inference_steps": 30, } )Run the generation script using the name of your new
PipelineParamvariable in the--pipeline_nameargument.
python gen_images_from_prompt.py --pipeline_name my_new_model_pipe ...