PixArt-Σ LoRA Fine-tuned for Lego Image Generation

This model is a LoRA fine-tuned version of PixArt-alpha/PixArt-Sigma-XL-2-1024-MS on the Norod78/lego-blip-captions-512 dataset for generating lego style images.

Model Details

Base Model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
Training Method: LoRA (Low-Rank Adaptation)
Domain: Lego
Dataset: Norod78/lego-blip-captions-512
LoRA Rank: 16
LoRA Alpha: 32
Task: Text-to-Image Generation

Training Details

Epochs: 50
Batch Size: 1
Gradient Accumulation Steps: 4
Learning Rate: 1e-4
Training Steps: 1500
Mixed Precision: FP16

Usage

from diffusers import PixArtSigmaPipeline
import torch

# Load pipeline
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
    torch_dtype=torch.float16
).to("cuda")

# Load LoRA weights
pipe.load_lora_weights("matthew816/pixart-lora-lego")

# Generate image from text
prompt = "lego style, a cat sitting on a chair"
image = pipe(
    prompt=prompt,
    num_inference_steps=20,
    guidance_scale=4.5
).images[0]
image.save("generated_lego_image.png")

Examples

This model generates images in lego style from text descriptions.

Example prompts:

"lego style, a dragon flying over mountains"
"lego style, a robot playing guitar"
"lego style, a sunset over the ocean"

Citation

If you use this model, please cite the original PixArt-Σ model and the dataset.

@article{chen2024pixart,
  title={PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation},
  author={Chen, Junsong and others},
  journal={arXiv preprint arXiv:2403.04692},
  year={2024}
}

Downloads last month: 5

Model tree for matthew816/pixart-lora-lego

Base model

PixArt-alpha/PixArt-Sigma-XL-2-1024-MS

Adapter

(2)

this model

Dataset used to train matthew816/pixart-lora-lego

Paper for matthew816/pixart-lora-lego

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7, 2024 • 40