DiffuseIT-ckpt / README.md
BiliSakura's picture
Add files using upload-large-folder tool
85a3dd9 verified
metadata
language: en
library_name: pytorch-image-translation-models
pipeline_tag: image-to-image
tags:
  - image-to-image
  - diffusion
  - image-translation
  - DiffuseIT
  - text-guided
  - style-transfer

DiffuseIT Checkpoints

Diffusion-based Image Translation using Disentangled Style and Content Representation (Kwon & Ye, ICLR 2023).

Converted from cyclomon/DiffuseIT for use with pytorch-image-translation-models.

Model Variants

Subfolder Dataset Resolution Description
imagenet256-uncond ImageNet 256×256 Unconditional diffusion model for general image translation
ffhq-256 FFHQ 256×256 Face-focused model with identity preservation (self-contained: unet + id_model)

Installation

pip install pytorch-image-translation-models

Clone DiffuseIT repository (required for CLIP, VIT losses):

git clone https://github.com/cyclomon/DiffuseIT.git projects/DiffuseIT
cd projects/DiffuseIT
pip install ftfy regex lpips kornia opencv-python color-matcher
pip install git+https://github.com/openai/CLIP.git

Usage

from examples.community.diffuseit import load_diffuseit_community_pipeline

# ImageNet 256
pipe = load_diffuseit_community_pipeline(
    "BiliSakura/DiffuseIT-ckpt/imagenet256-uncond",  # or local path
    diffuseit_src_path="projects/DiffuseIT",
)
pipe.to("cuda")

# Text-guided
out = pipe(
    source_image=img,
    prompt="Black Leopard",
    source="Lion",
    use_range_restart=True,
    use_noise_aug_all=True,
    output_type="pil",
)

# Image-guided
out = pipe(
    source_image=img,
    target_image=style_ref,
    use_colormatch=True,
    output_type="pil",
)

Citation

@inproceedings{kwon2023diffuseit,
  title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
  author={Kwon, Gihyun and Ye, Jong Chul},
  booktitle={ICLR},
  year={2023},
  url={https://arxiv.org/abs/2209.15264}
}