Add files using upload-large-folder tool
Browse files- README.md +83 -0
- ffhq-256/README.md +59 -0
- ffhq-256/id_model/README.md +3 -0
- ffhq-256/id_model/config.json +4 -0
- ffhq-256/id_model/model_ir_se50.safetensors +3 -0
- ffhq-256/unet/config.json +20 -0
- ffhq-256/unet/diffusion_pytorch_model.safetensors +3 -0
- imagenet256-uncond/README.md +58 -0
- imagenet256-uncond/unet/config.json +22 -0
- imagenet256-uncond/unet/diffusion_pytorch_model.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
library_name: pytorch-image-translation-models
|
| 4 |
+
pipeline_tag: image-to-image
|
| 5 |
+
tags:
|
| 6 |
+
- image-to-image
|
| 7 |
+
- diffusion
|
| 8 |
+
- image-translation
|
| 9 |
+
- DiffuseIT
|
| 10 |
+
- text-guided
|
| 11 |
+
- style-transfer
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# DiffuseIT Checkpoints
|
| 15 |
+
|
| 16 |
+
Diffusion-based Image Translation using Disentangled Style and Content Representation ([Kwon & Ye, ICLR 2023](https://arxiv.org/abs/2209.15264)).
|
| 17 |
+
|
| 18 |
+
Converted from [cyclomon/DiffuseIT](https://github.com/cyclomon/DiffuseIT) for use with `pytorch-image-translation-models`.
|
| 19 |
+
|
| 20 |
+
## Model Variants
|
| 21 |
+
|
| 22 |
+
| Subfolder | Dataset | Resolution | Description |
|
| 23 |
+
|-----------|---------|------------|-------------|
|
| 24 |
+
| [imagenet256-uncond](imagenet256-uncond/) | ImageNet | 256×256 | Unconditional diffusion model for general image translation |
|
| 25 |
+
| [ffhq-256](ffhq-256/) | FFHQ | 256×256 | Face-focused model with identity preservation (self-contained: unet + id_model) |
|
| 26 |
+
|
| 27 |
+
## Installation
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
pip install pytorch-image-translation-models
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
Clone DiffuseIT repository (required for CLIP, VIT losses):
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
git clone https://github.com/cyclomon/DiffuseIT.git projects/DiffuseIT
|
| 37 |
+
cd projects/DiffuseIT
|
| 38 |
+
pip install ftfy regex lpips kornia opencv-python color-matcher
|
| 39 |
+
pip install git+https://github.com/openai/CLIP.git
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
## Usage
|
| 43 |
+
|
| 44 |
+
```python
|
| 45 |
+
from examples.community.diffuseit import load_diffuseit_community_pipeline
|
| 46 |
+
|
| 47 |
+
# ImageNet 256
|
| 48 |
+
pipe = load_diffuseit_community_pipeline(
|
| 49 |
+
"BiliSakura/DiffuseIT-ckpt/imagenet256-uncond", # or local path
|
| 50 |
+
diffuseit_src_path="projects/DiffuseIT",
|
| 51 |
+
)
|
| 52 |
+
pipe.to("cuda")
|
| 53 |
+
|
| 54 |
+
# Text-guided
|
| 55 |
+
out = pipe(
|
| 56 |
+
source_image=img,
|
| 57 |
+
prompt="Black Leopard",
|
| 58 |
+
source="Lion",
|
| 59 |
+
use_range_restart=True,
|
| 60 |
+
use_noise_aug_all=True,
|
| 61 |
+
output_type="pil",
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
# Image-guided
|
| 65 |
+
out = pipe(
|
| 66 |
+
source_image=img,
|
| 67 |
+
target_image=style_ref,
|
| 68 |
+
use_colormatch=True,
|
| 69 |
+
output_type="pil",
|
| 70 |
+
)
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
## Citation
|
| 74 |
+
|
| 75 |
+
```bibtex
|
| 76 |
+
@inproceedings{kwon2023diffuseit,
|
| 77 |
+
title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
|
| 78 |
+
author={Kwon, Gihyun and Ye, Jong Chul},
|
| 79 |
+
booktitle={ICLR},
|
| 80 |
+
year={2023},
|
| 81 |
+
url={https://arxiv.org/abs/2209.15264}
|
| 82 |
+
}
|
| 83 |
+
```
|
ffhq-256/README.md
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
library_name: pytorch-image-translation-models
|
| 4 |
+
pipeline_tag: image-to-image
|
| 5 |
+
tags:
|
| 6 |
+
- image-to-image
|
| 7 |
+
- diffusion
|
| 8 |
+
- DiffuseIT
|
| 9 |
+
- FFHQ
|
| 10 |
+
- face
|
| 11 |
+
- identity-preservation
|
| 12 |
+
- text-guided
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# DiffuseIT: FFHQ 256
|
| 16 |
+
|
| 17 |
+
Face-focused diffusion model with identity preservation. Pre-trained on FFHQ 256×256.
|
| 18 |
+
|
| 19 |
+
**Source:** [cyclomon/DiffuseIT](https://github.com/cyclomon/DiffuseIT) — converted from `ffhq_10m.pt`
|
| 20 |
+
|
| 21 |
+
## Model Description
|
| 22 |
+
|
| 23 |
+
- **Architecture**: Guided diffusion (OpenAI-style UNet, face-optimized)
|
| 24 |
+
- **Resolution**: 256×256
|
| 25 |
+
- **Task**: Face image translation with identity preservation (use `use_ffhq=True`)
|
| 26 |
+
- **Self-contained**: Includes `id_model/` (ArcFace IR-SE50) for identity loss
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
from examples.community.diffuseit import load_diffuseit_community_pipeline
|
| 32 |
+
|
| 33 |
+
pipe = load_diffuseit_community_pipeline(
|
| 34 |
+
"BiliSakura/DiffuseIT-ckpt/ffhq-256",
|
| 35 |
+
use_ffhq=True,
|
| 36 |
+
diffuseit_src_path="projects/DiffuseIT",
|
| 37 |
+
)
|
| 38 |
+
pipe.to("cuda")
|
| 39 |
+
|
| 40 |
+
out = pipe(
|
| 41 |
+
source_image=face_img,
|
| 42 |
+
prompt="Target description",
|
| 43 |
+
source="Source description",
|
| 44 |
+
use_range_restart=True,
|
| 45 |
+
output_type="pil",
|
| 46 |
+
)
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## Citation
|
| 50 |
+
|
| 51 |
+
```bibtex
|
| 52 |
+
@inproceedings{kwon2023diffuseit,
|
| 53 |
+
title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
|
| 54 |
+
author={Kwon, Gihyun and Ye, Jong Chul},
|
| 55 |
+
booktitle={ICLR},
|
| 56 |
+
year={2023},
|
| 57 |
+
url={https://arxiv.org/abs/2209.15264}
|
| 58 |
+
}
|
| 59 |
+
```
|
ffhq-256/id_model/README.md
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ArcFace IR-SE50
|
| 2 |
+
|
| 3 |
+
ArcFace ResNet-50 IR-SE for face identity preservation. Used by DiffuseIT when `use_ffhq=True`.
|
ffhq-256/id_model/config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "ArcFaceIR_SE50",
|
| 3 |
+
"_converted_from": "model_ir_se50.pth"
|
| 4 |
+
}
|
ffhq-256/id_model/model_ir_se50.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c8b97cc250617df1074cf5defa4059d6b5c6187d3bbec7944800c200bbae9dfb
|
| 3 |
+
size 175329792
|
ffhq-256/unet/config.json
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"image_size": 256,
|
| 3 |
+
"num_channels": 128,
|
| 4 |
+
"num_res_blocks": 1,
|
| 5 |
+
"channel_mult": [
|
| 6 |
+
1,
|
| 7 |
+
1,
|
| 8 |
+
2,
|
| 9 |
+
2,
|
| 10 |
+
4,
|
| 11 |
+
4
|
| 12 |
+
],
|
| 13 |
+
"attention_resolutions": [
|
| 14 |
+
16
|
| 15 |
+
],
|
| 16 |
+
"out_channels": 6,
|
| 17 |
+
"learn_sigma": true,
|
| 18 |
+
"_class_name": "DiffuseITGuidedDiffusionUNet",
|
| 19 |
+
"_converted_from": "ffhq_10m.pt"
|
| 20 |
+
}
|
ffhq-256/unet/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ccf128ed09090f855832fed124ad12b44079822451f190b31921a6507f36d459
|
| 3 |
+
size 374293968
|
imagenet256-uncond/README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
library_name: pytorch-image-translation-models
|
| 4 |
+
pipeline_tag: image-to-image
|
| 5 |
+
tags:
|
| 6 |
+
- image-to-image
|
| 7 |
+
- diffusion
|
| 8 |
+
- DiffuseIT
|
| 9 |
+
- ImageNet
|
| 10 |
+
- text-guided
|
| 11 |
+
- style-transfer
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# DiffuseIT: ImageNet 256 Unconditional
|
| 15 |
+
|
| 16 |
+
Unconditional diffusion model for general image translation. Pre-trained on ImageNet 256×256.
|
| 17 |
+
|
| 18 |
+
**Source:** [cyclomon/DiffuseIT](https://github.com/cyclomon/DiffuseIT) — converted from `256x256_diffusion_uncond.pt`
|
| 19 |
+
|
| 20 |
+
## Model Description
|
| 21 |
+
|
| 22 |
+
- **Architecture**: Guided diffusion (OpenAI-style UNet)
|
| 23 |
+
- **Resolution**: 256×256
|
| 24 |
+
- **Task**: Text-guided and image-guided image translation
|
| 25 |
+
|
| 26 |
+
## Usage
|
| 27 |
+
|
| 28 |
+
```python
|
| 29 |
+
from examples.community.diffuseit import load_diffuseit_community_pipeline
|
| 30 |
+
|
| 31 |
+
pipe = load_diffuseit_community_pipeline(
|
| 32 |
+
"BiliSakura/DiffuseIT-ckpt/imagenet256-uncond",
|
| 33 |
+
diffuseit_src_path="projects/DiffuseIT",
|
| 34 |
+
)
|
| 35 |
+
pipe.to("cuda")
|
| 36 |
+
|
| 37 |
+
# Text-guided
|
| 38 |
+
out = pipe(
|
| 39 |
+
source_image=img,
|
| 40 |
+
prompt="Black Leopard",
|
| 41 |
+
source="Lion",
|
| 42 |
+
use_range_restart=True,
|
| 43 |
+
use_noise_aug_all=True,
|
| 44 |
+
output_type="pil",
|
| 45 |
+
)
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Citation
|
| 49 |
+
|
| 50 |
+
```bibtex
|
| 51 |
+
@inproceedings{kwon2023diffuseit,
|
| 52 |
+
title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
|
| 53 |
+
author={Kwon, Gihyun and Ye, Jong Chul},
|
| 54 |
+
booktitle={ICLR},
|
| 55 |
+
year={2023},
|
| 56 |
+
url={https://arxiv.org/abs/2209.15264}
|
| 57 |
+
}
|
| 58 |
+
```
|
imagenet256-uncond/unet/config.json
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"image_size": 256,
|
| 3 |
+
"num_channels": 256,
|
| 4 |
+
"num_res_blocks": 2,
|
| 5 |
+
"channel_mult": [
|
| 6 |
+
1,
|
| 7 |
+
1,
|
| 8 |
+
2,
|
| 9 |
+
2,
|
| 10 |
+
4,
|
| 11 |
+
4
|
| 12 |
+
],
|
| 13 |
+
"attention_resolutions": [
|
| 14 |
+
8,
|
| 15 |
+
16,
|
| 16 |
+
32
|
| 17 |
+
],
|
| 18 |
+
"out_channels": 6,
|
| 19 |
+
"learn_sigma": true,
|
| 20 |
+
"_class_name": "DiffuseITGuidedDiffusionUNet",
|
| 21 |
+
"_converted_from": "256x256_diffusion_uncond.pt"
|
| 22 |
+
}
|
imagenet256-uncond/unet/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:da7e1e247a9d1fd8e676f6471fc265f83c46e2926050e9a19e56593370d632fa
|
| 3 |
+
size 2211317416
|