BiliSakura commited on
Commit
85a3dd9
·
verified ·
1 Parent(s): 1a54a2c

Add files using upload-large-folder tool

Browse files
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: pytorch-image-translation-models
4
+ pipeline_tag: image-to-image
5
+ tags:
6
+ - image-to-image
7
+ - diffusion
8
+ - image-translation
9
+ - DiffuseIT
10
+ - text-guided
11
+ - style-transfer
12
+ ---
13
+
14
+ # DiffuseIT Checkpoints
15
+
16
+ Diffusion-based Image Translation using Disentangled Style and Content Representation ([Kwon & Ye, ICLR 2023](https://arxiv.org/abs/2209.15264)).
17
+
18
+ Converted from [cyclomon/DiffuseIT](https://github.com/cyclomon/DiffuseIT) for use with `pytorch-image-translation-models`.
19
+
20
+ ## Model Variants
21
+
22
+ | Subfolder | Dataset | Resolution | Description |
23
+ |-----------|---------|------------|-------------|
24
+ | [imagenet256-uncond](imagenet256-uncond/) | ImageNet | 256×256 | Unconditional diffusion model for general image translation |
25
+ | [ffhq-256](ffhq-256/) | FFHQ | 256×256 | Face-focused model with identity preservation (self-contained: unet + id_model) |
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ pip install pytorch-image-translation-models
31
+ ```
32
+
33
+ Clone DiffuseIT repository (required for CLIP, VIT losses):
34
+
35
+ ```bash
36
+ git clone https://github.com/cyclomon/DiffuseIT.git projects/DiffuseIT
37
+ cd projects/DiffuseIT
38
+ pip install ftfy regex lpips kornia opencv-python color-matcher
39
+ pip install git+https://github.com/openai/CLIP.git
40
+ ```
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ from examples.community.diffuseit import load_diffuseit_community_pipeline
46
+
47
+ # ImageNet 256
48
+ pipe = load_diffuseit_community_pipeline(
49
+ "BiliSakura/DiffuseIT-ckpt/imagenet256-uncond", # or local path
50
+ diffuseit_src_path="projects/DiffuseIT",
51
+ )
52
+ pipe.to("cuda")
53
+
54
+ # Text-guided
55
+ out = pipe(
56
+ source_image=img,
57
+ prompt="Black Leopard",
58
+ source="Lion",
59
+ use_range_restart=True,
60
+ use_noise_aug_all=True,
61
+ output_type="pil",
62
+ )
63
+
64
+ # Image-guided
65
+ out = pipe(
66
+ source_image=img,
67
+ target_image=style_ref,
68
+ use_colormatch=True,
69
+ output_type="pil",
70
+ )
71
+ ```
72
+
73
+ ## Citation
74
+
75
+ ```bibtex
76
+ @inproceedings{kwon2023diffuseit,
77
+ title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
78
+ author={Kwon, Gihyun and Ye, Jong Chul},
79
+ booktitle={ICLR},
80
+ year={2023},
81
+ url={https://arxiv.org/abs/2209.15264}
82
+ }
83
+ ```
ffhq-256/README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: pytorch-image-translation-models
4
+ pipeline_tag: image-to-image
5
+ tags:
6
+ - image-to-image
7
+ - diffusion
8
+ - DiffuseIT
9
+ - FFHQ
10
+ - face
11
+ - identity-preservation
12
+ - text-guided
13
+ ---
14
+
15
+ # DiffuseIT: FFHQ 256
16
+
17
+ Face-focused diffusion model with identity preservation. Pre-trained on FFHQ 256×256.
18
+
19
+ **Source:** [cyclomon/DiffuseIT](https://github.com/cyclomon/DiffuseIT) — converted from `ffhq_10m.pt`
20
+
21
+ ## Model Description
22
+
23
+ - **Architecture**: Guided diffusion (OpenAI-style UNet, face-optimized)
24
+ - **Resolution**: 256×256
25
+ - **Task**: Face image translation with identity preservation (use `use_ffhq=True`)
26
+ - **Self-contained**: Includes `id_model/` (ArcFace IR-SE50) for identity loss
27
+
28
+ ## Usage
29
+
30
+ ```python
31
+ from examples.community.diffuseit import load_diffuseit_community_pipeline
32
+
33
+ pipe = load_diffuseit_community_pipeline(
34
+ "BiliSakura/DiffuseIT-ckpt/ffhq-256",
35
+ use_ffhq=True,
36
+ diffuseit_src_path="projects/DiffuseIT",
37
+ )
38
+ pipe.to("cuda")
39
+
40
+ out = pipe(
41
+ source_image=face_img,
42
+ prompt="Target description",
43
+ source="Source description",
44
+ use_range_restart=True,
45
+ output_type="pil",
46
+ )
47
+ ```
48
+
49
+ ## Citation
50
+
51
+ ```bibtex
52
+ @inproceedings{kwon2023diffuseit,
53
+ title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
54
+ author={Kwon, Gihyun and Ye, Jong Chul},
55
+ booktitle={ICLR},
56
+ year={2023},
57
+ url={https://arxiv.org/abs/2209.15264}
58
+ }
59
+ ```
ffhq-256/id_model/README.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # ArcFace IR-SE50
2
+
3
+ ArcFace ResNet-50 IR-SE for face identity preservation. Used by DiffuseIT when `use_ffhq=True`.
ffhq-256/id_model/config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "_class_name": "ArcFaceIR_SE50",
3
+ "_converted_from": "model_ir_se50.pth"
4
+ }
ffhq-256/id_model/model_ir_se50.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8b97cc250617df1074cf5defa4059d6b5c6187d3bbec7944800c200bbae9dfb
3
+ size 175329792
ffhq-256/unet/config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_size": 256,
3
+ "num_channels": 128,
4
+ "num_res_blocks": 1,
5
+ "channel_mult": [
6
+ 1,
7
+ 1,
8
+ 2,
9
+ 2,
10
+ 4,
11
+ 4
12
+ ],
13
+ "attention_resolutions": [
14
+ 16
15
+ ],
16
+ "out_channels": 6,
17
+ "learn_sigma": true,
18
+ "_class_name": "DiffuseITGuidedDiffusionUNet",
19
+ "_converted_from": "ffhq_10m.pt"
20
+ }
ffhq-256/unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccf128ed09090f855832fed124ad12b44079822451f190b31921a6507f36d459
3
+ size 374293968
imagenet256-uncond/README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: pytorch-image-translation-models
4
+ pipeline_tag: image-to-image
5
+ tags:
6
+ - image-to-image
7
+ - diffusion
8
+ - DiffuseIT
9
+ - ImageNet
10
+ - text-guided
11
+ - style-transfer
12
+ ---
13
+
14
+ # DiffuseIT: ImageNet 256 Unconditional
15
+
16
+ Unconditional diffusion model for general image translation. Pre-trained on ImageNet 256×256.
17
+
18
+ **Source:** [cyclomon/DiffuseIT](https://github.com/cyclomon/DiffuseIT) — converted from `256x256_diffusion_uncond.pt`
19
+
20
+ ## Model Description
21
+
22
+ - **Architecture**: Guided diffusion (OpenAI-style UNet)
23
+ - **Resolution**: 256×256
24
+ - **Task**: Text-guided and image-guided image translation
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ from examples.community.diffuseit import load_diffuseit_community_pipeline
30
+
31
+ pipe = load_diffuseit_community_pipeline(
32
+ "BiliSakura/DiffuseIT-ckpt/imagenet256-uncond",
33
+ diffuseit_src_path="projects/DiffuseIT",
34
+ )
35
+ pipe.to("cuda")
36
+
37
+ # Text-guided
38
+ out = pipe(
39
+ source_image=img,
40
+ prompt="Black Leopard",
41
+ source="Lion",
42
+ use_range_restart=True,
43
+ use_noise_aug_all=True,
44
+ output_type="pil",
45
+ )
46
+ ```
47
+
48
+ ## Citation
49
+
50
+ ```bibtex
51
+ @inproceedings{kwon2023diffuseit,
52
+ title={Diffusion-based Image Translation using Disentangled Style and Content Representation},
53
+ author={Kwon, Gihyun and Ye, Jong Chul},
54
+ booktitle={ICLR},
55
+ year={2023},
56
+ url={https://arxiv.org/abs/2209.15264}
57
+ }
58
+ ```
imagenet256-uncond/unet/config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_size": 256,
3
+ "num_channels": 256,
4
+ "num_res_blocks": 2,
5
+ "channel_mult": [
6
+ 1,
7
+ 1,
8
+ 2,
9
+ 2,
10
+ 4,
11
+ 4
12
+ ],
13
+ "attention_resolutions": [
14
+ 8,
15
+ 16,
16
+ 32
17
+ ],
18
+ "out_channels": 6,
19
+ "learn_sigma": true,
20
+ "_class_name": "DiffuseITGuidedDiffusionUNet",
21
+ "_converted_from": "256x256_diffusion_uncond.pt"
22
+ }
imagenet256-uncond/unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da7e1e247a9d1fd8e676f6471fc265f83c46e2926050e9a19e56593370d632fa
3
+ size 2211317416