nvvaulin commited on
Commit
158d0b3
·
verified ·
1 Parent(s): c98e46b

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -42,3 +42,13 @@ assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_14B.jpg filter=lfs diff=lfs merge=l
42
  assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_5B.jpg filter=lfs diff=lfs merge=lfs -text
43
  assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_A14B.jpg filter=lfs diff=lfs merge=lfs -text
44
  assets/vbench.png filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
42
  assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_5B.jpg filter=lfs diff=lfs merge=lfs -text
43
  assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_A14B.jpg filter=lfs diff=lfs merge=lfs -text
44
  assets/vbench.png filter=lfs diff=lfs merge=lfs -text
45
+ assets/generation_examples/images/4.jpg filter=lfs diff=lfs merge=lfs -text
46
+ assets/generation_examples/images/4_edit.png filter=lfs diff=lfs merge=lfs -text
47
+ assets/generation_examples/images/5.jpg filter=lfs diff=lfs merge=lfs -text
48
+ assets/generation_examples/images/5_edit.png filter=lfs diff=lfs merge=lfs -text
49
+ assets/generation_examples/images/6.jpg filter=lfs diff=lfs merge=lfs -text
50
+ assets/generation_examples/images/6_edit.png filter=lfs diff=lfs merge=lfs -text
51
+ assets/generation_examples/images/7.jpg filter=lfs diff=lfs merge=lfs -text
52
+ assets/generation_examples/images/7_edit.png filter=lfs diff=lfs merge=lfs -text
53
+ assets/sbs_edit.png filter=lfs diff=lfs merge=lfs -text
54
+ assets/sbs_image.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,149 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <div align="center">
6
+ <picture>
7
+ <img src="assets/KANDINSKY_LOGO_1_BLACK.png">
8
+ </picture>
9
+ </div>
10
+
11
+ <div align="center">
12
+ <a href="https://habr.com/ru/companies/sberbank/articles/951800/">Habr</a> |
13
+ <a href="https://ai-forever.github.io/Kandinsky-5/">Project Page</a> |
14
+ <a href="https://github.com/kandinskylab/kandinsky-5/blob/main/paper.pdf">Technical Report</a> |
15
+ <a href="https://github.com/ai-forever/Kandinsky-5">Original GitHub</a> |
16
+ <a href="https://huggingface.co/collections/kandinskylab/kandinsky-50-image-lite-diffusers">🤗 Diffusers</a>
17
+ </div>
18
+
19
+ -----
20
+
21
+ <h1>Kandinsky 5.0 I2I Lite SFT – Diffusers</h1>
22
+
23
+ Kandinsky 5.0 is a family of diffusion models for video and image generation.
24
+
25
+ Kandinsky 5.0 Image Lite is a lightweight text-to-image (I2I) generation model with 6B parameters.
26
+
27
+ The model introduces several key innovations:
28
+ - **Latent diffusion pipeline** with **Flow Matching** for improved training stability
29
+ - **Diffusion Transformer (DiT)** as the main generative backbone with cross-attention to text embeddings
30
+ - Dual text encoding using **Qwen2.5-VL** and **CLIP** for comprehensive text understanding
31
+ - **Flux VAE** for efficient image encoding and decoding
32
+
33
+ The original codebase can be found at [kandinskylab/Kandinsky-5](https://github.com/kandinskylab/Kandinsky-5).
34
+
35
+
36
+ ## Available Models
37
+
38
+ Kandinsky 5.0 Image Lite:
39
+ | model_id | Description | Use Cases |
40
+ |------------|-------------|-----------|
41
+ | **<a href="https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers">kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers</a>** | 6B supervised fine-tuned text-to-image model | Highest generation quality |
42
+ | **<a href="https://huggingface.co/kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers">kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers</a>** | 6B supervised fine-tuned image-to-image editing model | Highest generation quality |
43
+ | **<a href="https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-pretrain-Diffusers">kandinskylab/Kandinsky-5.0-T2I-Lite-pretrain-Diffusers</a>** | 6B base pretrained text-to-image model | Research and fine-tuning |
44
+ | **<a href="https://huggingface.co/kandinskylab/Kandinsky-5.0-I2I-Lite-pretrain-Diffusers">kandinskylab/Kandinsky-5.0-I2I-Lite-pretrain-Diffusers</a>** | 6B base pretrained image-to-image editing model | Research and fine-tuning |
45
+
46
+
47
+ ## Examples
48
+
49
+ <table border="0" style="width: 90%; text-align: center; margin-top: 20px;">
50
+ <tr>
51
+ <td style="width:45%; vertical-align:top;">
52
+ <img src="assets/generation_examples/images/7.jpg" style="width:48%; display:inline-block; vertical-align:top;">
53
+ <img src="assets/generation_examples/images/7_edit.png" style="width:48%; display:inline-block; vertical-align:top;">
54
+ <div style="margin-top:8px;">
55
+ Change the image style to realistic, draw the horse in the same pose, with a black mane and a giant snowball on its nose
56
+ </div>
57
+ </td>
58
+ <td style="width:10%;"></td>
59
+ <td style="width:45%; vertical-align:top;">
60
+ <img src="assets/generation_examples/images/6.jpg" style="width:48%; display:inline-block; vertical-align:top;">
61
+ <img src="assets/generation_examples/images/6_edit.png" style="width:48%; display:inline-block; vertical-align:top;">
62
+ <div style="margin-top:8px;">
63
+ Draw this elephant in the same hat inside a green tractor, driving across a yellow field.
64
+ </div>
65
+ </td>
66
+ </tr>
67
+ </table>
68
+
69
+ <table border="0" style="width: 90%; text-align: center; margin-top: 20px;">
70
+ <tr>
71
+ <td style="width:45%; vertical-align:top;">
72
+ <img src="assets/generation_examples/images/4.jpg" style="width:48%; display:inline-block; vertical-align:top;">
73
+ <img src="assets/generation_examples/images/4_edit.png" style="width:48%; display:inline-block; vertical-align:top;">
74
+ <div style="margin-top:8px;">
75
+ Поменяй чебурашку на крокодила Гену из мультфильма, позу и выражения лица оставь прежними
76
+ </div>
77
+ </td>
78
+ <td style="width:10%;"></td>
79
+ <td style="width:45%; vertical-align:top;">
80
+ <img src="assets/generation_examples/images/5.jpg" style="width:48%; display:inline-block; vertical-align:top;">
81
+ <img src="assets/generation_examples/images/5_edit.png" style="width:48%; display:inline-block; vertical-align:top;">
82
+ <div style="margin-top:8px;">
83
+ Change the image style to an oil painting; don't change the pose of the animals, but add pronounced oil painting elements - paint strokes
84
+ </div>
85
+ </td>
86
+ </tr>
87
+ </table>
88
+
89
+
90
+ ## Kandinsky5I2IPipeline Usage Example
91
+
92
+ ```python
93
+ import torch
94
+ from diffusers import Kandinsky5I2IPipeline
95
+ from diffusers.utils import load_image
96
+ # Load the pipeline
97
+ model_id = "kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers"
98
+ pipe = Kandinsky5I2IPipeline.from_pretrained(model_id)
99
+
100
+ _ = pipe.to(device='cuda',dtype=torch.bfloat16)
101
+ pipe.enable_model_cpu_offload() # <--- Enable CPU offloading for single GPU inference
102
+
103
+ # Edit the input image
104
+ image = load_image(
105
+ "https://huggingface.co/kandinsky-community/kandinsky-3/resolve/main/assets/title.jpg?download=true"
106
+ )
107
+
108
+ prompt = "Change the background from a winter night scene to a bright summer day. Place the character on a sandy beach with clear blue sky, soft sunlight, and gentle waves in the distance. Replace the winter clothing with a light short-sleeved T-shirt (in soft pastel colors) and casual shorts. Ensure the character’s fur reflects warm daylight instead of cold winter tones. Add small beach details such as seashells, footprints in the sand, and a few scattered beach toys nearby. Keep the oranges in the scene, but place them naturally on the sand."
109
+ negative_prompt = ""
110
+
111
+ output = pipe(
112
+ image=image,
113
+ prompt=prompt,
114
+ negative_prompt=negative_prompt,
115
+ guidance_scale=3.5,
116
+ ).image[0]
117
+ ```
118
+
119
+ ## Results
120
+
121
+ <table style="width:100%; text-align:center; margin-top:20px;">
122
+ <tr>
123
+ <td>
124
+ <img src="assets/sbs_image.png" width="100%">
125
+ </td>
126
+ <td>
127
+ <img src="assets/sbs_edit.png" width="100%">
128
+ </td>
129
+ </tr>
130
+ <tr>
131
+ <td style="font-size: 1.1em; font-weight: 500; padding-top: 6px;">
132
+ Side-by-side evaluation of T2I SFT on PartiPrompts with extended prompts
133
+ </td>
134
+ <td style="font-size: 1.1em; font-weight: 500; padding-top: 6px;">
135
+ Side-by-side evaluation of I2I SFT on the Flux Kontext benchmark with extended prompts
136
+ </td>
137
+ </tr>
138
+ </table>
139
+
140
+
141
+ ## Citation
142
+ ```bibtex
143
+ @misc{kandinsky2025,
144
+ author = {Alexander Belykh and Alexander Varlamov and Alexey Letunovskiy and Anastasia Aliaskina and Anastasia Maltseva and Anastasiia Kargapoltseva and Andrey Shutkin and Anna Averchenkova and Anna Dmitrienko and Bulat Akhmatov and Denis Dimitrov and Denis Koposov and Denis Parkhomenko and Dmitrii and Ilya Vasiliev and Ivan Kirillov and Julia Agafonova and Kirill Chernyshev and Kormilitsyn Semen and Lev Novitskiy and Maria Kovaleva and Mikhail Mamaev and Mikhailov and Nikita Kiselev and Nikita Osterov and Nikolai Gerasimenko and Nikolai Vaulin and Olga Kim and Olga Vdovchenko and Polina Gavrilova and Polina Mikhailova and Tatiana Nikulina and Viacheslav Vasilev and Vladimir Arkhipkin and Vladimir Korviakov and Vladimir Polovnikov and Yury Kolabushin},
145
+ title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation},
146
+ howpublished = {\url{https://github.com/kandinskylab/Kandinsky-5}},
147
+ year = 2025
148
+ }
149
+ ```
assets/KANDINSKY_LOGO_1_BLACK.png ADDED
assets/generation_examples/images/4.jpg ADDED

Git LFS Details

  • SHA256: 64a0bce53cf94b53310afeb83100dd5d2567aa115bf4ae177e100cb58f88b779
  • Pointer size: 131 Bytes
  • Size of remote file: 165 kB
assets/generation_examples/images/4_edit.png ADDED

Git LFS Details

  • SHA256: 83ed836abaeb2582f296d74bc3a61fc9127a7b02d0e5a1a33adcdf3cd7d4f5eb
  • Pointer size: 132 Bytes
  • Size of remote file: 1.34 MB
assets/generation_examples/images/5.jpg ADDED

Git LFS Details

  • SHA256: 9c7fec66fd00e0d4debd16007cad9b5d9486bc550895096be4d6f27432aa5d46
  • Pointer size: 131 Bytes
  • Size of remote file: 219 kB
assets/generation_examples/images/5_edit.png ADDED

Git LFS Details

  • SHA256: 2aed55dd2050f99ae45644185ead07f30667ddf06751d16ac55db910867b94e6
  • Pointer size: 132 Bytes
  • Size of remote file: 2.26 MB
assets/generation_examples/images/6.jpg ADDED

Git LFS Details

  • SHA256: 583ec4fd596327dec28cf81539dd7e156e36e509051dd4d2d32acd85c0a6e566
  • Pointer size: 131 Bytes
  • Size of remote file: 116 kB
assets/generation_examples/images/6_edit.png ADDED

Git LFS Details

  • SHA256: ac8efce01e067f6038d2328209be3e5209599765c9b635d925e01ec20f7648ac
  • Pointer size: 132 Bytes
  • Size of remote file: 1.13 MB
assets/generation_examples/images/7.jpg ADDED

Git LFS Details

  • SHA256: c2a19bcbdea09028b91c33704f87a040011d92741acbf18aaa665d1d359b516b
  • Pointer size: 131 Bytes
  • Size of remote file: 111 kB
assets/generation_examples/images/7_edit.png ADDED

Git LFS Details

  • SHA256: 05795dedbc3268ead9e665106d24c3f6c4d4469858247dd5de798a9cbedb044a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.42 MB
assets/sbs_edit.png ADDED

Git LFS Details

  • SHA256: 80098a22c2d8a33786efd780541eb6bb0624c54d58e10420445d5049d786e3d7
  • Pointer size: 131 Bytes
  • Size of remote file: 221 kB
assets/sbs_image.png ADDED

Git LFS Details

  • SHA256: ca5d57a61d6181768a4598fef7972406c5788025c34e1cb6a334d6a8e6a3c3d1
  • Pointer size: 131 Bytes
  • Size of remote file: 225 kB