nielsr HF Staff commited on
Commit
1a071af
·
verified ·
1 Parent(s): 240c327

Improve model card: update pipeline tag, add library, and detailed content

Browse files

This PR significantly enhances the model card for `Kwai-Kolors/CoTyle`.

Key improvements include:
- **Updated Metadata**:
- Corrected the `pipeline_tag` from `image-text-to-text` to `text-to-image`, accurately reflecting the model's functionality (generating images from text prompts and a style code).
- Added `library_name: diffusers`, as the model is built upon and compatible with the Diffusers library, enabling better integration and discoverability on the Hub.
- **Comprehensive Content**:
- Incorporated a detailed introduction, news, ToDo list, and the abstract directly from the paper's GitHub README.
- Included prominent links and badges to the paper, project page, Hugging Face demo, Hugging Face model, and the GitHub repository.
- Provided detailed "Quick Start" instructions, including environment setup, download steps, and command-line usage examples for both single and batch image generation, directly from the GitHub README.
- Added information about the Gradio app for local interactive inference.
- Included the full academic citation and acknowledgements.

This update makes the model card much more informative and user-friendly for anyone exploring the `CoTyle` model.

Files changed (1) hide show
  1. README.md +171 -3
README.md CHANGED
@@ -1,8 +1,176 @@
1
  ---
2
- license: mit
3
  base_model:
4
  - Kwai-Kolors/Kolors-CoTyle
5
- pipeline_tag: image-text-to-text
 
6
  tags:
7
  - art
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - Kwai-Kolors/Kolors-CoTyle
4
+ license: mit
5
+ pipeline_tag: text-to-image
6
  tags:
7
  - art
8
+ library_name: diffusers
9
+ ---
10
+
11
+ # 🎨 A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
12
+ <p align="center">
13
+ <a href="https://arxiv.org/abs/2511.10555"><img alt="Build" src="https://img.shields.io/badge/arXiv-Paper-da282a.svg"></a>
14
+ <a href="https://Kwai-Kolors.github.io/CoTyle/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-Homepage-yellow"></a>
15
+ <a href="https://github.com/Kwai-Kolors/CoTyle"><img alt="Build" src="https://img.shields.io/badge/GitHub-Code-f8f0f0.svg"></a>
16
+ <a href="https://huggingface.co/spaces/Kwai-Kolors/CoTyle"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-32CD32"></a>
17
+ <a href="https://huggingface.co/Kwai-Kolors/CoTyle"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-fd8b02"></a>
18
+ </p>
19
+
20
+ <p align="center">
21
+ <span style="color:#137cf3; font-family: Gill Sans">Huijie Liu</span><sup>1,2</sup>,
22
+ <span style="color:#137cf3; font-family: Gill Sans">Shuhao Cui</span><sup>2</sup>,
23
+ <span style="color:#137cf3; font-family: Gill Sans">Haoxiang Cao</span><sup>2,3</sup>,
24
+ <span style="color:#137cf3; font-family: Gill Sans">Shuai Ma</span><sup>1</sup>,
25
+ <span style="color:#137cf3; font-family: Gill Sans">Kai Wu</span><sup>2,†</sup>,
26
+ <span style="color:#137cf3; font-family: Gill Sans">Guoliang Kang</span><sup>1,†</sup>
27
+ <br>
28
+ <sup>1</sup><span style="font-size: 16px">Beihang University</span>,
29
+ <sup>2</sup><span style="font-size: 16px">Kolors Team, Kuaishou Technology</span>,
30
+ <sup>3</sup><span style="font-size: 16px">South China Normal University</span>
31
+ <br>
32
+ <sup>†</sup><span style="font-size: 16px">Co-Corresponding Author</span>
33
+ </p>
34
+
35
+ > This repository offers the official code of the paper *"A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space"*. We provide both an Open-Source Version (based on Qwen-Image) and a Commercial Version (based on Kolors). If you are a professional developer and interested in further developing the CoTyle Open-Source Version, please follow the tutorial below. The Commercial Version is coming soon.
36
+
37
+ <p align="center">
38
+ <img src="https://github.com/Kwai-Kolors/CoTyle/raw/main/assets/2-fig1_01.png" width=95% height=95%
39
+ class="center">
40
+ </p>
41
+
42
+
43
+ ## 🔥 News
44
+ - [11/18/2025] The [demo](https://huggingface.co/spaces/Kwai-Kolors/CoTyle) of CoTyle is released on Hugging Face.
45
+ - [11/18/2025] The [weights](https://huggingface.co/Kwai-Kolors/CoTyle) of CoTyle are released on Hugging Face.
46
+ - [11/18/2025] The [code](https://github.com/Kwai-Kolors/CoTyle) is released!
47
+ - [11/18/2025] The [homepage](https://Kwai-Kolors.github.io/CoTyle/) of CoTyle is released.
48
+ - [11/14/2025] The [paper](https://arxiv.org/abs/2511.10555) of CoTyle is released.
49
+
50
+ ## 📝 ToDo
51
+ - [x] Publish the paper on Arxiv.
52
+ - [x] Release the homepage of CoTyle.
53
+ - [x] Launch a free demo on Hugging Face Spaces of CoTyle.
54
+ - [x] Open source the code and model weights of CoTyle.
55
+ - [ ] Release the commercial version of CoTyle.
56
+
57
+ ## 📖 Abstract
58
+ Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image synthesis, but often struggle with style consistency, limited novelty, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap,
59
+ we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings are used to condition a text-to-image diffusion model (T2I-DM) for style-consistent generation. Subsequently, we train an autoregressive transformer on the quantized style codes to model their distribution, allowing the synthesis of novel style codes. During inference, a numerical code maps to a unique style sequence,
60
+ which guides the diffusion process to produce images in the corresponding style. Unlike existing methods, our approach offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.
61
+
62
+ ## ⚡️ Quick Start
63
+ ### 🔧 Requirements and Installation
64
+ Run the following command to install the requirements.
65
+ ```bash
66
+ git clone https://github.com/Kwai-Kolors/CoTyle
67
+ cd CoTyle
68
+ conda create -n cotyle python=3.10
69
+ conda activate cotyle
70
+ pip install torch==2.6.0 torchvision==0.21.0
71
+ pip install -e git+https://github.com/Lakonik/piFlow.git@b1ef16e5e305251bccdfeac2a0e3d0ef339b974a#egg=lakonlab --no-build-isolation
72
+ pip install -r requirements.txt
73
+ # After running, some dependency errors may appear (don’t meet lakonlab’s requirements).
74
+ # This is normal and can be ignored.
75
+ ```
76
+
77
+ ### ⏬ Download
78
+ Please download the checkpoints and put them to the `./pretrained_models` directory.
79
+ You can download them from [Hugging Face](https://huggingface.co/Kwai-Kolors/CoTyle/tree/main).
80
+ ```bash
81
+ git lfs install
82
+ git clone https://huggingface.co/Kwai-Kolors/CoTyle
83
+ mv CoTyle pretrained_models
84
+ ```
85
+
86
+ ### 🚄 Code-to-Style Generation
87
+ For a quick walkthrough of the inference pipeline, we recommend generating a single image (see Single-Sample Generation).
88
+ To intuitively experience the powerful capabilities of CoTyle, we recommend generating a batch of images (see Batch-Sample Generation), which by default produces 42 images (7 style codes × 6 prompts).
89
+
90
+
91
+ #### Batch-Samples Generation
92
+ Run the following command to generate a batch of images. By default, 7 rows and 6 columns of images will be generated, where all images in each row are produced using the same style code, and all images in each column are generated using the same prompt.
93
+ You can adjust the `--style_code` and the content in `./test_prompts.txt` to obtain the desired outputs.
94
+
95
+ This process may take considerable time.
96
+ Therefore, we provide an accelerated version based on [piFlow](https://github.com/Lakonik/piFlow), which requires only 4 denoising steps; however, this approach produces lower image quality.
97
+ Enable `--accelerate` to activate piFlow.
98
+ ```bash
99
+ python inference_batch.py --model_path ./pretrained_models \
100
+ --style_code 1234567 5201314 13415926 886 20010627 996007 2333 \
101
+ --prompt_file_path ./test_prompts.txt \
102
+ --output_path outputs \
103
+ --seed 1024 \
104
+ --accelerate
105
+ ```
106
+
107
+ If time permits, we strongly recommend executing the command below.
108
+ ```bash
109
+ python inference_batch.py --model_path ./pretrained_models \
110
+ --style_code 1234567 5201314 13415926 886 20010627 996007 2333 \
111
+ --prompt_file_path ./test_prompts.txt \
112
+ --output_path outputs \
113
+ --seed 1024 \
114
+ ```
115
+
116
+ After successful execution, you will obtain the following results:
117
+ <p align="center">
118
+ <img src="https://github.com/Kwai-Kolors/CoTyle/raw/main/assets/batch_example.png" width=95% height=95%
119
+ class="center">
120
+ </p>
121
+
122
+
123
+ #### Single-Sample Generation
124
+ Execute the following code for single-sample inference. You can generate desired results by adjusting the `--style_code` and `--prompt`.
125
+
126
+ ```bash
127
+ python inference.py --model_path ./pretrained_models \
128
+ --style_code 1234567 \
129
+ --prompt "A lovely crystal snake spirit, slender and nimble, wears an exquisite crystal crown atop its head. Its scales are translucent, shimmering like crystal, its eyes are bright and round, and its expression is lively. Its body coils naturally, its tail gracefully curved, its overall posture harmonious and beautiful." \
130
+ --output_path outputs \
131
+ --seed 1024
132
+ ```
133
+ Similarly, you can enable the `--accelerate` to speed up.
134
+
135
+ ## 📲 Gradio Apps
136
+ We provide Gradio apps for interactivate inference with the CoTyle.
137
+
138
+ Official apps are available on [HuggingFace Spaces](https://huggingface.co/spaces/Kwai-Kolors/CoTyle).
139
+
140
+
141
+ If you want to run it locally, please execute:
142
+ ```bash
143
+ python app.py
144
+ ```
145
+
146
+ <strong>Note</strong>: The Gradio apps use an accelerated version, which may result in a slight reduction in image generation quality.
147
+
148
+ <strong>Tips</strong>:
149
+ - Adjust the <strong>Number of Prompts</strong> slider to add or remove input rows.
150
+ - Type your own prompts directly in the text boxes .
151
+ - You can click any template below to quickly load preset style code and prompts.
152
+ <p align="center">
153
+ <img src="https://github.com/Kwai-Kolors/CoTyle/raw/main/assets/demo.png" width=95% height=95%
154
+ class="center">
155
+ </p>
156
+
157
+
158
+
159
+ ## 🌟 Citation
160
+ If CoTyle is helpful, please help to ⭐ the repo.
161
+
162
+ If you find this project useful for your research, please consider citing our paper:
163
+ ```bibtex
164
+ @misc{liu2025styleworthcodeunlocking,
165
+ title={A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space},
166
+ author={Huijie Liu and Shuhao Cui and Haoxiang Cao and Shuai Ma and Kai Wu and Guoliang Kang},
167
+ year={2025},
168
+ eprint={2511.10555},
169
+ archivePrefix={arXiv},
170
+ primaryClass={cs.CV},
171
+ url={https://arxiv.org/abs/2511.10555},
172
+ }
173
+ ```
174
+
175
+ ## 💌 Acknowledge
176
+ This code builds on [diffusers](https://huggingface.co/docs/diffusers/index), [Qwen-Image](https://github.com/QwenLM/Qwen-Image), [piFlow](https://github.com/Lakonik/piFlow) and [UniTok](https://github.com/FoundationVision/UniTok). Thanks for open-sourcing!