Improve model card: update pipeline tag, add library, and detailed content
Browse filesThis PR significantly enhances the model card for `Kwai-Kolors/CoTyle`.
Key improvements include:
- **Updated Metadata**:
- Corrected the `pipeline_tag` from `image-text-to-text` to `text-to-image`, accurately reflecting the model's functionality (generating images from text prompts and a style code).
- Added `library_name: diffusers`, as the model is built upon and compatible with the Diffusers library, enabling better integration and discoverability on the Hub.
- **Comprehensive Content**:
- Incorporated a detailed introduction, news, ToDo list, and the abstract directly from the paper's GitHub README.
- Included prominent links and badges to the paper, project page, Hugging Face demo, Hugging Face model, and the GitHub repository.
- Provided detailed "Quick Start" instructions, including environment setup, download steps, and command-line usage examples for both single and batch image generation, directly from the GitHub README.
- Added information about the Gradio app for local interactive inference.
- Included the full academic citation and acknowledgements.
This update makes the model card much more informative and user-friendly for anyone exploring the `CoTyle` model.
|
@@ -1,8 +1,176 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
base_model:
|
| 4 |
- Kwai-Kolors/Kolors-CoTyle
|
| 5 |
-
|
|
|
|
| 6 |
tags:
|
| 7 |
- art
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Kwai-Kolors/Kolors-CoTyle
|
| 4 |
+
license: mit
|
| 5 |
+
pipeline_tag: text-to-image
|
| 6 |
tags:
|
| 7 |
- art
|
| 8 |
+
library_name: diffusers
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# 🎨 A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
|
| 12 |
+
<p align="center">
|
| 13 |
+
<a href="https://arxiv.org/abs/2511.10555"><img alt="Build" src="https://img.shields.io/badge/arXiv-Paper-da282a.svg"></a>
|
| 14 |
+
<a href="https://Kwai-Kolors.github.io/CoTyle/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-Homepage-yellow"></a>
|
| 15 |
+
<a href="https://github.com/Kwai-Kolors/CoTyle"><img alt="Build" src="https://img.shields.io/badge/GitHub-Code-f8f0f0.svg"></a>
|
| 16 |
+
<a href="https://huggingface.co/spaces/Kwai-Kolors/CoTyle"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-32CD32"></a>
|
| 17 |
+
<a href="https://huggingface.co/Kwai-Kolors/CoTyle"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-fd8b02"></a>
|
| 18 |
+
</p>
|
| 19 |
+
|
| 20 |
+
<p align="center">
|
| 21 |
+
<span style="color:#137cf3; font-family: Gill Sans">Huijie Liu</span><sup>1,2</sup>,
|
| 22 |
+
<span style="color:#137cf3; font-family: Gill Sans">Shuhao Cui</span><sup>2</sup>,
|
| 23 |
+
<span style="color:#137cf3; font-family: Gill Sans">Haoxiang Cao</span><sup>2,3</sup>,
|
| 24 |
+
<span style="color:#137cf3; font-family: Gill Sans">Shuai Ma</span><sup>1</sup>,
|
| 25 |
+
<span style="color:#137cf3; font-family: Gill Sans">Kai Wu</span><sup>2,†</sup>,
|
| 26 |
+
<span style="color:#137cf3; font-family: Gill Sans">Guoliang Kang</span><sup>1,†</sup>
|
| 27 |
+
<br>
|
| 28 |
+
<sup>1</sup><span style="font-size: 16px">Beihang University</span>,
|
| 29 |
+
<sup>2</sup><span style="font-size: 16px">Kolors Team, Kuaishou Technology</span>,
|
| 30 |
+
<sup>3</sup><span style="font-size: 16px">South China Normal University</span>
|
| 31 |
+
<br>
|
| 32 |
+
<sup>†</sup><span style="font-size: 16px">Co-Corresponding Author</span>
|
| 33 |
+
</p>
|
| 34 |
+
|
| 35 |
+
> This repository offers the official code of the paper *"A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space"*. We provide both an Open-Source Version (based on Qwen-Image) and a Commercial Version (based on Kolors). If you are a professional developer and interested in further developing the CoTyle Open-Source Version, please follow the tutorial below. The Commercial Version is coming soon.
|
| 36 |
+
|
| 37 |
+
<p align="center">
|
| 38 |
+
<img src="https://github.com/Kwai-Kolors/CoTyle/raw/main/assets/2-fig1_01.png" width=95% height=95%
|
| 39 |
+
class="center">
|
| 40 |
+
</p>
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
## 🔥 News
|
| 44 |
+
- [11/18/2025] The [demo](https://huggingface.co/spaces/Kwai-Kolors/CoTyle) of CoTyle is released on Hugging Face.
|
| 45 |
+
- [11/18/2025] The [weights](https://huggingface.co/Kwai-Kolors/CoTyle) of CoTyle are released on Hugging Face.
|
| 46 |
+
- [11/18/2025] The [code](https://github.com/Kwai-Kolors/CoTyle) is released!
|
| 47 |
+
- [11/18/2025] The [homepage](https://Kwai-Kolors.github.io/CoTyle/) of CoTyle is released.
|
| 48 |
+
- [11/14/2025] The [paper](https://arxiv.org/abs/2511.10555) of CoTyle is released.
|
| 49 |
+
|
| 50 |
+
## 📝 ToDo
|
| 51 |
+
- [x] Publish the paper on Arxiv.
|
| 52 |
+
- [x] Release the homepage of CoTyle.
|
| 53 |
+
- [x] Launch a free demo on Hugging Face Spaces of CoTyle.
|
| 54 |
+
- [x] Open source the code and model weights of CoTyle.
|
| 55 |
+
- [ ] Release the commercial version of CoTyle.
|
| 56 |
+
|
| 57 |
+
## 📖 Abstract
|
| 58 |
+
Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image synthesis, but often struggle with style consistency, limited novelty, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap,
|
| 59 |
+
we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings are used to condition a text-to-image diffusion model (T2I-DM) for style-consistent generation. Subsequently, we train an autoregressive transformer on the quantized style codes to model their distribution, allowing the synthesis of novel style codes. During inference, a numerical code maps to a unique style sequence,
|
| 60 |
+
which guides the diffusion process to produce images in the corresponding style. Unlike existing methods, our approach offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.
|
| 61 |
+
|
| 62 |
+
## ⚡️ Quick Start
|
| 63 |
+
### 🔧 Requirements and Installation
|
| 64 |
+
Run the following command to install the requirements.
|
| 65 |
+
```bash
|
| 66 |
+
git clone https://github.com/Kwai-Kolors/CoTyle
|
| 67 |
+
cd CoTyle
|
| 68 |
+
conda create -n cotyle python=3.10
|
| 69 |
+
conda activate cotyle
|
| 70 |
+
pip install torch==2.6.0 torchvision==0.21.0
|
| 71 |
+
pip install -e git+https://github.com/Lakonik/piFlow.git@b1ef16e5e305251bccdfeac2a0e3d0ef339b974a#egg=lakonlab --no-build-isolation
|
| 72 |
+
pip install -r requirements.txt
|
| 73 |
+
# After running, some dependency errors may appear (don’t meet lakonlab’s requirements).
|
| 74 |
+
# This is normal and can be ignored.
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
### ⏬ Download
|
| 78 |
+
Please download the checkpoints and put them to the `./pretrained_models` directory.
|
| 79 |
+
You can download them from [Hugging Face](https://huggingface.co/Kwai-Kolors/CoTyle/tree/main).
|
| 80 |
+
```bash
|
| 81 |
+
git lfs install
|
| 82 |
+
git clone https://huggingface.co/Kwai-Kolors/CoTyle
|
| 83 |
+
mv CoTyle pretrained_models
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
### 🚄 Code-to-Style Generation
|
| 87 |
+
For a quick walkthrough of the inference pipeline, we recommend generating a single image (see Single-Sample Generation).
|
| 88 |
+
To intuitively experience the powerful capabilities of CoTyle, we recommend generating a batch of images (see Batch-Sample Generation), which by default produces 42 images (7 style codes × 6 prompts).
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
#### Batch-Samples Generation
|
| 92 |
+
Run the following command to generate a batch of images. By default, 7 rows and 6 columns of images will be generated, where all images in each row are produced using the same style code, and all images in each column are generated using the same prompt.
|
| 93 |
+
You can adjust the `--style_code` and the content in `./test_prompts.txt` to obtain the desired outputs.
|
| 94 |
+
|
| 95 |
+
This process may take considerable time.
|
| 96 |
+
Therefore, we provide an accelerated version based on [piFlow](https://github.com/Lakonik/piFlow), which requires only 4 denoising steps; however, this approach produces lower image quality.
|
| 97 |
+
Enable `--accelerate` to activate piFlow.
|
| 98 |
+
```bash
|
| 99 |
+
python inference_batch.py --model_path ./pretrained_models \
|
| 100 |
+
--style_code 1234567 5201314 13415926 886 20010627 996007 2333 \
|
| 101 |
+
--prompt_file_path ./test_prompts.txt \
|
| 102 |
+
--output_path outputs \
|
| 103 |
+
--seed 1024 \
|
| 104 |
+
--accelerate
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
If time permits, we strongly recommend executing the command below.
|
| 108 |
+
```bash
|
| 109 |
+
python inference_batch.py --model_path ./pretrained_models \
|
| 110 |
+
--style_code 1234567 5201314 13415926 886 20010627 996007 2333 \
|
| 111 |
+
--prompt_file_path ./test_prompts.txt \
|
| 112 |
+
--output_path outputs \
|
| 113 |
+
--seed 1024 \
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
After successful execution, you will obtain the following results:
|
| 117 |
+
<p align="center">
|
| 118 |
+
<img src="https://github.com/Kwai-Kolors/CoTyle/raw/main/assets/batch_example.png" width=95% height=95%
|
| 119 |
+
class="center">
|
| 120 |
+
</p>
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
#### Single-Sample Generation
|
| 124 |
+
Execute the following code for single-sample inference. You can generate desired results by adjusting the `--style_code` and `--prompt`.
|
| 125 |
+
|
| 126 |
+
```bash
|
| 127 |
+
python inference.py --model_path ./pretrained_models \
|
| 128 |
+
--style_code 1234567 \
|
| 129 |
+
--prompt "A lovely crystal snake spirit, slender and nimble, wears an exquisite crystal crown atop its head. Its scales are translucent, shimmering like crystal, its eyes are bright and round, and its expression is lively. Its body coils naturally, its tail gracefully curved, its overall posture harmonious and beautiful." \
|
| 130 |
+
--output_path outputs \
|
| 131 |
+
--seed 1024
|
| 132 |
+
```
|
| 133 |
+
Similarly, you can enable the `--accelerate` to speed up.
|
| 134 |
+
|
| 135 |
+
## 📲 Gradio Apps
|
| 136 |
+
We provide Gradio apps for interactivate inference with the CoTyle.
|
| 137 |
+
|
| 138 |
+
Official apps are available on [HuggingFace Spaces](https://huggingface.co/spaces/Kwai-Kolors/CoTyle).
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
If you want to run it locally, please execute:
|
| 142 |
+
```bash
|
| 143 |
+
python app.py
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
<strong>Note</strong>: The Gradio apps use an accelerated version, which may result in a slight reduction in image generation quality.
|
| 147 |
+
|
| 148 |
+
<strong>Tips</strong>:
|
| 149 |
+
- Adjust the <strong>Number of Prompts</strong> slider to add or remove input rows.
|
| 150 |
+
- Type your own prompts directly in the text boxes .
|
| 151 |
+
- You can click any template below to quickly load preset style code and prompts.
|
| 152 |
+
<p align="center">
|
| 153 |
+
<img src="https://github.com/Kwai-Kolors/CoTyle/raw/main/assets/demo.png" width=95% height=95%
|
| 154 |
+
class="center">
|
| 155 |
+
</p>
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
## 🌟 Citation
|
| 160 |
+
If CoTyle is helpful, please help to ⭐ the repo.
|
| 161 |
+
|
| 162 |
+
If you find this project useful for your research, please consider citing our paper:
|
| 163 |
+
```bibtex
|
| 164 |
+
@misc{liu2025styleworthcodeunlocking,
|
| 165 |
+
title={A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space},
|
| 166 |
+
author={Huijie Liu and Shuhao Cui and Haoxiang Cao and Shuai Ma and Kai Wu and Guoliang Kang},
|
| 167 |
+
year={2025},
|
| 168 |
+
eprint={2511.10555},
|
| 169 |
+
archivePrefix={arXiv},
|
| 170 |
+
primaryClass={cs.CV},
|
| 171 |
+
url={https://arxiv.org/abs/2511.10555},
|
| 172 |
+
}
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
## 💌 Acknowledge
|
| 176 |
+
This code builds on [diffusers](https://huggingface.co/docs/diffusers/index), [Qwen-Image](https://github.com/QwenLM/Qwen-Image), [piFlow](https://github.com/Lakonik/piFlow) and [UniTok](https://github.com/FoundationVision/UniTok). Thanks for open-sourcing!
|