KORMo-VL-Diffusion / README.md
JungHun's picture
Update README.md
a6dba8b verified
---
library_name: diffusers
license: apache-2.0
---
<!-- <p align="center">
<img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.png?raw=true" style="width: 100%; max-width: 1100px;">
</p> -->
<p align="center">
<img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.svg?raw=true" style="width: 40%; max-width: 1100px;">
</p>
## ๐Ÿš€ Update News
- **2026-03-05**: Official release of KORMo-Diffusion.
- **2026-03-02**: Official release of KORMo-VL.
- **2025-10-13**: Official release of KORMo-10B-sft.
---
## ๐Ÿ’ก About KORMo-VL-Diffusion
**KORMo-VL** is a vision-language model developed **from scratch by the KAIST MLP Lab (https://sites.google.com/view/aailab)**, built on top of **KORMo-10B**.
The system consists of two components:
* **Vision-Language Model (VLM)**
* **Image Generation Model**
The KORMo-VL-Diffusion model, designed for image generation, was trained from scratch with a high proportion of images reflecting Korean daily environments and culture.
<span style="color:red">Unfortunately, due to limited GPU resources during the research process, we are sharing the intermediate results of the model at this stage.</span>
---
KORMo-VL์€ KAIST MLP ์—ฐ๊ตฌ์‹ค์—์„œ **from scratch๋กœ ๊ฐœ๋ฐœํ•œ ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ**๋กœ, KORMo-10B๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ (1) ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ๊ณผ (2) ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ค‘ **์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•œ KORMo-VL-Diffusion** ๋ชจ๋ธ์€ ํ•œ๊ตญ์˜ ์ƒํ™œ ํ™˜๊ฒฝ๊ณผ ๋ฌธํ™”๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ตญ๋‚ด ํ™˜๊ฒฝ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€๋Šฅํ•œ ๋†’์€ ๋น„์œจ๋กœ ์‚ฌ์šฉํ•˜์—ฌ **from scratch๋ถ€ํ„ฐ ํ•™์Šต๋œ ๋ชจ๋ธ**์ž…๋‹ˆ๋‹ค.
<span style="color:red">๋‹ค๋งŒ ์—ฐ๊ตฌ ์ง„ํ–‰ ์ค‘ GPU ์ž์›์„ ์ถ”๊ฐ€๋กœ ํ™•๋ณดํ•˜์ง€ ๋ชปํ•ด **ํ˜„์žฌ๋Š” ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ์„ ๊ณต์œ ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.**</span>
* **LLM:** KORMo-VL
* **Model Structure:** Qwen-Image๋ฅผ ๊ตฌ์กฐ๋ฅผ ์ฐธ์กฐํ•ด ์žฌ๊ฐœ๋ฐœํ•จ (20B ์ •๋„์˜ Diffusion๋ถ€๋ถ„์„ ๋ณ€ํ˜•ํ•ด scratch๋ถ€ํ„ฐ ํ•™์Šต)
* **Languages:** Korean / English
* **Training Data:** Synthetic data + public datasets (e.g., AI Hub, details to be released)
ํ–ฅํ›„ ํ•ด๋‹น ๋ชจ๋ธ์„ ์ถฉ๋ถ„ํžˆ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์ด ๋งˆ๋ จ๋œ๋‹ค๋ฉด **์™„์„ฑ๋œ ๋ชจ๋ธ๋กœ ๋ฐœ์ „์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.**
์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ ์œ„์—์„œ ์ถ”๊ฐ€ ํŠœ๋‹์ด๋‚˜ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์‹ถ์€ ๋ถ„๋“ค์€ **์ž์œ ๋กญ๊ฒŒ ํ™œ์šฉํ•ด ๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.**
## ๐Ÿ“ˆ T2I Performance
### English Prompt
| Prompt | Generated Image |
| :--- | :--- |
| **Prompt:** Dense forest | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> |
| **Prompt:** Black pattern mug | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/black%20pattern%20mug%20cpup.webp" width="400"> |
### Korean Prompt
| Prompt | Generated Image |
| :--- | :--- |
| **Prompt:** ์šธ์ฐฝํ•œ ์ˆฒ | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> |
| **Prompt:** ๊ฒ€์€ ๋ฌด๋Šฌ์˜ ๋จธ๊ทธ์ปต | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/%EA%B2%80%EC%9D%80%20%EB%AC%B4%EB%8A%AC%EC%9D%98%20%EB%A8%B8%EA%B7%B8%EC%BB%B5.webp" width="400"> |
## KORMo-VL-Diffusion Demo
`prompt: ์•„๋ฆ„๋‹ค์šด ์ •์›์˜ ๊ฝƒ๋“ค`
<video width="640" height="360" controls>
<source src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/kormo_diffusion_assets/kormo_t2i.mp4" type="video/mp4">
</video>
## ๐Ÿ“ฆ Installation
```bash
uv pip install transformers==4.57.1 pillow torchvision diffusers
```
---
## ๐Ÿš€ Inference Example
```
github repo ํ™œ์šฉ ์˜ˆ์ •
```
---
## Contact
- KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`
## Contributor (https://sites.google.com/view/aailab)
- Junghun Yuk
- INho won
- HANGYEOL YOO
- Junmyeong Lee
- KyungTae Lim
## Citation
```text
@misc{KORMo,
author = {Minjun Kim, Hyeonseok Lim, Hangyeol Yoo, Inho Won, Seungwoo Song, Minkyung Cho, Junghun Yuk, Changsu Choi, Dongjae Shin, Huije Lee, Hoyun Song, Alice Oh, and KyungTae Lim},
title = {KORMo: Korean Open Reasoning Model for Everyone},
year = {2025},
publisher = {GitHub},
journal = {Technical Report},
paperLink = {\url{https://arxiv.org/abs/2510.09426}},
},
}
```