File size: 4,343 Bytes
ce2298b
 
15adac9
ce2298b
15adac9
 
 
ce2298b
15adac9
 
 
ce2298b
 
15adac9
 
 
 
 
b6548e7
ce2298b
b6548e7
 
ce2298b
b6548e7
 
ce2298b
b6548e7
 
ce2298b
15adac9
ce2298b
b6548e7
 
 
 
 
 
91bed77
b6548e7
 
 
 
 
 
 
ce2298b
15adac9
a6dba8b
 
 
 
 
 
 
 
 
 
 
 
 
ce2298b
28b6916
 
 
 
 
 
 
 
ce2298b
15adac9
ce2298b
15adac9
c2c4a81
15adac9
ce2298b
15adac9
 
c2c4a81
 
15adac9
ce2298b
15adac9
ce2298b
 
15adac9
 
ce2298b
91bed77
 
 
 
 
15adac9
ce2298b
15adac9
ce2298b
15adac9
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
library_name: diffusers
license: apache-2.0
---
<!-- <p align="center">
  <img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.png?raw=true" style="width: 100%; max-width: 1100px;">
</p> -->

<p align="center">
  <img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.svg?raw=true" style="width: 40%; max-width: 1100px;">
</p>


## ๐Ÿš€ Update News
- **2026-03-05**: Official release of KORMo-Diffusion.
- **2026-03-02**: Official release of KORMo-VL.
- **2025-10-13**: Official release of KORMo-10B-sft.
---
## ๐Ÿ’ก About KORMo-VL-Diffusion

**KORMo-VL** is a vision-language model developed **from scratch by the KAIST MLP Lab (https://sites.google.com/view/aailab)**, built on top of **KORMo-10B**.
The system consists of two components:

* **Vision-Language Model (VLM)**
* **Image Generation Model**

The KORMo-VL-Diffusion model, designed for image generation, was trained from scratch with a high proportion of images reflecting Korean daily environments and culture. 
<span style="color:red">Unfortunately, due to limited GPU resources during the research process, we are sharing the intermediate results of the model at this stage.</span>

---

KORMo-VL์€ KAIST MLP ์—ฐ๊ตฌ์‹ค์—์„œ **from scratch๋กœ ๊ฐœ๋ฐœํ•œ ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ**๋กœ, KORMo-10B๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ (1) ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ๊ณผ (2) ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์ค‘ **์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•œ KORMo-VL-Diffusion** ๋ชจ๋ธ์€ ํ•œ๊ตญ์˜ ์ƒํ™œ ํ™˜๊ฒฝ๊ณผ ๋ฌธํ™”๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ตญ๋‚ด ํ™˜๊ฒฝ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€๋Šฅํ•œ ๋†’์€ ๋น„์œจ๋กœ ์‚ฌ์šฉํ•˜์—ฌ **from scratch๋ถ€ํ„ฐ ํ•™์Šต๋œ ๋ชจ๋ธ**์ž…๋‹ˆ๋‹ค. 
<span style="color:red">๋‹ค๋งŒ ์—ฐ๊ตฌ ์ง„ํ–‰ ์ค‘ GPU ์ž์›์„ ์ถ”๊ฐ€๋กœ ํ™•๋ณดํ•˜์ง€ ๋ชปํ•ด **ํ˜„์žฌ๋Š” ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ์„ ๊ณต์œ ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.**</span>

* **LLM:** KORMo-VL
* **Model Structure:** Qwen-Image๋ฅผ ๊ตฌ์กฐ๋ฅผ ์ฐธ์กฐํ•ด ์žฌ๊ฐœ๋ฐœํ•จ (20B ์ •๋„์˜ Diffusion๋ถ€๋ถ„์„ ๋ณ€ํ˜•ํ•ด scratch๋ถ€ํ„ฐ ํ•™์Šต)
* **Languages:** Korean / English
* **Training Data:** Synthetic data + public datasets (e.g., AI Hub, details to be released)

ํ–ฅํ›„ ํ•ด๋‹น ๋ชจ๋ธ์„ ์ถฉ๋ถ„ํžˆ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์ด ๋งˆ๋ จ๋œ๋‹ค๋ฉด **์™„์„ฑ๋œ ๋ชจ๋ธ๋กœ ๋ฐœ์ „์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.**
์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ ์œ„์—์„œ ์ถ”๊ฐ€ ํŠœ๋‹์ด๋‚˜ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์‹ถ์€ ๋ถ„๋“ค์€ **์ž์œ ๋กญ๊ฒŒ ํ™œ์šฉํ•ด ๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.**



## ๐Ÿ“ˆ T2I Performance
### English Prompt
| Prompt | Generated Image |
| :--- | :--- |
| **Prompt:** Dense forest | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> |
| **Prompt:** Black pattern mug | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/black%20pattern%20mug%20cpup.webp" width="400"> |

### Korean Prompt
| Prompt | Generated Image |
| :--- | :--- |
| **Prompt:** ์šธ์ฐฝํ•œ ์ˆฒ | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> |
| **Prompt:** ๊ฒ€์€ ๋ฌด๋Šฌ์˜ ๋จธ๊ทธ์ปต | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/%EA%B2%80%EC%9D%80%20%EB%AC%B4%EB%8A%AC%EC%9D%98%20%EB%A8%B8%EA%B7%B8%EC%BB%B5.webp" width="400"> |



## KORMo-VL-Diffusion Demo

`prompt: ์•„๋ฆ„๋‹ค์šด ์ •์›์˜ ๊ฝƒ๋“ค`

<video width="640" height="360" controls>
  <source src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/kormo_diffusion_assets/kormo_t2i.mp4" type="video/mp4">
</video>


## ๐Ÿ“ฆ Installation

```bash
uv pip install transformers==4.57.1 pillow torchvision diffusers
```

---
## ๐Ÿš€ Inference Example
```
github repo ํ™œ์šฉ ์˜ˆ์ •
```

---


## Contact
- KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`

## Contributor (https://sites.google.com/view/aailab)
- Junghun Yuk
- INho won
- HANGYEOL YOO
- Junmyeong Lee
- KyungTae Lim

## Citation

```text
@misc{KORMo,
  author = {Minjun Kim, Hyeonseok Lim, Hangyeol Yoo, Inho Won, Seungwoo Song, Minkyung Cho, Junghun Yuk, Changsu Choi, Dongjae Shin, Huije Lee, Hoyun Song, Alice Oh, and KyungTae Lim},
  title = {KORMo: Korean Open Reasoning Model for Everyone},
  year = {2025},
  publisher = {GitHub},
  journal = {Technical Report},
  paperLink = {\url{https://arxiv.org/abs/2510.09426}},
 },
}
```