File size: 4,343 Bytes

ce2298b
 
15adac9
ce2298b
15adac9
 
 
ce2298b
15adac9
 
 
ce2298b
 
15adac9
 
 
 
 
b6548e7
ce2298b
b6548e7
 
ce2298b
b6548e7
 
ce2298b
b6548e7
 
ce2298b
15adac9
ce2298b
b6548e7
 
 
 
 
 
91bed77
b6548e7
 
 
 
 
 
 
ce2298b
15adac9
a6dba8b
 
 
 
 
 
 
 
 
 
 
 
 
ce2298b
28b6916
 
 
 
 
 
 
 
ce2298b
15adac9
ce2298b
15adac9
c2c4a81
15adac9
ce2298b
15adac9
 
c2c4a81
 
15adac9
ce2298b
15adac9
ce2298b
 
15adac9
 
ce2298b
91bed77
 
 
 
 
15adac9
ce2298b
15adac9
ce2298b
15adac9

---
library_name: diffusers
license: apache-2.0
---
<!-- <p align="center">
  <img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.png?raw=true" style="width: 100%; max-width: 1100px;">
</p> -->

<p align="center">
  <img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.svg?raw=true" style="width: 40%; max-width: 1100px;">
</p>


## 🚀 Update News
- **2026-03-05**: Official release of KORMo-Diffusion.
- **2026-03-02**: Official release of KORMo-VL.
- **2025-10-13**: Official release of KORMo-10B-sft.
---
## 💡 About KORMo-VL-Diffusion

**KORMo-VL** is a vision-language model developed **from scratch by the KAIST MLP Lab (https://sites.google.com/view/aailab)**, built on top of **KORMo-10B**.
The system consists of two components:

* **Vision-Language Model (VLM)**
* **Image Generation Model**

The KORMo-VL-Diffusion model, designed for image generation, was trained from scratch with a high proportion of images reflecting Korean daily environments and culture. 
<span style="color:red">Unfortunately, due to limited GPU resources during the research process, we are sharing the intermediate results of the model at this stage.</span>

---

KORMo-VL은 KAIST MLP 연구실에서 **from scratch로 개발한 시각-언어 모델**로, KORMo-10B를 기반으로 (1) 시각-언어 모델과 (2) 이미지 생성 모델로 구성되어 있습니다.

이 중 **이미지 생성을 위한 KORMo-VL-Diffusion** 모델은 한국의 생활 환경과 문화를 반영하기 위해 국내 환경 이미지를 가능한 높은 비율로 사용하여 **from scratch부터 학습된 모델**입니다. 
<span style="color:red">다만 연구 진행 중 GPU 자원을 추가로 확보하지 못해 **현재는 중간 결과물을 공유하게 되었습니다.**</span>

* **LLM:** KORMo-VL
* **Model Structure:** Qwen-Image를 구조를 참조해 재개발함 (20B 정도의 Diffusion부분을 변형해 scratch부터 학습)
* **Languages:** Korean / English
* **Training Data:** Synthetic data + public datasets (e.g., AI Hub, details to be released)

향후 해당 모델을 충분히 학습할 수 있는 환경이 마련된다면 **완성된 모델로 발전시키는 것을 목표로 하고 있습니다.**
중간 결과물 위에서 추가 튜닝이나 연구를 진행하고 싶은 분들은 **자유롭게 활용해 보시기 바랍니다.**



## 📈 T2I Performance
### English Prompt
| Prompt | Generated Image |
| :--- | :--- |
| **Prompt:** Dense forest | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> |
| **Prompt:** Black pattern mug | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/black%20pattern%20mug%20cpup.webp" width="400"> |

### Korean Prompt
| Prompt | Generated Image |
| :--- | :--- |
| **Prompt:** 울창한 숲 | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> |
| **Prompt:** 검은 무늬의 머그컵 | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/%EA%B2%80%EC%9D%80%20%EB%AC%B4%EB%8A%AC%EC%9D%98%20%EB%A8%B8%EA%B7%B8%EC%BB%B5.webp" width="400"> |



## KORMo-VL-Diffusion Demo

`prompt: 아름다운 정원의 꽃들`

<video width="640" height="360" controls>
  <source src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/kormo_diffusion_assets/kormo_t2i.mp4" type="video/mp4">
</video>


## 📦 Installation

```bash
uv pip install transformers==4.57.1 pillow torchvision diffusers
```

---
## 🚀 Inference Example
```
github repo 활용 예정
```

---


## Contact
- KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`

## Contributor (https://sites.google.com/view/aailab)
- Junghun Yuk
- INho won
- HANGYEOL YOO
- Junmyeong Lee
- KyungTae Lim

## Citation

```text
@misc{KORMo,
  author = {Minjun Kim, Hyeonseok Lim, Hangyeol Yoo, Inho Won, Seungwoo Song, Minkyung Cho, Junghun Yuk, Changsu Choi, Dongjae Shin, Huije Lee, Hoyun Song, Alice Oh, and KyungTae Lim},
  title = {KORMo: Korean Open Reasoning Model for Everyone},
  year = {2025},
  publisher = {GitHub},
  journal = {Technical Report},
  paperLink = {\url{https://arxiv.org/abs/2510.09426}},
 },
}
```