| | --- |
| | library_name: diffusers |
| | license: apache-2.0 |
| | --- |
| | <!-- <p align="center"> |
| | <img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.png?raw=true" style="width: 100%; max-width: 1100px;"> |
| | </p> --> |
| |
|
| | <p align="center"> |
| | <img src="https://github.com/MLP-Lab/KORMo-tutorial/blob/main/tutorial/attachment/kormo_logo.svg?raw=true" style="width: 40%; max-width: 1100px;"> |
| | </p> |
| |
|
| |
|
| | ## ๐ Update News |
| | - **2026-03-05**: Official release of KORMo-Diffusion. |
| | - **2026-03-02**: Official release of KORMo-VL. |
| | - **2025-10-13**: Official release of KORMo-10B-sft. |
| | --- |
| | ## ๐ก About KORMo-VL-Diffusion |
| |
|
| | **KORMo-VL** is a vision-language model developed **from scratch by the KAIST MLP Lab (https://sites.google.com/view/aailab)**, built on top of **KORMo-10B**. |
| | The system consists of two components: |
| |
|
| | * **Vision-Language Model (VLM)** |
| | * **Image Generation Model** |
| |
|
| | The KORMo-VL-Diffusion model, designed for image generation, was trained from scratch with a high proportion of images reflecting Korean daily environments and culture. |
| | <span style="color:red">Unfortunately, due to limited GPU resources during the research process, we are sharing the intermediate results of the model at this stage.</span> |
| |
|
| | --- |
| |
|
| | KORMo-VL์ KAIST MLP ์ฐ๊ตฌ์ค์์ **from scratch๋ก ๊ฐ๋ฐํ ์๊ฐ-์ธ์ด ๋ชจ๋ธ**๋ก, KORMo-10B๋ฅผ ๊ธฐ๋ฐ์ผ๋ก (1) ์๊ฐ-์ธ์ด ๋ชจ๋ธ๊ณผ (2) ์ด๋ฏธ์ง ์์ฑ ๋ชจ๋ธ๋ก ๊ตฌ์ฑ๋์ด ์์ต๋๋ค. |
| |
|
| | ์ด ์ค **์ด๋ฏธ์ง ์์ฑ์ ์ํ KORMo-VL-Diffusion** ๋ชจ๋ธ์ ํ๊ตญ์ ์ํ ํ๊ฒฝ๊ณผ ๋ฌธํ๋ฅผ ๋ฐ์ํ๊ธฐ ์ํด ๊ตญ๋ด ํ๊ฒฝ ์ด๋ฏธ์ง๋ฅผ ๊ฐ๋ฅํ ๋์ ๋น์จ๋ก ์ฌ์ฉํ์ฌ **from scratch๋ถํฐ ํ์ต๋ ๋ชจ๋ธ**์
๋๋ค. |
| | <span style="color:red">๋ค๋ง ์ฐ๊ตฌ ์งํ ์ค GPU ์์์ ์ถ๊ฐ๋ก ํ๋ณดํ์ง ๋ชปํด **ํ์ฌ๋ ์ค๊ฐ ๊ฒฐ๊ณผ๋ฌผ์ ๊ณต์ ํ๊ฒ ๋์์ต๋๋ค.**</span> |
| |
|
| | * **LLM:** KORMo-VL |
| | * **Model Structure:** Qwen-Image๋ฅผ ๊ตฌ์กฐ๋ฅผ ์ฐธ์กฐํด ์ฌ๊ฐ๋ฐํจ (20B ์ ๋์ Diffusion๋ถ๋ถ์ ๋ณํํด scratch๋ถํฐ ํ์ต) |
| | * **Languages:** Korean / English |
| | * **Training Data:** Synthetic data + public datasets (e.g., AI Hub, details to be released) |
| |
|
| | ํฅํ ํด๋น ๋ชจ๋ธ์ ์ถฉ๋ถํ ํ์ตํ ์ ์๋ ํ๊ฒฝ์ด ๋ง๋ จ๋๋ค๋ฉด **์์ฑ๋ ๋ชจ๋ธ๋ก ๋ฐ์ ์ํค๋ ๊ฒ์ ๋ชฉํ๋ก ํ๊ณ ์์ต๋๋ค.** |
| | ์ค๊ฐ ๊ฒฐ๊ณผ๋ฌผ ์์์ ์ถ๊ฐ ํ๋์ด๋ ์ฐ๊ตฌ๋ฅผ ์งํํ๊ณ ์ถ์ ๋ถ๋ค์ **์์ ๋กญ๊ฒ ํ์ฉํด ๋ณด์๊ธฐ ๋ฐ๋๋๋ค.** |
| |
|
| |
|
| |
|
| | ## ๐ T2I Performance |
| | ### English Prompt |
| | | Prompt | Generated Image | |
| | | :--- | :--- | |
| | | **Prompt:** Dense forest | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> | |
| | | **Prompt:** Black pattern mug | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/black%20pattern%20mug%20cpup.webp" width="400"> | |
| |
|
| | ### Korean Prompt |
| | | Prompt | Generated Image | |
| | | :--- | :--- | |
| | | **Prompt:** ์ธ์ฐฝํ ์ฒ | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/Dense%20forest.webp" width="400"> | |
| | | **Prompt:** ๊ฒ์ ๋ฌด๋ฌ์ ๋จธ๊ทธ์ปต | <img src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/example_images/%EA%B2%80%EC%9D%80%20%EB%AC%B4%EB%8A%AC%EC%9D%98%20%EB%A8%B8%EA%B7%B8%EC%BB%B5.webp" width="400"> | |
| |
|
| |
|
| |
|
| | ## KORMo-VL-Diffusion Demo |
| |
|
| | `prompt: ์๋ฆ๋ค์ด ์ ์์ ๊ฝ๋ค` |
| |
|
| | <video width="640" height="360" controls> |
| | <source src="https://huggingface.co/KORMo-VL/KORMo-VL-Diffusion/resolve/main/kormo_diffusion_assets/kormo_t2i.mp4" type="video/mp4"> |
| | </video> |
| |
|
| |
|
| | ## ๐ฆ Installation |
| |
|
| | ```bash |
| | uv pip install transformers==4.57.1 pillow torchvision diffusers |
| | ``` |
| |
|
| | --- |
| | ## ๐ Inference Example |
| | ``` |
| | github repo ํ์ฉ ์์ |
| | ``` |
| |
|
| | --- |
| |
|
| |
|
| | ## Contact |
| | - KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr` |
| |
|
| | ## Contributor (https://sites.google.com/view/aailab) |
| | - Junghun Yuk |
| | - INho won |
| | - HANGYEOL YOO |
| | - Junmyeong Lee |
| | - KyungTae Lim |
| |
|
| | ## Citation |
| |
|
| | ```text |
| | @misc{KORMo, |
| | author = {Minjun Kim, Hyeonseok Lim, Hangyeol Yoo, Inho Won, Seungwoo Song, Minkyung Cho, Junghun Yuk, Changsu Choi, Dongjae Shin, Huije Lee, Hoyun Song, Alice Oh, and KyungTae Lim}, |
| | title = {KORMo: Korean Open Reasoning Model for Everyone}, |
| | year = {2025}, |
| | publisher = {GitHub}, |
| | journal = {Technical Report}, |
| | paperLink = {\url{https://arxiv.org/abs/2510.09426}}, |
| | }, |
| | } |
| | ``` |