|
|
--- |
|
|
license: apache-2.0 |
|
|
private: false |
|
|
unlisted: true |
|
|
thumbnail: https://huggingface.co/mamadat/SHREK_ENM/resolve/main/SHREK_ENM.png |
|
|
tags: |
|
|
- diffusion |
|
|
- text-to-image |
|
|
--- |
|
|
 |
|
|
# SHREK_ENM Diffusion Model v0.1 |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **์๋ ์บ๋ฆญํฐ ์์ฑ์ ํนํ๋ diffusion model** |
|
|
- **์ ์ฒด ๊ฐ์ค์น ์ฌํ์ต, ๋ชจ๋ธ ์ํคํ
์ฒ๋ Flux Krea ์ฌ์ฉ** |
|
|
- **Developed:** Jihun.Hong |
|
|
- **Datasets:** Seungwoo.Kim, Jiyeon Lee |
|
|
- **Model type:** Text-to-Image Diffusion Model |
|
|
- **Base Model architecture:** Flux.1_Krea_dev |
|
|
- **Training approach:** Full weight fine-tuning (Complete Retraining) |
|
|
- **Release date:** September 19, 2025 |
|
|
- **Version:** v0.1 |
|
|
|
|
|
### Model Sources |
|
|
- **Demo[coming soon]:** End to End with Bytedance Waver 1.0, GIF Sample Below |
|
|
<div align="center"> |
|
|
<img src="./SHREK_ENM_Video.gif" alt="SHREK Animation"> |
|
|
</div> |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Results |
|
|
**[๋ชจ๋ธ 3๊ฐ ๋น๊ต]** ์ข์ธก๋ถํฐ 3๊ฐ์ง Epoch(2์ฐจํ์ต ๊ฐ๊ฐ 4์๊ฐ, 8์๊ฐ, 12์๊ฐ)์ ๋ฐ๋ฅธ ๋ณํ๋ฅผ ๋ณด์ฌ์ค๋๋ค. ํ
์คํธ ๊ณผ์ ์ผ๋ก 30 Epoch ํ์ต๋ง ์งํํ์ผ๋ฉฐ, ํ๋ก๋์
๋ ๋ฒจ์ ์ํด์๋ ์ฝ 40์๊ฐ์ ์ถ๊ฐ ํ์ต์ด ํ์ํฉ๋๋ค. |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./training_progress.png" alt="Training Progress and Epoch Comparison" width="100%"> |
|
|
<p><em>Epoch๋ณ ๋ชจ๋ธ ๋ฐ์ ๊ณผ์ , ์ํ ์ถ๋ ฅ ๋ฐ ์ฑ๋ฅ ์งํ</em></p> |
|
|
</div> |
|
|
|
|
|
### Training Data |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./Dataset.png" alt="SHREK Animation"> |
|
|
</div> |
|
|
|
|
|
- **๋ฐ์ดํฐ์
:** ์ปค์คํ
SHREK ๋ฐ์ดํฐ์
|
|
|
- **๋ฐ์ดํฐ์
ํฌ๊ธฐ:** augmentation ํฌํจ 2.4GB, 820์ฅ, 1024ร1024, Shrek ์ผ๊ตด ๊ธฐ์ค SAM2 Segment, Yolo CROP |
|
|
- **๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ:** Image augmentation, 1024ร1024 ๋ฆฌ์ฌ์ด์ง, face detection ๊ธฐ๋ฐ ํฌ๋กญํ(Yolo, SAM2 ๊ธฐ๋ฐ) |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./Train.png" alt="SHREK Animation"> |
|
|
</div> |
|
|
|
|
|
- **ํ๋์จ์ด:** NVIDIA L40S GPU |
|
|
- **ํ์ต ์๊ฐ:** PR: 30์๊ฐ 02๋ถ, SC: 12์๊ฐ 11๋ถ, Total: 42์๊ฐ 13๋ถ |
|
|
- **Batch size:** 7 |
|
|
- **Learning rate:** 2e-06, 4e-06, 6e-06 |
|
|
- **Training steps:** 256 ร 40 / 7 = 1480 ์คํ
|
|
|
|
|
|
## Usage |
|
|
|
|
|
### ๋ค์ํ UI ์ ํ๋ฆฌ์ผ์ด์
ํธํ |
|
|
์ด ๋ชจ๋ธ์ **ComfyUI, SwarmUI, Forge, Automatic1111 ๋ฑ** AI UI ์ ํ๋ฆฌ์ผ์ด์
์์ ์ํํ๊ฒ ์๋ํฉ๋๋ค. |
|
|
|
|
|
**ComfyUI** |
|
|
<div align="center"> |
|
|
<img src="./ComfyUI_Workflow.png" alt="SHREK Animation"> |
|
|
</div> |
|
|
|
|
|
**SwarmUI** |
|
|
|
|
|
<div align="center"> |
|
|
<img src="./SwarmUI.png" alt="SHREK Animation"> |
|
|
</div> |
|
|
|
|
|
#### ์ค์น ๋จ๊ณ |
|
|
1. **๋ชจ๋ธ ํ์ผ ๋ค์ด๋ก๋:** |
|
|
- `SHREK_ENM.safetensors` - ๋ฉ์ธ ๋ชจ๋ธ ํ์ผ |
|
|
- `ae.safetensors` - VAE ๋ชจ๋ธ |
|
|
- `clip_l.safetensors` - CLIP text encoder |
|
|
- `t5xxl_enconly.safetensors` - T5 text encoder |
|
|
|
|
|
2. **์ฌ๋ฐ๋ฅธ ๋๋ ํ ๋ฆฌ์ ํ์ผ ๋ฐฐ์น** |
|
|
|
|
|
3. **ComfyUI์์ ๋ก๋:** |
|
|
- ๊ฐ ๊ตฌ์ฑ ์์์ ์ ํฉํ loader node ์ฌ์ฉ |
|
|
- workflow์ ๋ฐ๋ผ node ์ฐ๊ฒฐ |
|
|
- "Load Diffusion Model" node๋ฅผ ์ฌ์ฉํ์ฌ `SHREK_ENM.safetensors` ๋ก๋ |
|
|
- ํด๋น loader node๋ฅผ ์ฌ์ฉํ์ฌ text encoder์ VAE ๋ก๋ |
|
|
|
|
|
#### ๊ถ์ฅ ์ค์ |
|
|
- **CFG Scale:** 1.0 (์ด ๊ฐ์ ์ ์งํ๋ ๊ฒ์ ๊ฐ๋ ฅํ ๊ถ์ฅ) |
|
|
- **Sampling Steps:** 35-45 |
|
|
- **Sampler:** iPNDM ๋๋ Euler a |