Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)
HyperCLOVA X SEED 8B Omni — Visual Generation (VG)
HyperCLOVAX-SEED-Omni-8B-VG — Visual Generation variant of HyperCLOVA X SEED 8B Omni.
- Audio components removed from Omni model, focused on Image Generation / Editing
- Ready to use via
diffuserspipeline - Single GPU (~40 GB bf16), no OmniServe required
Results
Text-to-Image
우드 톤 테이블 위에 놓인 동양의 도자기 컵, 부드러운 그림자, 균형 잡힌 구도, 선명한 디테일, 고해상도 이미지
Image Editing
| Input | 검은색 승용차 한 대를 추가해줘 | 팝아트 스타일로 변환 해줘 |
|---|---|---|
![]() |
![]() |
![]() |
Inference
Requirements
pip install transformers==4.52.4 diffusers accelerate torch einops
# Optional
pip install bitsandbytes # 4-bit / 8-bit quantisation
pip install flash-attn # Flash Attention 2 (auto-detected if installed)
Hardware Guide
A100 / V100 single GPU에서 동작합니다.
| GPU | 권장 설정 |
|---|---|
| A100 80GB | bf16 (기본값) + flash-attn 설치 권장 |
| V100 32GB | load_in_4bit=True, num_inference_steps=30 |
- A100 이상:
pip install flash-attn으로 Flash Attention 2를 설치하면 자동 감지되어 LLM과 Vision Decoder 모두에서 사용됩니다. - V100:
load_in_8bit또는load_in_4bit양자화와 함께num_inference_steps=30으로 설정하는 것을 권장합니다.
Note: 이 파이프라인은 간편한 테스트 및 프로토타이핑 용도입니다. 빠른 추론이 필요한 경우 vLLM 활용 또는 공식 레포의 OmniServe를 참고하세요.
Quick Start — Python API
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"moving-j/HyperCLOVAX-SEED-Omni-8B-VG",
custom_pipeline="pipeline_hcx_omni",
trust_remote_code=True,
# load_in_4bit=True, # 4-bit quantisation (~28 GB VRAM)
# attn_implementation="eager", # disable Flash Attention 2
)
# ── Text-to-Image ──────────────────────────────────────────────────────
image = pipe(
"황금빛 노을이 물드는 설산 능선, 웅장한 구름, 사실적인 사진 스타일",
height=768, # output height (divisible by 16)
width=768, # output width (divisible by 16)
# aspect_ratio="16:9", # shorthand — overrides height/width
num_inference_steps=50, # diffusion denoising steps
guidance_scale=1.75, # autoguidance (T2I default: 1.75)
generator=42, # random seed
temperature=0.9, # LLM sampling temperature
top_p=0.9, # LLM nucleus sampling
top_k=200, # LLM top-k filtering
).images[0]
image.save("t2i.png")
# ── Image Editing ──────────────────────────────────────────────────────
from PIL import Image
input_img = Image.open("photo.jpg")
edited = pipe(
"수채화 스타일로 변환해줘", # editing instruction
image=input_img, # input image → editing mode
guidance_scale=0.0,
).images[0]
edited.save("edit.png")
Local Directory (git clone)
# After: git clone https://huggingface.co/moving-j/HyperCLOVAX-SEED-Omni-8B-VG
from pipeline_hcx_omni import HCXOmniPipeline
pipe = HCXOmniPipeline.from_pretrained("./HyperCLOVAX-SEED-Omni-8B-VG")
# Same pipe(...) API as above
Aspect Ratio Shortcuts
aspect_ratio |
Width x Height |
|---|---|
"1:1" (default) |
768 x 768 |
"16:9" |
1024 x 576 |
"9:16" |
576 x 1024 |
"4:3" |
1024 x 768 |
"3:4" |
768 x 1024 |
"3:2" |
768 x 512 |
"2:3" |
512 x 768 |
CLI Examples
git clone https://huggingface.co/moving-j/HyperCLOVAX-SEED-Omni-8B-VG
cd HyperCLOVAX-SEED-Omni-8B-VG
# Text-to-Image
python examples/text_to_image.py \
--prompt "황금빛 노을이 물드는 설산 능선" \
--aspect-ratio 16:9
# Image Editing
python examples/image_editing.py \
--image examples/assets/input_image.jpg \
--instruction "수채화 스타일로 변환해줘"
Full __call__ Parameters
| Parameter | Default | Description |
|---|---|---|
prompt |
— | Text description or editing instruction |
image |
None |
Input image for editing (None = text-to-image) |
height / width |
768 / 768 | Output size (divisible by 16) |
aspect_ratio |
None |
Shorthand — overrides height/width |
num_inference_steps |
50 | Diffusion denoising steps |
guidance_scale |
None |
Auto: T2I=1.75, editing=0.0 |
generator |
42 |
Random seed (int or torch.Generator) |
max_new_tokens |
7000 | Max LLM tokens |
temperature |
0.9 | LLM sampling temperature |
top_p / top_k |
0.9 / 200 | LLM sampling params |
Citation
@misc{hyperclovax2025seed,
title = {HyperCLOVA X SEED 8B Omni},
author = {NAVER Cloud HyperCLOVA X Team},
year = {2025},
note = {Technical Report. Visual Generation variant: HyperCLOVAX-SEED-Omni-8B-VG}
}
License
The model weights are licensed under the HyperCLOVA X SEED 8B Omni Model License Agreement.
The vision decoder pipeline code (decoder/vision/pipeline.py) is licensed under Apache-2.0, adapted from OmniServe (Copyright 2025 NAVER Cloud Corp.).
- Downloads last month
- 391



