Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

HyperCLOVA X SEED 8B Omni — Visual Generation (VG)

HyperCLOVAX-SEED-Omni-8B-VG — Visual Generation variant of HyperCLOVA X SEED 8B Omni.

  • Audio components removed from Omni model, focused on Image Generation / Editing
  • Ready to use via diffusers pipeline
  • Single GPU (~40 GB bf16), no OmniServe required

Results

Text-to-Image

우드 톤 테이블 위에 놓인 동양의 도자기 컵, 부드러운 그림자, 균형 잡힌 구도, 선명한 디테일, 고해상도 이미지

t2i result

Image Editing

Input 검은색 승용차 한 대를 추가해줘 팝아트 스타일로 변환 해줘

Inference

Requirements

pip install transformers==4.52.4 diffusers accelerate torch einops
# Optional
pip install bitsandbytes   # 4-bit / 8-bit quantisation
pip install flash-attn     # Flash Attention 2 (auto-detected if installed)

Hardware Guide

A100 / V100 single GPU에서 동작합니다.

GPU 권장 설정
A100 80GB bf16 (기본값) + flash-attn 설치 권장
V100 32GB load_in_4bit=True, num_inference_steps=30
  • A100 이상: pip install flash-attn으로 Flash Attention 2를 설치하면 자동 감지되어 LLM과 Vision Decoder 모두에서 사용됩니다.
  • V100: load_in_8bit 또는 load_in_4bit 양자화와 함께 num_inference_steps=30으로 설정하는 것을 권장합니다.

Note: 이 파이프라인은 간편한 테스트 및 프로토타이핑 용도입니다. 빠른 추론이 필요한 경우 vLLM 활용 또는 공식 레포의 OmniServe를 참고하세요.

Quick Start — Python API

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "moving-j/HyperCLOVAX-SEED-Omni-8B-VG",
    custom_pipeline="pipeline_hcx_omni",
    trust_remote_code=True,
    # load_in_4bit=True,           # 4-bit quantisation (~28 GB VRAM)
    # attn_implementation="eager",  # disable Flash Attention 2
)

# ── Text-to-Image ──────────────────────────────────────────────────────
image = pipe(
    "황금빛 노을이 물드는 설산 능선, 웅장한 구름, 사실적인 사진 스타일",
    height=768,                   # output height (divisible by 16)
    width=768,                    # output width  (divisible by 16)
    # aspect_ratio="16:9",        # shorthand — overrides height/width
    num_inference_steps=50,       # diffusion denoising steps
    guidance_scale=1.75,          # autoguidance (T2I default: 1.75)
    generator=42,                 # random seed
    temperature=0.9,              # LLM sampling temperature
    top_p=0.9,                    # LLM nucleus sampling
    top_k=200,                    # LLM top-k filtering
).images[0]
image.save("t2i.png")

# ── Image Editing ──────────────────────────────────────────────────────
from PIL import Image

input_img = Image.open("photo.jpg")
edited = pipe(
    "수채화 스타일로 변환해줘",   # editing instruction
    image=input_img,              # input image → editing mode
    guidance_scale=0.0,
).images[0]
edited.save("edit.png")

Local Directory (git clone)

# After: git clone https://huggingface.co/moving-j/HyperCLOVAX-SEED-Omni-8B-VG
from pipeline_hcx_omni import HCXOmniPipeline

pipe = HCXOmniPipeline.from_pretrained("./HyperCLOVAX-SEED-Omni-8B-VG")
# Same pipe(...) API as above

Aspect Ratio Shortcuts

aspect_ratio Width x Height
"1:1" (default) 768 x 768
"16:9" 1024 x 576
"9:16" 576 x 1024
"4:3" 1024 x 768
"3:4" 768 x 1024
"3:2" 768 x 512
"2:3" 512 x 768

CLI Examples

git clone https://huggingface.co/moving-j/HyperCLOVAX-SEED-Omni-8B-VG
cd HyperCLOVAX-SEED-Omni-8B-VG

# Text-to-Image
python examples/text_to_image.py \
    --prompt "황금빛 노을이 물드는 설산 능선" \
    --aspect-ratio 16:9

# Image Editing
python examples/image_editing.py \
    --image examples/assets/input_image.jpg \
    --instruction "수채화 스타일로 변환해줘"

Full __call__ Parameters

Parameter Default Description
prompt Text description or editing instruction
image None Input image for editing (None = text-to-image)
height / width 768 / 768 Output size (divisible by 16)
aspect_ratio None Shorthand — overrides height/width
num_inference_steps 50 Diffusion denoising steps
guidance_scale None Auto: T2I=1.75, editing=0.0
generator 42 Random seed (int or torch.Generator)
max_new_tokens 7000 Max LLM tokens
temperature 0.9 LLM sampling temperature
top_p / top_k 0.9 / 200 LLM sampling params

Citation

@misc{hyperclovax2025seed,
  title  = {HyperCLOVA X SEED 8B Omni},
  author = {NAVER Cloud HyperCLOVA X Team},
  year   = {2025},
  note   = {Technical Report. Visual Generation variant: HyperCLOVAX-SEED-Omni-8B-VG}
}

License

The model weights are licensed under the HyperCLOVA X SEED 8B Omni Model License Agreement.

The vision decoder pipeline code (decoder/vision/pipeline.py) is licensed under Apache-2.0, adapted from OmniServe (Copyright 2025 NAVER Cloud Corp.).

Downloads last month
391
Safetensors
Model size
10B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support