Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

HyperCLOVA X SEED 8B Omni — Visual Generation (VG)

HyperCLOVAX-SEED-Omni-8B-VG — Visual Generation variant of HyperCLOVA X SEED 8B Omni.

Audio components removed from Omni model, focused on Image Generation / Editing
Ready to use via diffusers pipeline
Single GPU (~40 GB bf16), no OmniServe required

Results

Text-to-Image

우드 톤 테이블 위에 놓인 동양의 도자기 컵, 부드러운 그림자, 균형 잡힌 구도, 선명한 디테일, 고해상도 이미지

Image Editing

Input	검은색 승용차 한 대를 추가해줘	팝아트 스타일로 변환 해줘

Inference

Requirements

pip install transformers==4.52.4 diffusers accelerate torch einops
# Optional
pip install bitsandbytes   # 4-bit / 8-bit quantisation
pip install flash-attn     # Flash Attention 2 (auto-detected if installed)

Hardware Guide

A100 / V100 single GPU에서 동작합니다.

GPU	권장 설정
A100 80GB	bf16 (기본값) + `flash-attn` 설치 권장
V100 32GB	`load_in_4bit=True`, `num_inference_steps=30`

A100 이상: pip install flash-attn으로 Flash Attention 2를 설치하면 자동 감지되어 LLM과 Vision Decoder 모두에서 사용됩니다.
V100: load_in_8bit 또는 load_in_4bit 양자화와 함께 num_inference_steps=30으로 설정하는 것을 권장합니다.

Note: 이 파이프라인은 간편한 테스트 및 프로토타이핑 용도입니다. 빠른 추론이 필요한 경우 vLLM 활용 또는 공식 레포의 OmniServe를 참고하세요.

Quick Start — Python API

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "moving-j/HyperCLOVAX-SEED-Omni-8B-VG",
    custom_pipeline="pipeline_hcx_omni",
    trust_remote_code=True,
    # load_in_4bit=True,           # 4-bit quantisation (~28 GB VRAM)
    # attn_implementation="eager",  # disable Flash Attention 2
)

# ── Text-to-Image ──────────────────────────────────────────────────────
image = pipe(
    "황금빛 노을이 물드는 설산 능선, 웅장한 구름, 사실적인 사진 스타일",
    height=768,                   # output height (divisible by 16)
    width=768,                    # output width  (divisible by 16)
    # aspect_ratio="16:9",        # shorthand — overrides height/width
    num_inference_steps=50,       # diffusion denoising steps
    guidance_scale=1.75,          # autoguidance (T2I default: 1.75)
    generator=42,                 # random seed
    temperature=0.9,              # LLM sampling temperature
    top_p=0.9,                    # LLM nucleus sampling
    top_k=200,                    # LLM top-k filtering
).images[0]
image.save("t2i.png")

# ── Image Editing ──────────────────────────────────────────────────────
from PIL import Image

input_img = Image.open("photo.jpg")
edited = pipe(
    "수채화 스타일로 변환해줘",   # editing instruction
    image=input_img,              # input image → editing mode
    guidance_scale=0.0,
).images[0]
edited.save("edit.png")

Local Directory (git clone)

# After: git clone https://huggingface.co/moving-j/HyperCLOVAX-SEED-Omni-8B-VG
from pipeline_hcx_omni import HCXOmniPipeline

pipe = HCXOmniPipeline.from_pretrained("./HyperCLOVAX-SEED-Omni-8B-VG")
# Same pipe(...) API as above

Aspect Ratio Shortcuts

`aspect_ratio`	Width x Height
`"1:1"` (default)	768 x 768
`"16:9"`	1024 x 576
`"9:16"`	576 x 1024
`"4:3"`	1024 x 768
`"3:4"`	768 x 1024
`"3:2"`	768 x 512
`"2:3"`	512 x 768

CLI Examples

git clone https://huggingface.co/moving-j/HyperCLOVAX-SEED-Omni-8B-VG
cd HyperCLOVAX-SEED-Omni-8B-VG

# Text-to-Image
python examples/text_to_image.py \
    --prompt "황금빛 노을이 물드는 설산 능선" \
    --aspect-ratio 16:9

# Image Editing
python examples/image_editing.py \
    --image examples/assets/input_image.jpg \
    --instruction "수채화 스타일로 변환해줘"

Full `call` Parameters

Parameter	Default	Description
`prompt`	—	Text description or editing instruction
`image`	`None`	Input image for editing (`None` = text-to-image)
`height` / `width`	768 / 768	Output size (divisible by 16)
`aspect_ratio`	`None`	Shorthand — overrides height/width
`num_inference_steps`	50	Diffusion denoising steps
`guidance_scale`	`None`	Auto: T2I=1.75, editing=0.0
`generator`	`42`	Random seed (`int` or `torch.Generator`)
`max_new_tokens`	7000	Max LLM tokens
`temperature`	0.9	LLM sampling temperature
`top_p` / `top_k`	0.9 / 200	LLM sampling params

Citation

@misc{hyperclovax2025seed,
  title  = {HyperCLOVA X SEED 8B Omni},
  author = {NAVER Cloud HyperCLOVA X Team},
  year   = {2025},
  note   = {Technical Report. Visual Generation variant: HyperCLOVAX-SEED-Omni-8B-VG}
}

License

The model weights are licensed under the HyperCLOVA X SEED 8B Omni Model License Agreement.

Downloads last month: 25

Safetensors

Model size

10B params

Tensor type

F32