Instructions to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B

SGLang

How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with Docker Model Runner:
```
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
```

HyperCLOVAX-SEED-Think-32B / README.md

PenPaperKeyCode

Update README.md

d6b3495 verified 5 months ago

preview code

raw

history blame

10.1 kB

	---
	license: other
	license_name: hyperclovax
	license_link: LICENSE
	library_name: transformers
	---

	![image](https://cdn-uploads.huggingface.co/production/uploads/64383d54c5a91b84ece18d62/2wkHd-bv3M9Zsma_ykIf8.png)

	# Overview
	HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that advances the [SEED Think 14B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B) line beyond simple scaling, pairing a unified vision-language Transformer backbone with a reasoning-centric training recipe. SEED 32B Think processes text tokens and visual patches within a shared embedding space, supports long-context multimodal understanding up to 128K tokens, and provides an optional “thinking mode” for deep, controllable reasoning. Building on the earlier 14B model, SEED 32B Think further strengthens Korean-centric reasoning and agentic capabilities, improving practical reasoning quality and reliability in real-world use.

	---

	# Technical Report
	- [HyperCLOVAX-SEED-Think-32B Tech Report (PDF)](./HyperCLOVA_X_32B_Think.pdf)



	---

	# Basic Information

	- Architecture : Transformer-based vision-language model (VLM) architecture (Dense Model)
	- Parameters : 32B
	- Input Format: Text/Image/Video
	- Output Format: Text
	- Context Length : 128K
	- Knowledge Cutoff: May 2025

	---

	# Benchmarks

	![테크니컬 리포트 04@2x](https://cdn-uploads.huggingface.co/production/uploads/646acf46086023e36edce4c4/qfIKiKlFVJWyCx3Dl1qN0.png)

	- General Knowledge (Korean Text): KoBalt, CLIcK, HAERAE Bench 1.0
	- Vision Understanding : ChartVQA, TextVQA, K-MMBench, K-DTCBench
	- Agentic Tasks: Tau^2-Airline, Tau^2-Retail, Tau^2-Telecom

	---

	# Examples
	- Solving 2026 Korean CSAT Math Problem
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/LPU8kNbYQ8FN_piQ_p6Je.jpeg" style="width: 640px;">
	- Understanding Text layout
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/Y8lHa7s1TmJcS6F82d41L.jpeg" style="width: 640px;">
	<!-- - Understanding Charts
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/zoH2Lh6CSkgdzvXz7JaHo.jpeg" style="width: 640px;"> -->

	---

	# Inference

	We provide [OmniServe](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe), a production-ready multimodal inference system with OpenAI-compatible API.

	## Capabilities

	- Inputs: Text, Image
	- Outputs: Text

	## Requirements

	- 4x NVIDIA A100 80GB
	- Docker & Docker Compose
	- NVIDIA Driver 525+, CUDA 12.1+

	## Installation

	```bash
	# Clone OmniServe
	git clone https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe.git
	cd OmniServe

	# Install dependencies
	pip install huggingface_hub safetensors torch openai easydict

	# Download model (~60GB)
	huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \
	--local-dir ./models/HyperCLOVAX-SEED-Think-32B

	# Convert model to component format
	python convert_model.py \
	--input ./models/HyperCLOVAX-SEED-Think-32B \
	--output ./track_a \
	--track a

	# Configure environment
	cp .env.example .env
	# Edit .env:
	# VLM_MODEL_PATH=./track_a/llm/HyperCLOVAX-SEED-Think-32B
	# VLM_ENCODER_VISION_MODEL_PATH=./track_a/ve/HyperCLOVAX-SEED-Think-32B

	# Build and run
	docker compose --profile track-a build
	docker compose --profile track-a up -d

	# Wait for model loading (~5 minutes)
	docker compose logs -f vlm
	```

	## Basic Usage

	```python
	from openai import OpenAI

	client = OpenAI(
	base_url="http://localhost:8000/a/v1",
	api_key="not-needed"
	)

	# Image understanding
	response = client.chat.completions.create(
	model="track_a_model",
	messages=[
	{
	"role": "user",
	"content": [
	{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
	{"type": "text", "text": "Describe this image."}
	]
	}
	],
	max_tokens=512,
	extra_body={"chat_template_kwargs": {"thinking": False}}
	)

	print(response.choices[0].message.content)
	```

	## Reasoning Mode

	Enable chain-of-thought reasoning for complex tasks:

	```python
	response = client.chat.completions.create(
	model="track_a_model",
	messages=[
	{"role": "user", "content": "Solve step by step: 3x + 7 = 22"}
	],
	max_tokens=1024,
	extra_body={
	"thinking_token_budget": 500,
	"chat_template_kwargs": {"thinking": True}
	}
	)

	# Response includes <think>...</think> with reasoning process
	print(response.choices[0].message.content)
	```

	## More Examples

	<details>
	<summary>Video Understanding</summary>

	```python
	response = client.chat.completions.create(
	model="track_a_model",
	messages=[
	{
	"role": "user",
	"content": [
	{"type": "image_url", "image_url": {"url": "https://example.com/video.mp4"}},
	{"type": "text", "text": "Describe this video."}
	]
	}
	],
	max_tokens=512,
	extra_body={"chat_template_kwargs": {"thinking": False}}
	)
	```

	</details>

	<details>
	<summary>Base64 Image Input</summary>

	```python
	import base64

	with open("image.png", "rb") as f:
	image_b64 = base64.b64encode(f.read()).decode()

	response = client.chat.completions.create(
	model="track_a_model",
	messages=[
	{
	"role": "user",
	"content": [
	{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}},
	{"type": "text", "text": "What is in this image?"}
	]
	}
	],
	max_tokens=512,
	extra_body={"chat_template_kwargs": {"thinking": False}}
	)
	```

	</details>

	<details>
	<summary>Using curl</summary>

	```bash
	curl -X POST http://localhost:8000/a/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "track_a_model",
	"messages": [
	{
	"role": "user",
	"content": [
	{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
	{"type": "text", "text": "Describe this image."}
	]
	}
	],
	"max_tokens": 512,
	"extra_body": {"chat_template_kwargs": {"thinking": false}}
	}'
	```

	</details>

	## Model Capabilities

	\| Input \| Output \|
	\|-------\|--------\|
	\| Text \| Text \|
	\| Image \| Text \|
	\| Video \| Text \|
	\| Image + Text \| Text \|
	\| Video + Text \| Text \|

	Features:
	- Reasoning mode with `<think>...</think>` output
	- Multi-turn conversation support
	- Image/Video understanding

	## Architecture

	```
	User Request
	(Image/Video/Text)
	│
	▼
	┌─────────────────────────────────────────────────────────────────────────┐
	│ OmniServe │
	│ POST /a/v1/chat/completions │
	│ │
	│ ┌──────────────────────────────────────────────────────────────────┐ │
	│ │ [1] INPUT ENCODING │ │
	│ │ │ │
	│ │ ┌─────────────────┐ │ │
	│ │ │ Vision Encoder │ │ │
	│ │ └────────┬────────┘ │ │
	│ │ │ embeddings │ │
	│ └────────────────────────────┼─────────────────────────────────────┘ │
	│ ▼ │
	│ ┌──────────────┐ │
	│ │ LLM (32B) │◀──── text │
	│ └──────┬───────┘ │
	│ │ │
	│ ▼ │
	│ Text Response │
	│ │
	└─────────────────────────────────────────────────────────────────────────┘
	│
	▼
	Response
	(Text)
	```

	## Hardware Requirements

	\| Component \| GPU \| VRAM \|
	\|-----------\|-----\|------\|
	\| Vision Encoder \| 1x \| ~8GB \|
	\| LLM (32B) \| 2x \| ~60GB \|
	\| Total \| 3x \| ~68GB \|

	## Key Parameters

	\| Parameter \| Description \| Default \|
	\|-----------\|-------------\|---------\|
	\| `chat_template_kwargs.thinking` \| Enable reasoning \| `false` \|
	\| `thinking_token_budget` \| Max reasoning tokens \| 500 \|
	\| `max_tokens` \| Max output tokens \| - \|
	\| `temperature` \| Sampling temperature \| 0.7 \|

	For more details, see [OmniServe documentation](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe).

	---

	# Citation
	TBU (Technical Report)

	---

	# Questions
	For any other questions, please feel free to contact us at dl_hcxopensource@navercorp.com.

	---

	# License
	The model is licensed under [HyperCLOVA X SEED 32B Think Model License Agreement](./LICENSE)