Instructions to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
- SGLang
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with Docker Model Runner:
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
| license: other | |
| license_name: hyperclovax | |
| license_link: LICENSE | |
| library_name: transformers | |
|  | |
| # Overview | |
| HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that advances the [SEED Think 14B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B) line beyond simple scaling, pairing a unified vision-language Transformer backbone with a reasoning-centric training recipe. SEED 32B Think processes text tokens and visual patches within a shared embedding space, supports long-context multimodal understanding up to 128K tokens, and provides an optional “thinking mode” for deep, controllable reasoning. Building on the earlier 14B model, SEED 32B Think further strengthens Korean-centric reasoning and agentic capabilities, improving practical reasoning quality and reliability in real-world use. | |
| --- | |
| # Technical Report | |
| - [HyperCLOVAX-SEED-Think-32B Tech Report (PDF)](./HyperCLOVA_X_32B_Think.pdf) | |
| --- | |
| # Basic Information | |
| - **Architecture** : Transformer-based vision-language model (VLM) architecture (Dense Model) | |
| - **Parameters** : 32B | |
| - **Input Format**: Text/Image/Video | |
| - **Output Format**: Text | |
| - **Context Length** : 128K | |
| - **Knowledge Cutoff**: May 2025 | |
| --- | |
| # Benchmarks | |
|  | |
| - **General Knowledge (Korean Text)**: KoBalt, CLIcK, HAERAE Bench 1.0 | |
| - **Vision Understanding** : ChartVQA, TextVQA, K-MMBench, K-DTCBench | |
| - **Agentic Tasks**: Tau^2-Airline, Tau^2-Retail, Tau^2-Telecom | |
| --- | |
| # Examples | |
| - Solving 2026 Korean CSAT Math Problem | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/LPU8kNbYQ8FN_piQ_p6Je.jpeg" style="width: 640px;"> | |
| - Understanding Text layout | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/Y8lHa7s1TmJcS6F82d41L.jpeg" style="width: 640px;"> | |
| <!-- - Understanding Charts | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/zoH2Lh6CSkgdzvXz7JaHo.jpeg" style="width: 640px;"> --> | |
| --- | |
| # Inference | |
| We provide [OmniServe](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe), a production-ready multimodal inference system with OpenAI-compatible API. | |
| ## Capabilities | |
| - **Inputs**: Text, Image | |
| - **Outputs**: Text | |
| ## Requirements | |
| - 4x NVIDIA A100 80GB | |
| - Docker & Docker Compose | |
| - NVIDIA Driver 525+, CUDA 12.1+ | |
| ## Installation | |
| ```bash | |
| # Clone OmniServe | |
| git clone https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe.git | |
| cd OmniServe | |
| # Install dependencies | |
| pip install huggingface_hub safetensors torch openai easydict | |
| # Download model (~60GB) | |
| huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \ | |
| --local-dir ./models/HyperCLOVAX-SEED-Think-32B | |
| # Convert model to component format | |
| python convert_model.py \ | |
| --input ./models/HyperCLOVAX-SEED-Think-32B \ | |
| --output ./track_a \ | |
| --track a | |
| # Configure environment | |
| cp .env.example .env | |
| # Edit .env: | |
| # VLM_MODEL_PATH=./track_a/llm/HyperCLOVAX-SEED-Think-32B | |
| # VLM_ENCODER_VISION_MODEL_PATH=./track_a/ve/HyperCLOVAX-SEED-Think-32B | |
| # Build and run | |
| docker compose --profile track-a build | |
| docker compose --profile track-a up -d | |
| # Wait for model loading (~5 minutes) | |
| docker compose logs -f vlm | |
| ``` | |
| ## Basic Usage | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI( | |
| base_url="http://localhost:8000/a/v1", | |
| api_key="not-needed" | |
| ) | |
| # Image understanding | |
| response = client.chat.completions.create( | |
| model="track_a_model", | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}, | |
| {"type": "text", "text": "Describe this image."} | |
| ] | |
| } | |
| ], | |
| max_tokens=512, | |
| extra_body={"chat_template_kwargs": {"thinking": False}} | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| ## Reasoning Mode | |
| Enable chain-of-thought reasoning for complex tasks: | |
| ```python | |
| response = client.chat.completions.create( | |
| model="track_a_model", | |
| messages=[ | |
| {"role": "user", "content": "Solve step by step: 3x + 7 = 22"} | |
| ], | |
| max_tokens=1024, | |
| extra_body={ | |
| "thinking_token_budget": 500, | |
| "chat_template_kwargs": {"thinking": True} | |
| } | |
| ) | |
| # Response includes <think>...</think> with reasoning process | |
| print(response.choices[0].message.content) | |
| ``` | |
| ## More Examples | |
| <details> | |
| <summary>Video Understanding</summary> | |
| ```python | |
| response = client.chat.completions.create( | |
| model="track_a_model", | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image_url", "image_url": {"url": "https://example.com/video.mp4"}}, | |
| {"type": "text", "text": "Describe this video."} | |
| ] | |
| } | |
| ], | |
| max_tokens=512, | |
| extra_body={"chat_template_kwargs": {"thinking": False}} | |
| ) | |
| ``` | |
| </details> | |
| <details> | |
| <summary>Base64 Image Input</summary> | |
| ```python | |
| import base64 | |
| with open("image.png", "rb") as f: | |
| image_b64 = base64.b64encode(f.read()).decode() | |
| response = client.chat.completions.create( | |
| model="track_a_model", | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}}, | |
| {"type": "text", "text": "What is in this image?"} | |
| ] | |
| } | |
| ], | |
| max_tokens=512, | |
| extra_body={"chat_template_kwargs": {"thinking": False}} | |
| ) | |
| ``` | |
| </details> | |
| <details> | |
| <summary>Using curl</summary> | |
| ```bash | |
| curl -X POST http://localhost:8000/a/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "track_a_model", | |
| "messages": [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}, | |
| {"type": "text", "text": "Describe this image."} | |
| ] | |
| } | |
| ], | |
| "max_tokens": 512, | |
| "extra_body": {"chat_template_kwargs": {"thinking": false}} | |
| }' | |
| ``` | |
| </details> | |
| ## Model Capabilities | |
| | Input | Output | | |
| |-------|--------| | |
| | Text | Text | | |
| | Image | Text | | |
| | Video | Text | | |
| | Image + Text | Text | | |
| | Video + Text | Text | | |
| **Features:** | |
| - Reasoning mode with `<think>...</think>` output | |
| - Multi-turn conversation support | |
| - Image/Video understanding | |
| ## Architecture | |
| ``` | |
| User Request | |
| (Image/Video/Text) | |
| │ | |
| ▼ | |
| ┌─────────────────────────────────────────────────────────────────────────┐ | |
| │ OmniServe │ | |
| │ POST /a/v1/chat/completions │ | |
| │ │ | |
| │ ┌──────────────────────────────────────────────────────────────────┐ │ | |
| │ │ [1] INPUT ENCODING │ │ | |
| │ │ │ │ | |
| │ │ ┌─────────────────┐ │ │ | |
| │ │ │ Vision Encoder │ │ │ | |
| │ │ └────────┬────────┘ │ │ | |
| │ │ │ embeddings │ │ | |
| │ └────────────────────────────┼─────────────────────────────────────┘ │ | |
| │ ▼ │ | |
| │ ┌──────────────┐ │ | |
| │ │ LLM (32B) │◀──── text │ | |
| │ └──────┬───────┘ │ | |
| │ │ │ | |
| │ ▼ │ | |
| │ Text Response │ | |
| │ │ | |
| └─────────────────────────────────────────────────────────────────────────┘ | |
| │ | |
| ▼ | |
| Response | |
| (Text) | |
| ``` | |
| ## Hardware Requirements | |
| | Component | GPU | VRAM | | |
| |-----------|-----|------| | |
| | Vision Encoder | 1x | ~8GB | | |
| | LLM (32B) | 2x | ~60GB | | |
| | **Total** | **3x** | **~68GB** | | |
| ## Key Parameters | |
| | Parameter | Description | Default | | |
| |-----------|-------------|---------| | |
| | `chat_template_kwargs.thinking` | Enable reasoning | `false` | | |
| | `thinking_token_budget` | Max reasoning tokens | 500 | | |
| | `max_tokens` | Max output tokens | - | | |
| | `temperature` | Sampling temperature | 0.7 | | |
| For more details, see [OmniServe documentation](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe). | |
| --- | |
| # Citation | |
| TBU (Technical Report) | |
| --- | |
| # Questions | |
| For any other questions, please feel free to contact us at dl_hcxopensource@navercorp.com. | |
| --- | |
| # License | |
| The model is licensed under [HyperCLOVA X SEED 32B Think Model License Agreement](./LICENSE) | |