--- license: other license_name: hyperclovax license_link: LICENSE library_name: transformers --- ![image](https://cdn-uploads.huggingface.co/production/uploads/64383d54c5a91b84ece18d62/2wkHd-bv3M9Zsma_ykIf8.png) # Overview HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that advances the [SEED Think 14B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B) line beyond simple scaling, pairing a unified vision-language Transformer backbone with a reasoning-centric training recipe. SEED 32B Think processes text tokens and visual patches within a shared embedding space, supports long-context multimodal understanding up to 128K tokens, and provides an optional “thinking mode” for deep, controllable reasoning. Building on the earlier 14B model, SEED 32B Think further strengthens Korean-centric reasoning and agentic capabilities, improving practical reasoning quality and reliability in real-world use. --- # Basic Information - **Architecture** : Transformer-based vision-language model (VLM) architecture (Dense Model) - **Parameters** : 32B - **Input Format**: Text/Image/Video - **Output Format**: Text - **Context Length** : 128K - **Knowledge Cutoff**: May 2025 --- # Benchmarks ![테크니컬 리포트 04@2x](https://cdn-uploads.huggingface.co/production/uploads/646acf46086023e36edce4c4/qfIKiKlFVJWyCx3Dl1qN0.png) - **General Knowledge (Korean Text)**: KoBalt, CLIcK, HAERAE Bench 1.0 - **Vision Understanding** : ChartVQA, TextVQA, K-MMBench, K-DTCBench - **Agentic Tasks**: Tau^2-Airline, Tau^2-Retail, Tau^2-Telecom --- # Examples - Solving 2026 Korean CSAT Math Problem - Understanding Text layout --- # Inference We provide [OmniServe](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe), a production-ready multimodal inference system with OpenAI-compatible API. ## Capabilities - **Inputs**: Text, Image - **Outputs**: Text ## Requirements - 4x NVIDIA A100 80GB - Docker & Docker Compose - NVIDIA Driver 525+, CUDA 12.1+ ## Installation ```bash # Clone OmniServe git clone https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe.git cd OmniServe # Install dependencies pip install huggingface_hub safetensors torch openai easydict # Download model (~60GB) huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \ --local-dir ./models/HyperCLOVAX-SEED-Think-32B # Convert model to component format python convert_model.py \ --input ./models/HyperCLOVAX-SEED-Think-32B \ --output ./track_a \ --track a # Configure environment cp .env.example .env # Edit .env: # VLM_MODEL_PATH=./track_a/llm/HyperCLOVAX-SEED-Think-32B # VLM_ENCODER_VISION_MODEL_PATH=./track_a/ve/HyperCLOVAX-SEED-Think-32B # Build and run docker compose --profile track-a build docker compose --profile track-a up -d # Wait for model loading (~5 minutes) docker compose logs -f vlm ``` ## Basic Usage ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/a/v1", api_key="not-needed" ) # Image understanding response = client.chat.completions.create( model="track_a_model", messages=[ { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}, {"type": "text", "text": "Describe this image."} ] } ], max_tokens=512, extra_body={"chat_template_kwargs": {"thinking": False}} ) print(response.choices[0].message.content) ``` ## Reasoning Mode Enable chain-of-thought reasoning for complex tasks: ```python response = client.chat.completions.create( model="track_a_model", messages=[ {"role": "user", "content": "Solve step by step: 3x + 7 = 22"} ], max_tokens=1024, extra_body={ "thinking_token_budget": 500, "chat_template_kwargs": {"thinking": True} } ) # Response includes ... with reasoning process print(response.choices[0].message.content) ``` ## More Examples
Video Understanding ```python response = client.chat.completions.create( model="track_a_model", messages=[ { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://example.com/video.mp4"}}, {"type": "text", "text": "Describe this video."} ] } ], max_tokens=512, extra_body={"chat_template_kwargs": {"thinking": False}} ) ```
Base64 Image Input ```python import base64 with open("image.png", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="track_a_model", messages=[ { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}}, {"type": "text", "text": "What is in this image?"} ] } ], max_tokens=512, extra_body={"chat_template_kwargs": {"thinking": False}} ) ```
Using curl ```bash curl -X POST http://localhost:8000/a/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "track_a_model", "messages": [ { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}, {"type": "text", "text": "Describe this image."} ] } ], "max_tokens": 512, "extra_body": {"chat_template_kwargs": {"thinking": false}} }' ```
## Model Capabilities | Input | Output | |-------|--------| | Text | Text | | Image | Text | | Video | Text | | Image + Text | Text | | Video + Text | Text | **Features:** - Reasoning mode with `...` output - Multi-turn conversation support - Image/Video understanding ## Architecture ``` User Request (Image/Video/Text) │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ OmniServe │ │ POST /a/v1/chat/completions │ │ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ [1] INPUT ENCODING │ │ │ │ │ │ │ │ ┌─────────────────┐ │ │ │ │ │ Vision Encoder │ │ │ │ │ └────────┬────────┘ │ │ │ │ │ embeddings │ │ │ └────────────────────────────┼─────────────────────────────────────┘ │ │ ▼ │ │ ┌──────────────┐ │ │ │ LLM (32B) │◀──── text │ │ └──────┬───────┘ │ │ │ │ │ ▼ │ │ Text Response │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ Response (Text) ``` ## Hardware Requirements | Component | GPU | VRAM | |-----------|-----|------| | Vision Encoder | 1x | ~8GB | | LLM (32B) | 2x | ~60GB | | **Total** | **3x** | **~68GB** | ## Key Parameters | Parameter | Description | Default | |-----------|-------------|---------| | `chat_template_kwargs.thinking` | Enable reasoning | `false` | | `thinking_token_budget` | Max reasoning tokens | 500 | | `max_tokens` | Max output tokens | - | | `temperature` | Sampling temperature | 0.7 | For more details, see [OmniServe documentation](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe). --- # Citation TBU (Technical Report) --- # Questions For any other questions, please feel free to contact us at dl_hcxopensource@navercorp.com. --- # License The model is licensed under [HyperCLOVA X SEED 32B Think Model License Agreement](./LICENSE)