Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,8 @@ library_name: transformers
|
|
| 10 |
# Overview
|
| 11 |
HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that advances the [SEED Think 14B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B) line beyond simple scaling, pairing a unified vision-language Transformer backbone with a reasoning-centric training recipe. SEED 32B Think processes text tokens and visual patches within a shared embedding space, supports long-context multimodal understanding up to 128K tokens, and provides an optional “thinking mode” for deep, controllable reasoning. Building on the earlier 14B model, SEED 32B Think further strengthens Korean-centric reasoning and agentic capabilities, improving practical reasoning quality and reliability in real-world use.
|
| 12 |
|
|
|
|
|
|
|
| 13 |
# Basic Information
|
| 14 |
|
| 15 |
- **Architecture** : Transformer-based vision-language model (VLM) architecture (Dense Model)
|
|
@@ -18,6 +20,8 @@ HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that ad
|
|
| 18 |
- **Output Format**: Text
|
| 19 |
- **Context Length** : 128K
|
| 20 |
|
|
|
|
|
|
|
| 21 |
# Benchmarks
|
| 22 |
|
| 23 |

|
|
@@ -26,6 +30,7 @@ HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that ad
|
|
| 26 |
- **Vision Understanding** : ChartVQA, TextVQA, K-MMBench, K-DTCBench
|
| 27 |
- **Agentic Tasks**: Tau^2-Airline, Tau^2-Retail, Tau^2-Telecom
|
| 28 |
|
|
|
|
| 29 |
|
| 30 |
# Examples
|
| 31 |
- Solving 2026 Korean CSAT Math Problem
|
|
@@ -35,6 +40,8 @@ HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that ad
|
|
| 35 |
<!-- - Understanding Charts
|
| 36 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/zoH2Lh6CSkgdzvXz7JaHo.jpeg" style="width: 640px;"> -->
|
| 37 |
|
|
|
|
|
|
|
| 38 |
# Inference
|
| 39 |
|
| 40 |
We provide [OmniServe](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe), a production-ready multimodal inference system with OpenAI-compatible API.
|
|
@@ -277,13 +284,17 @@ curl -X POST http://localhost:8000/a/v1/chat/completions \
|
|
| 277 |
|
| 278 |
For more details, see [OmniServe documentation](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe).
|
| 279 |
|
|
|
|
| 280 |
|
| 281 |
# Citation
|
| 282 |
TBU (Technical Report)
|
| 283 |
|
|
|
|
|
|
|
| 284 |
# Questions
|
| 285 |
For any other questions, please feel free to contact us at dl_hcxopensource@navercorp.com.
|
| 286 |
|
|
|
|
| 287 |
|
| 288 |
# License
|
| 289 |
The model is licensed under [HyperCLOVA X SEED 32B Think Model License Agreement](./LICENSE)
|
|
|
|
| 10 |
# Overview
|
| 11 |
HyperCLOVA X SEED 32B Think is an updated vision-language thinking model that advances the [SEED Think 14B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B) line beyond simple scaling, pairing a unified vision-language Transformer backbone with a reasoning-centric training recipe. SEED 32B Think processes text tokens and visual patches within a shared embedding space, supports long-context multimodal understanding up to 128K tokens, and provides an optional “thinking mode” for deep, controllable reasoning. Building on the earlier 14B model, SEED 32B Think further strengthens Korean-centric reasoning and agentic capabilities, improving practical reasoning quality and reliability in real-world use.
|
| 12 |
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
# Basic Information
|
| 16 |
|
| 17 |
- **Architecture** : Transformer-based vision-language model (VLM) architecture (Dense Model)
|
|
|
|
| 20 |
- **Output Format**: Text
|
| 21 |
- **Context Length** : 128K
|
| 22 |
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
# Benchmarks
|
| 26 |
|
| 27 |

|
|
|
|
| 30 |
- **Vision Understanding** : ChartVQA, TextVQA, K-MMBench, K-DTCBench
|
| 31 |
- **Agentic Tasks**: Tau^2-Airline, Tau^2-Retail, Tau^2-Telecom
|
| 32 |
|
| 33 |
+
---
|
| 34 |
|
| 35 |
# Examples
|
| 36 |
- Solving 2026 Korean CSAT Math Problem
|
|
|
|
| 40 |
<!-- - Understanding Charts
|
| 41 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67ff242cee08737feaf18cb2/zoH2Lh6CSkgdzvXz7JaHo.jpeg" style="width: 640px;"> -->
|
| 42 |
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
# Inference
|
| 46 |
|
| 47 |
We provide [OmniServe](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe), a production-ready multimodal inference system with OpenAI-compatible API.
|
|
|
|
| 284 |
|
| 285 |
For more details, see [OmniServe documentation](https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe).
|
| 286 |
|
| 287 |
+
---
|
| 288 |
|
| 289 |
# Citation
|
| 290 |
TBU (Technical Report)
|
| 291 |
|
| 292 |
+
---
|
| 293 |
+
|
| 294 |
# Questions
|
| 295 |
For any other questions, please feel free to contact us at dl_hcxopensource@navercorp.com.
|
| 296 |
|
| 297 |
+
---
|
| 298 |
|
| 299 |
# License
|
| 300 |
The model is licensed under [HyperCLOVA X SEED 32B Think Model License Agreement](./LICENSE)
|