File size: 5,984 Bytes
ebb29ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
885d2d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ebb29ac
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
license: other
license_name: hyperclovax
license_link: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE
library_name: transformers
base_model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
tags:
  - llama
  - text-generation
  - korean
  - reasoning
language:
  - ko
  - en
pipeline_tag: text-generation
---

# HyperCLOVAX-SEED-Text-Think-32B

**Extracted text-only LLM from [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B)**

This model contains only the language model component extracted from the original Vision-Language Model (VLM). The vision encoder and multimodal projector have been removed, making it a pure text-to-text model compatible with standard LLaMA inference pipelines.

## Model Details

| Property | Value |
|----------|-------|
| Architecture | LlamaForCausalLM |
| Parameters | ~33B |
| Hidden Size | 5120 |
| Layers | 72 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 24192 |
| Context Length | 128K |
| Vocab Size | 128,256 |
| Precision | bfloat16 |
| RoPE Theta | 50,000,000 |

## What Was Extracted

The original VLM consists of:
- **Vision Encoder**: Qwen2.5-VL based (~600M params) - **removed**
- **MM Projector**: Multimodal projection layers - **removed**  
- **Language Model**: HyperCLOVAX LLM (~33B params) - **extracted** ✓

Only the `model.language_model.*` weights were extracted and remapped to standard LLaMA format.

## Usage

### With Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto"
)

messages = [{"role": "user", "content": "What is the capital of South Korea?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### With vLLM

```bash
vllm serve minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf \
    --dtype bfloat16 \
    --tensor-parallel-size 2
```

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
response = client.chat.completions.create(
    model="minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf",
    messages=[{"role": "user", "content": "안녕하세요! 한국어로 대화할 수 있나요?"}]
)
print(response.choices[0].message.content)
```

## Thinking Mode

The model supports a "thinking mode" for complex reasoning tasks. Use the `<|thinking|>` token to trigger extended reasoning:

```python
messages = [
    {"role": "user", "content": "Solve this step by step: If x + 2y = 10 and 3x - y = 5, find x and y."}
]
# The model may produce <|thinking|>...</|thinking|> blocks with its reasoning process
```

## Hardware Requirements

- **Minimum**: 2x NVIDIA A100 40GB (with tensor parallelism)
- **Recommended**: 2x NVIDIA A100 80GB or 4x NVIDIA A6000

## Limitations

- This is a **text-only** model. It cannot process images or videos.
- The model inherits any limitations from the original HyperCLOVAX-SEED-Think-32B.
- Optimized primarily for Korean and English.

## License

This model inherits the [HyperCLOVAX license](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE) from the original model.

## Citation

If you use this model, please cite the original:

```bibtex
@misc{hyperclovax-seed-think-32b,
  title={HyperCLOVA X SEED Think 32B},
  author={NAVER Cloud},
  year={2025},
  url={https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B}
}
```

## Reproduce This Extraction

Want to extract the LLM yourself? Use the included [`extract_llm.py`](extract_llm.py) script.

### Prerequisites

```bash
pip install safetensors torch tqdm huggingface_hub
```

### Step 1: Download Original VLM (~66GB)

```bash
huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \
    --local-dir ./HyperCLOVAX-SEED-Think-32B
```

### Step 2: Run Extraction Script

```bash
# Download the extraction script
wget https://huggingface.co/minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf/resolve/main/extract_llm.py

# Run extraction
python extract_llm.py \
    --input ./HyperCLOVAX-SEED-Think-32B \
    --output ./HyperCLOVAX-SEED-Text-Think-32B
```

### What the Script Does

1. **Extracts LLM weights**: Filters `model.language_model.*` tensors from the VLM
2. **Remaps keys**: Converts to standard LLaMA format
   - `model.language_model.model.*``model.*`
   - `model.language_model.lm_head.*``lm_head.*`
3. **Creates config**: Generates LLaMA-compatible `config.json` from VLM's `text_config`
4. **Copies tokenizer**: Preserves all tokenizer files unchanged

### Output Structure

```
HyperCLOVAX-SEED-Text-Think-32B/
├── config.json                      # LLaMA config
├── generation_config.json
├── model-00001-of-00013.safetensors # ~5GB shards
├── ...
├── model-00013-of-00013.safetensors
├── model.safetensors.index.json
├── tokenizer.json
├── tokenizer_config.json
├── special_tokens_map.json
├── added_tokens.json
├── vocab.json
├── merges.txt
└── chat_template.jinja
```

### Verify Extraction

```bash
# Quick test with vLLM
vllm serve ./HyperCLOVAX-SEED-Text-Think-32B \
    --dtype bfloat16 \
    --tensor-parallel-size 2

# In another terminal
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "./HyperCLOVAX-SEED-Text-Think-32B", "messages": [{"role": "user", "content": "Hello!"}]}'
```

## Acknowledgments

- Original model by [NAVER Cloud HyperCLOVA X](https://huggingface.co/naver-hyperclovax)
- Extraction performed to enable text-only inference without vision dependencies