minpeter's picture
Upload README.md with huggingface_hub
885d2d4 verified
---
license: other
license_name: hyperclovax
license_link: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE
library_name: transformers
base_model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
tags:
- llama
- text-generation
- korean
- reasoning
language:
- ko
- en
pipeline_tag: text-generation
---
# HyperCLOVAX-SEED-Text-Think-32B
**Extracted text-only LLM from [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B)**
This model contains only the language model component extracted from the original Vision-Language Model (VLM). The vision encoder and multimodal projector have been removed, making it a pure text-to-text model compatible with standard LLaMA inference pipelines.
## Model Details
| Property | Value |
|----------|-------|
| Architecture | LlamaForCausalLM |
| Parameters | ~33B |
| Hidden Size | 5120 |
| Layers | 72 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 24192 |
| Context Length | 128K |
| Vocab Size | 128,256 |
| Precision | bfloat16 |
| RoPE Theta | 50,000,000 |
## What Was Extracted
The original VLM consists of:
- **Vision Encoder**: Qwen2.5-VL based (~600M params) - **removed**
- **MM Projector**: Multimodal projection layers - **removed**
- **Language Model**: HyperCLOVAX LLM (~33B params) - **extracted**
Only the `model.language_model.*` weights were extracted and remapped to standard LLaMA format.
## Usage
### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="bfloat16",
device_map="auto"
)
messages = [{"role": "user", "content": "What is the capital of South Korea?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### With vLLM
```bash
vllm serve minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf \
--dtype bfloat16 \
--tensor-parallel-size 2
```
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
response = client.chat.completions.create(
model="minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf",
messages=[{"role": "user", "content": "안녕하세요! 한국어로 대화할 수 있나요?"}]
)
print(response.choices[0].message.content)
```
## Thinking Mode
The model supports a "thinking mode" for complex reasoning tasks. Use the `<|thinking|>` token to trigger extended reasoning:
```python
messages = [
{"role": "user", "content": "Solve this step by step: If x + 2y = 10 and 3x - y = 5, find x and y."}
]
# The model may produce <|thinking|>...</|thinking|> blocks with its reasoning process
```
## Hardware Requirements
- **Minimum**: 2x NVIDIA A100 40GB (with tensor parallelism)
- **Recommended**: 2x NVIDIA A100 80GB or 4x NVIDIA A6000
## Limitations
- This is a **text-only** model. It cannot process images or videos.
- The model inherits any limitations from the original HyperCLOVAX-SEED-Think-32B.
- Optimized primarily for Korean and English.
## License
This model inherits the [HyperCLOVAX license](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE) from the original model.
## Citation
If you use this model, please cite the original:
```bibtex
@misc{hyperclovax-seed-think-32b,
title={HyperCLOVA X SEED Think 32B},
author={NAVER Cloud},
year={2025},
url={https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B}
}
```
## Reproduce This Extraction
Want to extract the LLM yourself? Use the included [`extract_llm.py`](extract_llm.py) script.
### Prerequisites
```bash
pip install safetensors torch tqdm huggingface_hub
```
### Step 1: Download Original VLM (~66GB)
```bash
huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \
--local-dir ./HyperCLOVAX-SEED-Think-32B
```
### Step 2: Run Extraction Script
```bash
# Download the extraction script
wget https://huggingface.co/minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf/resolve/main/extract_llm.py
# Run extraction
python extract_llm.py \
--input ./HyperCLOVAX-SEED-Think-32B \
--output ./HyperCLOVAX-SEED-Text-Think-32B
```
### What the Script Does
1. **Extracts LLM weights**: Filters `model.language_model.*` tensors from the VLM
2. **Remaps keys**: Converts to standard LLaMA format
- `model.language_model.model.*``model.*`
- `model.language_model.lm_head.*``lm_head.*`
3. **Creates config**: Generates LLaMA-compatible `config.json` from VLM's `text_config`
4. **Copies tokenizer**: Preserves all tokenizer files unchanged
### Output Structure
```
HyperCLOVAX-SEED-Text-Think-32B/
├── config.json # LLaMA config
├── generation_config.json
├── model-00001-of-00013.safetensors # ~5GB shards
├── ...
├── model-00013-of-00013.safetensors
├── model.safetensors.index.json
├── tokenizer.json
├── tokenizer_config.json
├── special_tokens_map.json
├── added_tokens.json
├── vocab.json
├── merges.txt
└── chat_template.jinja
```
### Verify Extraction
```bash
# Quick test with vLLM
vllm serve ./HyperCLOVAX-SEED-Text-Think-32B \
--dtype bfloat16 \
--tensor-parallel-size 2
# In another terminal
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "./HyperCLOVAX-SEED-Text-Think-32B", "messages": [{"role": "user", "content": "Hello!"}]}'
```
## Acknowledgments
- Original model by [NAVER Cloud HyperCLOVA X](https://huggingface.co/naver-hyperclovax)
- Extraction performed to enable text-only inference without vision dependencies