|
|
--- |
|
|
license: other |
|
|
license_name: hyperclovax |
|
|
license_link: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE |
|
|
library_name: transformers |
|
|
base_model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B |
|
|
tags: |
|
|
- llama |
|
|
- text-generation |
|
|
- korean |
|
|
- reasoning |
|
|
language: |
|
|
- ko |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# HyperCLOVAX-SEED-Text-Think-32B |
|
|
|
|
|
**Extracted text-only LLM from [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B)** |
|
|
|
|
|
This model contains only the language model component extracted from the original Vision-Language Model (VLM). The vision encoder and multimodal projector have been removed, making it a pure text-to-text model compatible with standard LLaMA inference pipelines. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Architecture | LlamaForCausalLM | |
|
|
| Parameters | ~33B | |
|
|
| Hidden Size | 5120 | |
|
|
| Layers | 72 | |
|
|
| Attention Heads | 40 | |
|
|
| KV Heads | 8 (GQA) | |
|
|
| Intermediate Size | 24192 | |
|
|
| Context Length | 128K | |
|
|
| Vocab Size | 128,256 | |
|
|
| Precision | bfloat16 | |
|
|
| RoPE Theta | 50,000,000 | |
|
|
|
|
|
## What Was Extracted |
|
|
|
|
|
The original VLM consists of: |
|
|
- **Vision Encoder**: Qwen2.5-VL based (~600M params) - **removed** |
|
|
- **MM Projector**: Multimodal projection layers - **removed** |
|
|
- **Language Model**: HyperCLOVAX LLM (~33B params) - **extracted** ✓ |
|
|
|
|
|
Only the `model.language_model.*` weights were extracted and remapped to standard LLaMA format. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_id = "minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype="bfloat16", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
messages = [{"role": "user", "content": "What is the capital of South Korea?"}] |
|
|
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) |
|
|
outputs = model.generate(inputs.to(model.device), max_new_tokens=512) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### With vLLM |
|
|
|
|
|
```bash |
|
|
vllm serve minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf \ |
|
|
--dtype bfloat16 \ |
|
|
--tensor-parallel-size 2 |
|
|
``` |
|
|
|
|
|
```python |
|
|
from openai import OpenAI |
|
|
|
|
|
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy") |
|
|
response = client.chat.completions.create( |
|
|
model="minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf", |
|
|
messages=[{"role": "user", "content": "안녕하세요! 한국어로 대화할 수 있나요?"}] |
|
|
) |
|
|
print(response.choices[0].message.content) |
|
|
``` |
|
|
|
|
|
## Thinking Mode |
|
|
|
|
|
The model supports a "thinking mode" for complex reasoning tasks. Use the `<|thinking|>` token to trigger extended reasoning: |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{"role": "user", "content": "Solve this step by step: If x + 2y = 10 and 3x - y = 5, find x and y."} |
|
|
] |
|
|
# The model may produce <|thinking|>...</|thinking|> blocks with its reasoning process |
|
|
``` |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
- **Minimum**: 2x NVIDIA A100 40GB (with tensor parallelism) |
|
|
- **Recommended**: 2x NVIDIA A100 80GB or 4x NVIDIA A6000 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- This is a **text-only** model. It cannot process images or videos. |
|
|
- The model inherits any limitations from the original HyperCLOVAX-SEED-Think-32B. |
|
|
- Optimized primarily for Korean and English. |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the [HyperCLOVAX license](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE) from the original model. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original: |
|
|
|
|
|
```bibtex |
|
|
@misc{hyperclovax-seed-think-32b, |
|
|
title={HyperCLOVA X SEED Think 32B}, |
|
|
author={NAVER Cloud}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Reproduce This Extraction |
|
|
|
|
|
Want to extract the LLM yourself? Use the included [`extract_llm.py`](extract_llm.py) script. |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
```bash |
|
|
pip install safetensors torch tqdm huggingface_hub |
|
|
``` |
|
|
|
|
|
### Step 1: Download Original VLM (~66GB) |
|
|
|
|
|
```bash |
|
|
huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \ |
|
|
--local-dir ./HyperCLOVAX-SEED-Think-32B |
|
|
``` |
|
|
|
|
|
### Step 2: Run Extraction Script |
|
|
|
|
|
```bash |
|
|
# Download the extraction script |
|
|
wget https://huggingface.co/minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf/resolve/main/extract_llm.py |
|
|
|
|
|
# Run extraction |
|
|
python extract_llm.py \ |
|
|
--input ./HyperCLOVAX-SEED-Think-32B \ |
|
|
--output ./HyperCLOVAX-SEED-Text-Think-32B |
|
|
``` |
|
|
|
|
|
### What the Script Does |
|
|
|
|
|
1. **Extracts LLM weights**: Filters `model.language_model.*` tensors from the VLM |
|
|
2. **Remaps keys**: Converts to standard LLaMA format |
|
|
- `model.language_model.model.*` → `model.*` |
|
|
- `model.language_model.lm_head.*` → `lm_head.*` |
|
|
3. **Creates config**: Generates LLaMA-compatible `config.json` from VLM's `text_config` |
|
|
4. **Copies tokenizer**: Preserves all tokenizer files unchanged |
|
|
|
|
|
### Output Structure |
|
|
|
|
|
``` |
|
|
HyperCLOVAX-SEED-Text-Think-32B/ |
|
|
├── config.json # LLaMA config |
|
|
├── generation_config.json |
|
|
├── model-00001-of-00013.safetensors # ~5GB shards |
|
|
├── ... |
|
|
├── model-00013-of-00013.safetensors |
|
|
├── model.safetensors.index.json |
|
|
├── tokenizer.json |
|
|
├── tokenizer_config.json |
|
|
├── special_tokens_map.json |
|
|
├── added_tokens.json |
|
|
├── vocab.json |
|
|
├── merges.txt |
|
|
└── chat_template.jinja |
|
|
``` |
|
|
|
|
|
### Verify Extraction |
|
|
|
|
|
```bash |
|
|
# Quick test with vLLM |
|
|
vllm serve ./HyperCLOVAX-SEED-Text-Think-32B \ |
|
|
--dtype bfloat16 \ |
|
|
--tensor-parallel-size 2 |
|
|
|
|
|
# In another terminal |
|
|
curl http://localhost:8000/v1/chat/completions \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{"model": "./HyperCLOVAX-SEED-Text-Think-32B", "messages": [{"role": "user", "content": "Hello!"}]}' |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Original model by [NAVER Cloud HyperCLOVA X](https://huggingface.co/naver-hyperclovax) |
|
|
- Extraction performed to enable text-only inference without vision dependencies |
|
|
|