Upload README.md with huggingface_hub

885d2d4 verified about 2 months ago

5.98 kB

	---
	license: other
	license_name: hyperclovax
	license_link: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE
	library_name: transformers
	base_model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
	tags:
	- llama
	- text-generation
	- korean
	- reasoning
	language:
	- ko
	- en
	pipeline_tag: text-generation
	---

	# HyperCLOVAX-SEED-Text-Think-32B

	Extracted text-only LLM from [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B)

	This model contains only the language model component extracted from the original Vision-Language Model (VLM). The vision encoder and multimodal projector have been removed, making it a pure text-to-text model compatible with standard LLaMA inference pipelines.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| LlamaForCausalLM \|
	\| Parameters \| ~33B \|
	\| Hidden Size \| 5120 \|
	\| Layers \| 72 \|
	\| Attention Heads \| 40 \|
	\| KV Heads \| 8 (GQA) \|
	\| Intermediate Size \| 24192 \|
	\| Context Length \| 128K \|
	\| Vocab Size \| 128,256 \|
	\| Precision \| bfloat16 \|
	\| RoPE Theta \| 50,000,000 \|

	## What Was Extracted

	The original VLM consists of:
	- Vision Encoder: Qwen2.5-VL based (~600M params) - removed
	- MM Projector: Multimodal projection layers - removed
	- Language Model: HyperCLOVAX LLM (~33B params) - extracted ✓

	Only the `model.language_model.*` weights were extracted and remapped to standard LLaMA format.

	## Usage

	### With Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype="bfloat16",
	device_map="auto"
	)

	messages = [{"role": "user", "content": "What is the capital of South Korea?"}]
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
	outputs = model.generate(inputs.to(model.device), max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### With vLLM

	```bash
	vllm serve minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf \
	--dtype bfloat16 \
	--tensor-parallel-size 2
	```

	```python
	from openai import OpenAI

	client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
	response = client.chat.completions.create(
	model="minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf",
	messages=[{"role": "user", "content": "안녕하세요! 한국어로 대화할 수 있나요?"}]
	)
	print(response.choices[0].message.content)
	```

	## Thinking Mode

	The model supports a "thinking mode" for complex reasoning tasks. Use the `<\|thinking\|>` token to trigger extended reasoning:

	```python
	messages = [
	{"role": "user", "content": "Solve this step by step: If x + 2y = 10 and 3x - y = 5, find x and y."}
	]
	# The model may produce <\|thinking\|>...</\|thinking\|> blocks with its reasoning process
	```

	## Hardware Requirements

	- Minimum: 2x NVIDIA A100 40GB (with tensor parallelism)
	- Recommended: 2x NVIDIA A100 80GB or 4x NVIDIA A6000

	## Limitations

	- This is a text-only model. It cannot process images or videos.
	- The model inherits any limitations from the original HyperCLOVAX-SEED-Think-32B.
	- Optimized primarily for Korean and English.

	## License

	This model inherits the [HyperCLOVAX license](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE) from the original model.

	## Citation

	If you use this model, please cite the original:

	```bibtex
	@misc{hyperclovax-seed-think-32b,
	title={HyperCLOVA X SEED Think 32B},
	author={NAVER Cloud},
	year={2025},
	url={https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B}
	}
	```

	## Reproduce This Extraction

	Want to extract the LLM yourself? Use the included [`extract_llm.py`](extract_llm.py) script.

	### Prerequisites

	```bash
	pip install safetensors torch tqdm huggingface_hub
	```

	### Step 1: Download Original VLM (~66GB)

	```bash
	huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \
	--local-dir ./HyperCLOVAX-SEED-Think-32B
	```

	### Step 2: Run Extraction Script

	```bash
	# Download the extraction script
	wget https://huggingface.co/minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf/resolve/main/extract_llm.py

	# Run extraction
	python extract_llm.py \
	--input ./HyperCLOVAX-SEED-Think-32B \
	--output ./HyperCLOVAX-SEED-Text-Think-32B
	```

	### What the Script Does

	1. Extracts LLM weights: Filters `model.language_model.*` tensors from the VLM
	2. Remaps keys: Converts to standard LLaMA format
	- `model.language_model.model.` → `model.`
	- `model.language_model.lm_head.` → `lm_head.`
	3. Creates config: Generates LLaMA-compatible `config.json` from VLM's `text_config`
	4. Copies tokenizer: Preserves all tokenizer files unchanged

	### Output Structure

	```
	HyperCLOVAX-SEED-Text-Think-32B/
	├── config.json # LLaMA config
	├── generation_config.json
	├── model-00001-of-00013.safetensors # ~5GB shards
	├── ...
	├── model-00013-of-00013.safetensors
	├── model.safetensors.index.json
	├── tokenizer.json
	├── tokenizer_config.json
	├── special_tokens_map.json
	├── added_tokens.json
	├── vocab.json
	├── merges.txt
	└── chat_template.jinja
	```

	### Verify Extraction

	```bash
	# Quick test with vLLM
	vllm serve ./HyperCLOVAX-SEED-Text-Think-32B \
	--dtype bfloat16 \
	--tensor-parallel-size 2

	# In another terminal
	curl http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{"model": "./HyperCLOVAX-SEED-Text-Think-32B", "messages": [{"role": "user", "content": "Hello!"}]}'
	```

	## Acknowledgments

	- Original model by [NAVER Cloud HyperCLOVA X](https://huggingface.co/naver-hyperclovax)
	- Extraction performed to enable text-only inference without vision dependencies