--- license: other license_name: hyperclovax license_link: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE library_name: transformers base_model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B tags: - llama - text-generation - korean - reasoning language: - ko - en pipeline_tag: text-generation --- # HyperCLOVAX-SEED-Text-Think-32B **Extracted text-only LLM from [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B)** This model contains only the language model component extracted from the original Vision-Language Model (VLM). The vision encoder and multimodal projector have been removed, making it a pure text-to-text model compatible with standard LLaMA inference pipelines. ## Model Details | Property | Value | |----------|-------| | Architecture | LlamaForCausalLM | | Parameters | ~33B | | Hidden Size | 5120 | | Layers | 72 | | Attention Heads | 40 | | KV Heads | 8 (GQA) | | Intermediate Size | 24192 | | Context Length | 128K | | Vocab Size | 128,256 | | Precision | bfloat16 | | RoPE Theta | 50,000,000 | ## What Was Extracted The original VLM consists of: - **Vision Encoder**: Qwen2.5-VL based (~600M params) - **removed** - **MM Projector**: Multimodal projection layers - **removed** - **Language Model**: HyperCLOVAX LLM (~33B params) - **extracted** ✓ Only the `model.language_model.*` weights were extracted and remapped to standard LLaMA format. ## Usage ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="bfloat16", device_map="auto" ) messages = [{"role": "user", "content": "What is the capital of South Korea?"}] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) outputs = model.generate(inputs.to(model.device), max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### With vLLM ```bash vllm serve minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf \ --dtype bfloat16 \ --tensor-parallel-size 2 ``` ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy") response = client.chat.completions.create( model="minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf", messages=[{"role": "user", "content": "안녕하세요! 한국어로 대화할 수 있나요?"}] ) print(response.choices[0].message.content) ``` ## Thinking Mode The model supports a "thinking mode" for complex reasoning tasks. Use the `<|thinking|>` token to trigger extended reasoning: ```python messages = [ {"role": "user", "content": "Solve this step by step: If x + 2y = 10 and 3x - y = 5, find x and y."} ] # The model may produce <|thinking|>... blocks with its reasoning process ``` ## Hardware Requirements - **Minimum**: 2x NVIDIA A100 40GB (with tensor parallelism) - **Recommended**: 2x NVIDIA A100 80GB or 4x NVIDIA A6000 ## Limitations - This is a **text-only** model. It cannot process images or videos. - The model inherits any limitations from the original HyperCLOVAX-SEED-Think-32B. - Optimized primarily for Korean and English. ## License This model inherits the [HyperCLOVAX license](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE) from the original model. ## Citation If you use this model, please cite the original: ```bibtex @misc{hyperclovax-seed-think-32b, title={HyperCLOVA X SEED Think 32B}, author={NAVER Cloud}, year={2025}, url={https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B} } ``` ## Reproduce This Extraction Want to extract the LLM yourself? Use the included [`extract_llm.py`](extract_llm.py) script. ### Prerequisites ```bash pip install safetensors torch tqdm huggingface_hub ``` ### Step 1: Download Original VLM (~66GB) ```bash huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \ --local-dir ./HyperCLOVAX-SEED-Think-32B ``` ### Step 2: Run Extraction Script ```bash # Download the extraction script wget https://huggingface.co/minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf/resolve/main/extract_llm.py # Run extraction python extract_llm.py \ --input ./HyperCLOVAX-SEED-Think-32B \ --output ./HyperCLOVAX-SEED-Text-Think-32B ``` ### What the Script Does 1. **Extracts LLM weights**: Filters `model.language_model.*` tensors from the VLM 2. **Remaps keys**: Converts to standard LLaMA format - `model.language_model.model.*` → `model.*` - `model.language_model.lm_head.*` → `lm_head.*` 3. **Creates config**: Generates LLaMA-compatible `config.json` from VLM's `text_config` 4. **Copies tokenizer**: Preserves all tokenizer files unchanged ### Output Structure ``` HyperCLOVAX-SEED-Text-Think-32B/ ├── config.json # LLaMA config ├── generation_config.json ├── model-00001-of-00013.safetensors # ~5GB shards ├── ... ├── model-00013-of-00013.safetensors ├── model.safetensors.index.json ├── tokenizer.json ├── tokenizer_config.json ├── special_tokens_map.json ├── added_tokens.json ├── vocab.json ├── merges.txt └── chat_template.jinja ``` ### Verify Extraction ```bash # Quick test with vLLM vllm serve ./HyperCLOVAX-SEED-Text-Think-32B \ --dtype bfloat16 \ --tensor-parallel-size 2 # In another terminal curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "./HyperCLOVAX-SEED-Text-Think-32B", "messages": [{"role": "user", "content": "Hello!"}]}' ``` ## Acknowledgments - Original model by [NAVER Cloud HyperCLOVA X](https://huggingface.co/naver-hyperclovax) - Extraction performed to enable text-only inference without vision dependencies