Instructions to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
- SGLang
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with Docker Model Runner:
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
Reproducible leading-space digit token sensitivity in GSM8K-style prompts
Hi, thank you for releasing HyperCLOVAX-SEED-Think-14B.
While reproducing TinyGSM8K / GSM8K-style arithmetic evaluations, I observed a reproducible sensitivity related to leading-space digit single-token forms such as " 1", " 2", " 4", and " 5".
This does not appear to be a prompt text corruption issue. The visible input text, tokenizer decode output, and API payload preserved the original quantities correctly. However, during generation, quantities encoded as leading-space digit single tokens were sometimes reconstructed as unrelated placeholder-like strings or special-token-like text.
Example 1
Input text:
Rory orders 2 subs for $7.50 each, 2 bags of chips for $1.50 each, and 2 cookies for $1.00 each.
Token observation:
" 2" -> 109896
The token appeared three times in the prompt.
Observed generation behavior:
the problem mentions "dog subs", "dog bags of chips", and "dog cookies"
"dog" here must be a variable representing a quantity
Assuming "dog" means a certain quantity ...
Expected answer:
29
Example 2
Input text:
To make 1 liter of juice, Sam needs 5 kilograms of oranges ... make 4 liters ...
Observed token forms:
" 1", " 5", and " 4" were encoded as leading-space digit tokens
Observed generation behavior:
to make<|stop|> liters of juice
Sam needs<|stop|> kilograms of oranges
needs.webdriver kilograms
Expected answer:
60
Checks performed
I tried to isolate where the issue comes from.
- HF tokenizer encode/decode preserved the visible quantities such as
2 subs,2 bags, and2 cookies. - The lm-eval / OpenAI-compatible API payload preserved the same quantities in
messages[0].content. - The vLLM server-generated
prompt_textandprompt_token_idsmatched the HF tokenizer output. - HF Transformers generation reproduced the same type of failure.
- vLLM offline generation with directly supplied HF
prompt_token_idsproduced the same generated token IDs as HF. - Therefore, this does not seem to be caused only by lm-eval payload construction, tokenizer decode corruption, or the vLLM OpenAI wrapper.
Mitigation tested
I tested a decode-equivalent prompt token ID rewrite for leading-space digit tokens.
Example:
Original tokenization:
" 2" -> [109896]
Rewritten tokenization:
" 2" -> [220, 17]
decode(original) == decode(rewritten) == " 2"
The visible prompt text is unchanged. Only the prompt token IDs are changed from a leading-space digit single token to a space token plus a digit token.
This mitigation improved answer stability in GSM8K-style prompts in my reproduction.
The rewrite map used was:
" 0" 109658 -> [220, 15]
" 1" 103647 -> [220, 16]
" 2" 109896 -> [220, 17]
" 3" 103590 -> [220, 18]
" 4" 101709 -> [220, 19]
" 5" 110217 -> [220, 20]
" 6" 105778 -> [220, 21]
" 7" 103650 -> [220, 22]
" 8" 101994 -> [220, 23]
" 9" 102409 -> [220, 24]
Environment
The issue was reproduced in the following setup:
- Model:
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B - Evaluation: TinyGSM8K / GSM8K-style arithmetic prompts
- Inference paths tested:
- Hugging Face Transformers generation
- vLLM OpenAI-compatible server
- vLLM offline generation with directly supplied HF
prompt_token_ids
- Serving / inference engine: vLLM OpenAI-compatible server and vLLM offline engine
- vLLM version observed in response metadata:
vllm-0.22.0-tp4-85d53a4e - Tensor parallelism: TP=4
- Decoding for deterministic path checks: greedy /
temperature=0 - Additional lm-eval runs also reproduced the issue under non-greedy settings
- Stop strings tested:
["<|im_end|><|endofturn|>", "<|im_end|><|stop|>"] skip_special_tokens:falsefor special-token leakage checks- Tokenizer check: HF tokenizer encode/decode preserved the visible prompt text
Question
Is this behavior expected or already known for this model/tokenizer combination?
If there is an official recommended evaluation setup for GSM8K-style arithmetic prompts, especially regarding prompt tokenization, stop strings, and answer extraction, I would appreciate guidance.
If useful, I can also open a model card PR documenting this as a benchmark reproduction caveat.
I also prepared a small public reproduction repository with the summary artifacts, rewrite map, symptom screenshot, and a minimal token-id rewrite utility:
https://github.com/skykang25/hcx-space-digit-repro
This repo does not contain full benchmark outputs or private environment details. It is intended only as a compact reproduction note for this discussion.