Model response is not generating opening <think> tag
#3
by bmiles - opened
Hi I'm having an issue where the model response is not generating the tag. I am running the model with vLLM and docker see below.
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4 \
--tensor-parallel-size 1 \
--max-num-seqs 64 \
--max-model-len 131072 \
--trust-remote-code \
--mamba_ssm_cache_dtype float32
Example response:
curl -s http://xxx:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4",
"messages": [
{"role": "system", "content": "/think"},
{"role": "user", "content": "Hello"}
],
"add_generation_prompt": true
}' | jq -r '.choices[0].message.content'
Okay, the user just said "Hello". I should respond politely. Let me greet them back and ask how I can assist. Keep it friendly and open-ended.
</think>
Hello! How can I assist you today? π
I also see the following in the server logs:
(APIServer pid=1) INFO 11-23 09:10:02 [chat_utils.py:560] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.`