Update README.md
Browse files
README.md
CHANGED
|
@@ -84,4 +84,15 @@ When loading the raw model via transformers then quantizing and saving, transfor
|
|
| 84 |
config to be missing critical values (like tie_word_embeddings). This was patched in vLLM for InternVL models (https://github.com/vllm-project/vllm/pull/19992) but
|
| 85 |
remains for Skywork still, and will hopefully be resolved soon.
|
| 86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
*Quantized with ❤️ using LLM Compressor for the open-source community*
|
|
|
|
| 84 |
config to be missing critical values (like tie_word_embeddings). This was patched in vLLM for InternVL models (https://github.com/vllm-project/vllm/pull/19992) but
|
| 85 |
remains for Skywork still, and will hopefully be resolved soon.
|
| 86 |
|
| 87 |
+
## vLLM Reasoning Parsing issues
|
| 88 |
+
See: https://github.com/vllm-project/vllm/pull/21041
|
| 89 |
+
See: https://github.com/SkyworkAI/Skywork-R1V/issues/42
|
| 90 |
+
|
| 91 |
+
Due to Skywork models not using a single `<think></think>` token in the tokenizer, vLLM struggles to parse out the reasoning. Additionally,
|
| 92 |
+
the chat chonfig for Skywork is `'<|im_start|>assistant\n<think>\n'` and includes the first `<think>` token so your generation output may
|
| 93 |
+
not even include the first `<think>` token and only output `</think>`. There is ongoing work to add a string-based reasoning parser to vLLM
|
| 94 |
+
that will allow for parsing out the `<think></think>` outputs as strings (multi-tokens) as a workaround to this issue.
|
| 95 |
+
|
| 96 |
+
The Skywork team has mentioned that they will be utilizing single-token `<think>` in the next model version so this wont be an issue moving forward.
|
| 97 |
+
|
| 98 |
*Quantized with ❤️ using LLM Compressor for the open-source community*
|