Comment about config issues
Browse files
README.md
CHANGED
|
@@ -73,4 +73,13 @@ torch==2.7.0
|
|
| 73 |
vllm==0.9.2
|
| 74 |
```
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
*Quantized with ❤️ using LLM Compressor for the open-source community*
|
|
|
|
| 73 |
vllm==0.9.2
|
| 74 |
```
|
| 75 |
|
| 76 |
+
## vLLM Workaround for FP8
|
| 77 |
+
See: https://github.com/vllm-project/vllm/issues/19876
|
| 78 |
+
|
| 79 |
+
Currently, the Skywork Chat config (https://github.com/vllm-project/vllm/blob/e8cc53af5e17205470c04f442e67f276e08623a1/vllm/transformers_utils/configs/skyworkr1v.py#L14)
|
| 80 |
+
uses a custom config (not a standard AutoConfig from transformers ), which doesn't take advantage of all the default values that the AutoConfig uses.
|
| 81 |
+
When loading the raw model via transformers then quantizing and saving, transformers doesn't save out default values to the config, causing the saved
|
| 82 |
+
config to be missing critical values (like tie_word_embeddings). This was patched in vLLM for InternVL models (https://github.com/vllm-project/vllm/pull/19992) but
|
| 83 |
+
remains for Skywork still, and will hopefully be resolved soon.
|
| 84 |
+
|
| 85 |
*Quantized with ❤️ using LLM Compressor for the open-source community*
|