Question about vLLM

#1
by qingy2024 - opened

Hi!

I saw a while ago you posted this question when trying to run the fine-tuned mistral-small-3.2 in vLLM:
https://discuss.vllm.ai/t/mistral-small-3-2-finetune-errors-out-there-is-no-module-or-parameter-named-language-model-in-llamaforcausallm/1764
I was wondering if you eventually found a fix for it? I am running into the same exact issue.

Thanks

Hey there, and thanks for reaching out.

The simple answer is that, due to this bug, I stopped using vLLM. I did not get it working and nobody was able to help, unfortunately.

I since switched to llama.cpp because it does what it's supposed to do. I am sure that there is a simple fix for vLLM, and that using vLLM would've probably been better, but it's weird that running such a popular language model on such a popular inference engine is so stupidly hard, so I didn't want to bother.

Switching to llama.cpp meant a lot of extra work for me. I since started maintaining a serverless endpoint repository for the engine on the RunPod hub (so if you're using RunPod, check https://console.runpod.io/hub/Jacob-ML/inference-worker). Community support and adoption of llama.cpp is bigger because it is a way more popular inference engine anyway, thus getting help is much easier.

Please keep me updated if you find a fix! That would be of great help.

Good luck :) would be so cool if anyone from vLLM could help with that issue...

Thanks for getting back to me!

That's unfortunate that you didn't find a fix; I was asking around in a few Discord servers. I think the issue is that the vLLM implementation for Mistral models expects them all to be in the custom Mistral config/weights/tokenizer that the official models are uploaded as. Fine-tuned models look like the more standard HuggingFace/Transformers format which I think trips up vLLM's logic.

Anyways, I'll let you know if I figure something out. In the meantime, I guess llama.cpp is an option...

qingy2024 changed discussion status to closed

Sign up or log in to comment