Tool Calling issue with stelterlab/Mistral-Small-24B-Instruct-2501-AWQ

#4
by sbhatt765 - opened

I am getting 400 while trying to get tool call working with this one -

Here is the docker config-
sudo docker run --runtime nvidia --gpus all
-v /data/Mistral-Small-24B-Instruct-2501-AWQ:/models/Mistral-Small-24B-Instruct-2501-AWQ
-p 8085:8085
--ipc=host
vllm/vllm-openai:latest
--model /models/Mistral-Small-24B-Instruct-2501-AWQ
--gpu-memory-utilization 0.85
--max-model-len 4096
--tokenizer-mode mistral
--tool-call-parser mistral
--enable-auto-tool-choice
--enforce-eager
--dtype half

let me know if the vLLM config looks alright, I am not using the chat-template flag(getting same error with that as well).

model_client = OpenAIChatCompletionClient(

 model="/models/Mistral-Small-24B-Instruct-2501-AWQ",
 base_url="http://localhost:8085/v1",

api_key="EMPTY",
model_info={
        "vision": False,
        "function_calling": True,
        "json_output": True,
        "family": "unknown",
    },

)

Hi!

You might get faster response when you are asking at the source (of the models) or on the discussions of vLLM.

Try to call it with:

--tokenizer mistralai/Mistral-Small-24B-Instruct-2501

see also https://github.com/vllm-project/vllm/discussions/12749

Sign up or log in to comment