mratsim/MiniMax-M2.5-FP8-INT4-AWQ · tool hallucinations and bad tool calls

tool hallucinations and bad tool calls

by jcowles - opened Mar 7

Mar 7

when I run this model (or the bf16 version), I get hallucinated tools and the tool calls show up in the output stream

This is an example of what it returns when I ask what tools it has:

File operations: Read, write, and search through files in the project
Terminal/bash: Execute shell commands
Web browsing/searching: Only when explicitly requested

It has none of these.

This is what output looks like when it attempts to call one of these non-existing tools:

minimax:tool_call
<invoke name="cli-mcp-server_run_command">
<parameter name="command">ls -la</parameter>
</invoke>
</minimax:tool_call>

dehnhaide

Mar 7

Care to also show your vllm command line to get a proper understanding of what's going on? I have used the model extensively and aside from the famous infamous "plurals" bug, so far this quant of mratsim's has been flawless!

jcowles

Mar 7

•

edited Mar 7

This is my fp8 config, the kv cache dtype is removed for bf16:

SAMPLER_OVERRIDE='{"temperature": 1, "top_p": 0.95, "top_k": 40, "repetition_penalty": 1.1, "frequency_penalty": 0.40}'

vllm serve /models/MiniMax-M2.5-FP8-INT4-AWQ \
     --gpu-memory-utilization 0.92 \
     --enable-expert-parallel \
     --kv-cache-dtype fp8_e4m3 \
     --override-generation-config "${SAMPLER_OVERRIDE}" \
     --max-model-len 195000 \
     --async-scheduling \
     --no-enable-prefix-caching \
     --tensor-parallel-size 4 \
     --enable-auto-tool-choice --tool-call-parser minimax_m2 \
     --reasoning-parser minimax_m2 \
     --served-model-name minimax-m2.5 \
     --compilation-config '{"pass_config":{"fuse_allreduce_rms":false}}' \
     --port 9000

jcowles

Mar 7

Running on 4x h100

dehnhaide

Mar 7

•

edited Mar 7

8x RTX3090 (p2p enabled - patched vllm) --> vllm 0.15.1 goes all fine with:

vllm serve mratsim/MiniMax-M2.5-FP8-INT4-AWQ--served-model-name "MiniMax-M2.5-FP8-INT4-AWQ"
--tensor-parallel-size 8
--max-model-len 196608
--gpu-memory-utilization 0.90
--max-num-seqs 4
--kv-cache-dtype fp8
--reasoning-parser minimax_m2
--tool-call-parser minimax_m2
--enable-auto-tool-choice
--host 0.0.0.0
--port 5005
--disable-log-requests
--disable-uvicorn-access-log
--swap-space 8
--override-generation-config '{"temperature": 1, "top_p": 0.95, "top_k": 40, "repetition_penalty": 1.1, "frequency_penalty": 0.40}'
--trust-remote-code

Not sure which vllm version you're running (probably a newer) that you can skip "--trust-remote-code"

jcowles

Mar 9

I was running vllm nightly from a few days before my post

jcowles

Mar 9

when you say patched vllm, what are you referring to?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment