Thanks!
#1
by kbuettner - opened
Hey, I just wanted to thank you for making this work available. I've been using it as my daily driver using 2 x NVIDIA RTX Pro 6000 Blackwell for several weeks now.
In case it's of benefit to anyone else, here is the command that I'm using to start it:
VLLM_MARLIN_USE_ATOMIC_ADD=1 vllm serve
AImhotep/GLM-4.7-REAP-265B-mixed-AutoRound
-tp 2
--max-num-seqs 8
--gpu-memory-utilization 0.95
--trust_remote_code
--reasoning-parser glm45
--tool-call-parser glm47
--enable-auto-tool-choice
--kv-cache-dtype fp8 --calculate-kv-scales
--kv_offloading_backend native --kv_offloading_size 80
--disable-hybrid-kv-cache-manager
--max-cudagraph-capture-size 32