mconcat/Qwopus3.5-27B-v3-FP8-Dynamic · My recipe for deployment

My recipe for deployment

by allanchan339 - opened 11 days ago

Using vllm, u need to manually install transformers>=5.3 to fit new RoPE embedding.

My hardware: RTX3090 + RTX4090
deployment script:

# Enable memory profiler to estimate CUDA graphs v0.19 functionality
export VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1
export MODEL_NAME="mconcat/Qwopus3.5-27B-v3-FP8-Dynamic"
# Start vLLM with reduced swap space
vllm serve $MODEL_NAME \
  --served-model-name vllm/Qwen3.5-27B \
  --trust-remote-code \
  --tensor-parallel-size 2 \
  --max-model-len 219520 \
  --gpu-memory-utilization 0.92 \
  --enable-auto-tool-choice \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --max-num-batched-tokens 4096 \
  --max-num-seqs 4 \
  --kv-cache-dtype fp8 \
  --tool-call-parser hermes \
  --reasoning-parser qwen3 \
  --no-use-tqdm-on-load \
  --host 0.0.0.0 \
  --port 8000 \
  --language-model-only

tested on opencode, tool calling + thinking working fine

allanchan339

11 days ago

•

edited 11 days ago

I feel the stability is a bit improved, the original 27B fail and melfunctioned on tool calling, but this one is fine so far. However i havent try the long context conversation / agentic coding yet. Any one have data on it?

mrwd2005

9 days ago

Thanks for sharing!
Quick question — why use --language-model-only? This model supports vision (image-text-to-text).

raycast

8 days ago

I use the Claude code. After running a few rounds, it would suddenly stop and then say "Continue" before resuming the run.

allanchan339

8 days ago

Thanks for sharing!
Quick question — why use --language-model-only? This model supports vision (image-text-to-text).

Simply ofcaz i dont need the image part, and it can save some RAM by stopping it

allanchan339

8 days ago

I use the Claude code. After running a few rounds, it would suddenly stop and then say "Continue" before resuming the run.

Do you means Claude code will automatically type "Continue" in the text box and let it run? Or you have to manually type "Continue" to let this model run ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment