How many tokens/s are you seeing on those two 6000 Blackwell's?

by zelias - opened Nov 26, 2025

Nov 26, 2025

I've tried running this model on a single pro 6000, but the tokens/s are terrible with the latest vllm image.
About 58-60 t/s from the vllm console with AWQ 4bit. Wondering if I should try this one for a meaningful increase.
GLM4.5 Air 4bit AWQ does 180/s, which is pretty insane difference, considering their respective sizes.

Firworks

Owner Nov 27, 2025

I grabbed a 2 x RTX Pro 6000 Blackwell instance and loaded it up. I'm seeing ~106 tok/s running with the command on my model card. I'm actually surprised there would really be much of a difference running 2 vs 1 with this model since it should easily fit in a single RTX Pro 6000.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment