120 TPS on sglang - very nice indeed
🔥 1
#7 opened about 1 month ago
by
bbouldin
win10/SVD-Qwen3-Coder-Next-Thinking
5
#6 opened about 2 months ago
by
win10
Feel almost bad for asking this, but do you plan an 8bit version too?
1
#4 opened about 2 months ago
by
MrMoonsilver
Can we perform 4-bit quantization for the awq of the Step-3.5-Flash model? The VLLM can run it.
1
#3 opened about 2 months ago
by
lsm03624
模型量化的效果并不理想
5
#2 opened about 2 months ago
by
mediali
how to fix: KeyError: 'model.layers.30.mlp.shared_expert.gate_gate_up_proj.weight'
🔥 1
2
#1 opened about 2 months ago
by
kq