Note FP8 KV cache needs vLLM 0.22.0; drop scrambled-output workaround (vllm#42650) 571346d joerowell commited on 4 days ago
Drop VLLM_USE_DEEP_GEMM=0 from vllm serve recipe (DeepGEMM is supported on Hopper and datacenter Blackwell) 514daf4 verified joerowell commited on 19 days ago
Enable thinking by default in non-Hopper FP8-KV serve command 62a5860 verified joerowell commited on Apr 29
Update non-Hopper FP8-KV serve command and link to vLLM recipes page 92f8b44 verified joerowell commited on Apr 29