Use chat_template.jinja as the single source: drop the {% include %} chat_template field from tokenizer_config.json ebba672 verified joerowell commited on 3 days ago
Note FP8 KV cache needs vLLM 0.22.0; drop scrambled-output workaround (vllm#42650) 8af858a joerowell commited on 29 days ago
Drop VLLM_USE_DEEP_GEMM=0 from vllm serve recipe (DeepGEMM is supported on Hopper and datacenter Blackwell) 8c57f62 verified joerowell commited on May 20
Enable thinking by default in non-Hopper FP8-KV serve command c7a758e verified joerowell commited on Apr 29
Update non-Hopper FP8-KV serve command and link to vLLM recipes page 2c0d22b verified joerowell commited on Apr 29