Commit History

Use chat_template.jinja as the single source: drop the {% include %} chat_template field from tokenizer_config.json
ebba672
verified

joerowell commited on

small self-contained fixes
e163f76
verified

joerowell commited on

Note FP8 KV cache needs vLLM 0.22.0; drop scrambled-output workaround (vllm#42650)
8af858a

joerowell commited on

update sampling parameters to match evals
451ad2d

joerowell commited on

Update model card for 256K context length
3219161
verified

varunrandery commited on

increase context length to 256k
2d55328

joerowell commited on

Drop VLLM_USE_DEEP_GEMM=0 from vllm serve recipe (DeepGEMM is supported on Hopper and datacenter Blackwell)
8c57f62
verified

joerowell commited on

Enable thinking by default in non-Hopper FP8-KV serve command
c7a758e
verified

joerowell commited on

Update non-Hopper FP8-KV serve command and link to vLLM recipes page
2c0d22b
verified

joerowell commited on

Laguna XS.2 upload
f82b43d

joerowell commited on