poolside
/

Laguna-XS.2-INT4

Text Generation

compressed-tensors

Model card Files Files and versions

Laguna-XS.2-INT4

Commit History

Use chat_template.jinja as the single source: drop the {% include %} chat_template field from tokenizer_config.json

ebba672
verified

joerowell commited on 3 days ago

small self-contained fixes

e163f76
verified

joerowell commited on 15 days ago

Note FP8 KV cache needs vLLM 0.22.0; drop scrambled-output workaround (vllm#42650)

8af858a

joerowell commited on 29 days ago

update sampling parameters to match evals

451ad2d

joerowell commited on May 27

Update model card for 256K context length

3219161
verified

varunrandery commited on May 26

increase context length to 256k

2d55328

joerowell commited on May 26

Drop VLLM_USE_DEEP_GEMM=0 from vllm serve recipe (DeepGEMM is supported on Hopper and datacenter Blackwell)

8c57f62
verified

joerowell commited on May 20

Add base_model (#1)

f14ef97

mgoin commited on May 10

Enable thinking by default in non-Hopper FP8-KV serve command

c7a758e
verified

joerowell commited on Apr 29

Update non-Hopper FP8-KV serve command and link to vLLM recipes page

2c0d22b
verified

joerowell commited on Apr 29

Laguna XS.2 upload

f82b43d

joerowell commited on Apr 28