docs: recommend vLLM or SGLang for single-node serving

by Jiminator - opened 14 days ago

←

Files changed (3) hide show

README.md CHANGED Viewed

@@ -140,27 +140,7 @@ See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for ou
 #### SGLang
-Laguna M.1 can be served with SGLang using its OpenAI-compatible server, including support for tool calling, streaming responses, and reasoning parsing:
-> [!NOTE]
-> Laguna support was added to SGLang in [sgl-project/sglang#24204](https://github.com/sgl-project/sglang/pull/24204). The integration is shared with [Laguna XS.2](https://huggingface.co/poolside/Laguna-XS.2) and is currently available on SGLang main.
-```shell
-# Laguna M.1 support is currently on SGLang main, so install from source
-git clone https://github.com/sgl-project/sglang.git
-cd sglang
-pip install -e "python[all]"
-sglang serve \
-    --trust-remote-code \
-    --model-path poolside/Laguna-M.1 \
-    --tool-call-parser poolside_v1 \
-    --reasoning-parser poolside_v1 \
-    --tp 8 \
-    --host 0.0.0.0
-```
-Quantized Laguna M.1 checkpoints are also available as [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4). SGLang reads the checkpoint `quantization_config`, so you can use the same launch command after replacing the model ID. For more SGLang-specific deployment details, see the [SGLang Cookbook](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-M.1).
 #### Transformers

 #### SGLang
+Laguna M.1 is supported in SGLang. A full serving recipe will be added here; for now, build SGLang from `main`.
 #### Transformers

generation_config.json CHANGED Viewed

@@ -9,10 +9,5 @@
   "pad_token_id": 9,
   "temperature": 1.0,
   "top_p": 1.0,
-  "min_p": 0.0,
-  "tool_call_parser": "poolside_v1",
-  "reasoning_parser": "poolside_v1",
-  "default_chat_template_kwargs": {
-    "enable_thinking": true
-  }
-}

   "pad_token_id": 9,
   "temperature": 1.0,
   "top_p": 1.0,
+  "min_p": 0.0
+}

tokenizer_config.json CHANGED Viewed

@@ -571,5 +571,6 @@
   "pad_token": "〈|PAD|〉",
   "sep_token": "〈|SEP|〉",
   "tokenizer_class": "PreTrainedTokenizerFast",
-  "unk_token": "〈|UNK|〉"
-}

   "pad_token": "〈|PAD|〉",
   "sep_token": "〈|SEP|〉",
   "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "〈|UNK|〉",
+  "chat_template": "{% include 'chat_template.jinja' %}"
+}