docs: recommend vLLM or SGLang for single-node serving

#3
by Jiminator - opened
Files changed (3) hide show
  1. README.md +1 -21
  2. generation_config.json +2 -7
  3. tokenizer_config.json +3 -2
README.md CHANGED
@@ -140,27 +140,7 @@ See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for ou
140
 
141
  #### SGLang
142
 
143
- Laguna M.1 can be served with SGLang using its OpenAI-compatible server, including support for tool calling, streaming responses, and reasoning parsing:
144
-
145
- > [!NOTE]
146
- > Laguna support was added to SGLang in [sgl-project/sglang#24204](https://github.com/sgl-project/sglang/pull/24204). The integration is shared with [Laguna XS.2](https://huggingface.co/poolside/Laguna-XS.2) and is currently available on SGLang main.
147
-
148
- ```shell
149
- # Laguna M.1 support is currently on SGLang main, so install from source
150
- git clone https://github.com/sgl-project/sglang.git
151
- cd sglang
152
- pip install -e "python[all]"
153
-
154
- sglang serve \
155
- --trust-remote-code \
156
- --model-path poolside/Laguna-M.1 \
157
- --tool-call-parser poolside_v1 \
158
- --reasoning-parser poolside_v1 \
159
- --tp 8 \
160
- --host 0.0.0.0
161
- ```
162
-
163
- Quantized Laguna M.1 checkpoints are also available as [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4). SGLang reads the checkpoint `quantization_config`, so you can use the same launch command after replacing the model ID. For more SGLang-specific deployment details, see the [SGLang Cookbook](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-M.1).
164
 
165
  #### Transformers
166
 
 
140
 
141
  #### SGLang
142
 
143
+ Laguna M.1 is supported in SGLang. A full serving recipe will be added here; for now, build SGLang from `main`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
  #### Transformers
146
 
generation_config.json CHANGED
@@ -9,10 +9,5 @@
9
  "pad_token_id": 9,
10
  "temperature": 1.0,
11
  "top_p": 1.0,
12
- "min_p": 0.0,
13
- "tool_call_parser": "poolside_v1",
14
- "reasoning_parser": "poolside_v1",
15
- "default_chat_template_kwargs": {
16
- "enable_thinking": true
17
- }
18
- }
 
9
  "pad_token_id": 9,
10
  "temperature": 1.0,
11
  "top_p": 1.0,
12
+ "min_p": 0.0
13
+ }
 
 
 
 
 
tokenizer_config.json CHANGED
@@ -571,5 +571,6 @@
571
  "pad_token": "〈|PAD|〉",
572
  "sep_token": "〈|SEP|〉",
573
  "tokenizer_class": "PreTrainedTokenizerFast",
574
- "unk_token": "〈|UNK|〉"
575
- }
 
 
571
  "pad_token": "〈|PAD|〉",
572
  "sep_token": "〈|SEP|〉",
573
  "tokenizer_class": "PreTrainedTokenizerFast",
574
+ "unk_token": "〈|UNK|〉",
575
+ "chat_template": "{% include 'chat_template.jinja' %}"
576
+ }