docs: recommend vLLM or SGLang for single-node serving
#3
by Jiminator - opened
- README.md +1 -21
- generation_config.json +2 -7
- tokenizer_config.json +3 -2
README.md
CHANGED
|
@@ -140,27 +140,7 @@ See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for ou
|
|
| 140 |
|
| 141 |
#### SGLang
|
| 142 |
|
| 143 |
-
Laguna M.1
|
| 144 |
-
|
| 145 |
-
> [!NOTE]
|
| 146 |
-
> Laguna support was added to SGLang in [sgl-project/sglang#24204](https://github.com/sgl-project/sglang/pull/24204). The integration is shared with [Laguna XS.2](https://huggingface.co/poolside/Laguna-XS.2) and is currently available on SGLang main.
|
| 147 |
-
|
| 148 |
-
```shell
|
| 149 |
-
# Laguna M.1 support is currently on SGLang main, so install from source
|
| 150 |
-
git clone https://github.com/sgl-project/sglang.git
|
| 151 |
-
cd sglang
|
| 152 |
-
pip install -e "python[all]"
|
| 153 |
-
|
| 154 |
-
sglang serve \
|
| 155 |
-
--trust-remote-code \
|
| 156 |
-
--model-path poolside/Laguna-M.1 \
|
| 157 |
-
--tool-call-parser poolside_v1 \
|
| 158 |
-
--reasoning-parser poolside_v1 \
|
| 159 |
-
--tp 8 \
|
| 160 |
-
--host 0.0.0.0
|
| 161 |
-
```
|
| 162 |
-
|
| 163 |
-
Quantized Laguna M.1 checkpoints are also available as [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4). SGLang reads the checkpoint `quantization_config`, so you can use the same launch command after replacing the model ID. For more SGLang-specific deployment details, see the [SGLang Cookbook](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-M.1).
|
| 164 |
|
| 165 |
#### Transformers
|
| 166 |
|
|
|
|
| 140 |
|
| 141 |
#### SGLang
|
| 142 |
|
| 143 |
+
Laguna M.1 is supported in SGLang. A full serving recipe will be added here; for now, build SGLang from `main`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
#### Transformers
|
| 146 |
|
generation_config.json
CHANGED
|
@@ -9,10 +9,5 @@
|
|
| 9 |
"pad_token_id": 9,
|
| 10 |
"temperature": 1.0,
|
| 11 |
"top_p": 1.0,
|
| 12 |
-
"min_p": 0.0
|
| 13 |
-
|
| 14 |
-
"reasoning_parser": "poolside_v1",
|
| 15 |
-
"default_chat_template_kwargs": {
|
| 16 |
-
"enable_thinking": true
|
| 17 |
-
}
|
| 18 |
-
}
|
|
|
|
| 9 |
"pad_token_id": 9,
|
| 10 |
"temperature": 1.0,
|
| 11 |
"top_p": 1.0,
|
| 12 |
+
"min_p": 0.0
|
| 13 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tokenizer_config.json
CHANGED
|
@@ -571,5 +571,6 @@
|
|
| 571 |
"pad_token": "〈|PAD|〉",
|
| 572 |
"sep_token": "〈|SEP|〉",
|
| 573 |
"tokenizer_class": "PreTrainedTokenizerFast",
|
| 574 |
-
"unk_token": "〈|UNK|〉"
|
| 575 |
-
}
|
|
|
|
|
|
| 571 |
"pad_token": "〈|PAD|〉",
|
| 572 |
"sep_token": "〈|SEP|〉",
|
| 573 |
"tokenizer_class": "PreTrainedTokenizerFast",
|
| 574 |
+
"unk_token": "〈|UNK|〉",
|
| 575 |
+
"chat_template": "{% include 'chat_template.jinja' %}"
|
| 576 |
+
}
|