Model card: add license + Transformers/TRT-LLM deployment; add LICENSE.md
#2
by joerowell - opened
- .eval_results/swe-bench_pro.yaml +0 -7
- .eval_results/swe-bench_verified.yaml +0 -7
- .eval_results/terminal-bench-2.0.yaml +0 -7
- README.md +3 -28
- config.json +2 -2
- generation_config.json +2 -7
.eval_results/swe-bench_pro.yaml
DELETED
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: ScaleAI/SWE-bench_Pro
|
| 3 |
-
task_id: SWE_Bench_Pro
|
| 4 |
-
value: 49.2
|
| 5 |
-
source:
|
| 6 |
-
url: https://huggingface.co/poolside/Laguna-M.1
|
| 7 |
-
name: Model Card
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/swe-bench_verified.yaml
DELETED
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: SWE-bench/SWE-bench_Verified
|
| 3 |
-
task_id: swe_bench_%_resolved
|
| 4 |
-
value: 74.6
|
| 5 |
-
source:
|
| 6 |
-
url: https://huggingface.co/poolside/Laguna-M.1
|
| 7 |
-
name: Model Card
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/terminal-bench-2.0.yaml
DELETED
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: harborframework/terminal-bench-2.0
|
| 3 |
-
task_id: terminalbench_2
|
| 4 |
-
value: 45.8
|
| 5 |
-
source:
|
| 6 |
-
url: https://huggingface.co/poolside/Laguna-M.1
|
| 7 |
-
name: Model Card
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -7,7 +7,6 @@ extra_gated_description: >-
|
|
| 7 |
tags:
|
| 8 |
- laguna-m.1
|
| 9 |
- vllm
|
| 10 |
-
- sglang
|
| 11 |
- bf16
|
| 12 |
- moe
|
| 13 |
license: apache-2.0
|
|
@@ -28,7 +27,7 @@ pipeline_tag: text-generation
|
|
| 28 |
|
| 29 |
# Laguna M.1
|
| 30 |
|
| 31 |
-
Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work.
|
| 32 |
|
| 33 |
> [!NOTE]
|
| 34 |
> For more details on how we trained this model, including our Model Factory approach, post-training recipe, async off-policy agent RL, and evaluations, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive) and [technical report](https://poolside.ai/assets/laguna/laguna-m1-xs2-technical-report.pdf).
|
|
@@ -87,7 +86,7 @@ Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated
|
|
| 87 |
|
| 88 |
## Usage
|
| 89 |
|
| 90 |
-
Laguna M.1 has upstream support in vLLM
|
| 91 |
|
| 92 |
### pool
|
| 93 |
|
|
@@ -138,36 +137,12 @@ vllm serve \
|
|
| 138 |
|
| 139 |
See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for our Laguna XS.2 model with which the implementation is shared for additional deployment guidance. FP8 and NVFP4 quantized checkpoints are available at [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4); quantization is detected automatically from `quantization_config`, so the same command works with the model ID substituted.
|
| 140 |
|
| 141 |
-
#### SGLang
|
| 142 |
-
|
| 143 |
-
Laguna M.1 can be served with SGLang using its OpenAI-compatible server, including support for tool calling, streaming responses, and reasoning parsing:
|
| 144 |
-
|
| 145 |
-
> [!NOTE]
|
| 146 |
-
> Laguna support was added to SGLang in [sgl-project/sglang#24204](https://github.com/sgl-project/sglang/pull/24204). The integration is shared with [Laguna XS.2](https://huggingface.co/poolside/Laguna-XS.2) and is currently available on SGLang main.
|
| 147 |
-
|
| 148 |
-
```shell
|
| 149 |
-
# Laguna M.1 support is currently on SGLang main, so install from source
|
| 150 |
-
git clone https://github.com/sgl-project/sglang.git
|
| 151 |
-
cd sglang
|
| 152 |
-
pip install -e "python[all]"
|
| 153 |
-
|
| 154 |
-
sglang serve \
|
| 155 |
-
--trust-remote-code \
|
| 156 |
-
--model-path poolside/Laguna-M.1 \
|
| 157 |
-
--tool-call-parser poolside_v1 \
|
| 158 |
-
--reasoning-parser poolside_v1 \
|
| 159 |
-
--tp 8 \
|
| 160 |
-
--host 0.0.0.0
|
| 161 |
-
```
|
| 162 |
-
|
| 163 |
-
Quantized Laguna M.1 checkpoints are also available as [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4). SGLang reads the checkpoint `quantization_config`, so you can use the same launch command after replacing the model ID. For more SGLang-specific deployment details, see the [SGLang Cookbook](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-M.1).
|
| 164 |
-
|
| 165 |
#### Transformers
|
| 166 |
|
| 167 |
Laguna is supported in Transformers `v5.7.0` and later ([huggingface/transformers#45673](https://github.com/huggingface/transformers/pull/45673)).
|
| 168 |
|
| 169 |
> [!NOTE]
|
| 170 |
-
> Laguna M.1 is a 225B-parameter model; loading the BF16 checkpoint in Transformers requires substantial multi-GPU memory (`device_map="auto"` shards across available devices). For single-node serving, vLLM
|
| 171 |
|
| 172 |
```python
|
| 173 |
import torch
|
|
|
|
| 7 |
tags:
|
| 8 |
- laguna-m.1
|
| 9 |
- vllm
|
|
|
|
| 10 |
- bf16
|
| 11 |
- moe
|
| 12 |
license: apache-2.0
|
|
|
|
| 27 |
|
| 28 |
# Laguna M.1
|
| 29 |
|
| 30 |
+
Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. This release has upstream support in vLLM and is part of stable releases from version v0.21.0.
|
| 31 |
|
| 32 |
> [!NOTE]
|
| 33 |
> For more details on how we trained this model, including our Model Factory approach, post-training recipe, async off-policy agent RL, and evaluations, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive) and [technical report](https://poolside.ai/assets/laguna/laguna-m1-xs2-technical-report.pdf).
|
|
|
|
| 86 |
|
| 87 |
## Usage
|
| 88 |
|
| 89 |
+
Laguna M.1 has upstream support in vLLM and Transformers, and TRT-LLM thanks to the support of the team at NVIDIA.
|
| 90 |
|
| 91 |
### pool
|
| 92 |
|
|
|
|
| 137 |
|
| 138 |
See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for our Laguna XS.2 model with which the implementation is shared for additional deployment guidance. FP8 and NVFP4 quantized checkpoints are available at [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4); quantization is detected automatically from `quantization_config`, so the same command works with the model ID substituted.
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
#### Transformers
|
| 141 |
|
| 142 |
Laguna is supported in Transformers `v5.7.0` and later ([huggingface/transformers#45673](https://github.com/huggingface/transformers/pull/45673)).
|
| 143 |
|
| 144 |
> [!NOTE]
|
| 145 |
+
> Laguna M.1 is a 225B-parameter model; loading the BF16 checkpoint in Transformers requires substantial multi-GPU memory (`device_map="auto"` shards across available devices). For single-node serving, vLLM is recommended.
|
| 146 |
|
| 147 |
```python
|
| 148 |
import torch
|
config.json
CHANGED
|
@@ -14,7 +14,7 @@
|
|
| 14 |
"num_attention_heads": 64,
|
| 15 |
"num_key_value_heads": 8,
|
| 16 |
"head_dim": 128,
|
| 17 |
-
"max_position_embeddings":
|
| 18 |
"attention_bias": false,
|
| 19 |
"attention_dropout": 0.0,
|
| 20 |
"rms_norm_eps": 1e-06,
|
|
@@ -45,7 +45,7 @@
|
|
| 45 |
"full_attention": {
|
| 46 |
"rope_theta": 500000.0,
|
| 47 |
"rope_type": "yarn",
|
| 48 |
-
"factor":
|
| 49 |
"original_max_position_embeddings": 4096,
|
| 50 |
"beta_slow": 1.0,
|
| 51 |
"beta_fast": 64.0,
|
|
|
|
| 14 |
"num_attention_heads": 64,
|
| 15 |
"num_key_value_heads": 8,
|
| 16 |
"head_dim": 128,
|
| 17 |
+
"max_position_embeddings": 131072,
|
| 18 |
"attention_bias": false,
|
| 19 |
"attention_dropout": 0.0,
|
| 20 |
"rms_norm_eps": 1e-06,
|
|
|
|
| 45 |
"full_attention": {
|
| 46 |
"rope_theta": 500000.0,
|
| 47 |
"rope_type": "yarn",
|
| 48 |
+
"factor": 32.0,
|
| 49 |
"original_max_position_embeddings": 4096,
|
| 50 |
"beta_slow": 1.0,
|
| 51 |
"beta_fast": 64.0,
|
generation_config.json
CHANGED
|
@@ -9,10 +9,5 @@
|
|
| 9 |
"pad_token_id": 9,
|
| 10 |
"temperature": 1.0,
|
| 11 |
"top_p": 1.0,
|
| 12 |
-
"min_p": 0.0
|
| 13 |
-
|
| 14 |
-
"reasoning_parser": "poolside_v1",
|
| 15 |
-
"default_chat_template_kwargs": {
|
| 16 |
-
"enable_thinking": true
|
| 17 |
-
}
|
| 18 |
-
}
|
|
|
|
| 9 |
"pad_token_id": 9,
|
| 10 |
"temperature": 1.0,
|
| 11 |
"top_p": 1.0,
|
| 12 |
+
"min_p": 0.0
|
| 13 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|