Model card: add license + Transformers/TRT-LLM deployment; add LICENSE.md

#2
by joerowell - opened
.eval_results/swe-bench_pro.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: ScaleAI/SWE-bench_Pro
3
- task_id: SWE_Bench_Pro
4
- value: 49.2
5
- source:
6
- url: https://huggingface.co/poolside/Laguna-M.1
7
- name: Model Card
 
 
 
 
 
 
 
 
.eval_results/swe-bench_verified.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: SWE-bench/SWE-bench_Verified
3
- task_id: swe_bench_%_resolved
4
- value: 74.6
5
- source:
6
- url: https://huggingface.co/poolside/Laguna-M.1
7
- name: Model Card
 
 
 
 
 
 
 
 
.eval_results/terminal-bench-2.0.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: harborframework/terminal-bench-2.0
3
- task_id: terminalbench_2
4
- value: 45.8
5
- source:
6
- url: https://huggingface.co/poolside/Laguna-M.1
7
- name: Model Card
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -7,7 +7,6 @@ extra_gated_description: >-
7
  tags:
8
  - laguna-m.1
9
  - vllm
10
- - sglang
11
  - bf16
12
  - moe
13
  license: apache-2.0
@@ -28,7 +27,7 @@ pipeline_tag: text-generation
28
 
29
  # Laguna M.1
30
 
31
- Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work.
32
 
33
  > [!NOTE]
34
  > For more details on how we trained this model, including our Model Factory approach, post-training recipe, async off-policy agent RL, and evaluations, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive) and [technical report](https://poolside.ai/assets/laguna/laguna-m1-xs2-technical-report.pdf).
@@ -87,7 +86,7 @@ Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated
87
 
88
  ## Usage
89
 
90
- Laguna M.1 has upstream support in vLLM, SGLang, and Transformers, and TRT-LLM thanks to the support of the team at NVIDIA.
91
 
92
  ### pool
93
 
@@ -138,36 +137,12 @@ vllm serve \
138
 
139
  See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for our Laguna XS.2 model with which the implementation is shared for additional deployment guidance. FP8 and NVFP4 quantized checkpoints are available at [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4); quantization is detected automatically from `quantization_config`, so the same command works with the model ID substituted.
140
 
141
- #### SGLang
142
-
143
- Laguna M.1 can be served with SGLang using its OpenAI-compatible server, including support for tool calling, streaming responses, and reasoning parsing:
144
-
145
- > [!NOTE]
146
- > Laguna support was added to SGLang in [sgl-project/sglang#24204](https://github.com/sgl-project/sglang/pull/24204). The integration is shared with [Laguna XS.2](https://huggingface.co/poolside/Laguna-XS.2) and is currently available on SGLang main.
147
-
148
- ```shell
149
- # Laguna M.1 support is currently on SGLang main, so install from source
150
- git clone https://github.com/sgl-project/sglang.git
151
- cd sglang
152
- pip install -e "python[all]"
153
-
154
- sglang serve \
155
- --trust-remote-code \
156
- --model-path poolside/Laguna-M.1 \
157
- --tool-call-parser poolside_v1 \
158
- --reasoning-parser poolside_v1 \
159
- --tp 8 \
160
- --host 0.0.0.0
161
- ```
162
-
163
- Quantized Laguna M.1 checkpoints are also available as [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4). SGLang reads the checkpoint `quantization_config`, so you can use the same launch command after replacing the model ID. For more SGLang-specific deployment details, see the [SGLang Cookbook](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-M.1).
164
-
165
  #### Transformers
166
 
167
  Laguna is supported in Transformers `v5.7.0` and later ([huggingface/transformers#45673](https://github.com/huggingface/transformers/pull/45673)).
168
 
169
  > [!NOTE]
170
- > Laguna M.1 is a 225B-parameter model; loading the BF16 checkpoint in Transformers requires substantial multi-GPU memory (`device_map="auto"` shards across available devices). For single-node serving, vLLM or SGLang is recommended.
171
 
172
  ```python
173
  import torch
 
7
  tags:
8
  - laguna-m.1
9
  - vllm
 
10
  - bf16
11
  - moe
12
  license: apache-2.0
 
27
 
28
  # Laguna M.1
29
 
30
+ Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. This release has upstream support in vLLM and is part of stable releases from version v0.21.0.
31
 
32
  > [!NOTE]
33
  > For more details on how we trained this model, including our Model Factory approach, post-training recipe, async off-policy agent RL, and evaluations, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive) and [technical report](https://poolside.ai/assets/laguna/laguna-m1-xs2-technical-report.pdf).
 
86
 
87
  ## Usage
88
 
89
+ Laguna M.1 has upstream support in vLLM and Transformers, and TRT-LLM thanks to the support of the team at NVIDIA.
90
 
91
  ### pool
92
 
 
137
 
138
  See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for our Laguna XS.2 model with which the implementation is shared for additional deployment guidance. FP8 and NVFP4 quantized checkpoints are available at [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4); quantization is detected automatically from `quantization_config`, so the same command works with the model ID substituted.
139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  #### Transformers
141
 
142
  Laguna is supported in Transformers `v5.7.0` and later ([huggingface/transformers#45673](https://github.com/huggingface/transformers/pull/45673)).
143
 
144
  > [!NOTE]
145
+ > Laguna M.1 is a 225B-parameter model; loading the BF16 checkpoint in Transformers requires substantial multi-GPU memory (`device_map="auto"` shards across available devices). For single-node serving, vLLM is recommended.
146
 
147
  ```python
148
  import torch
config.json CHANGED
@@ -14,7 +14,7 @@
14
  "num_attention_heads": 64,
15
  "num_key_value_heads": 8,
16
  "head_dim": 128,
17
- "max_position_embeddings": 262144,
18
  "attention_bias": false,
19
  "attention_dropout": 0.0,
20
  "rms_norm_eps": 1e-06,
@@ -45,7 +45,7 @@
45
  "full_attention": {
46
  "rope_theta": 500000.0,
47
  "rope_type": "yarn",
48
- "factor": 64.0,
49
  "original_max_position_embeddings": 4096,
50
  "beta_slow": 1.0,
51
  "beta_fast": 64.0,
 
14
  "num_attention_heads": 64,
15
  "num_key_value_heads": 8,
16
  "head_dim": 128,
17
+ "max_position_embeddings": 131072,
18
  "attention_bias": false,
19
  "attention_dropout": 0.0,
20
  "rms_norm_eps": 1e-06,
 
45
  "full_attention": {
46
  "rope_theta": 500000.0,
47
  "rope_type": "yarn",
48
+ "factor": 32.0,
49
  "original_max_position_embeddings": 4096,
50
  "beta_slow": 1.0,
51
  "beta_fast": 64.0,
generation_config.json CHANGED
@@ -9,10 +9,5 @@
9
  "pad_token_id": 9,
10
  "temperature": 1.0,
11
  "top_p": 1.0,
12
- "min_p": 0.0,
13
- "tool_call_parser": "poolside_v1",
14
- "reasoning_parser": "poolside_v1",
15
- "default_chat_template_kwargs": {
16
- "enable_thinking": true
17
- }
18
- }
 
9
  "pad_token_id": 9,
10
  "temperature": 1.0,
11
  "top_p": 1.0,
12
+ "min_p": 0.0
13
+ }