Model card: add license + Transformers/TRT-LLM deployment; add LICENSE.md

by joerowell - opened 14 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-58

Files changed (6) hide show

.eval_results/swe-bench_pro.yaml +0 -7
.eval_results/swe-bench_verified.yaml +0 -7
.eval_results/terminal-bench-2.0.yaml +0 -7
README.md +3 -28
config.json +2 -2
generation_config.json +2 -7

.eval_results/swe-bench_pro.yaml DELETED Viewed

@@ -1,7 +0,0 @@
-- dataset:
-    id: ScaleAI/SWE-bench_Pro
-    task_id: SWE_Bench_Pro
-  value: 49.2
-  source:
-    url: https://huggingface.co/poolside/Laguna-M.1
-    name: Model Card

.eval_results/swe-bench_verified.yaml DELETED Viewed

@@ -1,7 +0,0 @@
-- dataset:
-    id: SWE-bench/SWE-bench_Verified
-    task_id: swe_bench_%_resolved
-  value: 74.6
-  source:
-    url: https://huggingface.co/poolside/Laguna-M.1
-    name: Model Card

.eval_results/terminal-bench-2.0.yaml DELETED Viewed

@@ -1,7 +0,0 @@
-- dataset:
-    id: harborframework/terminal-bench-2.0
-    task_id: terminalbench_2
-  value: 45.8
-  source:
-    url: https://huggingface.co/poolside/Laguna-M.1
-    name: Model Card

README.md CHANGED Viewed

@@ -7,7 +7,6 @@ extra_gated_description: >-
 tags:
 - laguna-m.1
 - vllm
-- sglang
 - bf16
 - moe
 license: apache-2.0
@@ -28,7 +27,7 @@ pipeline_tag: text-generation
 # Laguna M.1
-Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work.
 > [!NOTE]
 > For more details on how we trained this model, including our Model Factory approach, post-training recipe, async off-policy agent RL, and evaluations, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive) and [technical report](https://poolside.ai/assets/laguna/laguna-m1-xs2-technical-report.pdf).
@@ -87,7 +86,7 @@ Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated
 ## Usage
-Laguna M.1 has upstream support in vLLM, SGLang, and Transformers, and TRT-LLM thanks to the support of the team at NVIDIA.
 ### pool
@@ -138,36 +137,12 @@ vllm serve \
 See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for our Laguna XS.2 model with which the implementation is shared for additional deployment guidance. FP8 and NVFP4 quantized checkpoints are available at [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4); quantization is detected automatically from `quantization_config`, so the same command works with the model ID substituted.
-#### SGLang
-Laguna M.1 can be served with SGLang using its OpenAI-compatible server, including support for tool calling, streaming responses, and reasoning parsing:
-> [!NOTE]
-> Laguna support was added to SGLang in [sgl-project/sglang#24204](https://github.com/sgl-project/sglang/pull/24204). The integration is shared with [Laguna XS.2](https://huggingface.co/poolside/Laguna-XS.2) and is currently available on SGLang main.
-```shell
-# Laguna M.1 support is currently on SGLang main, so install from source
-git clone https://github.com/sgl-project/sglang.git
-cd sglang
-pip install -e "python[all]"
-sglang serve \
-    --trust-remote-code \
-    --model-path poolside/Laguna-M.1 \
-    --tool-call-parser poolside_v1 \
-    --reasoning-parser poolside_v1 \
-    --tp 8 \
-    --host 0.0.0.0
-```
-Quantized Laguna M.1 checkpoints are also available as [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4). SGLang reads the checkpoint `quantization_config`, so you can use the same launch command after replacing the model ID. For more SGLang-specific deployment details, see the [SGLang Cookbook](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-M.1).
 #### Transformers
 Laguna is supported in Transformers `v5.7.0` and later ([huggingface/transformers#45673](https://github.com/huggingface/transformers/pull/45673)).
 > [!NOTE]
-> Laguna M.1 is a 225B-parameter model; loading the BF16 checkpoint in Transformers requires substantial multi-GPU memory (`device_map="auto"` shards across available devices). For single-node serving, vLLM or SGLang is recommended.
 ```python
 import torch

 tags:
 - laguna-m.1
 - vllm
 - bf16
 - moe
 license: apache-2.0
 # Laguna M.1
+Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. This release has upstream support in vLLM and is part of stable releases from version v0.21.0.
 > [!NOTE]
 > For more details on how we trained this model, including our Model Factory approach, post-training recipe, async off-policy agent RL, and evaluations, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive) and [technical report](https://poolside.ai/assets/laguna/laguna-m1-xs2-technical-report.pdf).
 ## Usage
+Laguna M.1 has upstream support in vLLM and Transformers, and TRT-LLM thanks to the support of the team at NVIDIA.
 ### pool
 See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for our Laguna XS.2 model with which the implementation is shared for additional deployment guidance. FP8 and NVFP4 quantized checkpoints are available at [Laguna-M.1-FP8](https://huggingface.co/poolside/Laguna-M.1-FP8) and [Laguna-M.1-NVFP4](https://huggingface.co/poolside/Laguna-M.1-NVFP4); quantization is detected automatically from `quantization_config`, so the same command works with the model ID substituted.
 #### Transformers
 Laguna is supported in Transformers `v5.7.0` and later ([huggingface/transformers#45673](https://github.com/huggingface/transformers/pull/45673)).
 > [!NOTE]
+> Laguna M.1 is a 225B-parameter model; loading the BF16 checkpoint in Transformers requires substantial multi-GPU memory (`device_map="auto"` shards across available devices). For single-node serving, vLLM is recommended.
 ```python
 import torch

config.json CHANGED Viewed

@@ -14,7 +14,7 @@
   "num_attention_heads": 64,
   "num_key_value_heads": 8,
   "head_dim": 128,
-  "max_position_embeddings": 262144,
   "attention_bias": false,
   "attention_dropout": 0.0,
   "rms_norm_eps": 1e-06,
@@ -45,7 +45,7 @@
     "full_attention": {
       "rope_theta": 500000.0,
       "rope_type": "yarn",
-      "factor": 64.0,
       "original_max_position_embeddings": 4096,
       "beta_slow": 1.0,
       "beta_fast": 64.0,

   "num_attention_heads": 64,
   "num_key_value_heads": 8,
   "head_dim": 128,
+  "max_position_embeddings": 131072,
   "attention_bias": false,
   "attention_dropout": 0.0,
   "rms_norm_eps": 1e-06,
     "full_attention": {
       "rope_theta": 500000.0,
       "rope_type": "yarn",
+      "factor": 32.0,
       "original_max_position_embeddings": 4096,
       "beta_slow": 1.0,
       "beta_fast": 64.0,

generation_config.json CHANGED Viewed

@@ -9,10 +9,5 @@
   "pad_token_id": 9,
   "temperature": 1.0,
   "top_p": 1.0,
-  "min_p": 0.0,
-  "tool_call_parser": "poolside_v1",
-  "reasoning_parser": "poolside_v1",
-  "default_chat_template_kwargs": {
-    "enable_thinking": true
-  }
-}

   "pad_token_id": 9,
   "temperature": 1.0,
   "top_p": 1.0,
+  "min_p": 0.0
+}