Text Generation
Transformers
Safetensors
laguna
laguna-xs.2
vllm
conversational
custom_code
Eval Results
Instructions to use poolside/Laguna-XS.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use poolside/Laguna-XS.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="poolside/Laguna-XS.2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("poolside/Laguna-XS.2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("poolside/Laguna-XS.2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use poolside/Laguna-XS.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "poolside/Laguna-XS.2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "poolside/Laguna-XS.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/poolside/Laguna-XS.2
- SGLang
How to use poolside/Laguna-XS.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "poolside/Laguna-XS.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "poolside/Laguna-XS.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "poolside/Laguna-XS.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "poolside/Laguna-XS.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use poolside/Laguna-XS.2 with Docker Model Runner:
docker model run hf.co/poolside/Laguna-XS.2
Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ pipeline_tag: text-generation
|
|
| 13 |
---
|
| 14 |
|
| 15 |
<p align="center">
|
| 16 |
-
<img alt="poolside-banner" src="">
|
| 17 |
</p>
|
| 18 |
|
| 19 |
<p align="center">
|
|
@@ -59,14 +59,14 @@ For more details on how we trained this model, including on data automixing and
|
|
| 59 |
|
| 60 |
We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
|
| 61 |
|
| 62 |
-
| Model | Size (total params.) | SWE-bench
|
| 63 |
-
|---------------------------|----------------------|--------------------------------
|
| 64 |
-
| **Laguna XS.2** | 33B |
|
| 65 |
-
| Devstral Small 2 | 24B dense |
|
| 66 |
-
| Gemma 4 31B IT | 31B dense |
|
| 67 |
-
| Qwen3.5-35B-A3B | 35B |
|
| 68 |
-
| GPT-5.4 Nano | - |
|
| 69 |
-
| Qwen3.6-27B | 27B dense |
|
| 70 |
|
| 71 |
*We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were [reported by the Qwen team](https://qwen.ai/blog?id=qwen3.6-35b-a3b).*
|
| 72 |
|
|
@@ -75,9 +75,9 @@ We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see th
|
|
| 75 |
|
| 76 |
All benchmarking for Laguna XS.2 was completed using the Laude Institute’s Harbor Framework with our [agent harness](https://github.com/poolsideai/pool), using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used for all benchmarking: temperature=0.7 and top_k=20. Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.
|
| 77 |
|
| 78 |
-
- SWE-bench Pro: mean pass@1 averaged over 3 runs.
|
| 79 |
- SWE-bench Verified: mean pass@1 averaged over 4 runs.
|
| 80 |
- SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
|
|
|
|
| 81 |
- Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.
|
| 82 |
|
| 83 |
</details>
|
|
@@ -128,8 +128,6 @@ ollama launch pool --model laguna.xs-2
|
|
| 128 |
|
| 129 |
Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
|
| 130 |
|
| 131 |
-
*By downloading and using pool, you agree to the Poolside [End User License Agreement (EULA)](https://poolside.ai/legal/eula).*
|
| 132 |
-
|
| 133 |
### Local deployment
|
| 134 |
|
| 135 |
[vLLM, Transformers v5, TRT-LLM, SGLang, ...]
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
<p align="center">
|
| 16 |
+
<img alt="poolside-banner" src="https://poolside.ai/assets/laguna/laguna-xs2-banner.svg" width="800px">
|
| 17 |
</p>
|
| 18 |
|
| 19 |
<p align="center">
|
|
|
|
| 59 |
|
| 60 |
We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
|
| 61 |
|
| 62 |
+
| Model | Size (total params.) | SWE-bench Verified | SWE-bench Multilingual | SWE-bench Pro (Public Dataset) | Terminal-Bench 2.0 |
|
| 63 |
+
|---------------------------|----------------------|--------------------|------------------------|--------------------------------|--------------------|
|
| 64 |
+
| **Laguna XS.2** | 33B | 68.2% | 62.4% | 44.5% | 30.1% |
|
| 65 |
+
| Devstral Small 2 | 24B dense | 68.0% | 55.7% | - | 22.5% |
|
| 66 |
+
| Gemma 4 31B IT | 31B dense | 52.0% | 51.7% | 35.7% | 42.9% |
|
| 67 |
+
| Qwen3.5-35B-A3B | 35B | 69.2% | 60.3% | 44.6% | 40.5% |
|
| 68 |
+
| GPT-5.4 Nano | - | - | - | 52.4% | 46.3% |
|
| 69 |
+
| Qwen3.6-27B | 27B dense | 77.2% | 71.3% | 53.2% | 59.3% |
|
| 70 |
|
| 71 |
*We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were [reported by the Qwen team](https://qwen.ai/blog?id=qwen3.6-35b-a3b).*
|
| 72 |
|
|
|
|
| 75 |
|
| 76 |
All benchmarking for Laguna XS.2 was completed using the Laude Institute’s Harbor Framework with our [agent harness](https://github.com/poolsideai/pool), using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used for all benchmarking: temperature=0.7 and top_k=20. Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.
|
| 77 |
|
|
|
|
| 78 |
- SWE-bench Verified: mean pass@1 averaged over 4 runs.
|
| 79 |
- SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
|
| 80 |
+
- SWE-bench Pro: mean pass@1 averaged over 3 runs.
|
| 81 |
- Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.
|
| 82 |
|
| 83 |
</details>
|
|
|
|
| 128 |
|
| 129 |
Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
|
| 130 |
|
|
|
|
|
|
|
| 131 |
### Local deployment
|
| 132 |
|
| 133 |
[vLLM, Transformers v5, TRT-LLM, SGLang, ...]
|