NOTICE

Lean Laguna is a serving recipe + benchmark harness + reusable RL environment built on top of Poolside's released Laguna XS.2. It does not redistribute model weights.

Built on

Laguna XS.2 — poolside/Laguna-XS.2 (Apache-2.0). The base model; not included here.
DFlash speculator — poolside/Laguna-XS.2-speculator.dflash. The 0.6B draft model used for speculative decoding; not included here. The Laguna DFlash speculator checkpoint was trained by Poolside.
DFlash method — the speculative-decoding drafting method, integrated in vLLM via --speculative-config '{"method":"dflash", ...}'.

What is original here (Apache-2.0)

The benchmark/serving harness (scripts/, bench/, evals/), the measured A/B results (results/), the spec_rl verifiers environment (spec_rl/), and the endpoint/configuration seam (configs/). Under greedy decoding the outputs are byte-identical to the base model — the speedup is lossless.

Cite

Built for the Poolside Research Hackathon. See README.md for the method, measured results, and reproduction steps.