NOTICE
Lean Laguna is a serving recipe + benchmark harness + reusable RL environment built on top of Poolside's released Laguna XS.2. It does not redistribute model weights.
Built on
- Laguna XS.2 —
poolside/Laguna-XS.2(Apache-2.0). The base model; not included here. - DFlash speculator —
poolside/Laguna-XS.2-speculator.dflash. The 0.6B draft model used for speculative decoding; not included here. The Laguna DFlash speculator checkpoint was trained by Poolside. - DFlash method — the speculative-decoding drafting method, integrated in vLLM via
--speculative-config '{"method":"dflash", ...}'.
What is original here (Apache-2.0)
The benchmark/serving harness (scripts/, bench/, evals/), the measured A/B results (results/),
the spec_rl verifiers environment (spec_rl/), and the endpoint/configuration seam (configs/).
Under greedy decoding the outputs are byte-identical to the base model — the speedup is lossless.
Cite
Built for the Poolside Research Hackathon. See README.md for the method, measured results, and
reproduction steps.