Lean Laguna: Laguna XS.2 + DFlash — lossless single-GPU speedup + cheaper RL rollouts

8612587 about 3 hours ago

772 Bytes

	{
	"label": "baseline",
	"model": "poolside/Laguna-XS.2",
	"n": 14,
	"tokens_per_s_mean": 19.64077204940069,
	"ttft_s_mean": 6.58612985270364,
	"acceptance_length_tau": 1.0,
	"source": "HF Job 6a19d8b73a4b8cae6044dfdf (h200), 2026-05-29; vLLM 0.22.0, --enforce-eager, --max-model-len 4096, greedy (temperature=0), no speculator",
	"prompt_set": "14 distinct mixed-difficulty Python prompts (trivial fib/is_prime -> medium binary_search/roman_to_int -> hard lcs/parse_duration/dijkstra/LRUCache)",
	"corroborating_run": "An earlier 20-prompt trivial-only run (job 6a19d2105c8d10ffa1107774) gave baseline 19.47 tok/s.",
	"note": "ttft_s_mean here is full-completion latency, NOT true time-to-first-token; we make no TTFT claim. Summary stats are over all n=14."
	}