Localpager GEPA Final Report

Promoted Result

Final mean score

0.6233

delta 0.1911 vs v9.1

Micro-F1

0.7985

target >= 0.7000

Precision / recall

0.9375 / 0.6954

FP weighted harder than FN

False positives

target <= 20

False negatives

target <= 58

Over-label events

target 0 or 1

Structural failures

target 0

Mean predicted labels

1.8667

same as v9.1 baseline

The promoted v3 candidate clears the strict gates and keeps mean predicted labels at the v9.1 baseline. The main caveat is scientific, not mechanical: the manual repair used mistakes from this same 60-row set, so a fresh holdout is still needed before treating it as deployment evidence.

60-Row Score Trajectory

Candidate Metrics

Candidate	Mean	Delta	Precision	Recall	F1	FP	FN	Over	Struct	Exact	Mean Labels	Label Delta	Source
v9.1 seed baseline	0.4322	0.0000	0.7411	0.5804	0.6510	29	60	2	3	10	1.8667	0.0000	source
GEPA six early GEPA	0.4912	0.0590	0.7965	0.5960	0.6818	23	61	1	0	12	n/a		source
Previous proper best proper GEPA	0.5355	0.1033	0.7778	0.6490	0.7076	28	53	5	2	21	2.1000	0.2333	source
Prop20 best proper GEPA	0.5936	0.1613	0.8707	0.6689	0.7566	15	50	2	4	24	1.9333	0.0667	source
Hardcase repair v2 manual repair, rejected	0.7729	0.3407	0.9343	0.8477	0.8889	9	23	4	0	38	2.2833	0.4167	source
Cardinality repair v3 promoted	0.6233	0.1911	0.9375	0.6954	0.7985	7	46	0	0	19	1.8667	0.0000	source

Strict Gate Check

Gate	Target	Observed	Result
mean weighted score	>= 0.5400	0.6233	pass
score delta vs v9.1	>= +0.1000	0.1911	pass
micro-F1	>= 0.7000	0.7985	pass
precision	>= 0.8000	0.9375	pass
recall	>= 0.6100	0.6954	pass
false positives	<= 20	7	pass
false negatives	<= 58	46	pass
over-label events	0 or 1	0	pass
structural failures	0	0	pass
exact matches	>= 15	19	pass
mean predicted-label delta	<= +0.10	0.0000	pass

Updated Links

Prop20 GEPA score report Prop20 candidate tree Prompt diffs

Promoted routing-policy SHA-256: b2576ca027148e109a1e72029c192ea9e26be486508671ec7c11025ac80f948b.

Remaining v3 Misses

Rows where v3 is not an exact match. These are the next useful targets for a clean holdout-aware follow-up.

Target	Title	Score	Gold	Predicted	FP	FN
#44379	fix(pi-runner): harden context-overflow recovery with one suppress-hook retry	0.1667	coding_agents, memory, hooks, reliability	agent_runtime, reliability	agent_runtime	coding_agents, memory, hooks
#45393	fix(errors): friendly message and last-message repair for tool_use/tool_result mismatch (#45385)	0.2000	tool_calling, coding_agents, reliability	tool_calling, agent_runtime	agent_runtime	coding_agents, reliability
#47083	fix: respect totalTokensFresh flag to avoid showing stale token counts	0.2000	sessions, telemetry_usage	ui_tui	ui_tui	sessions, telemetry_usage
#81957	ci: harden GitHub Actions supply-chain boundaries	0.2500	security	packaging_deployment	packaging_deployment	security
#65364	feat(plugins): add registerProviderRuntimeAuthOverride API	0.2500	auth_identity, api_surface	skills_plugins, auth_identity	skills_plugins	api_surface
#80008	feat(plugins): expose ACP spawn and prompt in plugin runtime	0.2500	acp, coding_agents	skills_plugins, acp	skills_plugins	coding_agents
#90146	google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error	0.2500	local_model_providers, reliability	local_model_providers, model_serving	model_serving	reliability
#73910	BUG: OpenClaw-managed Codex ACP uses isolated CODEX_HOME without auth bridge and sends unsupported timeout config	0.3333	codex, acp, acpx, auth_identity	codex, acp	none	acpx, auth_identity
#52249	ACP parent session stuck until refresh when yielded waiting for child completion	0.5000	acp, sessions, reliability	acp, sessions	none	reliability
#83863	ACP/Codex child tasks can be marked succeeded with progress-only output and no final deliverable	0.5000	acp, codex, agent_runtime	acp, codex	none	agent_runtime
#48940	ACP: add gateway-owned node-backed runtime	0.5000	acp, gateway, agent_runtime	acp, gateway	none	agent_runtime
#48580	Bug: acpx codex sessions 创建的会话立即退出 - stdin is not a terminal	0.5000	acpx, codex, sessions	acpx, codex	none	sessions
#39248	Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization	0.5000	coding_agents, sandboxing, agent_runtime	sandboxing, agent_runtime	none	coding_agents
#71216	Config schema: add `sandbox`, `routing.rules`, `instances`, and `gateway.nodes.denyPaths`	0.5000	config, sandboxing, gateway	config, gateway	none	sandboxing
#84477	Discord embedded-run prep wedge before strict-agentic, recovery skips sessionId=unknown lanes	0.5000	sessions, agent_runtime, reliability	sessions, reliability	none	agent_runtime
#51667	Feature: Native Audio Input for Omni-Modal Models (skip STT transcription)	0.5000	model_serving, security, config	model_serving, config	none	security
#43765	Improve runtime recovery for heartbeat, Feishu, and exec sessions	0.5000	reliability, exec_tools, cron_automation	reliability, exec_tools	none	cron_automation
#80783	Policy: add model, network, and MCP conformance checks	0.5000	mcp_tooling, config, security	config, mcp_tooling	none	security
#68187	SSE-backed MCP sessions can stay stale after server restart and fail with 'Session not found'	0.5000	mcp_tooling, sessions, gateway	mcp_tooling, sessions	none	gateway
#78528	Security: skill SecretRef API keys still leak into exec child environments	0.5000	security, exec_tools, skills_plugins	security, exec_tools	none	skills_plugins
#84715	[Bug]: @openclaw/codex peer link failure reproduced on 2026.5.19 after update	0.5000	codex, packaging_deployment	codex	none	packaging_deployment
#70529	[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT	0.5000	browser_automation, packaging_deployment	browser_automation	none	packaging_deployment
#84757	[Bug]: Telegram session can get stuck after compaction when encrypted reasoning content fails verification	0.5000	sessions, chat_integrations, reliability	sessions, chat_integrations	none	reliability
#44202	[Bug]: local memory embeddings on Apple Silicon can crash gateway in ggml-metal / node-llama-cpp; need official Metal/GPU guidance	0.5000	local_models, memory, self_hosted_inference	local_models, memory	none	self_hosted_inference
#10467	[Feature Request]: Multi-lane concurrency support for sub-agents via sessions_spawn	0.5000	queueing, sessions, coding_agents	queueing, sessions	none	coding_agents
#82507	[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)	0.5000	acpx, codex, skills_plugins	acpx, skills_plugins	none	codex
#40332	[Feature]: Per-binding and per-agent permissionMode for ACP sessions	0.5000	acp, approvals, acpx	approvals, acpx	none	acp
#84670	[codex] fix webchat full-message reader for truncated history	0.5000	gateway, api_surface, ui_tui	gateway, ui_tui	none	api_surface
#84583	cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting	0.5000	cron_automation, sessions, reliability	cron_automation, sessions	none	reliability
#68725	feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models	0.5000	open_weight_models, local_model_providers	open_weight_models	none	local_model_providers
#56442	feat: Add opt-in ACP parent completion notify for sessions_spawn	0.5000	acp, sessions, agent_runtime	acp, sessions	none	agent_runtime
#60979	feature: sessions_spawn ACP delivery to channel (stream output to Zulip/Discord topic)	0.5000	acp, chat_integrations, sessions	acp, sessions	none	chat_integrations
#52747	fix(acp): time out stuck session lane tasks	0.5000	acp, sessions, reliability	acp, sessions	none	reliability
#84763	fix(acpx): scrub provider credential env from ACP harness spawns	0.5000	acpx, acp, security	acp, acpx	none	security
#69256	fix(cron): prevent premature session cleanup when subagents are running	0.5000	cron_automation, sessions, reliability	cron_automation, sessions	none	reliability
#65242	fix: CompletionDeliveryGate to prevent duplicate ACP completion delivery	0.5000	acp, coding_agents, reliability	acp, reliability	none	coding_agents
#77827	fix: LM Studio thinking blocks invisible with Responses API	0.5000	model_serving, local_models	local_models	none	model_serving
#42027	fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock	0.5000	exec_tools, browser_automation, cron_automation	exec_tools, browser_automation	none	cron_automation
#84752	fix: self-heal lane wedges + restore openai-codex OAuth on embedded path	0.5000	reliability, auth_identity, sessions	reliability, auth_identity	none	sessions
#63826	security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification	0.5000	security, hooks, skills_plugins	security, skills_plugins	none	hooks
#62428	test(exec): land exec v2 contract follow-through	0.5000	exec_tools, sandboxing, approvals	exec_tools, sandboxing	none	approvals