Localpager GEPA Reports

Bottom Line

Full 330 GEPA mean

0.7350

v10 0.7307, delta +0.0043

Full 330 Micro-F1

0.8206

v10 0.8231, delta -0.0025

Precision / Recall

0.8246 / 0.8167

precision down, recall up

FP / FN

110 / 116

v10 102 / 119

Heldout Micro-F1

0.8417

v10 0.8296, delta +0.0121

Best Pareto score

0.6979

seed 0.5742

GEPA-best is not a clear replacement for v10. It improves the GEPA objective and exact match, but false positives increase and full-set micro-F1 is slightly lower.

Whole 330 Score Graph

v10 seedGEPA best

Open The Detailed Graphs

Proposal GraphsEvery proposal attempt, accepted/rejected status, subsample deltas, and best-so-far Pareto score. Whole Dataset ComparisonFull 330-row GEPA-best versus v10 charts and metric table. Iteration GraphOriginal GEPA score report for the run. Prompt Diff PickerDropdown-to-dropdown comparison for individual candidate prompts. Candidate TreeGEPA-native candidate tree visualization. Final ReportRun settings, heldout comparison, whole-330 check, and artifact links.

Metric Table

Metric	v10 seed	GEPA best	Delta
GEPA mean score	0.7307	0.7350	+0.0043
Micro-F1	0.8231	0.8206	-0.0025
Precision	0.8344	0.8246	-0.0098
Recall	0.8120	0.8167	+0.0047
Exact match	0.5242	0.5424	+0.0182
False positives	102	110	+8
False negatives	119	116	-3

Bottom Line

Whole 330 Score Graph

Open The Detailed Graphs

Metric Table

Archive