Localpager GEPA Reports

Default dashboard for the Evalstate Qwen GEPA overlay run. Updated 2026-06-17.

Bottom Line

Full 330 GEPA mean
0.7350
v10 0.7307, delta +0.0043
Full 330 Micro-F1
0.8206
v10 0.8231, delta -0.0025
Precision / Recall
0.8246 / 0.8167
precision down, recall up
FP / FN
110 / 116
v10 102 / 119
Heldout Micro-F1
0.8417
v10 0.8296, delta +0.0121
Best Pareto score
0.6979
seed 0.5742

GEPA-best is not a clear replacement for v10. It improves the GEPA objective and exact match, but false positives increase and full-set micro-F1 is slightly lower.

Whole 330 Score Graph

0.500.600.700.80 v10 GEPA mean 0.73070.7307GEPA best GEPA mean 0.73500.7350GEPA mean v10 F1 0.82310.8231GEPA best F1 0.82060.8206Micro-F1 v10 precision 0.83440.8344GEPA best precision 0.82460.8246Precision v10 recall 0.81200.8120GEPA best recall 0.81670.8167Recall full 330 repaired outputs
v10 seedGEPA best

Open The Detailed Graphs

Metric Table

Metricv10 seedGEPA bestDelta
GEPA mean score0.73070.7350+0.0043
Micro-F10.82310.8206-0.0025
Precision0.83440.8246-0.0098
Recall0.81200.8167+0.0047
Exact match0.52420.5424+0.0182
False positives102110+8
False negatives119116-3

Archive

2026-06-14 12B cardinality report 2026-06-14 12B score graph 2026-06-14 prompt diffs