Spaces:

Suhasdev
/

Universal-prompt-Optimizer

Sleeping

Suhasdev commited on Dec 12, 2025

Commit

b4d947d

1 Parent(s): 6c19fd6

Fix: GEPA not generating candidates - training minibatch was incorrectly cached

ROOT CAUSE:
- When GEPA calls adapter.evaluate(capture_traces=True), it needs trajectories
for reflective mutation to generate new candidates
- The adapter was returning cached results with trajectories=None for training
minibatches because len(batch)=4 >= valset_size=3 was True
- GEPA checks 'if not trajectories: skip', so no new candidates were generated

FIX:
1. When capture_traces=True, ALWAYS treat as training minibatch (dfeedback)
- Never use cache when traces are needed
2. Changed valset detection from '>=' to '==' for exact match
3. Added 'and not capture_traces' check before using cache

Files changed (1) hide show

src/gepa_optimizer/core/universal_adapter.py +13 -6

src/gepa_optimizer/core/universal_adapter.py CHANGED Viewed

@@ -236,13 +236,19 @@ Output the improved prompt directly and only the prompt."""
         # Determine dataset type first (needed for cache check)
         batch_size_threshold = self._config.batch_size if hasattr(self, '_config') and self._config else 8
-        # 🔥 CRITICAL FIX: If _is_baseline_evaluation is True, we KNOW this is the validation set
-        # This fixes the issue where valset_size might not be set yet when baseline detection happens
-        if hasattr(self, '_is_baseline_evaluation') and self._is_baseline_evaluation:
             dataset_type = 'dpareto'  # Baseline is ALWAYS evaluated on validation set
             self.logger.debug(f"🎯 Forced dataset_type to 'dpareto' (baseline evaluation flag is True)")
-        elif hasattr(self, '_valset_size') and self._valset_size is not None and len(batch) >= self._valset_size:
-            dataset_type = 'dpareto'  # Full validation set size = Dpareto
         elif len(batch) > batch_size_threshold * 1.5:
             dataset_type = 'dpareto'  # Much larger than batch = likely full valset
         else:
@@ -250,7 +256,8 @@ Output the improved prompt directly and only the prompt."""
         # 🔥 CRITICAL: Check cache to avoid re-evaluating same prompt on Dpareto
         # This ensures seed prompt is evaluated ONLY ONCE
-        if dataset_type == 'dpareto':
             normalized_prompt = system_prompt.strip().strip('"\'')
             if normalized_prompt in self._dpareto_evaluated_candidates:
                 existing_score, existing_type, _ = self._dpareto_evaluated_candidates[normalized_prompt]

         # Determine dataset type first (needed for cache check)
         batch_size_threshold = self._config.batch_size if hasattr(self, '_config') and self._config else 8
+        # 🔥 CRITICAL FIX: If capture_traces=True, this is a TRAINING MINIBATCH for reflection
+        # GEPA calls with capture_traces=True when it needs trajectories for reflective mutation
+        # We must NEVER use cache in this case, otherwise trajectories=None breaks GEPA!
+        if capture_traces:
+            dataset_type = 'dfeedback'  # Training minibatch - need fresh evaluation with trajectories
+            self.logger.debug(f"🎯 Forced dataset_type to 'dfeedback' (capture_traces=True)")
+        # If _is_baseline_evaluation is True, we KNOW this is the validation set
+        elif hasattr(self, '_is_baseline_evaluation') and self._is_baseline_evaluation:
             dataset_type = 'dpareto'  # Baseline is ALWAYS evaluated on validation set
             self.logger.debug(f"🎯 Forced dataset_type to 'dpareto' (baseline evaluation flag is True)")
+        elif hasattr(self, '_valset_size') and self._valset_size is not None and len(batch) == self._valset_size:
+            # 🔥 FIX: Use == not >= to avoid misclassifying training minibatches as validation set
+            dataset_type = 'dpareto'  # EXACT validation set size = Dpareto
         elif len(batch) > batch_size_threshold * 1.5:
             dataset_type = 'dpareto'  # Much larger than batch = likely full valset
         else:
         # 🔥 CRITICAL: Check cache to avoid re-evaluating same prompt on Dpareto
         # This ensures seed prompt is evaluated ONLY ONCE
+        # NOTE: Only applies when capture_traces=False (validation set evaluation)
+        if dataset_type == 'dpareto' and not capture_traces:
             normalized_prompt = system_prompt.strip().strip('"\'')
             if normalized_prompt in self._dpareto_evaluated_candidates:
                 existing_score, existing_type, _ = self._dpareto_evaluated_candidates[normalized_prompt]