| # Codex Dev Notes |
|
|
| ## Shinka Search And Memory Redesign |
|
|
| This update changed the Shinka-side evolution loop to be less parent-anchored and more deliberate about search strategy. |
|
|
| ### 1. Structured universal memory |
|
|
| Added `shinka/core/universal_memory.py`. |
|
|
| The new memory layer records one structured entry per evaluated program, including: |
| - generation |
| - program id and parent id |
| - search mode |
| - patch type |
| - patch name |
| - prompt intent |
| - correctness |
| - combined score |
| - parent score and score delta |
| - failure mode |
| - verdict (`win`, `loss`, `neutral`, `invalid`) |
| - summarized `aux_*` metrics |
| - tags |
|
|
| The memory is persisted at: |
|
|
| `<results_dir>/universal_memory.json` |
|
|
| It also maintains aggregated strategy statistics so future prompts can see: |
| - recent wins |
| - recent failures |
| - mode-specific outcomes |
| - underexplored strategy families |
|
|
| ### 2. Adaptive search-mode controller |
|
|
| Added `shinka/core/search_mode.py`. |
|
|
| The controller chooses a search mode per generation: |
| - `refine` |
| - `recombine` |
| - `diverge` |
| - `restart` |
| - `theory` |
|
|
| The choice is currently heuristic and uses: |
| - recent best-score change |
| - recent invalid-rate |
| - recent mode repetition |
| - whether some modes are underexplored |
|
|
| Current behavior: |
| - high invalid-rate pushes toward `restart` |
| - plateau after repeated `refine` pushes toward `diverge` |
| - periodic generations can trigger `theory` |
| - underexplored or scheduled turns can trigger `recombine` |
|
|
| ### 3. Runner integration |
|
|
| Updated `shinka/core/runner.py`. |
|
|
| Main integration points: |
| - initialize `UniversalMemory` |
| - initialize `SearchModeController` |
| - choose a search mode before each generation is patched |
| - adapt parent/inspiration usage based on search mode |
| - inject search metadata into patch metadata |
| - write evaluated outcomes back into structured memory |
|
|
| Important effects: |
| - generation 0 is recorded into universal memory |
| - completed jobs now store `search_mode`, `search_rationale`, and `prompt_intent` |
| - post-evaluation writes structured results back into universal memory |
| - `restart` mode can fall back to a generation-0 style parent and clear inspirations |
| - `diverge`, `recombine`, and `theory` trim or reshape inspiration context |
|
|
| ### 4. Prompt changes |
|
|
| Updated `shinka/core/sampler.py`. |
|
|
| Prompt construction now supports: |
| - `search_mode` |
| - `search_rationale` |
| - `prompt_intent` |
| - `memory_context` |
|
|
| New prompt sections: |
| - `# Search Mode` |
| - `# Universal Memory` |
|
|
| Mode-specific behavior: |
| - `restart` strongly prefers `full` |
| - `theory` biases toward `full` or `cross` |
| - `diverge` biases toward larger jumps |
| - `recombine` biases toward crossover |
| - context is reduced or cleared for modes that should not overfit to the current lineage |
|
|
| This is intended to reduce the previous tendency to always optimize as a local descendant of the current parent. |
|
|
| ### 5. Package exports |
|
|
| Updated `shinka/core/__init__.py` to export: |
| - `UniversalMemory` |
| - `SearchModeController` |
| - `SearchModeDecision` |
|
|
| ### 6. Verification |
|
|
| Performed: |
| - Python syntax check with `python3 -m py_compile` on the modified core files |
|
|
| Not yet performed: |
| - full end-to-end experiment validation |
| - behavioral verification of mode switching during long runs |
|
|
| ### 7. Expected runtime artifacts |
|
|
| After running an experiment, expect: |
| - `universal_memory.json` in the experiment root |
| - search metadata in program `metadata` |
| - prompt traces showing explicit search modes |
| - more varied search behavior across generations |
|
|
| ### 8. Known limitations of this first pass |
|
|
| This is a first implementation, not the final architecture. |
|
|
| Current limitations: |
| - search-mode control is heuristic, not learned |
| - universal memory retrieval is still recent-history oriented |
| - memory does not yet cluster strategy families semantically |
| - prompt memory is summarized text, not selective structured retrieval |
| - no dedicated controller yet for allocating a fixed exploration budget by mode |
|
|
| The current goal was to create the first working foundation for: |
| - structured experiment memory |
| - explicit mode switching |
| - reduced parent anchoring |
|
|