File size: 3,995 Bytes
1556404 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | # Codex Dev Notes
## Shinka Search And Memory Redesign
This update changed the Shinka-side evolution loop to be less parent-anchored and more deliberate about search strategy.
### 1. Structured universal memory
Added `shinka/core/universal_memory.py`.
The new memory layer records one structured entry per evaluated program, including:
- generation
- program id and parent id
- search mode
- patch type
- patch name
- prompt intent
- correctness
- combined score
- parent score and score delta
- failure mode
- verdict (`win`, `loss`, `neutral`, `invalid`)
- summarized `aux_*` metrics
- tags
The memory is persisted at:
`<results_dir>/universal_memory.json`
It also maintains aggregated strategy statistics so future prompts can see:
- recent wins
- recent failures
- mode-specific outcomes
- underexplored strategy families
### 2. Adaptive search-mode controller
Added `shinka/core/search_mode.py`.
The controller chooses a search mode per generation:
- `refine`
- `recombine`
- `diverge`
- `restart`
- `theory`
The choice is currently heuristic and uses:
- recent best-score change
- recent invalid-rate
- recent mode repetition
- whether some modes are underexplored
Current behavior:
- high invalid-rate pushes toward `restart`
- plateau after repeated `refine` pushes toward `diverge`
- periodic generations can trigger `theory`
- underexplored or scheduled turns can trigger `recombine`
### 3. Runner integration
Updated `shinka/core/runner.py`.
Main integration points:
- initialize `UniversalMemory`
- initialize `SearchModeController`
- choose a search mode before each generation is patched
- adapt parent/inspiration usage based on search mode
- inject search metadata into patch metadata
- write evaluated outcomes back into structured memory
Important effects:
- generation 0 is recorded into universal memory
- completed jobs now store `search_mode`, `search_rationale`, and `prompt_intent`
- post-evaluation writes structured results back into universal memory
- `restart` mode can fall back to a generation-0 style parent and clear inspirations
- `diverge`, `recombine`, and `theory` trim or reshape inspiration context
### 4. Prompt changes
Updated `shinka/core/sampler.py`.
Prompt construction now supports:
- `search_mode`
- `search_rationale`
- `prompt_intent`
- `memory_context`
New prompt sections:
- `# Search Mode`
- `# Universal Memory`
Mode-specific behavior:
- `restart` strongly prefers `full`
- `theory` biases toward `full` or `cross`
- `diverge` biases toward larger jumps
- `recombine` biases toward crossover
- context is reduced or cleared for modes that should not overfit to the current lineage
This is intended to reduce the previous tendency to always optimize as a local descendant of the current parent.
### 5. Package exports
Updated `shinka/core/__init__.py` to export:
- `UniversalMemory`
- `SearchModeController`
- `SearchModeDecision`
### 6. Verification
Performed:
- Python syntax check with `python3 -m py_compile` on the modified core files
Not yet performed:
- full end-to-end experiment validation
- behavioral verification of mode switching during long runs
### 7. Expected runtime artifacts
After running an experiment, expect:
- `universal_memory.json` in the experiment root
- search metadata in program `metadata`
- prompt traces showing explicit search modes
- more varied search behavior across generations
### 8. Known limitations of this first pass
This is a first implementation, not the final architecture.
Current limitations:
- search-mode control is heuristic, not learned
- universal memory retrieval is still recent-history oriented
- memory does not yet cluster strategy families semantically
- prompt memory is summarized text, not selective structured retrieval
- no dedicated controller yet for allocating a fixed exploration budget by mode
The current goal was to create the first working foundation for:
- structured experiment memory
- explicit mode switching
- reduced parent anchoring
|