shinka-backup / docs /codex_dev.md
JustinTX's picture
Add files using upload-large-folder tool
1556404 verified
# Codex Dev Notes
## Shinka Search And Memory Redesign
This update changed the Shinka-side evolution loop to be less parent-anchored and more deliberate about search strategy.
### 1. Structured universal memory
Added `shinka/core/universal_memory.py`.
The new memory layer records one structured entry per evaluated program, including:
- generation
- program id and parent id
- search mode
- patch type
- patch name
- prompt intent
- correctness
- combined score
- parent score and score delta
- failure mode
- verdict (`win`, `loss`, `neutral`, `invalid`)
- summarized `aux_*` metrics
- tags
The memory is persisted at:
`<results_dir>/universal_memory.json`
It also maintains aggregated strategy statistics so future prompts can see:
- recent wins
- recent failures
- mode-specific outcomes
- underexplored strategy families
### 2. Adaptive search-mode controller
Added `shinka/core/search_mode.py`.
The controller chooses a search mode per generation:
- `refine`
- `recombine`
- `diverge`
- `restart`
- `theory`
The choice is currently heuristic and uses:
- recent best-score change
- recent invalid-rate
- recent mode repetition
- whether some modes are underexplored
Current behavior:
- high invalid-rate pushes toward `restart`
- plateau after repeated `refine` pushes toward `diverge`
- periodic generations can trigger `theory`
- underexplored or scheduled turns can trigger `recombine`
### 3. Runner integration
Updated `shinka/core/runner.py`.
Main integration points:
- initialize `UniversalMemory`
- initialize `SearchModeController`
- choose a search mode before each generation is patched
- adapt parent/inspiration usage based on search mode
- inject search metadata into patch metadata
- write evaluated outcomes back into structured memory
Important effects:
- generation 0 is recorded into universal memory
- completed jobs now store `search_mode`, `search_rationale`, and `prompt_intent`
- post-evaluation writes structured results back into universal memory
- `restart` mode can fall back to a generation-0 style parent and clear inspirations
- `diverge`, `recombine`, and `theory` trim or reshape inspiration context
### 4. Prompt changes
Updated `shinka/core/sampler.py`.
Prompt construction now supports:
- `search_mode`
- `search_rationale`
- `prompt_intent`
- `memory_context`
New prompt sections:
- `# Search Mode`
- `# Universal Memory`
Mode-specific behavior:
- `restart` strongly prefers `full`
- `theory` biases toward `full` or `cross`
- `diverge` biases toward larger jumps
- `recombine` biases toward crossover
- context is reduced or cleared for modes that should not overfit to the current lineage
This is intended to reduce the previous tendency to always optimize as a local descendant of the current parent.
### 5. Package exports
Updated `shinka/core/__init__.py` to export:
- `UniversalMemory`
- `SearchModeController`
- `SearchModeDecision`
### 6. Verification
Performed:
- Python syntax check with `python3 -m py_compile` on the modified core files
Not yet performed:
- full end-to-end experiment validation
- behavioral verification of mode switching during long runs
### 7. Expected runtime artifacts
After running an experiment, expect:
- `universal_memory.json` in the experiment root
- search metadata in program `metadata`
- prompt traces showing explicit search modes
- more varied search behavior across generations
### 8. Known limitations of this first pass
This is a first implementation, not the final architecture.
Current limitations:
- search-mode control is heuristic, not learned
- universal memory retrieval is still recent-history oriented
- memory does not yet cluster strategy families semantically
- prompt memory is summarized text, not selective structured retrieval
- no dedicated controller yet for allocating a fixed exploration budget by mode
The current goal was to create the first working foundation for:
- structured experiment memory
- explicit mode switching
- reduced parent anchoring