open-range / docs /mutation_policy.md
Lars Talian
Make mutation policy weights explicit
228ed67
# Mutation Policy Weights
`PopulationMutationPolicy` is a hand-authored heuristic policy, but its
weights and shaping constants are now explicit in
`src/open_range/builder/mutation_policy.py` under `MutationPolicySettings`.
The policy has three jobs:
1. Choose which stored snapshot is the best parent to mutate next.
2. Choose which structural mutation op to apply.
3. Choose which security/noise mutation op to apply.
## Parent Selection Terms
These fields live in `MutationPolicySettings.parent`.
| Field | Default | Why it exists |
| --- | ---: | --- |
| `frontier_weight` | `0.28` | Prefer snapshots near the current learning frontier instead of trivially solved or impossible ones. |
| `replay_weight` | `0.18` | Revisit under-played snapshots so the curriculum does not collapse to a tiny subset. |
| `novelty_weight` | `0.16` | Favor rarer vulnerability mixes across the population. |
| `weak_overlap_weight` | `0.18` | Bias parent choice toward snapshots that exercise known weak areas. |
| `lineage_balance_weight` | `0.08` | Prevent one root lineage from dominating the pool. |
| `depth_balance_weight` | `0.04` | Avoid over-sampling very deep descendant chains. |
| `recency_weight` | `0.04` | Cool down parents that were used repeatedly in the recent window. |
| `complexity_weight` | `0.04` | Slightly prefer richer parents with more structure to mutate from. |
Shaping constants in the same model explain how those raw signals are formed:
| Field | Default | Meaning |
| --- | ---: | --- |
| `minimum_total` | `0.05` | Sampling floor for low-scoring parents. |
| `unplayed_frontier_score` | `0.40` | Frontier score used before any play stats exist. |
| `empty_vuln_novelty_score` | `0.25` | Novelty fallback for snapshots with no typed vulnerabilities. |
| `preferred_generation_depth` | `3.0` | Depth after which descendant chains start being penalized. |
| `complexity_vuln_factor` | `0.25` | Complexity contribution per vulnerability. |
| `complexity_golden_path_factor` | `0.03` | Complexity contribution per golden-path step. |
| `complexity_dependency_edge_factor` | `0.02` | Complexity contribution per dependency edge. |
| `complexity_trust_edge_factor` | `0.02` | Complexity contribution per trust edge. |
| `complexity_cap` | `1.0` | Cap for the normalized complexity score. |
## Mutation Selection Terms
These fields live in `MutationPolicySettings.mutation`.
| Field | Default | Why it exists |
| --- | ---: | --- |
| `curriculum_weight` | `0.38` | Prefer ops that target the agent's current weakness. |
| `novelty_weight` | `0.24` | Prefer ops that open new surfaces or vary episode shape. |
| `structural_gain_weight` | `0.28` | Prefer ops that materially expand the scenario graph. |
| `lineage_weight` | `0.10` | Slight bias toward shallower lineage when all else is equal. |
| `minimum_total` | `0.05` | Sampling floor for low-scoring mutation ops. |
Raw novelty bonuses in `MutationPolicySettings.novelty`:
| Field | Default | Meaning |
| --- | ---: | --- |
| `base_bonus` | `0.40` | Baseline novelty for every op. |
| `new_vuln_class_bonus` | `1.0` | Extra novelty for a vulnerability class not seen recently. |
| `new_noise_surface_bonus` | `0.50` | Extra novelty for noise on a new attack surface. |
| `structural_op_bonus` | `0.40` | Extra novelty for non-security ops that change the graph. |
Raw curriculum bonuses in `MutationPolicySettings.curriculum`:
| Field | Default | Meaning |
| --- | ---: | --- |
| `base_bonus` | `0.35` | Baseline curriculum value for every op. |
| `weak_area_bonus` | `1.50` | Reward seeding a vulnerability in a known weak area. |
| `new_vuln_bonus` | `0.40` | Reward introducing a vulnerability class not present in the parent. |
| `chain_length_bonus` | `0.60` | Reward edges that help satisfy multi-hop chain requirements. |
| `focus_identity_bonus` | `0.50` | Reward identity-layer ops when curriculum focus is identity. |
| `focus_infra_bonus` | `0.50` | Reward infra-layer ops when curriculum focus is infra. |
| `focus_process_bonus` | `0.40` | Reward benign noise when focus is process realism. |
## Structural Gain Table
These fields live in `MutationPolicySettings.structural_gains`.
| Op Type | Default |
| --- | ---: |
| `add_service` | `1.00` |
| `add_dependency_edge` | `0.90` |
| `add_trust_edge` | `0.85` |
| `add_user` | `0.80` |
| `seed_vuln` | `0.70` |
| `add_benign_noise` | `0.30` |
| `default_gain` | `0.20` |
## Tuning Path
You can swap weights without touching policy code:
1. Write a JSON or YAML file matching `MutationPolicySettings`.
2. Load it with `load_mutation_policy_settings(path)` or pass it into `PopulationMutationPolicy(settings=...)`.
3. Compare it against the default policy with:
```bash
PYTHONPATH=src .venv/bin/python scripts/calibrate_mutation_policy.py \
--store-dir snapshots \
--stats path/to/snapshot_stats.json \
--context path/to/build_context.json \
--settings tuned=path/to/policy_settings.yaml
```
The calibration output is JSON so it can be diffed, archived, or fed into
notebooks. Parent-selection logs and `MutationPlan.score_breakdown` now expose
weighted contributions instead of only raw feature values.