Spaces:

abrown31
/

open-range

Runtime error

App Files Files Community

open-range / docs /mutation_policy.md

Lars Talian

Make mutation policy weights explicit

228ed67 2 months ago

preview code

raw

history blame contribute delete

5.16 kB

	# Mutation Policy Weights

	`PopulationMutationPolicy` is a hand-authored heuristic policy, but its
	weights and shaping constants are now explicit in
	`src/open_range/builder/mutation_policy.py` under `MutationPolicySettings`.

	The policy has three jobs:

	1. Choose which stored snapshot is the best parent to mutate next.
	2. Choose which structural mutation op to apply.
	3. Choose which security/noise mutation op to apply.

	## Parent Selection Terms

	These fields live in `MutationPolicySettings.parent`.

	\| Field \| Default \| Why it exists \|
	\| --- \| ---: \| --- \|
	\| `frontier_weight` \| `0.28` \| Prefer snapshots near the current learning frontier instead of trivially solved or impossible ones. \|
	\| `replay_weight` \| `0.18` \| Revisit under-played snapshots so the curriculum does not collapse to a tiny subset. \|
	\| `novelty_weight` \| `0.16` \| Favor rarer vulnerability mixes across the population. \|
	\| `weak_overlap_weight` \| `0.18` \| Bias parent choice toward snapshots that exercise known weak areas. \|
	\| `lineage_balance_weight` \| `0.08` \| Prevent one root lineage from dominating the pool. \|
	\| `depth_balance_weight` \| `0.04` \| Avoid over-sampling very deep descendant chains. \|
	\| `recency_weight` \| `0.04` \| Cool down parents that were used repeatedly in the recent window. \|
	\| `complexity_weight` \| `0.04` \| Slightly prefer richer parents with more structure to mutate from. \|

	Shaping constants in the same model explain how those raw signals are formed:

	\| Field \| Default \| Meaning \|
	\| --- \| ---: \| --- \|
	\| `minimum_total` \| `0.05` \| Sampling floor for low-scoring parents. \|
	\| `unplayed_frontier_score` \| `0.40` \| Frontier score used before any play stats exist. \|
	\| `empty_vuln_novelty_score` \| `0.25` \| Novelty fallback for snapshots with no typed vulnerabilities. \|
	\| `preferred_generation_depth` \| `3.0` \| Depth after which descendant chains start being penalized. \|
	\| `complexity_vuln_factor` \| `0.25` \| Complexity contribution per vulnerability. \|
	\| `complexity_golden_path_factor` \| `0.03` \| Complexity contribution per golden-path step. \|
	\| `complexity_dependency_edge_factor` \| `0.02` \| Complexity contribution per dependency edge. \|
	\| `complexity_trust_edge_factor` \| `0.02` \| Complexity contribution per trust edge. \|
	\| `complexity_cap` \| `1.0` \| Cap for the normalized complexity score. \|

	## Mutation Selection Terms

	These fields live in `MutationPolicySettings.mutation`.

	\| Field \| Default \| Why it exists \|
	\| --- \| ---: \| --- \|
	\| `curriculum_weight` \| `0.38` \| Prefer ops that target the agent's current weakness. \|
	\| `novelty_weight` \| `0.24` \| Prefer ops that open new surfaces or vary episode shape. \|
	\| `structural_gain_weight` \| `0.28` \| Prefer ops that materially expand the scenario graph. \|
	\| `lineage_weight` \| `0.10` \| Slight bias toward shallower lineage when all else is equal. \|
	\| `minimum_total` \| `0.05` \| Sampling floor for low-scoring mutation ops. \|

	Raw novelty bonuses in `MutationPolicySettings.novelty`:

	\| Field \| Default \| Meaning \|
	\| --- \| ---: \| --- \|
	\| `base_bonus` \| `0.40` \| Baseline novelty for every op. \|
	\| `new_vuln_class_bonus` \| `1.0` \| Extra novelty for a vulnerability class not seen recently. \|
	\| `new_noise_surface_bonus` \| `0.50` \| Extra novelty for noise on a new attack surface. \|
	\| `structural_op_bonus` \| `0.40` \| Extra novelty for non-security ops that change the graph. \|

	Raw curriculum bonuses in `MutationPolicySettings.curriculum`:

	\| Field \| Default \| Meaning \|
	\| --- \| ---: \| --- \|
	\| `base_bonus` \| `0.35` \| Baseline curriculum value for every op. \|
	\| `weak_area_bonus` \| `1.50` \| Reward seeding a vulnerability in a known weak area. \|
	\| `new_vuln_bonus` \| `0.40` \| Reward introducing a vulnerability class not present in the parent. \|
	\| `chain_length_bonus` \| `0.60` \| Reward edges that help satisfy multi-hop chain requirements. \|
	\| `focus_identity_bonus` \| `0.50` \| Reward identity-layer ops when curriculum focus is identity. \|
	\| `focus_infra_bonus` \| `0.50` \| Reward infra-layer ops when curriculum focus is infra. \|
	\| `focus_process_bonus` \| `0.40` \| Reward benign noise when focus is process realism. \|

	## Structural Gain Table

	These fields live in `MutationPolicySettings.structural_gains`.

	\| Op Type \| Default \|
	\| --- \| ---: \|
	\| `add_service` \| `1.00` \|
	\| `add_dependency_edge` \| `0.90` \|
	\| `add_trust_edge` \| `0.85` \|
	\| `add_user` \| `0.80` \|
	\| `seed_vuln` \| `0.70` \|
	\| `add_benign_noise` \| `0.30` \|
	\| `default_gain` \| `0.20` \|

	## Tuning Path

	You can swap weights without touching policy code:

	1. Write a JSON or YAML file matching `MutationPolicySettings`.
	2. Load it with `load_mutation_policy_settings(path)` or pass it into `PopulationMutationPolicy(settings=...)`.
	3. Compare it against the default policy with:

	```bash
	PYTHONPATH=src .venv/bin/python scripts/calibrate_mutation_policy.py \
	--store-dir snapshots \
	--stats path/to/snapshot_stats.json \
	--context path/to/build_context.json \
	--settings tuned=path/to/policy_settings.yaml
	```

	The calibration output is JSON so it can be diffed, archived, or fed into
	notebooks. Parent-selection logs and `MutationPlan.score_breakdown` now expose
	weighted contributions instead of only raw feature values.

	# Mutation Policy Weights

	`PopulationMutationPolicy` is a hand-authored heuristic policy, but its
	weights and shaping constants are now explicit in
	`src/open_range/builder/mutation_policy.py` under `MutationPolicySettings`.

	The policy has three jobs:

	1. Choose which stored snapshot is the best parent to mutate next.
	2. Choose which structural mutation op to apply.
	3. Choose which security/noise mutation op to apply.

	## Parent Selection Terms

	These fields live in `MutationPolicySettings.parent`.

	\| Field \| Default \| Why it exists \|
	\| --- \| ---: \| --- \|
	\| `frontier_weight` \| `0.28` \| Prefer snapshots near the current learning frontier instead of trivially solved or impossible ones. \|
	\| `replay_weight` \| `0.18` \| Revisit under-played snapshots so the curriculum does not collapse to a tiny subset. \|
	\| `novelty_weight` \| `0.16` \| Favor rarer vulnerability mixes across the population. \|
	\| `weak_overlap_weight` \| `0.18` \| Bias parent choice toward snapshots that exercise known weak areas. \|
	\| `lineage_balance_weight` \| `0.08` \| Prevent one root lineage from dominating the pool. \|
	\| `depth_balance_weight` \| `0.04` \| Avoid over-sampling very deep descendant chains. \|
	\| `recency_weight` \| `0.04` \| Cool down parents that were used repeatedly in the recent window. \|
	\| `complexity_weight` \| `0.04` \| Slightly prefer richer parents with more structure to mutate from. \|

	Shaping constants in the same model explain how those raw signals are formed:

	\| Field \| Default \| Meaning \|
	\| --- \| ---: \| --- \|
	\| `minimum_total` \| `0.05` \| Sampling floor for low-scoring parents. \|
	\| `unplayed_frontier_score` \| `0.40` \| Frontier score used before any play stats exist. \|
	\| `empty_vuln_novelty_score` \| `0.25` \| Novelty fallback for snapshots with no typed vulnerabilities. \|
	\| `preferred_generation_depth` \| `3.0` \| Depth after which descendant chains start being penalized. \|
	\| `complexity_vuln_factor` \| `0.25` \| Complexity contribution per vulnerability. \|
	\| `complexity_golden_path_factor` \| `0.03` \| Complexity contribution per golden-path step. \|
	\| `complexity_dependency_edge_factor` \| `0.02` \| Complexity contribution per dependency edge. \|
	\| `complexity_trust_edge_factor` \| `0.02` \| Complexity contribution per trust edge. \|
	\| `complexity_cap` \| `1.0` \| Cap for the normalized complexity score. \|

	## Mutation Selection Terms

	These fields live in `MutationPolicySettings.mutation`.

	\| Field \| Default \| Why it exists \|
	\| --- \| ---: \| --- \|
	\| `curriculum_weight` \| `0.38` \| Prefer ops that target the agent's current weakness. \|
	\| `novelty_weight` \| `0.24` \| Prefer ops that open new surfaces or vary episode shape. \|
	\| `structural_gain_weight` \| `0.28` \| Prefer ops that materially expand the scenario graph. \|
	\| `lineage_weight` \| `0.10` \| Slight bias toward shallower lineage when all else is equal. \|
	\| `minimum_total` \| `0.05` \| Sampling floor for low-scoring mutation ops. \|

	Raw novelty bonuses in `MutationPolicySettings.novelty`:

	\| Field \| Default \| Meaning \|
	\| --- \| ---: \| --- \|
	\| `base_bonus` \| `0.40` \| Baseline novelty for every op. \|
	\| `new_vuln_class_bonus` \| `1.0` \| Extra novelty for a vulnerability class not seen recently. \|
	\| `new_noise_surface_bonus` \| `0.50` \| Extra novelty for noise on a new attack surface. \|
	\| `structural_op_bonus` \| `0.40` \| Extra novelty for non-security ops that change the graph. \|

	Raw curriculum bonuses in `MutationPolicySettings.curriculum`:

	\| Field \| Default \| Meaning \|
	\| --- \| ---: \| --- \|
	\| `base_bonus` \| `0.35` \| Baseline curriculum value for every op. \|
	\| `weak_area_bonus` \| `1.50` \| Reward seeding a vulnerability in a known weak area. \|
	\| `new_vuln_bonus` \| `0.40` \| Reward introducing a vulnerability class not present in the parent. \|
	\| `chain_length_bonus` \| `0.60` \| Reward edges that help satisfy multi-hop chain requirements. \|
	\| `focus_identity_bonus` \| `0.50` \| Reward identity-layer ops when curriculum focus is identity. \|
	\| `focus_infra_bonus` \| `0.50` \| Reward infra-layer ops when curriculum focus is infra. \|
	\| `focus_process_bonus` \| `0.40` \| Reward benign noise when focus is process realism. \|

	## Structural Gain Table

	These fields live in `MutationPolicySettings.structural_gains`.

	\| Op Type \| Default \|
	\| --- \| ---: \|
	\| `add_service` \| `1.00` \|
	\| `add_dependency_edge` \| `0.90` \|
	\| `add_trust_edge` \| `0.85` \|
	\| `add_user` \| `0.80` \|
	\| `seed_vuln` \| `0.70` \|
	\| `add_benign_noise` \| `0.30` \|
	\| `default_gain` \| `0.20` \|

	## Tuning Path

	You can swap weights without touching policy code:

	1. Write a JSON or YAML file matching `MutationPolicySettings`.
	2. Load it with `load_mutation_policy_settings(path)` or pass it into `PopulationMutationPolicy(settings=...)`.
	3. Compare it against the default policy with:

	```bash
	PYTHONPATH=src .venv/bin/python scripts/calibrate_mutation_policy.py \
	--store-dir snapshots \
	--stats path/to/snapshot_stats.json \
	--context path/to/build_context.json \
	--settings tuned=path/to/policy_settings.yaml
	```

	The calibration output is JSON so it can be diffed, archived, or fed into
	notebooks. Parent-selection logs and `MutationPlan.score_breakdown` now expose
	weighted contributions instead of only raw feature values.