reward.md — Reward System Reference
core/reward.py — Task-aware reward orchestrator.
Overview
Two reward functions are available:
| Function | Used when |
|---|---|
compute_reward(...) |
Legacy / no-task episodes |
compute_task_reward(...) |
All task-driven episodes (v2.0+) |
compute_task_reward — Components
reward = (0.35 × milestone) # Reaching key progress markers
+ (0.25 × completion) # Final goal achievement (binary 1.0 if any goal met)
+ (0.15 × outcome) # Isolated local metric improvement
+ (0.10 × replan_bonus) # Recovery after ExoEvents
+ (0.10 × efficiency) # Resource preservation relative to delta
+ (0.05 × reasoning) # Logical coherence & action alignment
+ penalties
Penalties
| Penalty | Value | Level | Trigger |
|---|---|---|---|
INACTION_PENALTY |
-0.40 |
Step | actions_taken == 0 |
TASK_INACTION_PENALTY |
-0.20 |
Task | actions_taken == 0 (additive to step penalty) |
CRITICAL_FLOOR_VIOLATION |
-0.50 |
Step | Any metric drops below 20 |
DEAD_END |
-0.50 |
Task | All viable routes closed without success |
CASCADE_SPREAD_WIDER |
-0.30 |
Step | Changes spread wider than disruption baseline |
RELATIONSHIP_COLLAPSE |
-0.15 |
Step | Relationships drop more than 20 points in one step |
CUMULATIVE_RELATIONSHIP_EROSION |
-0.15 |
Episode | Cumulative relationship drop more than 20 points |
PLAUSIBILITY_VIOLATION |
-0.10 to -0.30 |
Step | Implausible metric/cost ratio |
TIMEOUT |
-0.20 |
Task | Max steps reached without resolution |
Return Value
Both functions return (reward: float, breakdown: dict), but the component keys differ slightly.
breakdown = {
"components": {
# compute_reward(...)
"outcome": float,
"containment": float,
"efficiency": float,
"preservation": float,
"format_compliance": float,
"plausibility": float,
"reasoning_alignment": float,
# compute_task_reward(...)
"local_metric_delta": float,
"milestone": float,
"completion": float,
"replan": float,
"reasoning": float,
"timeout_penalty": float,
},
"penalties_fired": list[str],
"base_reward": float,
"penalties_total": float,
}
Change Log
| Date | Change |
|---|---|
| 2026-04-23 | Initial doc created |