LifeStack / docs /reward.md
Soham Banerjee
deploy: pure lifestack with partitioned wisdom pool
77da5ce
# reward.md — Reward System Reference
`core/reward.py` — Task-aware reward orchestrator.
---
## Overview
Two reward functions are available:
| Function | Used when |
|---|---|
| `compute_reward(...)` | Legacy / no-task episodes |
| `compute_task_reward(...)` | All task-driven episodes (v2.0+) |
---
## `compute_task_reward` — Components
```
reward = (0.35 × milestone) # Reaching key progress markers
+ (0.25 × completion) # Final goal achievement (binary 1.0 if any goal met)
+ (0.15 × outcome) # Isolated local metric improvement
+ (0.10 × replan_bonus) # Recovery after ExoEvents
+ (0.10 × efficiency) # Resource preservation relative to delta
+ (0.05 × reasoning) # Logical coherence & action alignment
+ penalties
```
### Penalties
| Penalty | Value | Level | Trigger |
|---|---|---|---|
| `INACTION_PENALTY` | `-0.40` | Step | `actions_taken == 0` |
| `TASK_INACTION_PENALTY` | `-0.20` | Task | `actions_taken == 0` (additive to step penalty) |
| `CRITICAL_FLOOR_VIOLATION` | `-0.50` | Step | Any metric drops below 20 |
| `DEAD_END` | `-0.50` | Task | All viable routes closed without success |
| `CASCADE_SPREAD_WIDER` | `-0.30` | Step | Changes spread wider than disruption baseline |
| `RELATIONSHIP_COLLAPSE` | `-0.15` | Step | Relationships drop more than 20 points in one step |
| `CUMULATIVE_RELATIONSHIP_EROSION` | `-0.15` | Episode | Cumulative relationship drop more than 20 points |
| `PLAUSIBILITY_VIOLATION` | `-0.10 to -0.30` | Step | Implausible metric/cost ratio |
| `TIMEOUT` | `-0.20` | Task | Max steps reached without resolution |
---
## Return Value
Both functions return `(reward: float, breakdown: dict)`, but the component keys differ slightly.
```python
breakdown = {
"components": {
# compute_reward(...)
"outcome": float,
"containment": float,
"efficiency": float,
"preservation": float,
"format_compliance": float,
"plausibility": float,
"reasoning_alignment": float,
# compute_task_reward(...)
"local_metric_delta": float,
"milestone": float,
"completion": float,
"replan": float,
"reasoning": float,
"timeout_penalty": float,
},
"penalties_fired": list[str],
"base_reward": float,
"penalties_total": float,
}
```
---
## Change Log
| Date | Change |
|---|---|
| 2026-04-23 | Initial doc created |