LifeStack / docs /reward.md
Soham Banerjee
deploy: pure lifestack with partitioned wisdom pool
77da5ce

reward.md — Reward System Reference

core/reward.py — Task-aware reward orchestrator.


Overview

Two reward functions are available:

Function Used when
compute_reward(...) Legacy / no-task episodes
compute_task_reward(...) All task-driven episodes (v2.0+)

compute_task_reward — Components

reward = (0.35 × milestone)          # Reaching key progress markers
       + (0.25 × completion)         # Final goal achievement (binary 1.0 if any goal met)
       + (0.15 × outcome)            # Isolated local metric improvement
       + (0.10 × replan_bonus)       # Recovery after ExoEvents
       + (0.10 × efficiency)         # Resource preservation relative to delta
       + (0.05 × reasoning)          # Logical coherence & action alignment
       + penalties

Penalties

Penalty Value Level Trigger
INACTION_PENALTY -0.40 Step actions_taken == 0
TASK_INACTION_PENALTY -0.20 Task actions_taken == 0 (additive to step penalty)
CRITICAL_FLOOR_VIOLATION -0.50 Step Any metric drops below 20
DEAD_END -0.50 Task All viable routes closed without success
CASCADE_SPREAD_WIDER -0.30 Step Changes spread wider than disruption baseline
RELATIONSHIP_COLLAPSE -0.15 Step Relationships drop more than 20 points in one step
CUMULATIVE_RELATIONSHIP_EROSION -0.15 Episode Cumulative relationship drop more than 20 points
PLAUSIBILITY_VIOLATION -0.10 to -0.30 Step Implausible metric/cost ratio
TIMEOUT -0.20 Task Max steps reached without resolution

Return Value

Both functions return (reward: float, breakdown: dict), but the component keys differ slightly.

breakdown = {
    "components": {
        # compute_reward(...)
        "outcome": float,
        "containment": float,
        "efficiency": float,
        "preservation": float,
        "format_compliance": float,
        "plausibility": float,
        "reasoning_alignment": float,

        # compute_task_reward(...)
        "local_metric_delta": float,
        "milestone": float,
        "completion": float,
        "replan": float,
        "reasoning": float,
        "timeout_penalty": float,
    },
    "penalties_fired": list[str],
    "base_reward": float,
    "penalties_total": float,
}

Change Log

Date Change
2026-04-23 Initial doc created