Spaces:

s-b3
/

LifeStack

Sleeping

App Files Files Community

LifeStack / docs /reward.md

Soham Banerjee

deploy: pure lifestack with partitioned wisdom pool

77da5ce about 1 month ago

preview code

raw

history blame contribute delete

2.5 kB

reward.md — Reward System Reference

core/reward.py — Task-aware reward orchestrator.

Overview

Two reward functions are available:

Function	Used when
`compute_reward(...)`	Legacy / no-task episodes
`compute_task_reward(...)`	All task-driven episodes (v2.0+)

`compute_task_reward` — Components

reward = (0.35 × milestone)          # Reaching key progress markers
       + (0.25 × completion)         # Final goal achievement (binary 1.0 if any goal met)
       + (0.15 × outcome)            # Isolated local metric improvement
       + (0.10 × replan_bonus)       # Recovery after ExoEvents
       + (0.10 × efficiency)         # Resource preservation relative to delta
       + (0.05 × reasoning)          # Logical coherence & action alignment
       + penalties

Penalties

Penalty	Value	Level	Trigger
`INACTION_PENALTY`	`-0.40`	Step	`actions_taken == 0`
`TASK_INACTION_PENALTY`	`-0.20`	Task	`actions_taken == 0` (additive to step penalty)
`CRITICAL_FLOOR_VIOLATION`	`-0.50`	Step	Any metric drops below 20
`DEAD_END`	`-0.50`	Task	All viable routes closed without success
`CASCADE_SPREAD_WIDER`	`-0.30`	Step	Changes spread wider than disruption baseline
`RELATIONSHIP_COLLAPSE`	`-0.15`	Step	Relationships drop more than 20 points in one step
`CUMULATIVE_RELATIONSHIP_EROSION`	`-0.15`	Episode	Cumulative relationship drop more than 20 points
`PLAUSIBILITY_VIOLATION`	`-0.10 to -0.30`	Step	Implausible metric/cost ratio
`TIMEOUT`	`-0.20`	Task	Max steps reached without resolution

Return Value

Both functions return (reward: float, breakdown: dict), but the component keys differ slightly.

breakdown = {
    "components": {
        # compute_reward(...)
        "outcome": float,
        "containment": float,
        "efficiency": float,
        "preservation": float,
        "format_compliance": float,
        "plausibility": float,
        "reasoning_alignment": float,

        # compute_task_reward(...)
        "local_metric_delta": float,
        "milestone": float,
        "completion": float,
        "replan": float,
        "reasoning": float,
        "timeout_penalty": float,
    },
    "penalties_fired": list[str],
    "base_reward": float,
    "penalties_total": float,
}

Change Log

Date	Change
2026-04-23	Initial doc created