Spaces:

s-b3
/

LifeStack

Sleeping

App Files Files Community

LifeStack / docs /reward.md

Soham Banerjee

deploy: pure lifestack with partitioned wisdom pool

77da5ce about 1 month ago

preview code

raw

history blame contribute delete

2.5 kB

	# reward.md — Reward System Reference

	`core/reward.py` — Task-aware reward orchestrator.

	---

	## Overview

	Two reward functions are available:

	\| Function \| Used when \|
	\|---\|---\|
	\| `compute_reward(...)` \| Legacy / no-task episodes \|
	\| `compute_task_reward(...)` \| All task-driven episodes (v2.0+) \|

	---

	## `compute_task_reward` — Components

	```
	reward = (0.35 × milestone) # Reaching key progress markers
	+ (0.25 × completion) # Final goal achievement (binary 1.0 if any goal met)
	+ (0.15 × outcome) # Isolated local metric improvement
	+ (0.10 × replan_bonus) # Recovery after ExoEvents
	+ (0.10 × efficiency) # Resource preservation relative to delta
	+ (0.05 × reasoning) # Logical coherence & action alignment
	+ penalties
	```

	### Penalties

	\| Penalty \| Value \| Level \| Trigger \|
	\|---\|---\|---\|---\|
	\| `INACTION_PENALTY` \| `-0.40` \| Step \| `actions_taken == 0` \|
	\| `TASK_INACTION_PENALTY` \| `-0.20` \| Task \| `actions_taken == 0` (additive to step penalty) \|
	\| `CRITICAL_FLOOR_VIOLATION` \| `-0.50` \| Step \| Any metric drops below 20 \|
	\| `DEAD_END` \| `-0.50` \| Task \| All viable routes closed without success \|
	\| `CASCADE_SPREAD_WIDER` \| `-0.30` \| Step \| Changes spread wider than disruption baseline \|
	\| `RELATIONSHIP_COLLAPSE` \| `-0.15` \| Step \| Relationships drop more than 20 points in one step \|
	\| `CUMULATIVE_RELATIONSHIP_EROSION` \| `-0.15` \| Episode \| Cumulative relationship drop more than 20 points \|
	\| `PLAUSIBILITY_VIOLATION` \| `-0.10 to -0.30` \| Step \| Implausible metric/cost ratio \|
	\| `TIMEOUT` \| `-0.20` \| Task \| Max steps reached without resolution \|

	---

	## Return Value

	Both functions return `(reward: float, breakdown: dict)`, but the component keys differ slightly.

	```python
	breakdown = {
	"components": {
	# compute_reward(...)
	"outcome": float,
	"containment": float,
	"efficiency": float,
	"preservation": float,
	"format_compliance": float,
	"plausibility": float,
	"reasoning_alignment": float,

	# compute_task_reward(...)
	"local_metric_delta": float,
	"milestone": float,
	"completion": float,
	"replan": float,
	"reasoning": float,
	"timeout_penalty": float,
	},
	"penalties_fired": list[str],
	"base_reward": float,
	"penalties_total": float,
	}
	```

	---

	## Change Log

	\| Date \| Change \|
	\|---\|---\|
	\| 2026-04-23 \| Initial doc created \|