Spaces:

openenv-community
/

lifeops

Running

App Files Files Community

lifeops / ARCHITECTURE_REVIEW.md

avlukas

Add EpisodeTrace import and enhance travel issue checks

5bc5483 3 months ago

preview code

raw

history blame contribute delete

11.5 kB

	# LifeOps Architecture & Logic Review

	## Executive Summary

	The LifeOps environment is well-structured and mostly correct. Several bugs, edge cases, and design inconsistencies were identified. The most critical issues are: (1) baseline agent creates double-booked focus blocks, (2) dead reward code for conflict resolution, (3) no travel feasibility check for the first event of the day.

	---

	## 1. Environment Logic

	### 1.1 `reset()`

	Status: Generally correct.

	- Loads scenario, persona, calendar, tasks, pending requests, travel times.
	- `max_steps = max(5, len(pending) + 5)` is reasonable.
	- Edge case: `reset("invalid_id")` raises `KeyError` — consider catching and raising a clearer error.

	### 1.2 `step()`

	Status: Correct flow.

	- Validates action against `valid_actions()` via `_action_key`.
	- Applies request or focus action, increments step count, computes reward and done.

	Potential issue: `_action_key` does not include all fields that distinguish actions. For `block_focus_time`, two actions with same `(new_start_min, duration_min)` but different `new_end_min` (one None, one computed) could theoretically collide — in practice `new_end_min` is None for focus blocks, so this is fine.

	### 1.3 State Transitions

	Status: Correct.

	- Request actions: accept/reschedule add to calendar; reject/propose do not; all pop the current request.
	- Focus actions: add focus event, progress highest-priority unfinished task.

	### 1.4 Termination Conditions (`_is_done`)

	Status: Correct.

	- Done when: `step_count >= max_steps` OR (no pending requests AND all tasks complete).
	- Edge case: If `valid_actions()` returns empty (e.g., hypothetical scenario with no request and no unfinished tasks but `_is_done` False), the demo runner would crash on `valid[0]`. Current scenarios do not hit this.

	---

	## 2. Reward Calculation

	### 2.1 Correctness

	Overlap penalty: Correct. `-5.0 * len(next_overlaps)`.

	Travel penalty: Correct. `-4.0 * len(issues) * travel_aversion_weight`.

	Rejected important penalty: Correct. `-4.0` when rejecting importance ≥ 3.

	Preference penalty: Correct. Applied to accept/reschedule/propose for meeting-like events.

	Focus reward: Correct. `(1.0 + 0.02 * progress) * focus_time_weight`.

	Wasted focus penalty: `-0.5` when `block_focus_time` with `progress == 0`. In practice this is rare because focus blocks are only generated when `has_unfinished` is true, and progress is always made when there is an unfinished task. Defensive.

	### 2.2 Dead Code: `conflict_resolved_bonus`

	Bug: The reward includes:

	```python
	if prev_overlaps and len(next_overlaps) < len(prev_overlaps):
	reward += 3.0
	breakdown["conflict_resolved_bonus"] = 3.0
	```

	The calendar is append-only — events are never removed. Therefore `next_overlaps` can never have fewer pairs than `prev_overlaps`. This branch is never executed.

	Fix: Remove this block, or redesign if you later add event-removal/cancellation.

	### 2.3 Missing Penalties / Rewards

	- No reward for accepting important requests — only penalty for rejecting. Consider a small positive reward for accepting high-importance requests.
	- No explicit penalty for `propose_new_time` that suggests an infeasible time — the preference penalty applies, but overlap/travel of the proposed time are not penalized (since it is not added to the calendar). This may be intentional (proposal quality is soft).

	### 2.4 Unintended Reward Loops

	- None identified. The reward structure is straightforward.

	---

	## 3. State Consistency

	### 3.1 Calendar Updates

	Status: Correct. Events are appended; no removal or modification.

	### 3.2 Task Tracking

	Status: Correct. `remaining_minutes` is decremented in-place for the highest-priority unfinished task during focus blocks.

	### 3.3 Message/Request Handling

	Status: Correct. FIFO via `pending_requests.pop(0)`. `current_request` is always the first pending.

	---

	## 4. Travel Feasibility

	### 4.1 Detection of Impossible Travel

	Status: Correct for consecutive events. `travel_issues()` sorts by start time and checks each pair.

	### 4.2 Missing: Travel to First Event

	Bug: `travel_issues()` only checks `prev → next` for consecutive events. It never checks whether the user can reach the first event of the day. The model assumes the user is already at the location of the first event at its start time.

	Example: First event at 8:00 at Office, persona at Home, travel 25 min. User would need to leave by 7:35. This is not validated.

	Fix: Add an optional `start_location` (e.g., Home) to the persona/state and check travel from that location to the first event.

	### 4.3 Overlap Logic

	Status: Correct. `_overlap(a_start, a_end, b_start, b_end)` uses `a_start < b_end and b_start < a_end`. Touching events (a_end == b_start) do not overlap.

	### 4.4 Rescheduling Edge Cases

	- Reschedule/propose options use fixed deltas (-30, 30, 60). At day boundaries, `new_start` can clamp to the same value for different deltas, producing duplicate actions. This is harmless (same key).
	- No validation that rescheduled time is free — overlaps are penalized by reward. Acceptable for RL.

	---

	## 5. Action Handling

	### 5.1 All Actions Update State Correctly

	\| Action \| Calendar \| Pending Requests \| Tasks \|
	\|-------------------\|-----------------\|------------------\|------------\|
	\| accept_event \| +1 event \| pop \| — \|
	\| reject_event \| — \| pop \| — \|
	\| reschedule_event \| +1 event \| pop \| — \|
	\| propose_new_time \| — \| pop \| — \|
	\| block_focus_time \| +1 focus event \| — \| progress \|

	Status: Correct.

	### 5.2 Invalid Action Handling

	Status: Correct. `step()` raises `ValueError` if action key is not in `valid_keys`.

	### 5.3 Action Constraints

	Issue: `generate_valid_actions()` does not filter out:

	- Focus blocks that overlap with existing calendar events.
	- Reschedule/propose times that would overlap or cause travel issues.

	This is acceptable for RL (agent learns from penalties) but means the baseline can choose “valid” actions that create overlaps.

	---

	## 6. Demo Runner Correctness

	Note: There is no separate `play_episode.py`; the demo lives in `env/lifeops_env.py` under `if __name__ == "__main__"`.

	### 6.1 Reflects Real Environment Behavior

	Status: Yes. Uses `env.reset()`, `env.observation()`, `env.valid_actions()`, `env.step()`.

	### 6.2 Trajectories Exercise Key Logic

	Status: Partially. Tests cover accept, reject, propose, focus, overlap penalty, travel penalty. However:

	Bug: The baseline agent can create double-booked focus blocks. Observed in a run:

	- After handling the request, it scheduled focus blocks at 9:00, 11:00, 14:00, then again at 9:00 and again at 11:00, causing overlaps.

	Cause: `_choose_simple_action` scores focus blocks by simulating each option against the current calendar. Once a slot is used (e.g., 9:00), the next time it considers 9:00 vs 11:00 vs 14:00 vs 16:00, they may all overlap with existing focus blocks. When scores tie, it picks the first (9:00). So it reuses occupied slots.

	Fix: Filter focus blocks to exclude slots that overlap with the current calendar, or improve the baseline to prefer slots with zero overlaps (and handle ties by picking a free slot).

	---

	## 7. Baseline Agent Logic

	### 7.1 Avoids Double Booking?

	No. As above, the baseline can schedule overlapping focus blocks. For request actions it minimizes overlaps when choosing accept/reschedule/propose, so it tends to avoid double-booking requests. But for focus blocks it does not.

	### 7.2 Respects Travel Constraints?

	Yes. For request actions, it scores by `(overlaps, travel_issues)` and picks the action with the fewest. For focus blocks, it also minimizes travel issues. So it prefers feasible travel.

	### 7.3 Prioritizes High-Priority Obligations?

	Partially. It strongly prefers scheduling over rejecting (reject scores (999, 999)), so it rarely rejects important requests. But it does not explicitly prioritize by `importance`. It only minimizes overlaps and travel. For optional low-importance meetings it may still accept if that minimizes violations, instead of rejecting to free time for high-priority tasks.

	---

	## 8. Summary of Issues

	\| Severity \| Issue \| Location \| Fix \| Status \|
	\|----------\|-------\|----------\|-----\|--------\|
	\| High \| Baseline creates overlapping focus blocks \| `lifeops_env.py` `_choose_simple_action` \| Filter or re-score focus blocks to avoid already-used slots \| Fixed – prefer non-overlapping slots; fall back to least-bad when all overlap \|
	\| Medium \| `conflict_resolved_bonus` never triggers \| `reward.py` \| Remove dead code or add event removal to enable it \| Fixed – removed dead code \|
	\| Medium \| No travel check to first event of day \| `reward.py` `travel_issues` \| Add optional check from `start_location` to first event \| Fixed – added `start_location` param, uses `home_location` \|
	\| Low \| `reset("bad_id")` raises raw `KeyError` \| `lifeops_env.py` \| Catch and re-raise with clearer message \| Not applied (minor) \|
	\| Low \| Duplicate reschedule actions at boundaries \| `actions.py` \| Optional: deduplicate by `(new_start, new_end)` \| Not applied (harmless) \|
	\| Low \| Baseline never rejects (scores reject as 999,999) \| `lifeops_env.py` \| Consider allowing reject when all scheduling options are bad \| Fixed – reject now scores (0, 0) so it wins when scheduling causes issues \|

	---

	## 9. Suggested Fixes (Minimal Changes)

	### Fix 1: Baseline focus block selection

	In `_choose_simple_action`, when scoring focus blocks, prefer actions that result in zero overlaps. If all have overlaps, pick the one with the smallest overlap count, and among those prefer the one that overlaps with the fewest events (e.g., break ties by total overlap duration or event count).

	A simpler approach: filter `focus_actions` to exclude those whose `(new_start_min, duration_min)` would overlap with any existing calendar event. Use `detect_overlaps` with a simulated calendar including the candidate focus block.

	### Fix 2: Remove dead `conflict_resolved_bonus`

	Delete or comment out lines 99–101 in `reward.py` until the environment supports event removal.

	### Fix 3: Travel to first event (optional)

	Add a parameter `start_location` (default `None`) to the scenario or persona. If set, prepend a synthetic “start” event at `start_location` with `end_min=0` before the first real event, so `travel_issues` checks the first leg.

	---

	## 10. Edge Cases Not Handled

	1. Empty valid_actions: If both `current_request` is None and `has_unfinished` is False, `valid_actions` is empty. The demo would crash on `valid[0]`. Current scenarios avoid this.
	2. Event at midnight (0) or end of day (1440): Logic uses `<= 1440`; should be verified for boundary events.
	3. Zero-duration events: `_overlap` would treat (100, 100) and (100, 100) as overlapping (`100 < 100` is false, so no overlap). Zero-duration events are not generated.
	4. Multiple events at same start/end: Sorting by `(start_min, end_min)` is deterministic; `travel_issues` order is stable.
	5. Unknown locations in travel_times: Default 30 minutes is conservative; no explicit handling for missing keys.