Spaces:

openenv-community
/

lifeops

Running

avlukas commited on Mar 7

Commit

5bc5483

1 Parent(s): d40203a

Add EpisodeTrace import and enhance travel issue checks

- Imported EpisodeTrace in lifeops_env.py.
- Added last_task_id_progressed to LifeOpsState for tracking task progress.
- Updated travel_issues function in reward.py to accept start_location for improved travel feasibility checks.
- Modified action selection logic to consider overlaps and travel issues more effectively, including home location in calculations.

Files changed (6) hide show

ARCHITECTURE_REVIEW.md +230 -0
env/episode_trace.py +191 -0
env/lifeops_env.py +86 -22
env/reward.py +19 -5
training/__init__.py +1 -0
training/train_rl.py +217 -0

ARCHITECTURE_REVIEW.md ADDED Viewed

	@@ -0,0 +1,230 @@

+# LifeOps Architecture & Logic Review
+## Executive Summary
+The LifeOps environment is well-structured and mostly correct. Several bugs, edge cases, and design inconsistencies were identified. The most critical issues are: (1) baseline agent creates double-booked focus blocks, (2) dead reward code for conflict resolution, (3) no travel feasibility check for the first event of the day.
+---
+## 1. Environment Logic
+### 1.1 `reset()`
+**Status:** Generally correct.
+- Loads scenario, persona, calendar, tasks, pending requests, travel times.
+- `max_steps = max(5, len(pending) + 5)` is reasonable.
+- **Edge case:** `reset("invalid_id")` raises `KeyError` — consider catching and raising a clearer error.
+### 1.2 `step()`
+**Status:** Correct flow.
+- Validates action against `valid_actions()` via `_action_key`.
+- Applies request or focus action, increments step count, computes reward and done.
+**Potential issue:** `_action_key` does not include all fields that distinguish actions. For `block_focus_time`, two actions with same `(new_start_min, duration_min)` but different `new_end_min` (one None, one computed) could theoretically collide — in practice `new_end_min` is None for focus blocks, so this is fine.
+### 1.3 State Transitions
+**Status:** Correct.
+- Request actions: accept/reschedule add to calendar; reject/propose do not; all pop the current request.
+- Focus actions: add focus event, progress highest-priority unfinished task.
+### 1.4 Termination Conditions (`_is_done`)
+**Status:** Correct.
+- Done when: `step_count >= max_steps` OR (no pending requests AND all tasks complete).
+- **Edge case:** If `valid_actions()` returns empty (e.g., hypothetical scenario with no request and no unfinished tasks but `_is_done` False), the demo runner would crash on `valid[0]`. Current scenarios do not hit this.
+---
+## 2. Reward Calculation
+### 2.1 Correctness
+**Overlap penalty:** Correct. `-5.0 * len(next_overlaps)`.
+**Travel penalty:** Correct. `-4.0 * len(issues) * travel_aversion_weight`.
+**Rejected important penalty:** Correct. `-4.0` when rejecting importance ≥ 3.
+**Preference penalty:** Correct. Applied to accept/reschedule/propose for meeting-like events.
+**Focus reward:** Correct. `(1.0 + 0.02 * progress) * focus_time_weight`.
+**Wasted focus penalty:** `-0.5` when `block_focus_time` with `progress == 0`. In practice this is rare because focus blocks are only generated when `has_unfinished` is true, and progress is always made when there is an unfinished task. Defensive.
+### 2.2 Dead Code: `conflict_resolved_bonus`
+**Bug:** The reward includes:
+```python
+if prev_overlaps and len(next_overlaps) < len(prev_overlaps):
+    reward += 3.0
+    breakdown["conflict_resolved_bonus"] = 3.0
+```
+The calendar is **append-only** — events are never removed. Therefore `next_overlaps` can never have fewer pairs than `prev_overlaps`. This branch is **never executed**.
+**Fix:** Remove this block, or redesign if you later add event-removal/cancellation.
+### 2.3 Missing Penalties / Rewards
+- **No reward for accepting important requests** — only penalty for rejecting. Consider a small positive reward for accepting high-importance requests.
+- **No explicit penalty for `propose_new_time` that suggests an infeasible time** — the preference penalty applies, but overlap/travel of the proposed time are not penalized (since it is not added to the calendar). This may be intentional (proposal quality is soft).
+### 2.4 Unintended Reward Loops
+- None identified. The reward structure is straightforward.
+---
+## 3. State Consistency
+### 3.1 Calendar Updates
+**Status:** Correct. Events are appended; no removal or modification.
+### 3.2 Task Tracking
+**Status:** Correct. `remaining_minutes` is decremented in-place for the highest-priority unfinished task during focus blocks.
+### 3.3 Message/Request Handling
+**Status:** Correct. FIFO via `pending_requests.pop(0)`. `current_request` is always the first pending.
+---
+## 4. Travel Feasibility
+### 4.1 Detection of Impossible Travel
+**Status:** Correct for consecutive events. `travel_issues()` sorts by start time and checks each pair.
+### 4.2 Missing: Travel to First Event
+**Bug:** `travel_issues()` only checks `prev → next` for consecutive events. It never checks whether the user can reach the **first** event of the day. The model assumes the user is already at the location of the first event at its start time.
+**Example:** First event at 8:00 at Office, persona at Home, travel 25 min. User would need to leave by 7:35. This is not validated.
+**Fix:** Add an optional `start_location` (e.g., Home) to the persona/state and check travel from that location to the first event.
+### 4.3 Overlap Logic
+**Status:** Correct. `_overlap(a_start, a_end, b_start, b_end)` uses `a_start < b_end and b_start < a_end`. Touching events (a_end == b_start) do not overlap.
+### 4.4 Rescheduling Edge Cases
+- Reschedule/propose options use fixed deltas (-30, 30, 60). At day boundaries, `new_start` can clamp to the same value for different deltas, producing duplicate actions. This is harmless (same key).
+- No validation that rescheduled time is free — overlaps are penalized by reward. Acceptable for RL.
+---
+## 5. Action Handling
+### 5.1 All Actions Update State Correctly
+| Action            | Calendar        | Pending Requests | Tasks      |
+|-------------------|-----------------|------------------|------------|
+| accept_event      | +1 event        | pop              | —          |
+| reject_event      | —               | pop              | —          |
+| reschedule_event  | +1 event        | pop              | —          |
+| propose_new_time  | —               | pop              | —          |
+| block_focus_time  | +1 focus event  | —                | progress   |
+**Status:** Correct.
+### 5.2 Invalid Action Handling
+**Status:** Correct. `step()` raises `ValueError` if action key is not in `valid_keys`.
+### 5.3 Action Constraints
+**Issue:** `generate_valid_actions()` does **not** filter out:
+- Focus blocks that overlap with existing calendar events.
+- Reschedule/propose times that would overlap or cause travel issues.
+This is acceptable for RL (agent learns from penalties) but means the baseline can choose “valid” actions that create overlaps.
+---
+## 6. Demo Runner Correctness
+**Note:** There is no separate `play_episode.py`; the demo lives in `env/lifeops_env.py` under `if __name__ == "__main__"`.
+### 6.1 Reflects Real Environment Behavior
+**Status:** Yes. Uses `env.reset()`, `env.observation()`, `env.valid_actions()`, `env.step()`.
+### 6.2 Trajectories Exercise Key Logic
+**Status:** Partially. Tests cover accept, reject, propose, focus, overlap penalty, travel penalty. However:
+**Bug:** The baseline agent **can create double-booked focus blocks**. Observed in a run:
+- After handling the request, it scheduled focus blocks at 9:00, 11:00, 14:00, then **again at 9:00** and **again at 11:00**, causing overlaps.
+**Cause:** `_choose_simple_action` scores focus blocks by simulating each option against the **current** calendar. Once a slot is used (e.g., 9:00), the next time it considers 9:00 vs 11:00 vs 14:00 vs 16:00, they may all overlap with existing focus blocks. When scores tie, it picks the first (9:00). So it reuses occupied slots.
+**Fix:** Filter focus blocks to exclude slots that overlap with the current calendar, or improve the baseline to prefer slots with zero overlaps (and handle ties by picking a free slot).
+---
+## 7. Baseline Agent Logic
+### 7.1 Avoids Double Booking?
+**No.** As above, the baseline can schedule overlapping focus blocks. For **request** actions it minimizes overlaps when choosing accept/reschedule/propose, so it tends to avoid double-booking requests. But for focus blocks it does not.
+### 7.2 Respects Travel Constraints?
+**Yes.** For request actions, it scores by `(overlaps, travel_issues)` and picks the action with the fewest. For focus blocks, it also minimizes travel issues. So it prefers feasible travel.
+### 7.3 Prioritizes High-Priority Obligations?
+**Partially.** It strongly prefers scheduling over rejecting (reject scores (999, 999)), so it rarely rejects important requests. But it does not explicitly prioritize by `importance`. It only minimizes overlaps and travel. For optional low-importance meetings it may still accept if that minimizes violations, instead of rejecting to free time for high-priority tasks.
+---
+## 8. Summary of Issues
+| Severity | Issue | Location | Fix | Status |
+|----------|-------|----------|-----|--------|
+| High     | Baseline creates overlapping focus blocks | `lifeops_env.py` `_choose_simple_action` | Filter or re-score focus blocks to avoid already-used slots | **Fixed** – prefer non-overlapping slots; fall back to least-bad when all overlap |
+| Medium   | `conflict_resolved_bonus` never triggers | `reward.py` | Remove dead code or add event removal to enable it | **Fixed** – removed dead code |
+| Medium   | No travel check to first event of day | `reward.py` `travel_issues` | Add optional check from `start_location` to first event | **Fixed** – added `start_location` param, uses `home_location` |
+| Low      | `reset("bad_id")` raises raw `KeyError` | `lifeops_env.py` | Catch and re-raise with clearer message | Not applied (minor) |
+| Low      | Duplicate reschedule actions at boundaries | `actions.py` | Optional: deduplicate by `(new_start, new_end)` | Not applied (harmless) |
+| Low      | Baseline never rejects (scores reject as 999,999) | `lifeops_env.py` | Consider allowing reject when all scheduling options are bad | **Fixed** – reject now scores (0, 0) so it wins when scheduling causes issues |
+---
+## 9. Suggested Fixes (Minimal Changes)
+### Fix 1: Baseline focus block selection
+In `_choose_simple_action`, when scoring focus blocks, prefer actions that result in **zero** overlaps. If all have overlaps, pick the one with the smallest overlap count, and among those prefer the one that overlaps with the fewest events (e.g., break ties by total overlap duration or event count).
+A simpler approach: **filter** `focus_actions` to exclude those whose `(new_start_min, duration_min)` would overlap with any existing calendar event. Use `detect_overlaps` with a simulated calendar including the candidate focus block.
+### Fix 2: Remove dead `conflict_resolved_bonus`
+Delete or comment out lines 99–101 in `reward.py` until the environment supports event removal.
+### Fix 3: Travel to first event (optional)
+Add a parameter `start_location` (default `None`) to the scenario or persona. If set, prepend a synthetic “start” event at `start_location` with `end_min=0` before the first real event, so `travel_issues` checks the first leg.
+---
+## 10. Edge Cases Not Handled
+1. **Empty valid_actions:** If both `current_request` is None and `has_unfinished` is False, `valid_actions` is empty. The demo would crash on `valid[0]`. Current scenarios avoid this.
+2. **Event at midnight (0) or end of day (1440):** Logic uses `<= 1440`; should be verified for boundary events.
+3. **Zero-duration events:** `_overlap` would treat (100, 100) and (100, 100) as overlapping (`100 < 100` is false, so no overlap). Zero-duration events are not generated.
+4. **Multiple events at same start/end:** Sorting by `(start_min, end_min)` is deterministic; `travel_issues` order is stable.
+5. **Unknown locations in travel_times:** Default 30 minutes is conservative; no explicit handling for missing keys.

env/episode_trace.py ADDED Viewed

	@@ -0,0 +1,191 @@

+"""
+Episode tracing and structured logging for LifeOps.
+Provides human-readable step-by-step logs and a timeline view for hackathon demos.
+"""
+from __future__ import annotations
+import copy
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+def _min_to_time(m: int) -> str:
+    """Convert minutes since midnight to HH:MM."""
+    h, mm = divmod(m, 60)
+    return f"{h:02d}:{mm:02d}"
+@dataclass
+class StepRecord:
+    """Record of a single environment step."""
+    step: int
+    action: Dict[str, Any]
+    prev_calendar_count: int
+    next_calendar_count: int
+    prev_pending_count: int
+    next_pending_count: int
+    reward: float
+    breakdown: Dict[str, Any]
+    overlaps: List[tuple]
+    travel_issues: List[tuple]
+    done: bool
+    # State changes (compact summaries)
+    added_event: Optional[Dict[str, Any]] = None
+    handled_request: Optional[Dict[str, Any]] = None
+    task_progress: Optional[Dict[str, str]] = None  # task_id -> "X min progress"
+@dataclass
+class EpisodeTrace:
+    """Trace of an entire episode for logging and timeline display."""
+    scenario_id: str
+    persona_name: str
+    initial_calendar: List[Dict[str, Any]] = field(default_factory=list)
+    initial_tasks: List[Dict[str, Any]] = field(default_factory=list)
+    initial_pending_count: int = 0
+    steps: List[StepRecord] = field(default_factory=list)
+    total_reward: float = 0.0
+    def log_step(
+        self,
+        step: int,
+        action: Dict[str, Any],
+        prev_obs: Dict[str, Any],
+        next_obs: Dict[str, Any],
+        reward: float,
+        breakdown: Dict[str, Any],
+        info: Dict[str, Any],
+        done: bool,
+        last_added_event: Optional[Dict[str, Any]] = None,
+        last_handled_request: Optional[Dict[str, Any]] = None,
+        last_task_progress_minutes: int = 0,
+        task_id_progressed: Optional[str] = None,
+    ) -> None:
+        """Record one step."""
+        task_progress = None
+        if last_task_progress_minutes and task_id_progressed:
+            task_progress = {task_id_progressed: f"{last_task_progress_minutes} min"}
+        self.steps.append(
+            StepRecord(
+                step=step,
+                action=copy.deepcopy(action),
+                prev_calendar_count=len(prev_obs.get("calendar", [])),
+                next_calendar_count=len(next_obs.get("calendar", [])),
+                prev_pending_count=prev_obs.get("pending_request_count", 0),
+                next_pending_count=next_obs.get("pending_request_count", 0),
+                reward=reward,
+                breakdown=copy.deepcopy(breakdown),
+                overlaps=info.get("overlaps", []),
+                travel_issues=info.get("travel_issues", []),
+                done=done,
+                added_event=copy.deepcopy(last_added_event) if last_added_event else None,
+                handled_request=copy.deepcopy(last_handled_request) if last_handled_request else None,
+                task_progress=task_progress,
+            )
+        )
+    def _format_action(self, a: Dict[str, Any]) -> str:
+        at = a.get("action_type", "?")
+        if at == "block_focus_time":
+            start = a.get("new_start_min")
+            dur = a.get("duration_min")
+            return f"block_focus_time @ {_min_to_time(start or 0)} for {dur} min"
+        if at == "accept_event":
+            return f"accept_event (request_id={a.get('request_id', '?')})"
+        if at == "reject_event":
+            return f"reject_event (request_id={a.get('request_id', '?')})"
+        if at == "reschedule_event":
+            ns, ne = a.get("new_start_min"), a.get("new_end_min")
+            return f"reschedule_event → {_min_to_time(ns or 0)}–{_min_to_time(ne or 0)}"
+        if at == "propose_new_time":
+            ns, ne = a.get("new_start_min"), a.get("new_end_min")
+            return f"propose_new_time → {_min_to_time(ns or 0)}–{_min_to_time(ne or 0)}"
+        return str(a)
+    def _format_breakdown(self, b: Dict[str, Any]) -> str:
+        parts = []
+        for k, v in b.items():
+            if k == "total":
+                continue
+            if isinstance(v, (int, float)) and v != 0:
+                parts.append(f"{k}={v:+.1f}")
+        return ", ".join(parts) if parts else "(none)"
+    def print_step_log(self, step_record: StepRecord) -> None:
+        """Print a single step in human-readable form."""
+        s = step_record
+        print(f"\n  Step {s.step}")
+        print(f"    Action: {self._format_action(s.action)}")
+        print(f"    Reward: {s.reward:+.2f}  ({self._format_breakdown(s.breakdown)})")
+        if s.added_event:
+            e = s.added_event
+            print(f"    + Added: {e.get('title', '?')} @ {_min_to_time(e.get('start_min', 0))}–{_min_to_time(e.get('end_min', 0))} ({e.get('location', '?')})")
+        if s.handled_request and s.action.get("action_type") != "block_focus_time":
+            r = s.handled_request
+            at = s.action.get("action_type", "")
+            if at == "reject_event":
+                outcome = "rejected"
+            elif at == "propose_new_time":
+                outcome = "proposed new time (not scheduled)"
+            else:
+                outcome = "accepted/scheduled"
+            print(f"    Request {outcome}: {r.get('title', '?')}")
+        if s.task_progress:
+            for tid, prog in s.task_progress.items():
+                print(f"    Task progress: {tid} ({prog})")
+        if s.overlaps:
+            print(f"    ⚠ Overlaps: {s.overlaps}")
+        if s.travel_issues:
+            print(f"    ⚠ Travel issues: {[(t[0], t[1], f'need {t[2]}min') for t in s.travel_issues]}")
+    def print_timeline(self, final_calendar: Optional[List[Dict[str, Any]]] = None) -> None:
+        """Print a readable timeline of the final calendar."""
+        if final_calendar is not None:
+            events = list(final_calendar)
+        else:
+            # Fallback: merge initial + all added events from steps
+            events = list(self.initial_calendar)
+            for s in self.steps:
+                if s.added_event:
+                    events.append(s.added_event)
+        if not events:
+            print("\n  (No events on calendar)")
+            return
+        ordered = sorted(events, key=lambda e: (int(e["start_min"]), int(e["end_min"])))
+        print("\n  Timeline (final calendar):")
+        print("  " + "-" * 60)
+        for e in ordered:
+            start = int(e["start_min"])
+            end = int(e["end_min"])
+            title = e.get("title", e.get("event_id", "?"))
+            loc = e.get("location", "?")
+            kind = e.get("kind", "meeting")
+            print(f"  {_min_to_time(start)} – {_min_to_time(end)}  {title}  @ {loc}  [{kind}]")
+        print("  " + "-" * 60)
+    def print_full(self, final_calendar: Optional[List[Dict[str, Any]]] = None) -> None:
+        """Print the complete episode trace (header, steps, timeline, summary)."""
+        print("\n" + "=" * 60)
+        print("EPISODE TRACE")
+        print("=" * 60)
+        print(f"Scenario: {self.scenario_id}")
+        print(f"Persona: {self.persona_name}")
+        print(f"Initial: {len(self.initial_calendar)} events, {len(self.initial_tasks)} tasks, {self.initial_pending_count} pending requests")
+        print("-" * 60)
+        for s in self.steps:
+            self.print_step_log(s)
+        self.print_timeline(final_calendar)
+        print("\n" + "-" * 60)
+        print(f"Total reward: {self.total_reward:+.2f}")
+        print("=" * 60)

env/lifeops_env.py CHANGED Viewed

@@ -22,6 +22,7 @@ from typing import Any, Dict, List, Optional, Tuple
 try:
     # Normal usage (tests / `python -m ...`) expects repo root on sys.path.
     from env.actions import Action, ActionType, generate_valid_actions
     from env.personas import Persona, get_personas
     from env.reward import compute_reward, detect_overlaps, travel_issues
     from env.scenario_generator import Scenario, get_scenario, list_scenario_ids, sample_scenarios
@@ -30,6 +31,7 @@ except ModuleNotFoundError:
     repo_root = Path(__file__).resolve().parent.parent
     sys.path.insert(0, str(repo_root))
     from env.actions import Action, ActionType, generate_valid_actions
     from env.personas import Persona, get_personas
     from env.reward import compute_reward, detect_overlaps, travel_issues
     from env.scenario_generator import Scenario, get_scenario, list_scenario_ids, sample_scenarios
@@ -75,6 +77,7 @@ class LifeOpsState:
     last_added_event: Optional[Dict[str, Any]] = None
     last_handled_request: Optional[Dict[str, Any]] = None
     last_task_progress_minutes: int = 0
     def current_request(self) -> Optional[Dict[str, Any]]:
         return self.pending_requests[0] if self.pending_requests else None
@@ -168,6 +171,7 @@ class LifeOpsEnv:
         self._state.last_added_event = None
         self._state.last_handled_request = None
         self._state.last_task_progress_minutes = 0
         at = str(action_dict.get("action_type"))
         if at in {ActionType.accept_event.value, ActionType.reject_event.value, ActionType.reschedule_event.value, ActionType.propose_new_time.value}:
@@ -195,7 +199,15 @@ class LifeOpsEnv:
         info: Dict[str, Any] = {
             "reward_breakdown": breakdown,
             "overlaps": detect_overlaps(next_obs.get("calendar", [])),
-            "travel_issues": travel_issues(next_obs.get("calendar", []), next_obs.get("travel_times", {})),
         }
         return next_obs, float(reward), bool(done), info
@@ -260,8 +272,10 @@ class LifeOpsEnv:
             progress = min(duration, int(t["remaining_minutes"]))
             t["remaining_minutes"] = int(t["remaining_minutes"]) - progress
             self._state.last_task_progress_minutes = int(progress)
         else:
             self._state.last_task_progress_minutes = 0
     def _is_done(self) -> bool:
         if self._state.step_count >= self._state.max_steps:
@@ -273,26 +287,44 @@ class LifeOpsEnv:
         return True
 def _choose_simple_action(env: LifeOpsEnv) -> Action:
     """
     Tiny heuristic policy for manual running:
-    - If accept would cause overlap/travel issues, try reschedule actions next.
     - Otherwise accept the request.
-    - If no request, block focus time.
     """
     valid = env.valid_actions()
     obs = env.observation()
     req = obs.get("current_request")
     if req is None:
         focus_actions = [a for a in valid if a.action_type == ActionType.block_focus_time]
         if not focus_actions:
             return valid[0]
         def focus_score(a: Action) -> Tuple[int, int]:
             start = int(a.new_start_min or 0)
             dur = int(a.duration_min or 0)
-            sim = list(obs.get("calendar", [])) + [
                 {
                     "event_id": "focus_sim",
                     "start_min": start,
@@ -300,43 +332,58 @@ def _choose_simple_action(env: LifeOpsEnv) -> Action:
                     "location": obs["persona"].get("primary_work_location", "Home"),
                 }
             ]
-            return (len(detect_overlaps(sim)), len(travel_issues(sim, obs.get("travel_times", {}))))
-        focus_actions.sort(key=focus_score)
-        return focus_actions[0]
     # Pick the request-handling action that minimizes feasibility violations.
     def score_action(a: Action) -> Tuple[int, int]:
         # (overlap_count, travel_issue_count) — smaller is better
         if a.action_type == ActionType.reject_event:
-            return (999, 999)  # prefer scheduling over rejecting (manual runner)
         if a.action_type in {ActionType.accept_event, ActionType.reschedule_event, ActionType.propose_new_time}:
             added = dict(req)
             if a.action_type in {ActionType.reschedule_event, ActionType.propose_new_time}:
                 added["start_min"] = int(a.new_start_min or added["start_min"])
                 added["end_min"] = int(a.new_end_min or added["end_min"])
-            # NOTE: propose_new_time does not actually schedule; don't add it to sim calendar.
-            sim_events = list(obs.get("calendar", [])) + ([] if a.action_type == ActionType.propose_new_time else [added])
-            return (len(detect_overlaps(sim_events)), len(travel_issues(sim_events, obs.get("travel_times", {}))))
         return (500, 500)
-    candidates = [a for a in valid if a.action_type in {ActionType.accept_event, ActionType.reschedule_event, ActionType.propose_new_time, ActionType.reject_event}]
     candidates.sort(key=score_action)
     return candidates[0] if candidates else valid[0]
 if __name__ == "__main__":
-    # Simple manual episode runner: `python env/lifeops_env.py`
     env = LifeOpsEnv(seed=7)
     obs = env.reset()
-    print("Scenario:", obs["scenario_id"])
-    print("Persona:", obs["persona"]["name"])
     done = False
     total_reward = 0.0
     while not done:
-        obs = env.observation()
-        req = obs.get("current_request")
         if req is not None:
             print(f"\nCurrent request: {req['title']} ({req['start_min']}..{req['end_min']}) @ {req['location']}")
         else:
@@ -344,13 +391,30 @@ if __name__ == "__main__":
         action = _choose_simple_action(env)
         next_obs, reward, done, info = env.step(action)
         total_reward += reward
-        print("Action:", action.to_dict())
-        print("Reward:", reward)
         if info.get("overlaps"):
-            print("Overlaps:", info["overlaps"])
         if info.get("travel_issues"):
-            print("Travel issues:", info["travel_issues"])
-    print("\nEpisode done. Total reward:", total_reward)

 try:
     # Normal usage (tests / `python -m ...`) expects repo root on sys.path.
     from env.actions import Action, ActionType, generate_valid_actions
+    from env.episode_trace import EpisodeTrace
     from env.personas import Persona, get_personas
     from env.reward import compute_reward, detect_overlaps, travel_issues
     from env.scenario_generator import Scenario, get_scenario, list_scenario_ids, sample_scenarios
     repo_root = Path(__file__).resolve().parent.parent
     sys.path.insert(0, str(repo_root))
     from env.actions import Action, ActionType, generate_valid_actions
+    from env.episode_trace import EpisodeTrace
     from env.personas import Persona, get_personas
     from env.reward import compute_reward, detect_overlaps, travel_issues
     from env.scenario_generator import Scenario, get_scenario, list_scenario_ids, sample_scenarios
     last_added_event: Optional[Dict[str, Any]] = None
     last_handled_request: Optional[Dict[str, Any]] = None
     last_task_progress_minutes: int = 0
+    last_task_id_progressed: Optional[str] = None
     def current_request(self) -> Optional[Dict[str, Any]]:
         return self.pending_requests[0] if self.pending_requests else None
         self._state.last_added_event = None
         self._state.last_handled_request = None
         self._state.last_task_progress_minutes = 0
+        self._state.last_task_id_progressed = None
         at = str(action_dict.get("action_type"))
         if at in {ActionType.accept_event.value, ActionType.reject_event.value, ActionType.reschedule_event.value, ActionType.propose_new_time.value}:
         info: Dict[str, Any] = {
             "reward_breakdown": breakdown,
             "overlaps": detect_overlaps(next_obs.get("calendar", [])),
+            "travel_issues": travel_issues(
+                next_obs.get("calendar", []),
+                next_obs.get("travel_times", {}),
+                start_location=next_obs.get("persona", {}).get("home_location"),
+            ),
+            "last_added_event": copy.deepcopy(self._state.last_added_event),
+            "last_handled_request": copy.deepcopy(self._state.last_handled_request),
+            "last_task_progress_minutes": int(self._state.last_task_progress_minutes),
+            "last_task_id_progressed": self._state.last_task_id_progressed,
         }
         return next_obs, float(reward), bool(done), info
             progress = min(duration, int(t["remaining_minutes"]))
             t["remaining_minutes"] = int(t["remaining_minutes"]) - progress
             self._state.last_task_progress_minutes = int(progress)
+            self._state.last_task_id_progressed = str(t.get("task_id", "?"))
         else:
             self._state.last_task_progress_minutes = 0
+            self._state.last_task_id_progressed = None
     def _is_done(self) -> bool:
         if self._state.step_count >= self._state.max_steps:
         return True
+def _focus_overlaps_calendar(a: Action, calendar: List[Dict[str, Any]]) -> bool:
+    """True if adding this focus block would overlap with existing calendar events."""
+    start = int(a.new_start_min or 0)
+    dur = int(a.duration_min or 0)
+    sim = list(calendar) + [
+        {"event_id": "_", "start_min": start, "end_min": start + dur, "location": "x"},
+    ]
+    return len(detect_overlaps(sim)) > 0
 def _choose_simple_action(env: LifeOpsEnv) -> Action:
     """
     Tiny heuristic policy for manual running:
+    - If accept would cause overlap/travel issues, try reschedule/propose, or reject.
     - Otherwise accept the request.
+    - If no request, block focus time (prefer non-overlapping slots).
     """
     valid = env.valid_actions()
     obs = env.observation()
     req = obs.get("current_request")
+    calendar = obs.get("calendar", [])
+    travel_times = obs.get("travel_times", {})
+    home = obs.get("persona", {}).get("home_location")
     if req is None:
         focus_actions = [a for a in valid if a.action_type == ActionType.block_focus_time]
         if not focus_actions:
             return valid[0]
+        # Prefer focus blocks that don't overlap with existing calendar.
+        non_overlapping = [a for a in focus_actions if not _focus_overlaps_calendar(a, calendar)]
+        candidates = non_overlapping if non_overlapping else focus_actions
         def focus_score(a: Action) -> Tuple[int, int]:
             start = int(a.new_start_min or 0)
             dur = int(a.duration_min or 0)
+            sim = list(calendar) + [
                 {
                     "event_id": "focus_sim",
                     "start_min": start,
                     "location": obs["persona"].get("primary_work_location", "Home"),
                 }
             ]
+            return (len(detect_overlaps(sim)), len(travel_issues(sim, travel_times, home)))
+        candidates.sort(key=focus_score)
+        return candidates[0]
     # Pick the request-handling action that minimizes feasibility violations.
+    # Reject scores (0, 0) so we prefer it when all scheduling options cause issues.
     def score_action(a: Action) -> Tuple[int, int]:
         # (overlap_count, travel_issue_count) — smaller is better
         if a.action_type == ActionType.reject_event:
+            return (0, 0)  # no new overlaps/travel; prefer when scheduling options are bad
         if a.action_type in {ActionType.accept_event, ActionType.reschedule_event, ActionType.propose_new_time}:
             added = dict(req)
             if a.action_type in {ActionType.reschedule_event, ActionType.propose_new_time}:
                 added["start_min"] = int(a.new_start_min or added["start_min"])
                 added["end_min"] = int(a.new_end_min or added["end_min"])
+            sim_events = list(calendar) + (
+                [] if a.action_type == ActionType.propose_new_time else [added]
+            )
+            return (len(detect_overlaps(sim_events)), len(travel_issues(sim_events, travel_times, home)))
         return (500, 500)
+    candidates = [
+        a
+        for a in valid
+        if a.action_type
+        in {ActionType.accept_event, ActionType.reschedule_event, ActionType.propose_new_time, ActionType.reject_event}
+    ]
     candidates.sort(key=score_action)
     return candidates[0] if candidates else valid[0]
 if __name__ == "__main__":
+    # Simple manual episode runner with tracing: `python env/lifeops_env.py`
     env = LifeOpsEnv(seed=7)
     obs = env.reset()
+    trace = EpisodeTrace(
+        scenario_id=obs["scenario_id"],
+        persona_name=obs["persona"]["name"],
+        initial_calendar=copy.deepcopy(obs.get("calendar", [])),
+        initial_tasks=copy.deepcopy(obs.get("tasks", [])),
+        initial_pending_count=obs.get("pending_request_count", 0),
+    )
     done = False
     total_reward = 0.0
+    step_num = 0
     while not done:
+        prev_obs = env.observation()
+        req = prev_obs.get("current_request")
         if req is not None:
             print(f"\nCurrent request: {req['title']} ({req['start_min']}..{req['end_min']}) @ {req['location']}")
         else:
         action = _choose_simple_action(env)
         next_obs, reward, done, info = env.step(action)
+        step_num += 1
         total_reward += reward
+        trace.log_step(
+            step=step_num,
+            action=action.to_dict(),
+            prev_obs=prev_obs,
+            next_obs=next_obs,
+            reward=reward,
+            breakdown=info.get("reward_breakdown", {}),
+            info=info,
+            done=done,
+            last_added_event=info.get("last_added_event"),
+            last_handled_request=info.get("last_handled_request"),
+            last_task_progress_minutes=info.get("last_task_progress_minutes", 0),
+            task_id_progressed=info.get("last_task_id_progressed"),
+        )
+        print(f"  → Action: {trace._format_action(action.to_dict())}  |  Reward: {reward:+.2f}")
         if info.get("overlaps"):
+            print(f"  ⚠ Overlaps: {info['overlaps']}")
         if info.get("travel_issues"):
+            print(f"  ⚠ Travel issues: {info['travel_issues']}")
+    trace.total_reward = total_reward
+    trace.print_full(final_calendar=next_obs.get("calendar", []))

env/reward.py CHANGED Viewed

@@ -10,7 +10,7 @@ The reward is intentionally small and readable. It's "shaped" to encourage:
 from __future__ import annotations
-from typing import Any, Dict, List, Tuple
 def _overlap(a_start: int, a_end: int, b_start: int, b_end: int) -> bool:
     return a_start < b_end and b_start < a_end
@@ -46,10 +46,14 @@ def _travel_time_minutes(travel_times: Dict[str, Dict[str, int]], a_loc: str, b_
 def travel_issues(
     events: List[Dict[str, Any]],
     travel_times: Dict[str, Dict[str, int]],
 ) -> List[Tuple[str, str, int, int]]:
     """
     Returns travel feasibility issues between consecutive events.
     Output tuple: (from_event_id, to_event_id, needed_minutes, available_minutes)
     """
@@ -58,6 +62,15 @@ def travel_issues(
     ordered = sorted(events, key=lambda e: (int(e["start_min"]), int(e["end_min"])))
     issues: List[Tuple[str, str, int, int]] = []
     for prev, nxt in zip(ordered, ordered[1:]):
         prev_end = int(prev["end_min"])
         nxt_start = int(nxt["start_min"])
@@ -96,11 +109,12 @@ def compute_reward(
     if next_overlaps:
         reward -= 5.0 * len(next_overlaps)
         breakdown["overlap_penalty"] = -5.0 * len(next_overlaps)
-    if prev_overlaps and len(next_overlaps) < len(prev_overlaps):
-        reward += 3.0
-        breakdown["conflict_resolved_bonus"] = 3.0
-    issues = travel_issues(next_events, travel_times)
     if issues:
         # Penalize per infeasible leg.
         travel_pen = -4.0 * len(issues) * float(persona.get("travel_aversion_weight", 1.0))

 from __future__ import annotations
+from typing import Any, Dict, List, Optional, Tuple
 def _overlap(a_start: int, a_end: int, b_start: int, b_end: int) -> bool:
     return a_start < b_end and b_start < a_end
 def travel_issues(
     events: List[Dict[str, Any]],
     travel_times: Dict[str, Dict[str, int]],
+    start_location: Optional[str] = None,
 ) -> List[Tuple[str, str, int, int]]:
     """
     Returns travel feasibility issues between consecutive events.
+    If start_location is provided (e.g. persona home), also checks whether the
+    user can reach the first event of the day in time.
     Output tuple: (from_event_id, to_event_id, needed_minutes, available_minutes)
     """
     ordered = sorted(events, key=lambda e: (int(e["start_min"]), int(e["end_min"])))
     issues: List[Tuple[str, str, int, int]] = []
+    # Check travel from start_location to first event (if provided).
+    if start_location is not None:
+        first = ordered[0]
+        available = int(first["start_min"])
+        needed = _travel_time_minutes(travel_times, start_location, str(first["location"]))
+        if needed > available:
+            issues.append(("__start__", str(first["event_id"]), needed, available))
     for prev, nxt in zip(ordered, ordered[1:]):
         prev_end = int(prev["end_min"])
         nxt_start = int(nxt["start_min"])
     if next_overlaps:
         reward -= 5.0 * len(next_overlaps)
         breakdown["overlap_penalty"] = -5.0 * len(next_overlaps)
+    issues = travel_issues(
+        next_events,
+        travel_times,
+        start_location=persona.get("home_location"),
+    )
     if issues:
         # Penalize per infeasible leg.
         travel_pen = -4.0 * len(issues) * float(persona.get("travel_aversion_weight", 1.0))

training/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Training utilities for LifeOps RL."""

training/train_rl.py ADDED Viewed

	@@ -0,0 +1,217 @@

+#!/usr/bin/env python3
+"""
+Minimal RL training loop for LifeOps.
+Runs episodes in the environment, collects trajectories, and prints results.
+Uses a simple policy (random or heuristic). No external RL frameworks required.
+For learned policies, consider adding HuggingFace TRL or a small PyTorch policy.
+"""
+from __future__ import annotations
+import random
+import sys
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+# Add repo root for imports
+repo_root = Path(__file__).resolve().parent.parent
+if str(repo_root) not in sys.path:
+    sys.path.insert(0, str(repo_root))
+from env.actions import Action
+from env.lifeops_env import LifeOpsEnv, _choose_simple_action
+def random_policy(env: LifeOpsEnv) -> Action:
+    """Pick uniformly from valid actions."""
+    valid = env.valid_actions()
+    if not valid:
+        raise RuntimeError("No valid actions")
+    return random.choice(valid)
+def collect_trajectory(
+    env: LifeOpsEnv,
+    policy: str = "random",
+    scenario_id: Optional[str] = None,
+) -> Tuple[List[Dict[str, Any]], float, int, str]:
+    """
+    Run one episode and collect trajectory.
+    Returns:
+        trajectory: list of (obs, action_dict, reward, done) per step
+        total_reward: sum of rewards
+        episode_length: number of steps
+        scenario_id: scenario used
+    """
+    obs = env.reset(scenario_id=scenario_id)
+    scenario_id = obs["scenario_id"]
+    trajectory: List[Dict[str, Any]] = []
+    total_reward = 0.0
+    step_count = 0
+    policy_fn = _choose_simple_action if policy == "heuristic" else random_policy
+    done = False
+    while not done:
+        action = policy_fn(env)
+        action_dict = action.to_dict()
+        next_obs, reward, done, info = env.step(action)
+        step_count += 1
+        total_reward += reward
+        trajectory.append({
+            "obs": obs,
+            "action": action_dict,
+            "reward": reward,
+            "next_obs": next_obs,
+            "done": done,
+            "info": info,
+        })
+        obs = next_obs
+    return trajectory, total_reward, step_count, scenario_id
+def _format_action_short(a: Dict[str, Any]) -> str:
+    """Format action for key decisions summary."""
+    at = a.get("action_type", "?")
+    if at == "block_focus_time":
+        start = a.get("new_start_min", 0)
+        dur = a.get("duration_min", 0)
+        h, m = divmod(start or 0, 60)
+        return f"block_focus @ {h:02d}:{m:02d} ({dur}min)"
+    if at == "accept_event":
+        return f"accept request {a.get('request_id', '?')}"
+    if at == "reject_event":
+        return f"reject request {a.get('request_id', '?')}"
+    if at == "reschedule_event":
+        ns = a.get("new_start_min", 0)
+        h, m = divmod(ns or 0, 60)
+        return f"reschedule → {h:02d}:{m:02d}"
+    if at == "propose_new_time":
+        ns = a.get("new_start_min", 0)
+        h, m = divmod(ns or 0, 60)
+        return f"propose → {h:02d}:{m:02d}"
+    return at
+def print_episode_results(
+    episode: int,
+    total_reward: float,
+    episode_length: int,
+    scenario_id: str,
+    trajectory: List[Dict[str, Any]],
+    verbose: bool = False,
+) -> None:
+    """Print human-readable episode results."""
+    print(f"\n--- Episode {episode} ---")
+    print(f"  Scenario:      {scenario_id}")
+    print(f"  Steps:         {episode_length}")
+    print(f"  Total reward:  {total_reward:+.2f}")
+    # Key decisions taken
+    if trajectory:
+        decisions = [_format_action_short(t["action"]) for t in trajectory]
+        print(f"  Key decisions: {', '.join(decisions)}")
+    if verbose:
+        for i, t in enumerate(trajectory):
+            a = t["action"]
+            at = a.get("action_type", "?")
+            r = t["reward"]
+            print(f"    Step {i + 1}: {at}  reward={r:+.2f}")
+def train(
+    num_episodes: int = 20,
+    seed: Optional[int] = 42,
+    policy: str = "random",
+    scenario_id: Optional[str] = None,
+    verbose: bool = False,
+) -> Dict[str, Any]:
+    """
+    Run RL training loop: collect trajectories and print results.
+    Args:
+        num_episodes: number of episodes to run
+        seed: random seed for env (None = random)
+        policy: "random" or "heuristic"
+        scenario_id: fix scenario (None = random each episode)
+        verbose: print per-step details
+    Returns:
+        Summary dict with episode rewards and stats
+    """
+    env = LifeOpsEnv(seed=seed)
+    all_rewards: List[float] = []
+    all_lengths: List[int] = []
+    all_scenarios: List[str] = []
+    print("=" * 50)
+    print("LifeOps RL Training")
+    print("=" * 50)
+    print(f"Episodes: {num_episodes}  |  Policy: {policy}  |  Seed: {seed}")
+    for ep in range(1, num_episodes + 1):
+        trajectory, total_reward, ep_len, scenario_id_used = collect_trajectory(
+            env, policy=policy, scenario_id=scenario_id
+        )
+        all_rewards.append(total_reward)
+        all_lengths.append(ep_len)
+        all_scenarios.append(scenario_id_used)
+        print_episode_results(
+            episode=ep,
+            total_reward=total_reward,
+            episode_length=ep_len,
+            scenario_id=scenario_id_used,
+            trajectory=trajectory,
+            verbose=verbose,
+        )
+    # Summary
+    avg_reward = sum(all_rewards) / len(all_rewards)
+    avg_len = sum(all_lengths) / len(all_lengths)
+    print("\n" + "=" * 50)
+    print("Training Summary")
+    print("=" * 50)
+    print(f"  Episodes:     {num_episodes}")
+    print(f"  Avg reward:   {avg_reward:+.2f}")
+    print(f"  Avg length:   {avg_len:.1f} steps")
+    print(f"  Best reward:  {max(all_rewards):+.2f}")
+    print(f"  Worst reward: {min(all_rewards):+.2f}")
+    print("=" * 50)
+    return {
+        "rewards": all_rewards,
+        "lengths": all_lengths,
+        "scenarios": all_scenarios,
+        "avg_reward": avg_reward,
+        "avg_length": avg_len,
+    }
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="Train RL agent on LifeOps")
+    parser.add_argument("-n", "--episodes", type=int, default=10, help="Number of episodes")
+    parser.add_argument("-p", "--policy", choices=["random", "heuristic"], default="random")
+    parser.add_argument("-s", "--seed", type=int, default=42)
+    parser.add_argument("--scenario", type=str, default=None, help="Fix scenario (e.g. s1_basic_conflict)")
+    parser.add_argument("-v", "--verbose", action="store_true")
+    args = parser.parse_args()
+    train(
+        num_episodes=args.episodes,
+        seed=args.seed,
+        policy=args.policy,
+        scenario_id=args.scenario,
+        verbose=args.verbose,
+    )