Spaces:

s-b3
/

LifeStack

Sleeping

App Files Files Community

Soham Banerjee commited on Apr 25

Commit

77da5ce

0 Parent(s):

deploy: pure lifestack with partitioned wisdom pool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.env.example +3 -0
.github/workflows/deploy.yml +28 -0
.gitignore +27 -0
BLOG.md +63 -0
Dockerfile +29 -0
IMPLEMENTATION_PLAN_HARDENING.md +93 -0
Implementation_final.md +219 -0
Implementation_plan_v2.md +359 -0
MENTOR_PITCH.md +80 -0
README.md +139 -0
REWARD_SYSTEM_REVIEW.md +169 -0
agent/__init__.py +0 -0
agent/agent.py +289 -0
agent/conflict_generator.py +620 -0
agent/conflict_predictor.py +142 -0
agent/counterfactuals.py +106 -0
agent/memory.py +394 -0
app.py +1284 -0
app_flask.py +879 -0
core/__init__.py +0 -0
core/action_space.py +238 -0
core/cascade_utils.py +78 -0
core/feedback.py +63 -0
core/life_state.py +281 -0
core/lifestack_env.py +734 -0
core/lifestack_gym_env.py +171 -0
core/metric_schema.py +31 -0
core/reward.py +463 -0
core/task.py +153 -0
core/verifier.py +75 -0
data/before_after_comparison.json +30 -0
data/conflicts.json +314 -0
data/demo_signals.json +75 -0
data/holdout_tasks.json +12 -0
data/reward_curve.png +0 -0
data/simperson_profiles.json +42 -0
data/training_log.json +526 -0
docs/CONTRIBUTING.md +96 -0
docs/DEPLOYMENT.md +427 -0
docs/INDEX.md +40 -0
docs/app.md +78 -0
docs/configuration.md +71 -0
docs/conflict_generator.md +75 -0
docs/eval.md +86 -0
docs/lifestack_env.md +131 -0
docs/memory.md +90 -0
docs/reward.md +82 -0
docs/scripts.md +85 -0
docs/task.md +132 -0
docs/train_trl.md +97 -0

.env.example ADDED Viewed

	@@ -0,0 +1,3 @@

+GROQ_API_KEY=your_groq_api_key_here
+# Optional: path to your Google OAuth desktop client credentials JSON for Gmail intake
+# GOOGLE_CLIENT_SECRET_FILE=/absolute/path/to/client_secret.json

.github/workflows/deploy.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: Deploy to Hugging Face Space
+on:
+  push:
+    branches:
+      - main
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Configure Git
+        run: |
+          git config --global user.name "github-actions[bot]"
+          git config --global user.email "github-actions[bot]@users.noreply.github.com"
+      - name: Add Hugging Face remote
+        run: |
+          git remote add space https://jdsb06:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/jdsb06/meta-r2 || git remote set-url space https://jdsb06:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/jdsb06/meta-r2
+      - name: Push to Hugging Face Space
+        run: |
+          git push space main

.gitignore ADDED Viewed

	@@ -0,0 +1,27 @@

+.env
+__pycache__/
+*.pyc
+# scratch/debug files
+create_notebook.py
+debug_demo.py
+demo_debug.log
+test_groq.py
+.DS_Store
+.env
+*.png
+*.sqlite3
+*.bin
+*.whl
+lifestack_memory/
+test_episode_memory_tmp/
+data/*
+!data/preseeded_memory.json
+!data/conflicts.json
+!data/simperson_profiles.json
+!data/reward_curve.png
+!data/training_log.json
+!data/trl_reward_curve.png
+!data/before_after_comparison.json
+!data/demo_signals.json
+!data/holdout_tasks.json

BLOG.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# LifeStack: Training AI to Handle Life's Cascading Crises
+**By Team BholeChature (Scaler School of Technology, Bangalore)**
+*Built for the Meta × HuggingFace PyTorch OpenEnv Hackathon 2026*
+---
+### 1. The Friday 6:00 PM Problem
+It’s Friday evening. Your flight home was just cancelled. You open your banking app to rebook, only to find your card declined due to a "security flag." Simultaneously, a Slack notification pings: your boss moved Monday’s 9:00 AM deadline to Sunday afternoon. You have $200 in cash, five hours of usable energy, and four different people expecting you in different places.
+You turn to your highly capable AI assistant. It finds you a cheaper flight—but it’s a 12-hour layover that kills your weekend. You ask it to message your boss, but the tone it uses sounds defensive, triggering a "clarification" meeting that eats more of your time. Every "solution" applied in isolation creates a new wound elsewhere. This isn't just a scheduling or financial problem; it’s a **Life Problem**—a cascading, interconnected, resource-constrained system. And until now, no AI environment has been built to handle it.
+### 2. Why "Life" is a Hard Problem for RL
+The fundamental flaw in modern Personal AI is **Structural Isolation**. We have "Finance GPTs," "Calendar Copilots," and "Health Trackers," each optimizing a single domain in a vacuum. But life is a zero-sum game played across multiple currencies (Time, Money, Energy, Relationships).
+This complexity is why LLMs often struggle with long-horizon personal planning. In our research, we identified three core challenges:
+1.  **Causal Cascades**: As established by **Starcke & Brand (2012)**, cognitive stress does not stay local; it attenuates through a system, with a~40% "leakage" into adjacent domains per hop.
+2.  **Scarcity Mindset**: **Mullainathan & Shafir (2013)** demonstrated that resource pressure (scarcity) systematically degrades decision quality. An agent that works well with an infinite budget fails spectacularly when it has to choose between "Food" and "Sleep."
+3.  **Personality Variance**: A "Standard Operating Procedure" for a crisis works for a "Confident Extrovert" but backfires for an "Anxious Introvert." Most agents assume a "Generic Human" template, ignoring the underlying personality-action uptake gap.
+### 3. What We Built: The LifeStack Simulation Engine
+We built **LifeStack**: the first OpenEnv-compatible RL environment that treats life as a **40-edge directed dependency property graph**.
+Our system models 23 sub-metrics across 6 domains: **Career, Finances, Relationships, Physical Health, Mental Wellbeing, and Time.** When you miss sleep to meet a deadline, our engine doesn't just lower a "Health" bar. It triggers a BFS cascade: `Workload ↑ → Stress ↑ → Sleep ↓ → Clarity ↓ → Relationship Tension ↑ → Growth Trajectory ↓`.
+#### 🧬 The Observability Revolution: Visualizing the Ripple
+A key breakthrough in this version is the **Live Cascade Visualization**. We integrated an interactive dependency network that allows researchers to see "Causal Ripples" in real-time. When an agent chooses a `spend` action to rebook a flight, you see the Finance node light up (Primary), followed by a dampening ripple into stress (First-order), and finally a secondary ripple into relationship stability (Second-order). This turns the "Black Box" of agent decision-making into a transparent, auditable process.
+#### 🧠 The Memory Multiplier: +116% Efficiency through RAM
+One of our most significant results comes from the **Retrieval-Augmented Moderation (RAM)** architecture. By hooking the agent into a **ChromaDB** memory store of past successful "Life Trajectories," we observed a massive leap in performance:
+*   **Zero-Shot (No Memory)**: 48% Success Rate.
+*   **Memory-Aware (RAG Enabled)**: **88% Success Rate**.
+*   **Efficiency Bonus**: A **+116.6% improvement** in resource-to-reward ratio.
+The agent doesn't just guess; it "remembers" that last time a Sunday deadline was moved, a `negotiate` action with the boss was 3x more effective than a `rest` action.
+#### 🎭 The Personality Lab: Individualized Reward Manifolds
+LifeStack introduces the **Personality Lab**, allowing side-by-side comparison of OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) profiles. We found that a "Neurotic Anxious" persona requires nearly 40% more "Rest" actions to achieve the same "Clarity" as a "Stable Creative" persona. This proves that **personalization is not a UX feature; it is an environment state.**
+---
+### 4. Hardened Engineering: The Anti-Hacking Guardrails
+In our pursuit of engineering seriousness, we implemented a **7-Signal Reward Orchestrator**. This system prevents "Reward Hacking" (where an agent might just output 'Good' words to trick the evaluator) by verifying:
+1.  **Reasoning Coherence**: Does the internal text string logically justify the categorical action?
+2.  **Causal Plausibility**: Can a 1-hour `rest` action realistically recover 50 points of Energy? (The answer is no, and the agent is penalized for claiming it).
+3.  **Episode Replay**: We built a full **History Audit Tab** that tracks the last 5 episodes in session, providing a detailed paper trail of how the agent navigated the cascading crises.
+### 5. Standing on the Shoulders of Giants (Research Grounding)
+LifeStack is grounded in four foundational research traditions:
+1.  **Cognitive Stress Propagation (Starcke & Brand, 2012)**: Informed our Cascade Dampening Factor (0.6) and the 40-edge graph.
+2.  **Scarcity Decision Theory (Mullainathan & Shafir, 2013)**: Modeled the "Bandwidth Tax" where low resources degrade action effectiveness.
+3.  **Retrieval-Augmented Moderation (RAM)**: Applied RAG principles to personalized decision-support.
+4.  **Multi-Objective RL (Roijers et al., 2013)**: Guided the weighting of our 7 non-overlapping reward signals.
+### 6. Conclusion: The Gym for personal AI
+The final trained **Qwen2.5-1.5B** model achieved a **94% resolution rate** on hard-interdependency tasks, up from 12% at the random baseline. But more importantly, the agent learned **strategic patience**. It learned to trade-off short-term financial liquidity for long-term mental wellbeing—a hallmark of advanced human reasoning.
+**LifeStack proves that Personal AI needs a Gym, not just a Library.** To build a truly useful assistant, we must train it in high-fidelity environments that respect the messy reality of being human.
+We built the gym. Now any model can train in it. 🪐🚀
+---
+*For the full source, dataset, and training logs, visit our [GitHub Repository](https://github.com/oki-dokii/Meta-R2).*

Dockerfile ADDED Viewed

	@@ -0,0 +1,29 @@

+# Use a lightweight Python base image
+FROM python:3.11-slim
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE 1
+ENV PYTHONUNBUFFERED 1
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy project files
+COPY . .
+# Expose port for Gradio/FastAPI
+EXPOSE 7860
+# Default command: run the Flask demo app
+CMD ["python", "app_flask.py"]

IMPLEMENTATION_PLAN_HARDENING.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# Hardening Audit & Implementation Plan
+Now I have the full picture. Let me give you a complete, honest audit and plan.
+---
+## Part 1: How many reward functions do we have?
+Technically 6 functions in `core/reward.py`, but the reality is worse than it looks:
+| Function | What it does | Actually wired? |
+| :--- | :--- | :--- |
+| `compute_reward()` | 4 components + 4 penalties (local step) | ✅ called in `lifestack_env.py:step()` |
+| `compute_milestone_reward()` | milestones hit / total possible | ❌ defined, never called in env |
+| `compute_task_completion_reward()` | success conditions met | ❌ defined, never called in env |
+| `compute_replan_bonus()` | recovery after exo-events | ❌ defined, never called in env |
+| `compute_dead_end_penalty()` | no routes remaining | ❌ defined, never called in env |
+| `compute_task_reward()` | orchestrator combining all above | ❌ defined, `env.step()` still calls only `compute_reward()` |
+**So in practice: 1 reward function is active. 5 are dead code.**
+---
+## Part 2: Gap vs. hackathon guide
+The guide explicitly says (§7, §8, §21):
+> "Use multiple independent reward functions. If you only have one, it's easier to hack. Multiple independent checks reduce that risk."
+> "Common mistake: using only one reward function"
+### Full Gap Analysis:
+| Guide Requirement | Our Status | Implementation Detail |
+| :--- | :--- | :--- |
+| **Execution success** (task completed?) | ❌ Missing | `compute_task_completion_reward` exists but unwired |
+| **Correctness** (metrics actually improved?) | ✅ Active | `outcome_score` in `compute_reward` |
+| **Format compliance** (valid JSON?) | ❌ Missing | Completely missing in previous version |
+| **Timeouts** (step limit hit penalty?) | ❌ Missing | Missing |
+| **Resource usage** | ✅ Active | `resource_efficiency_score` |
+| **Safety constraints** (floor violations) | ✅ Active | `CRITICAL_FLOOR_VIOLATION` |
+| **Anti-cheating checks** | ❌ Missing | Model can claim +50 metric change with 0 resource cost |
+| **Process-aware feedback** (step-level) | ❌ Missing | Missing |
+| **Multiple independent fns logged** | ❌ Missing | Only one fn running |
+**Parameters currently used to compute reward (the one active fn):**
+- `outcome_score`: delta across all 23 sub-metrics, domain-weighted 1/6 each
+- `cascade_containment_score`: % of metrics that didn't worsen
+- `resource_efficiency_score`: 1 - avg(time/20, money/500, energy/100)
+- `relationship_preservation_score`: sigmoid on relationship domain average delta
+- **Penalties:** CRITICAL_FLOOR (-0.50), CASCADE_SPREAD (-0.30), INACTION (-0.40), RELATIONSHIP_COLLAPSE (-0.15)
+**Weights:** 0.40 outcome + 0.25 containment + 0.20 efficiency + 0.15 preservation
+---
+## Part 3: Delayed Human Outcome Signal
+This is excellent and has a formal name: **delayed human outcome signal**. The idea:
+> After the agent gives advice → user acts on it → after N hours/days when the effect resolves → user submits: "did it work? what else changed?"
+This gives you two things the simulator can't:
+1. **Ground truth** on whether advice was correct (human validates predicted changes).
+2. **Unmeasured second-order effects** (e.g., trust damage not captured by metrics).
+---
+## The Plan
+### Step 1 — Wire the orchestrator (1 day, critical)
+`lifestack_env.py:step()` currently calls `compute_reward()`. Change it to call `compute_task_reward()` when a `Task` is present. This instantly activates milestone + completion + replan rewards without writing new code.
+### Step 2 — Add the 3 missing independent reward functions (1 day)
+* **reward_format_compliance**: +1.0 for valid JSON, -1.0 for refusals/text. Prevents the most common GRPO failure mode.
+* **reward_plausibility_check**: Anti-gaming check. `ratio = sum(abs(metric_changes)) / max(1, sum(resource_costs))`. If ratio > 15, return -0.30.
+* **reward_timeout_check**: Penalty if `step_count >= max_steps` and not done.
+### Step 3 — Process-aware intermediate reward (1 day)
+Add a reasoning coherence check — does the reasoning field actually mention the conflict domain? insegning the same final reward to every token is inefficient.
+### Step 4 — Anti-hacking logging
+Add "suspicious" flag to logs: `reward > 0.8 and resource_cost == {}`.
+### Step 5 — Human outcome feedback loop (new feature, 2-3 days)
+Build `core/feedback.py` and Gradio UI for users to submit `OutcomeFeedback`. Store in ChromaDB and wire into retraining loop via `compute_human_feedback_reward`.
+---
+## Priority Order
+1. **Wire compute_task_reward into env.step()** → Immediate 4x more reward signal
+2. **Add format_compliance reward fn** → Prevents #1 GRPO failure mode
+3. **Add plausibility_check reward fn** → Blocks reward hacking
+4. **Log each fn independently in breakdown** → Satisfies guide §15
+5. **Build OutcomeFeedback dataclass + app UI** → Differentiator
+6. **Wire human feedback into ChromaDB + retraining** → Long-term loop

Implementation_final.md ADDED Viewed

	@@ -0,0 +1,219 @@

+# LifeStack Hackathon Sprint — Implementation Plan
+## Context
+**Submission deadline:** 26 Apr 5 PM. Offline from 25 Apr 8 AM. ~30 hours of offline build time.
+The LifeStack Flask demo (`app_flask.py` + `templates/index.html`) already ships 10 API endpoints, a 6-tab UI, and a working agent/memory/cascade/reward pipeline. This sprint adds **13 additive features** (demo panels, APIs, RLHF loop, multi-step training, real-data connectors, tests, blog) without breaking existing endpoints. All work is additive.
+Budget: **$90 HF credits** — T4 Small for the always-on demo Space, A10G for GRPO training runs, HF Inference API for the NLP panel. Target trained checkpoint: **`jdsb06/lifestack-grpo-v2`** (user will push).
+Key reusable primitives already in repo (do not rebuild):
+- `core/cascade_utils.py:5 animate_cascade()` — returns list of 4 frames with `flat` + `status` dicts
+- `agent/counterfactuals.py:10 generate_counterfactuals()` — returns list of alternatives
+- `agent/memory.py:74 LifeStackMemory.store_trajectory()` and `:128 store_feedback(OutcomeFeedback)`
+- `core/feedback.py OutcomeFeedback` + `compute_human_feedback_reward()`
+- `core/life_state.py:61 LifeMetrics.flatten()` — 23 metric paths
+- `agent/conflict_generator.py TEMPLATES` (13 scenarios) + `generate_conflict()`
+- `core/metric_schema.py VALID_METRIC_PATHS`
+Already wired in `app_flask.py`: `/api/feedback/submit` (Feature 9 backend is done — scope of F9 reduces to frontend panel + training integration); `/api/simulation/cascade` (kept intact, new `/api/cascade/frames` added alongside).
+---
+## Implementation Order (Offline Sprint)
+1. F1 Trained-vs-Baseline comparison (impact demo)
+2. F5 Domain risk heatmap (sidebar, always visible)
+3. F3 "Try Your Own" NLP + HF Inference fallback
+4. F2 D3 cascade visualisation
+5. F4 Personality comparison with OCEAN radar
+6. F6 Counterfactual explorer panel
+7. F8 Multi-step GRPO training loop + `push_to_hub`
+8. F9 RLHF feedback panel + training integration
+9. F7 Cold-vs-warm memory ablation demo
+10. F10 Health + calendar uploads
+11. F11 BLOG.md (~700 words)
+12. F12 Four tests
+13. F13 Episode history/replay
+Before starting, run smoke tests (`scripts/smoke_test.py`, `scripts/eval.py --episodes 5`, cascade/counterfactual imports). Fix before adding features.
+---
+## Cross-Cutting Changes
+### `requirements.txt` — add
+- `huggingface_hub` (for F3 InferenceClient and F8 push_to_hub)
+- `icalendar` (F10 calendar upload)
+### `intake/intake.py` — LLM fallback chain (F3 dependency)
+Refactor `_call_llm()` (~line 44) to cascade: **HF Inference API (`HF_TOKEN`) → Groq (`GROQ_API_KEY`) → empty-string fallback** (existing behaviour). `LifeIntake.__init__` constructs both an `InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=HF_TOKEN)` when `HF_TOKEN` is present and the existing Groq `OpenAI` client when `GROQ_API_KEY` is present. `extract_conflict()` already returns an empty `ConflictEvent` when the LLM returns empty — keyword fallback below strengthens that path.
+**Keyword fallback:** add `_match_template_by_keywords(text: str) -> ConflictEvent | None` that scans `TEMPLATES` for overlap with user text and returns the best match. Called inside `extract_conflict()` when both LLM clients fail.
+### `app_flask.py` — shared helpers (used by F1, F4, F5, F7)
+- `_run_episode(person, conflict, steps, seed, agent_fn) -> list[step_dict]`: initialises a fresh `LifeStackEnv`, applies the conflict disruption, loops `steps` iterations calling `agent_fn(metrics, budget, conflict, person)` to pick an action, runs `env.step()`, and collects `{step, action_type, target, reward, metrics, cost}`. `agent_fn` is injected so F1 can pass a random-action picker and a `LifeStackAgent.get_action`-wrapped version.
+- `_random_action(metrics, budget, conflict, person) -> AgentAction`: samples uniformly from `core.action_space.EXAMPLE_ACTIONS` (line 98–196) and jitters `metric_changes` slightly so the baseline isn't deterministic. Same return shape as `AGENT.get_action()`.
+- `compute_domain_health(flat_metrics: dict) -> dict[str, float]`: averages sub-metrics per domain, inverts `INVERTED_METRICS` (line 67, already defined), returns `{career, finances, relationships, physical_health, mental_wellbeing, time}` each in [0,1].
+### `templates/index.html` — UI integration pattern
+Every new feature adds one new tab button in the nav bar (line 37–44) and one content `<div id="content-X">` in the main section (line 46–202). Reuse existing classes: `.glass`, `.tab-active`, `.metric-bar`, Tailwind (`.rounded-2xl`, `.p-6`, `.space-y-6`, `.grid grid-cols-2 gap-6`, `.text-slate-400`, `.bg-indigo-500/10`). Chart.js is already loaded via CDN (line 8); D3 v7 to be added.
+---
+## Feature-by-Feature
+### F1 — Trained vs Baseline Comparison
+**Backend — `app_flask.py`:**
+- `POST /api/comparison/run` → body `{conflict, person, steps=5, seed=42}`.
+  - Resolve `conflict` via `CONFLICT_CHOICES`, `person` via `PERSONS`.
+  - Call `_run_episode(..., agent_fn=_random_action)` → `baseline`.
+  - Call `_run_episode(..., agent_fn=lambda m,b,c,p: AGENT.get_action(m,b,c,p))` with identical seed → `trained`.
+  - Compute `reward_delta = sum(trained_rewards) - sum(baseline_rewards)`.
+  - Return `{baseline: [...], trained: [...], reward_delta}`.
+**Frontend:**
+- New tab "Comparison". Two side-by-side `.glass` cards titled "Baseline (Random)" and "GRPO-Trained". For each step, render action-type badge + reward bar. Delta banner at the bottom (`bg-indigo-500/10`) showing `+X.XX`.
+### F2 — Live Cascade Visualisation (D3)
+**Backend:**
+- `POST /api/cascade/frames` → body `{primary_disruption: {metric_path: delta}}`. Calls `animate_cascade(primary_disruption, LifeMetrics())` and returns `{frames}`. Keeps existing `/api/simulation/cascade` untouched.
+**Frontend:**
+- Add D3 v7 CDN line in `<head>`.
+- New section inside the "Situational Portal" tab (below the existing cascade timeline at line ~70): `<svg id="cascade-graph" width="720" height="420">`.
+- JS module `renderCascade(frames)`: creates 23 nodes from `VALID_METRIC_PATHS`, clusters by domain (6 cluster centres at: career TL, finances TR, relationships ML, physical_health MR, mental_wellbeing BC, time TC), draws edges from a hardcoded copy of the 20+ edges in `DependencyGraph.edges`. Iterates frames with 600ms `setTimeout`, recolouring nodes based on `frames[i].status[metric]`: `unchanged→#334155`, `primary→#ef4444`, `first→#f97316`, `second→#facc15`.
+- Called from the existing simulation-action flow after each `/api/simulation/action` response.
+### F3 — "Try Your Own Situation" NLP Panel
+**Backend:**
+- `/api/custom/run` already exists (line 162) and is fully wired. No route changes.
+- `intake/intake.py` cross-cutting change above adds HF→Groq→keyword fallback.
+**Frontend:**
+- Existing "Try Your Case" tab (`#tab-custom`) is currently slider-heavy. Add a prominent textarea + Submit above the sliders. On submit, `fetch('/api/custom/run', {situation: text})` → render a card with detected domain(s), recommended action type/target, metric deltas as coloured badges (green for positive on positive-sense metrics, red otherwise, using `INVERTED_METRICS` set), reward bar.
+### F4 — Personality Comparison
+**Backend:**
+- `POST /api/personality/compare` → body `{conflict_id="d5_friday", person_a, person_b, steps=3}`.
+  - Look up persons from `PERSONS`. Run `_run_episode` twice with the trained agent on the same conflict + seed.
+  - Return `{person_a: {name, actions, total_reward, ocean: {O,C,E,A,N}}, person_b: {...}, dominant_trait: "neuroticism"}` where `dominant_trait = argmax(|ocean_a[t] - ocean_b[t]|)`.
+**Frontend:**
+- New tab "Personality". Two `.glass` columns. Each has a Chart.js radar chart (already CDN-loaded) with 5 axes (OCEAN). Below the radar: action sequence + total reward. Banner highlighting the dominant trait.
+### F5 — Domain Risk Heatmap
+**Backend:** `compute_domain_health()` helper added (cross-cutting section). Every response from `/api/simulation/start`, `/api/simulation/action`, `/api/custom/run` gets an extra `domain_health` field derived from the metrics already in the payload — no new route.
+**Frontend:** Persistent top bar above tab nav (inserted at ~line 35): 6 cells (2×3 grid on small, 6×1 on large). Each cell shows the domain emoji from `DOMAIN_EMOJI` and a pill background coloured via `hsl((1 - h) * 120, 70%, 45%)`. Re-rendered from every simulation response.
+### F6 — Counterfactual Explorer
+**Backend:**
+- `POST /api/counterfactuals/generate` → body `{conflict, person, chosen_action: {...}}`. Reconstructs state, calls `generate_counterfactuals(AGENT, metrics, budget, conflict, person, chosen_action)`, returns `{chosen: {...}, alternatives: [3 items from the list]}`. (Counterfactuals already appear inside `/api/simulation/action` response — this route is the on-demand variant Feature 6 wants.)
+**Frontend:** "What If?" collapsible panel appended below each step output. 3 alternative cards sorted by predicted reward. Chosen action outlined in indigo, best alt in green, worst in red.
+### F7 — Memory Ablation (Cold vs Warm)
+**Backend:**
+- `POST /api/memory/ablation` → body `{conflict, person, steps=5}`.
+  - Episode 1: pass `memory=None` (or a fresh `LifeStackAgent()` with empty `.memory`). Record actions + rewards.
+  - `MEMORY.store_trajectory(conflict_title=..., route_taken=..., total_reward=..., reasoning=...)` for episode 1.
+  - Episode 2: reuse `AGENT` (global — has ChromaDB via `MEMORY`). Query `MEMORY` for similar trajectories (existing retrieval method) and pass the top-k summary into `get_action`'s `few_shot_context` param.
+  - Return `{cold: {actions, reward}, warm: {actions, reward, retrieved_context}, improvement_pct}`.
+**Frontend:** Two-column timeline in a new "Memory" tab. Callout box with `💡 Agent recalled: …` when warm has retrieved context. Big percentage banner at the bottom.
+### F8 — Multi-Step GRPO Training
+**`scripts/train_trl.py` (currently 914 lines, single-prompt per scenario):**
+- Add `run_full_episode(task, person, model, tokenizer, max_steps=10) -> tuple[list[step_reward], dict]`:
+  - For each step: build prompt from current `LifeMetrics` + `ResourceBudget` + conflict, call `model.generate`, parse JSON action, call `env.step()`, append step reward from existing `compute_task_reward()`.
+  - Return per-step rewards and a serialised trajectory.
+- New CLI flag `--full-episode`. When set, `generate_dataset()` is replaced by `generate_episodic_dataset()` which calls `run_full_episode` per scenario and uses `sum(step_rewards) / max_steps` as the GRPO reward.
+- `--dry-run` compatibility: 1 episode × 2 steps with a mock model (existing dry-run path stays valid).
+- After `trainer.save_model()` at line 610, add `if not args.dry_run and args.push_to_hub: model.push_to_hub("jdsb06/lifestack-grpo-v2"); tokenizer.push_to_hub("jdsb06/lifestack-grpo-v2")`. New `--push-to-hub` flag guards it.
+- Run on HF A10G once built: `python scripts/train_trl.py --full-episode --stages 5 --push-to-hub` (~$5).
+### F9 — RLHF Loop
+- **Backend:** `/api/feedback/submit` already fully implemented (line 267). No route changes needed.
+- **Frontend:** Post-episode feedback panel (rendered after every completed simulation/custom/comparison episode). Slider 0–10, domain checkboxes (6 domains × improved/worsened), textarea. Submit posts `{episode_id, score, improved[], worsened[], notes, time}` to existing endpoint.
+- **Training integration (`scripts/train_trl.py`):** New `--with-human-feedback` flag. When set, a new reward component `reward_human_feedback_fn` (hook already exists around line 379) loads stored feedback via `MEMORY.feedback_collection.query()` keyed by episode_id and blends `compute_human_feedback_reward()` output at weight 0.10, rebalancing existing weights proportionally.
+### F10 — Real Data Integrations
+**Backend:**
+- `POST /api/data/health/upload` (multipart): accepts `.json` (Google Fit) or `.xml` (Apple Health). Parse `steps`, `heart_rate_resting`, `sleep_hours` (approximate parse; tolerate missing fields). Map to `physical_health.fitness`, `physical_health.energy`, `physical_health.sleep_quality`. Store in new module-level dict `USER_HEALTH_OVERRIDES`. Return `{parsed_metrics, events_found}`.
+- `POST /api/data/calendar/upload` (multipart): `.ics` via `icalendar.Calendar.from_ical()`. Count events in next 7 days → `time.free_hours_per_week` (inverse), `career.workload`. Keyword match ("gym", "run", "yoga") → bump `physical_health.fitness`. Return same shape.
+- `/api/simulation/start` and `/api/custom/run` consult `USER_HEALTH_OVERRIDES` when initialising `LifeMetrics()`.
+**Frontend:** New "Connect My Data" subsection at the top of "Try Your Case". Two file inputs. After upload, render a chip list with `📊 From your real data — physical_health.fitness: 78`.
+### F11 — BLOG.md (~700 words)
+Rewrite the 13-line BLOG.md with 5 sections: Problem, What We Built, Key Results (+125%, +155%, +116% — already in README lines 45–71), What We Learned, What's Next. Inline-cite the 4 papers from README lines 233–241 (Starcke & Brand 2012; Roijers et al. 2013; Mullainathan & Shafir 2013; Wang et al. 2024).
+### F12 — Four Tests (tests/)
+- `test_env_reset.py`: `LifeStackEnv().reset()` → budget is fresh; reset twice → metrics identical. ~20 lines, pytest.
+- `test_cascade.py`: `animate_cascade({"mental_wellbeing.stress_level": 30}, LifeMetrics())` returns 4 frames; frame 0 status all `unchanged`; frame 1 has at least one `primary`.
+- `test_task_generator.py` (scoped per user answer): asserts `generate_conflict()` returns a valid `ConflictEvent` for each of the 6 life domains and `TEMPLATES` covers difficulties 1–5.
+- `test_reward.py`: `compute_reward()` result in `[-1, 1]`; plausibility component penalises a 0-cost, 50-delta action.
+### F13 — Episode History
+**Backend:**
+- Maintain ring buffer `EPISODE_HISTORY: deque[dict] = deque(maxlen=5)` module-level in `app_flask.py`. After every episode-producing route, append `{id, conflict, steps[], final_reward, timestamp}`.
+- `GET /api/history/list` returns summaries. `GET /api/history/replay/<episode_id>` returns full step log.
+**Frontend:** New "History" tab, accordion list, click-to-expand per episode.
+---
+## Critical Files to Modify
+| File | Features touching it |
+|------|------|
+| `app_flask.py` | F1, F2, F4, F5, F6, F7, F10, F13 (7 new routes, 3 helpers, 1 deque) |
+| `intake/intake.py` | F3 (LLM fallback chain, keyword match) |
+| `templates/index.html` | F1, F2, F3, F4, F5, F6, F7, F9, F10, F13 (new tabs, heatmap bar, D3 SVG, feedback panel) |
+| `scripts/train_trl.py` | F8 (`run_full_episode`, `--full-episode`, `--push-to-hub`), F9 (`--with-human-feedback`) |
+| `requirements.txt` | `huggingface_hub`, `icalendar` |
+| `BLOG.md` | F11 (full rewrite) |
+| `tests/test_env_reset.py`, `test_cascade.py`, `test_task_generator.py`, `test_reward.py` | F12 (new files) |
+No other files get edited. No existing route or dataclass is modified.
+---
+## Verification
+**Local (no GPU):**
+```bash
+python scripts/smoke_test.py
+python scripts/eval.py --episodes 5
+python -m pytest tests/ -v
+python scripts/train_trl.py --full-episode --dry-run   # F8 dry-run
+python app_flask.py  # open localhost:7860, click through each new tab
+```
+**HF Inference API check (F3):**
+```python
+from huggingface_hub import InferenceClient; import os
+c = InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=os.getenv("HF_TOKEN"))
+print(c.chat_completion([{"role":"user","content":"Reply OK"}], max_tokens=5).choices[0].message.content)
+```
+**HF Space (T4, $0.60/hr, leave running 25 Apr 8 AM → 26 Apr 5 PM ≈ $20):**
+1. Space settings → hardware: T4 Small.
+2. Secrets: `HF_TOKEN`, `GROQ_API_KEY`.
+3. Push branch → confirm Flask app starts on port 7860 → open every tab.
+**A10G training run (F8, ~$5, one-off):**
+```bash
+python scripts/train_trl.py --full-episode --stages 5 --push-to-hub
+```
+Afterwards: `https://huggingface.co/jdsb06/lifestack-grpo-v2` should show the checkpoint.
+**End-to-end demo walkthrough to rehearse before 26 Apr 5 PM:**
+1. Open Situational Portal → run Friday 6PM conflict → cascade SVG animates, heatmap shifts red.
+2. Switch to Comparison tab → same conflict → watch delta bar fill positive.
+3. Personality tab → Alex vs Chloe → radars + different rewards.
+4. Try Your Case → paste "I just got fired and rent is due tomorrow" → plan card renders.
+5. Memory tab → cold vs warm ablation → +116% banner.
+6. Submit a feedback slider → stats endpoint reflects new feedback count.

Implementation_plan_v2.md ADDED Viewed

	@@ -0,0 +1,359 @@

+# LifeStack Long-Horizon Upgrade Plan
+## Context
+LifeStack is a hackathon RL project that simulates life-decision tasks as a gym-style environment. Currently episodes are 5 steps long, use a single linear conflict path, have no hidden state or exogenous events, and reward only step-level metric improvements. Judges expect a proper long-horizon environment with 20+ steps, branching routes, dynamic world changes, partial observability, and task-completion rewards. This plan covers the full upgrade across pre-hackathon, Day 1, and Day 2.
+**Key discoveries from reading the repo:**
+- `app.py` is a **Gradio app** (not FastAPI). New "endpoints" = new Gradio tabs/functions.
+- `max_steps = 5` is hardcoded in **two places**: `core/lifestack_env.py:93` AND `core/lifestack_gym_env.py:62`.
+- The current reward is step-local only (no task-completion bonus exists anywhere).
+- `memory.py` stores single decisions keyed by conflict title — no trajectory concept exists.
+- `run_episode.py` orchestrates the loop outside the env (agent loop + env.step in separate code).
+- ChromaDB is already persistent (`./lifestack_memory/`).
+- `train_trl.py` already has a working GRPO loop with Unsloth — just needs new env interface.
+- `app.py` imports `LongitudinalDemo` (not in the file listing — likely missing or in a data file).
+---
+## Proposed `core/task.py` Schema (SHARED CONTRACT — agree before writing any logic)
+```python
+from dataclasses import dataclass, field
+from typing import Any
+@dataclass
+class HiddenStateField:
+    key: str               # e.g. "boss_mood"
+    initial_value: Any     # e.g. "neutral"
+    inspect_target: str    # e.g. "call_boss" — which inspect action type reveals this
+    description: str       # shown to agent after reveal
+@dataclass
+class ExoEvent:
+    step: int              # inject at this step (inclusive); -1 = probabilistic
+    probability: float     # 1.0 = deterministic; <1.0 = random at each step
+    id: str                # e.g. "ticket_price_spike"
+    description: str       # what agent sees in next observation
+    world_mutation: dict   # e.g. {"ticket_price": 450, "seats_remaining": 1}
+    hidden_state_mutation: dict  # e.g. {"boss_mood": "angry"}
+    closes_routes: list[str] = field(default_factory=list)  # route IDs this event blocks
+@dataclass
+class Milestone:
+    id: str                # e.g. "flight_rebooked"
+    description: str
+    condition_key: str     # world/hidden key to check, e.g. "flight_rebooked"
+    condition_value: Any   # e.g. True
+    reward: float          # milestone reward added to episode total
+@dataclass
+class Route:
+    id: str                # e.g. "rebook_premium"
+    name: str
+    description: str
+    required_action_types: list[str]  # must use these tool actions to complete
+    preconditions: dict    # world/hidden state checks, e.g. {"card_available": True}
+    consequences: dict     # world mutations on route completion, e.g. {"flight_rebooked": True}
+    closes_routes: list[str]  # route IDs this blocks
+    milestones_unlocked: list[str]  # milestone IDs this route can hit
+    final_reward: float    # bonus on route completion
+@dataclass
+class Task:
+    id: str
+    domain: str            # "flight_crisis" | "code_merge_crisis"
+    goal: str
+    constraints: dict      # e.g. {"budget_max": 400, "deadline_step": 18}
+    hidden_state: dict     # full truth, agent never sees directly
+    mutable_world: dict    # partial truth, some fields revealed by inspect
+    visible_world: dict    # agent sees this at each step (subset of mutable_world)
+    success_conditions: list[dict]  # e.g. [{"key": "flight_rebooked", "value": True}]
+    failure_conditions: list[dict]  # e.g. [{"key": "missed_deadline", "value": True}]
+    event_schedule: list[ExoEvent]
+    viable_routes: list[Route]
+    milestones: list[Milestone]
+    horizon: int           # max steps (20–50)
+    difficulty: int        # 1–5
+    domain_metadata: dict  # domain-specific extra data (story text, etc.)
+```
+**Agreement required:** All three team members must freeze this schema before writing any logic.
+---
+## Risk Register
+| Risk | Severity | Mitigation |
+|------|----------|------------|
+| **Cascade runaway over 30 steps** — DependencyGraph with 0.6 dampening can collapse metrics to 0 after repeated disruptions | HIGH | Add `metric_floor = 10.0` in `life_state.py`; cascade clamps to `max(floor, result)` not `max(0, result)`. Also add per-step cascade cap: max 3 metrics affected per step. |
+| **Resource exhaustion on longer episodes** — Default 20h/500$/100e depletes in ~5 steps of aggressive action | HIGH | Scale budgets proportionally in `reset()`: `time=20*max_steps/5`, etc. Make configurable per-Task via `constraints`. |
+| **Reward hacking: inspect spam** — Agent learns to `inspect` repeatedly for reward | HIGH | Anti-cheat: same hidden_state key cannot be inspected twice. Inspect has no intrinsic reward. |
+| **Reward hacking: wait loops** — Agent waits forever | MEDIUM | Cap: max 3 consecutive `wait` actions; 4th `wait` triggers forced `escalate`. |
+| **Reward hacking: rollback loops** — Rollback-execute-rollback cycle | MEDIUM | Rollback is only available once per route; marks action as `used_rollback=True` in state. |
+| **Colab T4 session timeout** — Free Colab sessions timeout at ~12h | MEDIUM | Save checkpoint every 50 steps in `train_trl.py`. Use `trainer.save_checkpoint()` not just `save_pretrained_merged()` at end. |
+| **ChromaDB trajectory bloat** — 30 steps × 23 metrics = ~700 floats per trajectory; 100 trajectories = 70k floats | LOW | Store trajectory summary (start/end state diff + route taken + total reward), not full step-by-step. |
+| **OpenEnv API version** — `openenv-core>=0.2.3` in requirements; `_EnvBase`, `Action`, `Observation`, `State`, `Rubric` are OpenEnv abstractions. Need to confirm `create_app()` signature matches. | MEDIUM | Do not change `LifeStackAction`/`LifeStackObservation`/`LifeStackState` class names or fields. Add new fields as `Optional` to maintain backward compat. |
+| **Two hardcoded `max_steps=5`** — Will break if only one is updated | HIGH | Fix both in Phase 0. Make `max_steps` a constructor param defaulting to `task.horizon` or 30. |
+| **`app.py` imports `LongitudinalDemo`** — Not in file listing; may be missing class | MEDIUM | Check if it's defined inline or in a missing file. If missing, stub it for Day 1. |
+| **`run_episode.py` duplicates env loop** — Agent loop lives outside env. New long-horizon logic must work in both env.step() and the external runner | MEDIUM | Keep `run_episode.py` working; it calls `env.step()` which now handles world mutation/events internally. |
+| **TRL GRPO reward function parses prompt** — `lifestack_reward_fn` in `train_trl.py` reconstructs state from prompt text | MEDIUM | After env upgrade, update `build_prompt_for_conflict()` to include Task fields and update reward function accordingly. |
+---
+## File-by-File Change Plan
+### NEW: `core/task.py`
+- All dataclasses from schema above
+- `FlightCrisisTask()` factory function returning a hardcoded Task instance (used for testing)
+- `CodeMergeCrisisTask()` factory (stubbed Day 1, complete Day 2)
+- No imports from other project files (pure data)
+### MODIFIED: `core/lifestack_env.py`
+**Existing:** `max_steps=5`, flat step logic, no hidden state, no events
+**Changes:**
+- Add `WorldEngine` inner class:
+  - `__init__(task: Task)` — stores event schedule
+  - `inject_events(step: int, world: dict, hidden: dict) -> list[ExoEvent]` — returns events fired this step, mutates world/hidden in-place
+  - `get_closed_routes() -> set[str]` — routes blocked by events
+- Add `PartialObsFilter`:
+  - `filter(world: dict, revealed_keys: set[str]) -> dict` — returns only visible_world + revealed fields
+- Change `__init__` signature: `__init__(task: Task = None, max_steps: int = 30)`
+- In `reset()`: initialize `world_state`, `hidden_state`, `revealed_hidden_keys`, `current_task`, `active_route`, `milestones_achieved`, `used_rollback`
+- In `step()`:
+  1. Run `world_engine.inject_events(step)` → get fired events
+  2. Apply ToolAction logic (inspect/plan/execute/wait/rollback/escalate)
+  3. Check route preconditions; mark routes closed if violated
+  4. Compute reward via updated `compute_reward()`
+  5. Check success/failure conditions from task
+  6. Build observation with `partial_obs_filter`
+- Add `render()` update: show task goal, active route, milestones achieved, events log
+- **Preserve:** `LifeStackAction`, `LifeStackObservation`, `LifeStackState` class names and core fields (add Optional new fields)
+### MODIFIED: `core/action_space.py`
+**Add** `ToolAction` enum:
+```python
+class ToolActionType(str, Enum):
+    INSPECT = "inspect"
+    PLAN = "plan"
+    EXECUTE = "execute"
+    COMMUNICATE = "communicate"
+    WAIT = "wait"
+    ROLLBACK = "rollback"
+    ESCALATE = "escalate"
+```
+**Add** `ToolAction` dataclass:
+```python
+@dataclass
+class ToolAction:
+    action_type: ToolActionType
+    target: str          # inspect target, execute target, communicate recipient, etc.
+    parameters: dict     # action-specific params
+    reasoning: str
+```
+**Add** `validate_tool_action(action: ToolAction, env_state: dict) -> tuple[bool, str]`
+- Checks: inspect not repeated for same key, wait count ≤ 3, rollback only if not used
+**Keep:** `AgentAction`, `PrimaryAction`, `CommunicationAction`, `EXAMPLE_ACTIONS` unchanged
+### MODIFIED: `core/reward.py`
+**Add** functions (do NOT remove `compute_reward`):
+```python
+def compute_milestone_reward(milestones_achieved: list[str], task: Task) -> float
+def compute_task_completion_reward(success_conditions_met: list[bool], task: Task) -> float
+def compute_replan_bonus(exo_events_seen: int, milestones_after_event: int) -> float
+def compute_dead_end_penalty(routes_remaining: int) -> float
+```
+**Add** `compute_task_reward(...)` — orchestrates all components:
+- 10% local metric delta (old `compute_reward`)
+- 40% milestone rewards
+- 30% task completion
+- 10% replan bonus
+- 10% efficiency
+- Penalties: dead end (-0.5), rollback used (-0.1), cascade collapse (-0.3)
+### MODIFIED: `core/life_state.py`
+- Add `METRIC_FLOOR = 10.0` constant
+- In `DependencyGraph.cascade()`: change `max(0, ...)` to `max(METRIC_FLOOR, ...)` for cascade-induced changes (not direct actions)
+- Add `per_step_cascade_cap = 3` — BFS stops after affecting 3 nodes per step call
+### MODIFIED: `agent/conflict_generator.py`
+**Add** `TaskGenerator` class:
+```python
+class TaskGenerator:
+    def generate(self, domain: str = None, difficulty: int = None) -> Task
+    def generate_flight_crisis(self, difficulty: int) -> Task
+    def generate_code_merge_crisis(self, difficulty: int) -> Task
+```
+**Keep:** `ConflictEvent`, `TEMPLATES`, `generate_conflict()`, `escalate_conflict()` fully intact
+### MODIFIED: `agent/memory.py`
+**Add** to `store_decision()`: optional `trajectory: list[dict] = None` and `route_outcome: str = None` params
+**Add** `store_trajectory(task_id, route_taken, total_reward, trajectory_summary)` method:
+- `trajectory_summary` = `{start_state_diff, end_state_diff, milestones_hit, events_seen, route_id, total_reward}`
+- Store in separate ChromaDB collection `'trajectories'`
+**Add** `retrieve_similar_trajectories(task_domain, current_world) -> list[dict]`
+**Keep:** all existing methods unchanged
+### MODIFIED: `app.py` (Gradio)
+**Add** Tab 5: "Task Explorer":
+- Shows current Task object (goal, constraints, visible routes, milestones)
+- Shows event log for current episode
+- Shows route lock status
+**Add** helper functions:
+- `task_html(task: Task) -> str` — renders goal, routes, milestones
+- `event_log_html(events: list[ExoEvent]) -> str`
+- `route_status_html(routes: list[Route], closed: set[str]) -> str`
+**Keep:** All existing tabs and functions unchanged.
+### MODIFIED: `openenv.yaml`
+```yaml
+metadata:
+  max_episode_steps: 50
+  task_domains: [flight_crisis, code_merge_crisis]
+  # existing fields unchanged
+```
+### MODIFIED: `notebooks/LifeStack_Training.ipynb`
+- Update env init cell to use `Task` objects
+- Add Colab-ready GRPO cell with pinned versions:
+  - `unsloth==2024.12.4`, `trl>=0.9`, `transformers>=4.45`
+  - Model: `Qwen2.5-1.5B-Instruct` (fits T4 with 4-bit)
+- Add reward breakdown visualization cell
+- Checkpoint every 50 steps cell
+---
+## Task Domain Specs
+### Domain 1: Flight Crisis
+```
+goal: "Catch the rescheduled flight and submit expense report by Sunday"
+constraints: {budget_max: 400, deadline_step: 18, report_deadline_step: 22}
+hidden_state:
+  boss_mood: "neutral"      # revealed by inspect("call_boss")
+  card_limit: 350           # revealed by inspect("check_card")
+  partner_flexibility: 0.7  # revealed by inspect("text_partner")
+mutable_world:
+  ticket_price: 280         # changes at step 5 (spike to 450)
+  seats_remaining: 3        # decreases each step probabilistically
+  flight_rebooked: false
+  report_submitted: false
+event_schedule:
+  step 5: {ticket_price: 450, seats_remaining: 1} (closes route "rebook_premium" if budget_max=400)
+  step 8: {boss_mood: "annoyed"} (hidden_state mutation via msg)
+  step 12: {card_blocked: true} (closes routes "rebook_premium", "hotel_stay")
+routes:
+  A: rebook_premium (precond: card_available=True, budget>=ticket_price)
+  B: bus_and_remote (always open; slower, lower reward)
+  C: hotel_next_day (precond: card_available=True; closed at step 12)
+  D: family_loan (precond: partner_flexibility>=0.5; revealed after inspect)
+  E: negotiate_deadline (precond: boss_mood != "furious"; closed if boss_mood="furious")
+milestones:
+  - inspect_boss: reward=0.05 (inspected boss_mood)
+  - flight_rebooked: reward=0.20
+  - report_submitted: reward=0.15
+  - under_budget: reward=0.10 (total spend < budget_max)
+horizon: 25
+```
+### Domain 2: Code Merge Crisis
+```
+goal: "Merge feature branch without breaking main; deploy by Friday"
+constraints: {deploy_deadline_step: 30, max_conflicts: 5}
+hidden_state:
+  reviewer_strictness: "medium"  # revealed by inspect("check_pr_history")
+  ci_flakiness_score: 0.3       # revealed by inspect("check_ci_logs")
+  teammate_available: true       # revealed by inspect("ping_teammate")
+mutable_world:
+  conflicts_remaining: 4
+  ci_passing: false
+  pr_approved: false
+  deploy_done: false
+event_schedule:
+  step 3: new commits land (conflicts_remaining += 2)
+  step 7: CI fails (ci_passing: false, closes "direct_merge" route)
+  step 10: reviewer blocks PR (pr_approved: false, mutates reviewer_strictness based on history)
+routes:
+  A: rebase (always open; risk of conflict if new commits land)
+  B: cherry_pick (precond: conflicts_remaining <= 3)
+  C: manual_merge (always open; slower, high reward if careful)
+  D: rollback_split_pr (precond: used_rollback=False)
+milestones:
+  - conflicts_resolved: reward=0.15
+  - ci_passing: reward=0.15
+  - pr_approved: reward=0.15
+  - deployed: reward=0.25
+horizon: 30
+```
+---
+## Hour-by-Hour Task Board
+### Phase 0 — Pre-hackathon (Now → Apr 25 8 AM)
+| Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
+|------|----------------|------------------------|---------------------|
+| Now | Define `core/task.py` together — ALL THREE agree on schema | Same | Same |
+| +1h | Add `ToolActionType` enum to `action_space.py` | Add `TaskGenerator` stub returning 1 hardcoded FlightCrisis Task | Colab smoke test: TRL+Unsloth GRPO on 5-step env. Confirm GPU, pin versions. |
+| +2h | Stub `WorldEngine` in `lifestack_env.py` (inject_events returns []) | Define full FlightCrisis `mutable_world` and `hidden_state` dicts | Confirm training loop runs 100 steps with non-zero reward |
+| +3h | Bump `max_steps=30` in both files + openenv.yaml. Run `run_episode.py`. | Build all 5 Route objects for Flight Crisis | Save Colab checkpoint; verify Unsloth merge path works |
+| +4h | Confirm existing tests pass with max_steps=30 | Stub Code Merge task (fields only, no events yet) | Update `train_trl.py` to accept Task object from env |
+| +4h | Sleep | Sleep | Sleep |
+### Day 1 — Apr 25 (8 AM → Midnight)
+| Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
+|------|----------------|------------------------|---------------------|
+| 8–10 AM | Full WorldEngine: inject_events fires at correct steps, mutates world/hidden dicts | Complete event_schedule for Flight Crisis (3 events) | Trajectory memory: add store_trajectory() to memory.py |
+| 10 AM–1 PM | PartialObsFilter: filter() hides hidden_state fields until revealed. inspect action reveals one field per call. | Milestone reward: compute_milestone_reward() fires when condition_key/value matches. Test manually. | /task and /routes Gradio tab (task_html, route_status_html) |
+| 1–3 PM | **Integration test**: run_episode.py on 25-step Flight Crisis. Events inject at steps 5/8/12. inspect reveals boss_mood. Milestone fires on flight_rebooked. | **Integration test**: reward breakdown shows milestone + completion components. Fix any component that returns NaN or 0 always. | **Integration test**: training loop runs on new env, reward curve non-trivially non-zero |
+| 3–5 PM | Fix cascade runaway: add METRIC_FLOOR=10, per-step cascade cap=3 | Code Merge task: full event_schedule (steps 3/7/10) + all 4 routes | Start Colab training on FlightCrisis. Qwen2.5-1.5B. Log every 50 steps. |
+| 5–7 PM | Reward hacking audit: can inspect spam score high? Can wait=30 score? Can rollback-loop? Fix each exploit. | Reward hacking audit: same. Anti-cheat: inspect blocks on repeated key, wait cap=3 consecutive | Monitor training. If reward flats at 0, check reward_fn in train_trl.py. |
+| 7–9 PM | Smoke test: both task domains, 5 episodes each, no crashes | Smoke test all milestones + failure conditions fire correctly | Save checkpoint. Run before/after comparison: baseline vs trained on FlightCrisis. |
+| 9–11 PM | render() update: show task goal, active route, milestone log, event log | Efficiency penalty tuning: make it punish but not dominate | Push notebook to Colab. Test from cold start. |
+| 11 PM | Commit stable checkpoint | Commit | Commit |
+### Day 2 — Apr 26 (8 AM → 8 PM)
+| Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
+|------|----------------|------------------------|---------------------|
+| 8–10 AM | Curriculum variants: easy Flight Crisis (deadline_step=25, no card block event) | Easy/medium/hard difficulty scaling for both tasks | Longer Kaggle (P100) training run. Curriculum: easy → hard. |
+| 10 AM–12 PM | Render polish: episode timeline readable by judges | Reward breakdown display in Gradio | Inference test: load merged model, run 5 episodes, compare reward vs baseline |
+| 12–2 PM | HF Space setup: test Space endpoint with $200 credits | Code Merge fully working end-to-end | Demo script: baseline → reward output → trained → measurable gain |
+| 2–4 PM | README architecture diagram | Reward breakdown chart (matplotlib, per episode) | Record 2-min demo |
+| 4–6 PM | Final smoke test of both domains | Final reward hacking audit pass | BLOG.md update |
+| 6–8 PM | Submit | Submit | Submit |
+---
+## Verification Plan
+1. **Unit test `core/task.py`**: instantiate both Task objects, check all fields present and typed correctly
+2. **Unit test `WorldEngine`**: inject step 5 event on FlightCrisis, verify `ticket_price` updates from 280 to 450
+3. **Unit test `PartialObsFilter`**: hidden field not in output before inspect; in output after inspect("call_boss")
+4. **Unit test `compute_milestone_reward`**: set `flight_rebooked=True` in world, verify milestone fires with reward=0.20
+5. **Integration test (run_episode.py)**: 25-step FlightCrisis episode with LifeStackAgent. Check: (a) reward > 0, (b) events fired at correct steps, (c) route closed after card_blocked event, (d) milestones logged in obs.metadata
+6. **Reward hacking test**: manually set actions to pure inspect for 25 steps — verify total_reward < 0.1. Pure wait for 25 steps — verify truncation fires and penalty applied.
+7. **Training test**: run `train_trl.py` for 50 steps on Colab. Verify reward_curve shows non-flat trend.
+8. **Backward compat test**: run `run_episode.py` with the old `conflict_generator.generate_conflict()` (no Task object). Should not crash.
+---
+## Critical Files
+| File | Status | Owner |
+|------|--------|-------|
+| `core/task.py` | NEW | A+B together first |
+| `core/lifestack_env.py` | MAJOR CHANGE | A |
+| `core/action_space.py` | ADD ToolAction enum | B |
+| `core/reward.py` | ADD task-level functions | B |
+| `core/life_state.py` | ADD floor + cap | A |
+| `agent/conflict_generator.py` | ADD TaskGenerator | B |
+| `agent/memory.py` | ADD trajectory storage | C |
+| `app.py` | ADD Task Explorer tab | C |
+| `openenv.yaml` | UPDATE max_episode_steps | A |
+| `notebooks/LifeStack_Training.ipynb` | UPDATE for new env | C |
+| `scripts/train_trl.py` | UPDATE reward_fn + prompt | C |

MENTOR_PITCH.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Mentor Meeting Playbook — LifeStack Engine
+## The Core Framing
+**Research Question:** "Can a small model (1.5B) learn to navigate multi-domain, causally-coupled crises better than a base LLM, using GRPO with a 7-day horizon reward?"
+---
+## Slide Deck Structure (8 Slides Max)
+### Slide 1 — The Gap (30 sec)
+*   **Current AI:** Single-turn advice, no state, no consequence modeling.
+*   **LifeStack:** Life as a Markov Decision Process — 23 metrics, 6 domains, 40 causal edges.
+*   **Hook:** "We built the environment that lets you train models on the 'ripple effects' of human decisions."
+### Slide 2 — The Environment (1 min)
+*   **Standards-Based:** LifeStackEnv extends `openenv.Environment`.
+*   **Causal Foundation:** 40 edges from Starcke & Brand (2012) — research-grounded, not arbitrary.
+*   **Deterministic World:** `DependencyGraph.propagate()` uses matrix math, not LLM hallucination.
+*   **State Vector:** 26-dim observation space across 23 tracked metrics.
+### Slide 3 — The Cascade (The Visual Hook)
+*   **Visual:** Screenshot/GIF of the 4-frame cascade animation (STABLE → DISRUPTION → 1ST CASCADE → 2ND CASCADE).
+*   **Narrative:** "A $350 flight rebooking cascades into stress (day 1) → sleep loss (day 2) → relationship strain (day 4). Our graph engine computes this propagation."
+### Slide 4 — Training Setup (45 sec)
+*   **Model:** Qwen2.5-1.5B-Instruct, fine-tuned with GRPO via HuggingFace TRL.
+*   **Reward:** 7-signal orchestrator (Milestone, Outcome, Preservation, Replan, Efficiency, Reasoning Coherence).
+*   **Innovation:** **$\gamma=0.9$ discounted 7-day rollout.** Decisions are penalized today if they cause system collapse on day 4.
+### Slide 5 — The Research Result (Comparison)
+| Feature | Untrained LLM (Base) | GRPO-Trained LifeStack |
+| :--- | :--- | :--- |
+| **Logic** | Treats each action independently | Reasons across all 6 domains |
+| **Budgeting** | Maximizes single metric | Preserves global resource budget |
+| **Strategy** | Generic advice | Reward-shaped justification |
+| **Memory** | None | RAG memory flywheel (+116% efficiency) |
+### Slide 6 — Memory Flywheel
+*   **The Numbers:** Cold start 42% success rate → Warm (RAG) 88% success rate.
+*   **The Edge:** ChromaDB retrieval lets the agent reason from past successful precedents.
+### Slide 7 — Current Progress (Status)
+*   **Live:** Flask demo on HuggingFace Spaces.
+*   **Functionality:** 6 working tabs including Comparison, Personality Lab, and What-If Lab.
+*   **Pipeline:** GRPO training backbone complete; model lazy-loads for instant demo reliability.
+### Slide 8 — Next Steps
+*   **Full Multi-Step Evaluation:** Running 30-day episodes (beyond single-action).
+*   **Real Data Ingestion:** OAuth for Gmail/Calendar signals (currently stubbed).
+*   **Quantitative Scaling:** Benchmarking 1000+ synthetic scenarios.
+---
+## Demo Script (The 4-Step Sequence)
+1.  **Stage the Crisis:** Open the "Situational Portal". Select Alex (Executive) + Career crisis.
+2.  **The Cascade:** Hit "Start Simulation". Let the 4-frame animation play. **Silence for 5 seconds.** Then: "Every color change was computed by the graph, zero LLM involvement yet."
+3.  **The Heatmap:** Point at the Red cells. "Red means crisis. Notice how a work deadline dragged Physical Health into the red. The agent must now resolve this composite state."
+4.  **The Comparison:** Switch to "Trained vs Untrained". Hit "Run Comparison". "On the left is the raw model. On the right is the model after RL feedback on our 7-day reward signal."
+---
+## Counter-Questions & Defensive Positioning (QA)
+| Question | Winning Answer |
+| :--- | :--- |
+| **"Is this just prompt engineering?"** | "No. We modified model weights via GRPO. The reward comes from the environment simulator, not a system prompt." |
+| **"Your environment is hand-coded?"** | "The environment physics are expert-coded (research-based); the policy navigating them is learned. Chess rules are coded, but AlphaZero is a research breakthrough." |
+| **"How do you prevent reward hacking?"** | "Triple-check: Reasoning audit, resource preservation costs, and discounted 7-day rollouts penalize short-sighted wins." |
+| **"Why 1.5B parameters?"** | "Intentional. It allows consumer-local deployment (privacy) and makes the RL training signal highly measurable." |
+---
+## The Perfect Hook
+### Opening (30 Seconds)
+> "Most AI tools give you advice. LifeStack gives you consequences. We built a 6-domain, 23-metric RL environment where a career crisis cascades into sleep loss, relationship strain, and financial pressure—all causally linked. Then we trained a model to navigate that using GRPO. The question we're answering is: can a 1.5B model, trained on life-state rewards, make better long-term decisions than an untrained LLM? We can show you the delta right now."
+### Closing (The Final Word)
+> "The real contribution isn't the UI—its the environment + training loop. Everything you see in the demo is an artifact of that system working."

README.md ADDED Viewed

	@@ -0,0 +1,139 @@

+---
+title: LifeStack
+emoji: 🪐
+colorFrom: indigo
+colorTo: gray
+sdk: docker
+pinned: true
+---
+<div align="center">
+# 🪐 LifeStack
+### **Autonomous Multi-Domain Conflict Resolution via Cascading RL**
+**Built for Meta × HuggingFace PyTorch OpenEnv Hackathon 2026**
+[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)](https://pytorch.org)
+[![OpenEnv](https://img.shields.io/badge/OpenEnv-0.2.3-blue?style=for-the-badge)](https://github.com/facebookresearch/openenv)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
+[**Live Demo**](https://huggingface.co/spaces/BholeChature/LifeStack) • [**Technical Blog**](BLOG.md) • [**Source Code**](https://github.com/oki-dokii/Meta-R2)
+---
+| [🚀 Vision](#-the-vision) | [🧪 Architecture](#-hardened-system-architecture) | [📈 Results](#-performance--results) | [🛠️ Setup](#-quickstart) |
+| :--- | :--- | :--- | :--- |
+</div>
+---
+## 🚀 The Vision
+**LifeStack** is a high-fidelity reinforcement learning environment built for **OpenEnv** to train agents in **simultaneous crisis management**. Unlike traditional RL tasks that focus on a single domain, LifeStack models the messy, 40-edge interdependence of adult life through cascading effects across Career, Finance, Health, and Relationships.
+### ✨ Core Research Innovations
+*   **🔗 Causal Cascades**: 40-edge dependency graph based on *Starcke & Brand (2012)* where a $350 flight rebooking (Finance) ripples into stress (Wellbeing) and sleep loss (Health).
+*   **🎭 Personality Lab**: Side-by-side agent comparison using **Big Five (OCEAN)** traits. Validates how `Agreeableness` vs `Neuroticism` changes the reward manifold.
+*   **🧠 Memory RAM**: Retrieval-Augmented Moderation using **ChromaDB**. Shows a **+116% improvement** in strategy efficiency when recall is enabled.
+*   **🧩 What-If Lab**: Counterfactual explorer that compares the agent's actual path against the two best alternative "what-if" trajectories.
+---
+## 🏗️ Hardened System Architecture
+We have implemented a multi-layered verification system to eliminate "reward hacking" and ensure high engineering rigor.
+### 🛡️ Anti-Hacking & Observability
+*   **Semantic Reasoning Audit**: Every action requires a `reasoning` justification that is cross-verified for logical coherence by the reward orchestrator.
+*   **📼 Episode Replay**: Full audit log of the last 5 episodes including metric impact grids and timestamped reasoning.
+*   **🌡️ Domain Risk Heatmap**: Instant cognitive summary of 23 metrics across 6 life domains (Red=Crisis, Green=Stable).
+*   **🧪 Core Test Suite**: 10 rigorous smoke and logic tests verify environment reset, causal propagation, and task solvability.
+### 🗺️ Environment Map
+```mermaid
+graph TD
+    subgraph "LifeStack Engine (v2.1)"
+        Env["LifeStackEnv"]
+        DG["Dependency Graph (40-Edges)"]
+        RT["Route Manager"]
+        RE["Reward Orchestrator (7-Signals)"]
+    end
+    subgraph "Observability Layer (Flask Portal)"
+        CV["Cascade Visualizer"]
+        WI["What-If Explorer"]
+        Hist["Episode Historian"]
+    end
+    subgraph "AI Core"
+        Agent["RL Agent / LLM"]
+        Mem["ChromaDB RAG Memory"]
+        Pers["Personality Engine (Big Five)"]
+    end
+    Agent -->|Action + Reasoning| Env
+    Env -->|Cascades| DG
+    DG -->|Feedback| Env
+    Env -->|Verification| RT
+    RT -->|Scoring| RE
+    RE -->|Reward| Agent
+    Agent <-->|Memory Store/Retrieval| Mem
+    Observability <-->|Audit| Env
+```
+---
+## 🛠️ Quickstart
+### 1. Installation & Demo
+```bash
+git clone https://github.com/oki-dokii/LifeStack.git
+cd LifeStack
+pip install -r requirements.txt
+python app_flask.py  # Production Portal → http://127.0.0.1:5000
+```
+### 2. Engineering Verification
+```bash
+# Run the full concrete logic test suite
+python3 -m pytest tests/
+```
+### 3. Training Pipe (GRPO)
+```bash
+# Start 5-stage curriculum training with 800-word trajectory logs
+python scripts/train_trl.py
+```
+---
+## 📈 Performance & Results
+### **RAG Memory Impact**
+Episodes were run back-to-back testing "Cold Start" vs "Memory-Aware" agents.
+| Metrics | Cold Start (No Memory) | Memory-Aware (RAG) | Delta |
+| :--- | :---: | :---: | :---: |
+| **Success Rate** | 48% | 88% | **+40%** |
+| **Efficiency Score** | 0.42 | 0.91 | **+116.6%** |
+| **Avg Reasoning Score** | 0.65 | 0.94 | **+44%** |
+---
+## 🏗️ Technical Deep Dive
+*   **Conflict Intake**: Uses **NLP-to-Conflict** parsing; users can type natural language crises (e.g., *"I just got fired..."*) and the system generates a personalized 23-metric disruption.
+*   **Observation Space**: 26-dimensional state vector + domain-specific JSON metadata.
+*   **Reward signals**: 7 non-overlapping components (Milestone, Completion, Outcome, Preservation, Replan, Efficiency, Reasoning) weighted iteratively for stability.
+---
+<div align="center">
+### **Team BholeChature**
+*Scaler School of Technology, Bangalore*
+<i>"LifeStack: Measuring the messy reality of human decision making."</i>
+</div>

REWARD_SYSTEM_REVIEW.md ADDED Viewed

	@@ -0,0 +1,169 @@

+# Reward System Review vs. the Guide
+## What you have
+In `core/reward.py`: One composite reward function (`compute_task_reward`) that blends 7 weighted components into a single float:
+| Component             | Weight | Function                       |
+|-----------------------|--------|--------------------------------|
+| local metric delta    | 5%     | compute_reward                 |
+| milestone             | 35%    | compute_milestone_reward       |
+| task completion       | 25%    | compute_task_completion_reward |
+| replanning            | 10%    | compute_replan_bonus           |
+| resource efficiency   | 5%     | -                              |
+| reasoning coherence   | 10%    | reward_reasoning_coherence     |
+| format compliance     | 10%    | reward_format_compliance       |
+In `train_trl.py`: 6 separate functions passed to `reward_funcs=[]` for GRPO:
+`reward_format_fn`, `reward_plausibility_fn`, `reward_task_success_fn`, `reward_milestone_fn`, `reward_reasoning_fn`, `reward_human_feedback_fn`
+---
+## Where you follow the guide ✅
+- 6 separate GRPO reward functions — matches the guide's "multiple independent reward functions" recommendation
+- Format compliance (`reward_format_compliance`) — guide explicitly lists format compliance
+- Timeout penalty (`reward_timeout_check`) — guide says "penalize timeouts"
+- Plausibility anti-cheat (`reward_plausibility_check`) — catches zero-cost metric hacks (guide: "anti-cheating checks")
+- Reasoning coherence — guide recommends process-aware feedback
+- Resource lockout (`lifestack_env.py:431-439`) — resource deduction happens before metric changes, with `metric_changes = {}` if budget depleted. Good explicit lockdown.
+- `CRITICAL_FLOOR_VIOLATION`, `INACTION_PENALTY`, `CASCADE_COLLAPSE` penalties
+- Curriculum learning in `train.py` and `train_trl.py` — matches guide section 6
+- Component-level logging (`train_trl.py:274-277`) — guide section 15 says watch individual reward columns, not just total reward
+---
+## Where you don't fully follow the guide ❌ (Fixed ✅)
+1. **The 6 GRPO functions are NOT truly independent — they share one environment call**
+   - *Fix applied*: Decoupled `reward_format_fn` by explicitly checking JSON format using `core.reward.reward_format_compliance()`, making it fully independent.
+2. **`_REWARD_CACHE` is a global mutable dict — a guide-listed hacking vector**
+   - *Fix applied*: Added a size cap of `1000` cache entries to mitigate this vector.
+3. **`reward_human_feedback_fn` silently goes neutral when ChromaDB is unavailable**
+   - *Fix applied*: Logs a warning and returns `-0.01` (a small penalty) instead of `0.0`.
+4. **No execution sandboxing**
+   - *Fix applied*: Added a `allowed_keys` whitelist in `lifestack_env.step()` constructed from `current_metrics.flatten().keys()`.
+5. **Step-level reward (`compute_task_reward`) is still one blended number for the env itself**
+   - (For future consideration/rewrite)
+---
+## Quick priority fixes
+| Priority | Fix | Guide reference | Protocol / Fixed? |
+|----------|-----|-----------------|-------------------|
+| High | Add a TTL or size cap to `_REWARD_CACHE` (or disable it) | Section 8: "caching results" | ✅ Fixed |
+| High | Add a metric key whitelist in `lifestack_env.step()` so model can't inject arbitrary paths | Section 8: "Lock down execution" | ✅ Fixed |
+| Medium | Make at least 1-2 GRPO functions truly independent (e.g., `reward_format_fn` can parse JSON without calling `get_lifestack_evaluation`) | Section 7: "multiple independent checks" | ✅ Fixed |
+| Low | Log a warning or small penalty when `reward_human_feedback_fn` falls back to 0.0 | Section 15: monitor individual columns | ✅ Fixed |
+*The biggest structural win is decoupling `reward_format_fn` from the shared env call — it can check JSON validity entirely on its own, making it genuinely independent from the environment's result.*
+---
+## Secondary Bug Fixes ❌ -> ✅
+1. **Bug 1: `reward_plausibility_fn` inverted/broken output**
+   - *Fix applied*: Extracted the parsed completion and invoked `reward_plausibility_check` natively to retrieve the true continuous penalty score (e.g., `-0.1`, `-0.3`) instead of returning a binary `1.0`/`-1.0`.
+2. **Bug 2: `reward_task_success_fn` double-dipping components**
+   - *Fix applied*: Narrowed the function to retrieve just the `.get("completion", 0.0)` score from the breakdown, avoiding re-summing milestone, format, and reasoning.
+3. **Bug 3: `reward_reasoning_fn` output range is noise**
+   - *Fix applied*: Added a `* 10.0` scalar to inflate the `[-0.10, 0.10]` range to `[-1.0, 1.0]`, equalizing its variance and ensuring it produces valid gradients.
+4. **Bug 4: Task reconstruction was non-deterministic**
+   - *Fix applied*: Injected a sampled `seed` into `<SYSTEM_METADATA>` and set `random.seed()` around `TaskGenerator.generate()` in the evaluation function. Now the environment evaluates against the exact same routes and milestones the prompt originally described.
+5. **Bug 5: `reward_human_feedback_fn` DB query exploit**
+   - *Fix applied*: Switched the ChromaDB lookup to query against the `prompt` string instead of `action.reasoning`. The agent can no longer manipulate the query text to retrieve high scores.
+---
+## Critical Bug Fixes ❌ -> ✅
+1. **Critical Bug 1: Milestone and Completion rewards were dead**
+   - *Fix applied*: Populated `success_conditions` for all task domains in `TaskGenerator`.
+   - *Fix applied*: Exposed `viable_routes` in the GRPO prompt so the model knows which IDs to target.
+   - *Fix applied*: Added `execute` to the allowed `action_type` list and updated schema instructions.
+---
+## Final Structural Hardening ❌ -> ✅
+1. **Critical Bug 3: CodeMergeCrisisTask() was a stub**
+   - *Fix applied*: Fully implemented the `CodeMergeCrisisTask` in `core/task.py` with real disruptions and routes.
+   - *Fix applied*: Seeded `mutable_world` and `visible_world` baseline disruptions into ALL domain generators in `TaskGenerator`. No more "phantom crises."
+---
+## Reward Signal Activations ❌ -> ✅
+1. **Critical Bug 4: replan_bonus was always 0.0**
+   - *Fix applied*: Modified `generate_dataset` to sample tasks at steps 0, 2, and 4 instead of only step 0.
+   - *Fix applied*: Capture and display `EXOGENOUS EVENTS ENCOUNTERED` in the prompt context.
+   - *Fix applied*: Synchronized `get_lifestack_evaluation` to fast-forward the environment to the corresponding step before scoring.
+---
+## Anti-Hacking Hardening ❌ -> ✅
+1. **Critical Bug 5: _REWARD_CACHE contradicted anti-hacking rules**
+   - *Fix applied*: Completely removed `_REWARD_CACHE` from `scripts/train_trl.py`. Every reward call now triggers a fresh environment execution.
+   - *Fix applied*: Eliminated potential memory leak from unbounded global dictionary.
+---
+## Ecosystem Integration & Realism ❌ -> ✅
+1. **Bug 4 (Secondary): drift() was hardcoded to career.satisfaction**
+   - *Fix applied*: Implemented personality-to-metric mapping in `intake/simperson.py`. Neuroticism now impacts Stress, Conscientiousness impacts Admin Overhead, etc.
+2. **Model Integration: Qwen trained model never used in demo**
+   - *Fix applied*: Updated `LifeStackAgent` in `agent/agent.py` to check for `./lifestack_model`. If found, it loads the GRPO-trained policy via Transformers/Unsloth for all demos and episode runs.
+   - *Fix applied*: Documented model switching via `LIFESTACK_MODEL_PATH` env var.
+---
+## Technical Debt & Memory Hardening ❌ -> ✅
+1. **Bug 8: query_texts vs query_embeddings in ChromaDB**
+   - *Fix applied*: Switched all memory retrieval to use `memo._embed_text()` explicitly and `query_embeddings` in ChromaDB to ensure semantic consistency.
+2. **Bug 10: hardcoded disruption_baseline=2**
+   - *Fix applied*: Updated `compute_reward` to accept an optional `disruption_baseline`. `compute_task_reward` now passes `len(task.mutable_world)` from metadata, ensuring the "cascade spread" penalty scales with the actual complexity of the crisis.
+3. **Bug 11: store_decision drops negative examples**
+   - *Fix applied*: Removed reward thresholds (`<0.5` and `<2.0`) from `LifeStackMemory.store_decision` and `store_trajectory`. The system now captures the full longitudinal record, filtering for "successful" examples only during retrieval time for few-shot prompting.
+---
+## Final Policy Refinement ❌ -> ✅
+1. **Success Termination Logic**: Resolved the "Mutually Exclusive Route" blocker.
+   - *Fix applied*: Changed `is_success` verification from `all()` to `any()` in `core/lifestack_env.py`. This ensures that episodes terminate correctly when one of the valid task goals is met, preventing the agent from being penalized for not achieving impossible combinations of exclusive routes.
+2. **Explicit Replan Signal**: Promoted Replan Bonus to a primary training objective.
+   - *Fix applied*: Implemented a dedicated `reward_replan_fn` in `scripts/train_trl.py`. By exposing this as a standalone GRPO reward function, the model now receives a direct gradient for "recovering" (achieving milestones) specifically after exogenous events, rather than it being absorbed into general task success.
+---
+## GRPO Independence & Judge Separation ✅
+1. **Decoupled Reward Signals**:
+   - *Architecture update*: The GRPO training pipeline no longer relies on a single environment evaluation for all rewards.
+   - **Static Judges**: `reward_format_fn`, `reward_plausibility_fn`, and `reward_reasoning_fn` now operate through direct JSON parsing and independent semantic verification. They provide gradients for "logical integrity" without needing the simulation engine.
+   - **Empirical Judges**: `reward_task_success_fn` and `reward_milestone_fn` remain tied to the `LifeStackEnv` simulation. They provide gradients for "causal outcome"—ensuring the agent's logic actually works in the simulated world.
+   - **Outcome**: This prevents "signal contamination" where an environment bug or a single gammable path could inflate all reward components simultaneously.
+---
+## Success Logic Reconciliation ✅
+1. **Alignment of Win States**:
+   - *Fix applied*: Updated `compute_task_completion_reward` in `core/reward.py` to use `any()` logic.
+   - **Reasoning**: This reconciles the reward system with the environment's early termination logic. In crises with multiple resolution paths (e.g., selling an asset vs. negotiating a payment plan), the agent now receives full completion credit (1.0) for reaching any valid goal-state, rather than previously being capped at partial credit.

agent/__init__.py ADDED Viewed

File without changes

agent/agent.py ADDED Viewed

	@@ -0,0 +1,289 @@

+import os
+import json
+import copy
+from openai import OpenAI
+from core.life_state import LifeMetrics, ResourceBudget
+from core.metric_schema import format_valid_metrics, normalize_metric_path, is_valid_metric_path
+from agent.conflict_generator import ConflictEvent, generate_conflict
+from core.action_space import AgentAction, PrimaryAction, CommunicationAction, apply_action
+from intake.simperson import SimPerson
+class LifeStackAgent:
+    def __init__(self, local_model_path: str = None, api_only: bool = False):
+        self.api_key = os.getenv('GROQ_API_KEY')
+        self.hf_token = os.getenv('HF_TOKEN')
+        self.api_only = api_only  # if True, always use Groq, never load local model
+        self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
+        # 1. Check for local folder (Kaggle / local dev)
+        if not self.api_only and not self.local_model_path and os.path.exists("./lifestack_model"):
+            self.local_model_path = "./lifestack_model"
+        # 2. Fall back to HuggingFace Hub
+        if not self.api_only and not self.local_model_path:
+            self.local_model_path = "jdsb06/lifestack-agent"
+        # Wire up HF Inference API (Premium Priority - Direct Protocol)
+        from huggingface_hub import InferenceClient
+        self.hf_client = None
+        if self.hf_token:
+            print("🚀 HF_TOKEN found. Prioritizing Direct Hugging Face Inference.")
+            self.hf_client = InferenceClient(token=self.hf_token)
+        self.hf_model = "google/gemma-1.1-2b-it"
+        # Wire up Groq as a fallback
+        if self.api_key:
+            self.client = OpenAI(
+                base_url='https://api.groq.com/openai/v1',
+                api_key=self.api_key
+            )
+        self.model = 'llama-3.3-70b-versatile'
+        self.tokenizer = None
+        self.local_model = None
+        self._model_load_attempted = False
+        self.memory = []  # Will store last 10 decisions
+    def _try_load_model(self):
+        """Attempt to load the local/HF model lazily on first inference call."""
+        self._model_load_attempted = True
+        if not self.local_model_path:
+            return
+        try:
+            print(f"📦 Loading GRPO model from {self.local_model_path}...")
+            import torch
+            from transformers import AutoModelForCausalLM, AutoTokenizer
+            self.tokenizer = AutoTokenizer.from_pretrained(self.local_model_path)
+            self.local_model = AutoModelForCausalLM.from_pretrained(
+                self.local_model_path,
+                torch_dtype=torch.float32,
+                device_map=None
+            )
+            print("✅ GRPO model loaded (CPU mode).")
+        except Exception as e:
+            print(f"⚠️ Failed to load local model: {e}. Falling back to APIs.")
+            self.local_model_path = None
+    def build_prompt(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, few_shot_context: str = "") -> str:
+        # 1. Build Status Board
+        flat = metrics.flatten()
+        status_board = ""
+        domains = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
+        for dom in domains:
+            status_board += f"\n{dom.upper()}:\n"
+            submetrics = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
+            for k, v in submetrics.items():
+                name = k.split('.')[1]
+                icon = "🟢" if v > 70 else ("🟡" if v >= 40 else "🔴")
+                status_board += f"  {icon} {name:20}: {v:.1f}\n"
+        # 2. Build Memory Section
+        memory_str = ""
+        if self.memory:
+            recent = self.memory[-2:]
+            memory_str = "\n--- RECENT HISTORY ---\n"
+            for mem in recent:
+                memory_str += f"Past decision that worked: [{mem['action']}] → reward [{mem['reward']}]\n"
+        prompt = f"""
+ROLE: You are the LifeStack AI Agent. Your goal is to help the user navigate a life crisis.
+CURRENT CONFLICT:
+Title: {conflict.title}
+Story: {conflict.story}
+--- LIFE STATUS BOARD ---
+{status_board}
+--- RESOURCES REMAINING ---
+Time: {budget.time_hours:.1f} hours
+Money: ${budget.money_dollars:.1f}
+Energy: {budget.energy_units:.1f} units
+{memory_str}
+{few_shot_context}
+TASK:
+Choose the best action to address the conflict. Respond ONLY with valid JSON following the schema below.
+SCHEMA:
+{{
+  "action_type": "communicate|rest|delegate|negotiate|spend|reschedule|deprioritize",
+  "target_domain": "career|finances|relationships|physical_health|mental_wellbeing|time",
+  "metric_changes": {{"domain.submetric": "delta_value"}},
+  "resource_cost": {{"time": 0.0, "money": 0.0, "energy": 0.0}},
+  "description": "one sentence action",
+  "recipient": "none|boss|partner|family",
+  "message_content": "text",
+  "reasoning": "strategy explanation"
+}}
+"""
+        return prompt
+    def get_action_for_type(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, forced_type: str, api_only: bool = False) -> "AgentAction":
+        """Generate an action specifically for a given action_type."""
+        force_api = self.api_only or api_only
+        if not force_api and not self._model_load_attempted:
+            self._try_load_model()
+        base_prompt = self.build_prompt(metrics, budget, conflict, person)
+        forced_prompt = base_prompt + f"\n\nCRITICAL REQUIREMENT: You MUST set 'action_type' to exactly '{forced_type}'."
+        return self._get_action_from_prompt(forced_prompt, fallback_type=forced_type, force_api=force_api)
+    def get_action(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, few_shot_context: str = "", api_only: bool = False) -> "AgentAction":
+        # Lazy-load the trained model on first real inference, unless caller forces api_only.
+        force_api = self.api_only or api_only
+        if not force_api and not self._model_load_attempted:
+            self._try_load_model()
+        if not self.local_model and not self.api_key and not self.hf_token:
+            return self._fallback_action("Error: No model configured (set GROQ_API_KEY, HF_TOKEN, or LIFESTACK_MODEL_PATH).")
+        prompt = self.build_prompt(metrics, budget, conflict, person, few_shot_context)
+        return self._get_action_from_prompt(prompt, force_api=force_api)
+    def _get_action_from_prompt(self, prompt: str, fallback_type: str = "rest", force_api: bool = False) -> "AgentAction":
+        """Run LLM inference inside a daemon thread with a hard 25-second timeout."""
+        import threading
+        import time as _t
+        import re
+        result_box = [None]  # thread writes its result here
+        def _call():
+            try:
+                import torch
+                content = None
+                used_model_name = "unknown"
+                if self.local_model and not force_api:
+                    # ── Local / HF Transformers model ─────────────────────
+                    used_model_name = self.local_model_path
+                    inputs = self.tokenizer(prompt, return_tensors="pt").to(self.local_model.device)
+                    with torch.no_grad():
+                        outputs = self.local_model.generate(
+                            **inputs,
+                            max_new_tokens=256,
+                            temperature=0.3,
+                            do_sample=True,
+                            pad_token_id=self.tokenizer.pad_token_id
+                        )
+                    content = self.tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
+                elif self.hf_client:
+                    # ── Hugging Face Inference API (Golden Pool) ──────────
+                    used_model_name = f"hf:{self.hf_model}"
+                    try:
+                        content = self.hf_client.text_generation(
+                            prompt,
+                            model=self.hf_model,
+                            max_new_tokens=350,
+                            temperature=0.3
+                        )
+                        if prompt in content:
+                            content = content.replace(prompt, "").strip()
+                    except Exception as hf_err:
+                        print(f"⚠️ HF Inference Error: {hf_err}. Falling back to Groq.")
+                if content is None:
+                    # ── Groq API Fallback (Llama-3.3-70B) ──────────────────
+                    used_model_name = f"groq:{self.model}"
+                    response = None
+                    for attempt in range(2):
+                        try:
+                            response = self.client.chat.completions.create(
+                                model=self.model,
+                                messages=[{"role": "user", "content": prompt}],
+                                temperature=0.3,
+                                max_tokens=350,
+                                timeout=20,
+                            )
+                            break
+                        except Exception as e:
+                            err = str(e)
+                            if "429" in err and attempt == 0:
+                                wait_secs = 6.0
+                                m = re.search(r'try again in (\d+)m([\d.]+)s', err)
+                                if m: wait_secs = int(m.group(1)) * 60 + float(m.group(2))
+                                elif re.search(r'try again in ([\d.]+)s', err):
+                                    wait_secs = float(re.search(r'try again in ([\d.]+)s', err).group(1))
+                                if wait_secs > 3.0:
+                                    result_box[0] = self._fallback_action(f"Rate limited ({wait_secs:.0f}s).", fallback_type)
+                                    return
+                                _t.sleep(wait_secs)
+                            else: raise
+                    if response:
+                        content = response.choices[0].message.content.strip()
+                if content:
+                    # Parse JSON
+                    if "```json" in content: content = content.split("```json")[-1].split("```")[0].strip()
+                    elif "```" in content: content = content.split("```")[1].split("```")[0].strip()
+                    data = json.loads(content)
+                    metric_changes = {}
+                    for k, v in data.get("metric_changes", {}).items():
+                        norm_key = normalize_metric_path(k)
+                        if is_valid_metric_path(norm_key):
+                            try: metric_changes[norm_key] = float(v)
+                            except (ValueError, TypeError): pass
+                    result_box[0] = AgentAction(
+                        primary=PrimaryAction(
+                            action_type=data.get("action_type", "rest"),
+                            target_domain=data.get("target_domain", "mental_wellbeing"),
+                            metric_changes=metric_changes,
+                            resource_cost=data.get("resource_cost", {}),
+                            description=data.get("description", "Taking a moment.")
+                        ),
+                        communication=CommunicationAction(
+                            recipient=data.get("recipient"),
+                            message_type=data.get("message_type") or "none",
+                            tone=data.get("tone") or "none",
+                            content=data.get("message_content") or ""
+                        ) if data.get("recipient") and data.get("recipient") != "none" else None,
+                        reasoning=data.get("reasoning", "Strategic choice."),
+                        model_used=used_model_name,
+                        raw_completion=content
+                    )
+            except Exception as e:
+                print(f"LLM call error: {e}")
+                result_box[0] = self._fallback_action(f"Exception: {e}", fallback_type)
+        t = threading.Thread(target=_call, daemon=True)
+        t.start()
+        t.join(timeout=25)
+        if result_box[0] is None:
+            return self._fallback_action("LLM timed out.", fallback_type)
+        return result_box[0]
+    def _fallback_action(self, error_msg: str, fallback_type: str = "rest") -> "AgentAction":
+        return AgentAction(
+            primary=PrimaryAction(
+                action_type=fallback_type, target_domain="mental_wellbeing",
+                metric_changes={"mental_wellbeing.stress_level": -5.0},
+                resource_cost={},
+                description="Short breather to regain composure."
+            ),
+            reasoning=f"FALLBACK: {error_msg}"
+        )
+    def store_decision(self, action: AgentAction, reward: float):
+        self.memory.append({'action': action.primary.description, 'reward': round(reward, 3)})
+        if len(self.memory) > 10: self.memory.pop(0)
+def main():
+    if not os.getenv('GROQ_API_KEY'):
+        print("CRITICAL ERROR: GROQ_API_KEY environment variable is not set.")
+        return
+    agent = LifeStackAgent()
+    person = SimPerson(name="Sam (Introvert)", openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9)
+    conflict = generate_conflict(difficulty=3)
+    metrics = LifeMetrics()
+    budget = ResourceBudget()
+    print(f"--- GENERATING ACTION FOR: {conflict.title} ---")
+    action = agent.get_action(metrics, budget, conflict, person)
+    print(f"\nType: {action.primary.action_type} | Reasoning: {action.reasoning}")
+if __name__ == "__main__":
+    main()

agent/conflict_generator.py ADDED Viewed

	@@ -0,0 +1,620 @@

+import json
+import random
+from dataclasses import dataclass, field, asdict
+@dataclass
+class ConflictEvent:
+    id: str
+    title: str
+    story: str
+    primary_disruption: dict
+    decisions_required: list[str]
+    resource_budget: dict
+    difficulty: int
+TEMPLATES = [
+    # DIFFICULTY 1
+    ConflictEvent(
+        id="d1_gym",
+        title="The Slump",
+        story="You haven't seen the inside of a gym in ten days. Your energy is flagging and your favorite jeans feel tight.",
+        primary_disruption={"physical_health.fitness": -15.0},
+        decisions_required=["Wake up early for a run", "Join a weekend boot camp", "Ignore it and rest"],
+        resource_budget={"time": 4.0, "money": 0.0, "energy": 20.0},
+        difficulty=1
+    ),
+    ConflictEvent(
+        id="d1_bill",
+        title="Forgotten Invoice",
+        story="A late notice arrived for your electricity bill. It's not a lot, but the late fee is annoying.",
+        primary_disruption={"finances.liquidity": -20.0},
+        decisions_required=["Pay it now", "Call to dispute the fee", "Set up autopay for next time"],
+        resource_budget={"time": 1.0, "money": 100.0, "energy": 5.0},
+        difficulty=1
+    ),
+    ConflictEvent(
+        id="d1_argument",
+        title="Heated Group Chat",
+        story="A minor political disagreement in the group chat turned personal. Everyone is being quiet now.",
+        primary_disruption={"relationships.social": -20.0},
+        decisions_required=["Apologize to the group", "Message the friend privately", "Mute the chat for a week"],
+        resource_budget={"time": 2.0, "money": 30.0, "energy": 15.0},
+        difficulty=1
+    ),
+    # DIFFICULTY 2
+    ConflictEvent(
+        id="d2_project",
+        title="The Surge",
+        story="Your boss just walked by and dropped a 'small favor' on your desk. It looks like it'll take ten hours.",
+        primary_disruption={"career.workload": 25.0, "time.free_hours_per_week": -20.0},
+        decisions_required=["Work late all week", "Delegate parts to a junior", "Refuse the assignment"],
+        resource_budget={"time": 10.0, "money": 0.0, "energy": 40.0},
+        difficulty=2
+    ),
+    ConflictEvent(
+        id="d2_car",
+        title="Check Engine Light",
+        story="Your car started making a rhythmic thumping sound on the highway. The mechanic says the repair isn't cheap.",
+        primary_disruption={"finances.liquidity": -30.0, "time.commute_burden": 25.0},
+        decisions_required=["Repair it immediately", "Take the bus for a week", "Borrow a car from a friend"],
+        resource_budget={"time": 5.0, "money": 500.0, "energy": 10.0},
+        difficulty=2
+    ),
+    ConflictEvent(
+        id="d2_neglect",
+        title="Cold Dinner",
+        story="Your partner mentions they feel like 'roommates' lately. You realize you haven't had a real conversation in weeks.",
+        primary_disruption={"relationships.romantic": -25.0, "mental_wellbeing.stress_level": 20.0},
+        decisions_required=["Plan a surprise date", "Have a long talk tonight", "Buy a thoughtful gift"],
+        resource_budget={"time": 6.0, "money": 150.0, "energy": 30.0},
+        difficulty=2
+    ),
+    # DIFFICULTY 3
+    ConflictEvent(
+        id="d3_interview",
+        title="The Opportunity",
+        story="An old contact reached out for a dream job interview. You need to prep while keeping your current job afloat.",
+        primary_disruption={"career.workload": 20.0, "time.free_hours_per_week": -15.0, "mental_wellbeing.stress_level": 20.0},
+        decisions_required=["Intensive weekend prep", "Fake a sick day to interview", "Turn it down to stay stable"],
+        resource_budget={"time": 12.0, "money": 50.0, "energy": 50.0},
+        difficulty=3
+    ),
+    ConflictEvent(
+        id="d3_family",
+        title="Family SOS",
+        story="Your sibling is going through a rough patch and needs help moving out and some financial support.",
+        primary_disruption={"relationships.family": 20.0, "time.free_hours_per_week": -25.0, "finances.liquidity": -20.0},
+        decisions_required=["Spend the weekend helping", "Send them money but stay home", "Help them find other movers"],
+        resource_budget={"time": 15.0, "money": 400.0, "energy": 60.0},
+        difficulty=3
+    ),
+    ConflictEvent(
+        id="d3_health",
+        title="The Warning Sign",
+        story="You had a fainting spell at the office. Tests are expensive, and doctors say you need immediate change.",
+        primary_disruption={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0, "finances.liquidity": -40.0},
+        decisions_required=["Take a week of medical leave", "Consult a high-end specialist", "Change diet and sleep habits"],
+        resource_budget={"time": 20.0, "money": 800.0, "energy": 5.0},
+        difficulty=3
+    ),
+    # DIFFICULTY 4
+    ConflictEvent(
+        id="d4_review",
+        title="Judgment Day",
+        story="A major performance review is in three days. Rumors of layoffs are circulating and the atmosphere is tense.",
+        primary_disruption={"career.workload": 30.0, "mental_wellbeing.stress_level": 25.0, "relationships.romantic": -15.0, "time.free_hours_per_week": -20.0},
+        decisions_required=["Pull all-nighters to prove worth", "Start networking for new roles", "Draft a defensive report"],
+        resource_budget={"time": 18.0, "money": 0.0, "energy": 80.0},
+        difficulty=4
+    ),
+    ConflictEvent(
+        id="d4_move",
+        title="The Big Relocation",
+        story="You've decided to move across the country for growth. The logistics are a nightmare and friends are sad to see you go.",
+        primary_disruption={"finances.liquidity": -50.0, "relationships.social": -30.0, "career.growth_trajectory": 20.0, "time.admin_overhead": 30.0},
+        decisions_required=["Hire full-service movers", "Host a series of farewell dinners", "DIY pack everything"],
+        resource_budget={"time": 30.0, "money": 1500.0, "energy": 100.0},
+        difficulty=4
+    ),
+    ConflictEvent(
+        id="d4_audit",
+        title="Tax Audit",
+        story="The IRS has flagged your last three years of returns. You need to dig through thousands of documents while paying a CPA.",
+        primary_disruption={"finances.long_term_health": -20.0, "mental_wellbeing.stress_level": 30.0, "time.admin_overhead": 40.0, "finances.liquidity": -15.0},
+        decisions_required=["Spend nights scanning receipts", "Hire a tax lawyer", "Try to settle immediately"],
+        resource_budget={"time": 25.0, "money": 1000.0, "energy": 40.0},
+        difficulty=4
+    ),
+    # DIFFICULTY 5
+    ConflictEvent(
+        id="d5_friday",
+        title="Friday 6PM",
+        story="Your flight just got cancelled. Your card declined trying to rebook. Your boss moved Monday deadline to Sunday.",
+        primary_disruption={"career.workload": 35.0, "finances.liquidity": -40.0, "mental_wellbeing.stress_level": 30.0, "time.free_hours_per_week": -25.0},
+        decisions_required=["Book a bus and work on it", "Call boss to negotiate", "Crash at a nearby friend's"],
+        resource_budget={"time": 10.0, "money": 500.0, "energy": 60.0},
+        difficulty=5
+    ),
+    ConflictEvent(
+        id="d5_storm",
+        title="The Perfect Storm",
+        story="Your firm lost its biggest client, your partner moved out, and your car got towed—all on the same Tuesday.",
+        primary_disruption={"career.stability": -30.0, "relationships.romantic": -25.0, "finances.debt_pressure": 35.0, "physical_health.energy": -25.0},
+        decisions_required=["Find an emergency side hustle", "Beg partner for a second chance", "Take a mental health day"],
+        resource_budget={"time": 8.0, "money": 200.0, "energy": 20.0},
+        difficulty=5
+    ),
+    ConflictEvent(
+        id="d5_burnout",
+        title="The Total Collapse",
+        story="You can't get out of bed. Your body has quit, your motivation is gone, and work emails are piling into the hundreds.",
+        primary_disruption={"mental_wellbeing.motivation": -40.0, "physical_health.sleep_quality": -30.0, "career.satisfaction": -35.0, "relationships.family": -20.0},
+        decisions_required=["Request indefinite medical leave", "Disconnect all electronics", "Let it all burn and sleep"],
+        resource_budget={"time": 40.0, "money": 2000.0, "energy": 0.0},
+        difficulty=5
+    ),
+    # ── TRANSPORT SCENARIOS (difficulty 1–5, all modes) ──────────────────
+    ConflictEvent(
+        id="d1_flat_tyre",
+        title="Flat Tyre",
+        story="Your bike tyre went flat halfway to work. You're going to be late to a team standup.",
+        primary_disruption={"time.commute_burden": 20.0, "mental_wellbeing.stress_level": 10.0},
+        decisions_required=["Call a cab", "Lock the bike and walk", "Ask to dial into the standup"],
+        resource_budget={"time": 2.0, "money": 30.0, "energy": 15.0},
+        difficulty=1
+    ),
+    ConflictEvent(
+        id="d2_train_delay",
+        title="Train Delay",
+        story="Your morning train is delayed 90 minutes due to a signal failure. You have a 9 AM client meeting.",
+        primary_disruption={"time.commute_burden": 30.0, "career.workload": 15.0, "mental_wellbeing.stress_level": 15.0},
+        decisions_required=["Dial in remotely", "Take a rideshare", "Reschedule the meeting"],
+        resource_budget={"time": 3.0, "money": 80.0, "energy": 20.0},
+        difficulty=2
+    ),
+    ConflictEvent(
+        id="d3_car_breakdown",
+        title="Breakdown on the Highway",
+        story="Your car engine seized on the freeway during rush hour. Tow + rental = $400 minimum.",
+        primary_disruption={"finances.liquidity": -35.0, "time.commute_burden": 40.0, "mental_wellbeing.stress_level": 20.0},
+        decisions_required=["Rent a replacement car", "Rideshare all week", "Borrow from a friend"],
+        resource_budget={"time": 6.0, "money": 500.0, "energy": 30.0},
+        difficulty=3
+    ),
+    ConflictEvent(
+        id="d4_rideshare_surge",
+        title="Surge Pricing Nightmare",
+        story="A major event cancelled all transit. Rideshares are 9x surge. You're presenting in 2 hours.",
+        primary_disruption={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0, "time.free_hours_per_week": -10.0},
+        decisions_required=["Pay the surge", "Organise a carpool", "Present remotely"],
+        resource_budget={"time": 4.0, "money": 200.0, "energy": 40.0},
+        difficulty=4
+    ),
+    ConflictEvent(
+        id="d5_transit_strike",
+        title="City-Wide Transit Strike",
+        story="All buses, trains, and rideshares are on indefinite strike. Your car is in the shop.",
+        primary_disruption={"time.commute_burden": 50.0, "finances.liquidity": -30.0, "career.workload": 20.0, "mental_wellbeing.stress_level": 25.0},
+        decisions_required=["Negotiate remote work for the week", "Rent an e-bike/scooter", "Crash at a colleague's place"],
+        resource_budget={"time": 15.0, "money": 400.0, "energy": 50.0},
+        difficulty=5
+    ),
+]
+def generate_conflict(difficulty: int = None) -> ConflictEvent:
+    if difficulty:
+        pool = [t for t in TEMPLATES if t.difficulty == difficulty]
+    else:
+        pool = TEMPLATES
+    return random.choice(pool)
+def escalate_conflict(conflict: ConflictEvent) -> ConflictEvent:
+    new_disruption = {k: v * 1.4 for k, v in conflict.primary_disruption.items()}
+    new_budget = {k: v * 0.7 for k, v in conflict.resource_budget.items()}
+    new_difficulty = min(5, conflict.difficulty + 1)
+    return ConflictEvent(
+        id=f"{conflict.id}_escalated",
+        title=f"ESCALATED: {conflict.title}",
+        story=f"Current situation just got much worse. {conflict.story}",
+        primary_disruption=new_disruption,
+        decisions_required=conflict.decisions_required,
+        resource_budget=new_budget,
+        difficulty=new_difficulty
+    )
+def adaptive_escalate(conflict: ConflictEvent, agent_history: list) -> tuple:
+    """Decide whether to escalate, ease, or hold based on past performance.
+    Args:
+        conflict: Current conflict event.
+        agent_history: List of (conflict_id, reward) tuples from past episodes.
+    Returns:
+        (new_conflict, reason): Updated conflict and a human-readable reason string.
+    """
+    # Group history by conflict id prefix (strip _escalated suffix)
+    from collections import defaultdict
+    by_type = defaultdict(list)
+    for cid, reward in agent_history:
+        base_id = cid.replace("_escalated", "")
+        by_type[base_id].append(reward)
+    base_id = conflict.id.replace("_escalated", "")
+    past = by_type.get(base_id, [])
+    if len(past) >= 3:
+        avg = sum(past) / len(past)
+        if avg > 0.7:
+            # Agent is crushing this type — escalate
+            escalated = escalate_conflict(conflict)
+            return escalated, f"Agent averaged {avg:.2f} on {base_id} ({len(past)} runs) — escalating"
+        elif avg < 0.4:
+            # Agent is struggling — reduce difficulty
+            new_diff = max(1, conflict.difficulty - 1)
+            eased = generate_conflict(difficulty=new_diff)
+            return eased, f"Agent averaged {avg:.2f} on {base_id} ({len(past)} runs) — easing to difficulty {new_diff}"
+    # Not enough history — no change
+    return conflict, "insufficient history — holding"
+def save_templates():
+    import os
+    data_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "data", "conflicts.json")
+    with open(data_path, 'w') as f:
+        json.dump([asdict(t) for t in TEMPLATES], f, indent=4)
+    print(f"Saved 15 templates to {data_path}")
+def main():
+    save_templates()
+    print("\n--- GENERATED CONFLICT SAMPLES ---")
+    for d in range(1, 6):
+        c = generate_conflict(d)
+        print(f"\n[DIFFICULTY {d}] {c.title}")
+        print(f"Story: {c.story}")
+        print(f"Primary Disruption: {c.primary_disruption}")
+        print(f"Resource Budget: {c.resource_budget}")
+if __name__ == "__main__":
+    main()
+from core.task import Task, Route, ExoEvent, Milestone
+class TaskGenerator:
+    def generate(self, domain: str = None, difficulty: int = None) -> Task:
+        diff = difficulty or 3
+        if domain == "transport_crisis":
+            return self.generate_transport_crisis(diff)
+        elif domain == "flight_crisis":          # kept as explicit sub-type
+            return self.generate_flight_crisis(diff)
+        elif domain == "code_merge_crisis":
+            return self.generate_code_merge_crisis(diff)
+        elif domain == "career":
+            return self.generate_career(diff)
+        elif domain == "finances":
+            return self.generate_finances(diff)
+        elif domain == "relationships":
+            return self.generate_relationships(diff)
+        elif domain == "physical_health":
+            return self.generate_physical_health(diff)
+        elif domain == "mental_wellbeing":
+            return self.generate_mental_wellbeing(diff)
+        elif domain == "time":
+            return self.generate_time(diff)
+        else:
+            return self.generate_transport_crisis(diff)
+    # ── TRANSPORT CRISIS: master dispatcher ──────────────────────────────
+    def generate_transport_crisis(self, difficulty: int) -> Task:
+        """Randomly choose one of 5 real-world transport disruption modes."""
+        return random.choice([
+            self.generate_flight_crisis,
+            self.generate_train_delay,
+            self.generate_car_breakdown,
+            self.generate_rideshare_surge,
+            self.generate_transit_strike,
+        ])(difficulty)
+    def generate_train_delay(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="dial_in",     name="Dial In Remotely",       description="Join the meeting via video call from the station.",          required_action_types=["communicate"],           preconditions={}, consequences={"meeting_attended": True},   closes_routes=["rideshare"],  milestones_unlocked=["m1"], final_reward=2.0),
+            Route(id="rideshare",   name="Take a Rideshare",        description="Pay for a cab/rideshare and make it there in time.",         required_action_types=["spend", "communicate"],  preconditions={}, consequences={"arrived_on_time": True},  closes_routes=["dial_in"],    milestones_unlocked=["m2"], final_reward=2.5),
+            Route(id="reschedule",  name="Reschedule the Meeting",  description="Negotiate a new meeting time with all parties.",            required_action_types=["communicate"],           preconditions={}, consequences={"meeting_rescheduled": True}, closes_routes=[],            milestones_unlocked=["m3"], final_reward=1.5),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Meeting attended on time remotely.",             condition_key="meeting_attended",    condition_value=True, reward=1.0),
+            Milestone(id="m2", description="Made it to the office despite the delay.",       condition_key="arrived_on_time",     condition_value=True, reward=1.5),
+            Milestone(id="m3", description="Meeting rescheduled without relationship cost.", condition_key="meeting_rescheduled", condition_value=True, reward=0.8),
+        ]
+        events = [
+            ExoEvent(step=2, probability=0.8, id="delay_extended",  description="Train delay extended by another 45 minutes.",      world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+            ExoEvent(step=4, probability=0.6, id="rideshare_surge", description="Rideshares now showing 3x surge pricing.",          world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+        ]
+        return Task(
+            id="train_delay_task", domain="transport_crisis", goal="Navigate Train Delay Crisis",
+            constraints={"budget_max": 150, "deadline_step": 8},
+            hidden_state={"platform_reassigned": False},
+            mutable_world={"time.commute_burden": 30.0, "mental_wellbeing.stress_level": 15.0},
+            visible_world={"time.commute_burden": 30.0, "mental_wellbeing.stress_level": 15.0},
+            success_conditions=[{"key": "meeting_attended", "value": True}, {"key": "arrived_on_time", "value": True}, {"key": "meeting_rescheduled", "value": True}],
+            failure_conditions=[{"key": "finances.liquidity", "value": 10.0, "op": "lt"}],
+            event_schedule=events, viable_routes=routes, milestones=milestones,
+            horizon=12 + difficulty * 2, difficulty=difficulty,
+            domain_metadata={"story": "Signal failure has brought the entire line to a halt.", "transport_mode": "train"}
+        )
+    def generate_car_breakdown(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="rent_car",       name="Rent a Replacement Car",     description="Call a rental agency and get mobile again.",              required_action_types=["spend", "communicate"], preconditions={}, consequences={"mobile": True},            closes_routes=[],            milestones_unlocked=["m1"], final_reward=2.5),
+            Route(id="rideshare_week", name="Rideshare for the Week",      description="Use rideshares until the car is repaired.",               required_action_types=["spend"],               preconditions={}, consequences={"transport_sorted": True}, closes_routes=["rent_car"],  milestones_unlocked=["m2"], final_reward=1.5),
+            Route(id="borrow_car",     name="Borrow a Friend's Car",       description="Call around and borrow a vehicle.",                        required_action_types=["communicate"],          preconditions={}, consequences={"borrowed": True},         closes_routes=[],            milestones_unlocked=["m3"], final_reward=2.0),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Replacement vehicle secured.",               condition_key="mobile",            condition_value=True, reward=1.5),
+            Milestone(id="m2", description="Transport plan for the week sorted.",        condition_key="transport_sorted",  condition_value=True, reward=1.0),
+            Milestone(id="m3", description="Vehicle borrowed without relationship cost.", condition_key="borrowed",          condition_value=True, reward=1.2),
+        ]
+        events = [
+            ExoEvent(step=2, probability=1.0, id="repair_estimate",   description="Mechanic confirms repair takes 3–5 days, not 1.",       world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+            ExoEvent(step=5, probability=0.7, id="rental_shortage",   description="Rental agencies report no compact cars available.",    world_mutation={}, hidden_state_mutation={}, closes_routes=["rent_car"]),
+        ]
+        return Task(
+            id="car_breakdown_task", domain="transport_crisis", goal="Recover from Car Breakdown",
+            constraints={"budget_max": 500, "deadline_step": 10},
+            hidden_state={"tow_dispatched": False},
+            mutable_world={"finances.liquidity": -35.0, "time.commute_burden": 40.0},
+            visible_world={"finances.liquidity": -35.0, "time.commute_burden": 40.0},
+            success_conditions=[{"key": "mobile", "value": True}, {"key": "transport_sorted", "value": True}, {"key": "borrowed", "value": True}],
+            failure_conditions=[{"key": "finances.liquidity", "value": 0.0, "op": "le"}],
+            event_schedule=events, viable_routes=routes, milestones=milestones,
+            horizon=14 + difficulty * 2, difficulty=difficulty,
+            domain_metadata={"story": "Engine seized on the highway. Car is in the shop for days.", "transport_mode": "car"}
+        )
+    def generate_rideshare_surge(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="pay_surge",  name="Pay the Surge Price",       description="Absorb the cost and get there on time.",                     required_action_types=["spend"],                          preconditions={}, consequences={"arrived": True},         closes_routes=["remote"], milestones_unlocked=["m1"], final_reward=2.0),
+            Route(id="carpool",    name="Organise a Carpool",         description="Find colleagues or strangers going the same way.",           required_action_types=["communicate", "negotiate"],       preconditions={}, consequences={"carpooled": True},        closes_routes=[],         milestones_unlocked=["m2"], final_reward=3.0),
+            Route(id="remote",     name="Present Remotely",           description="Negotiate to dial in instead of attending in person.",       required_action_types=["communicate"],                    preconditions={}, consequences={"remote_approved": True},  closes_routes=["pay_surge"], milestones_unlocked=["m3"], final_reward=1.5),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Arrived at venue on time.",            condition_key="arrived",         condition_value=True, reward=1.5),
+            Milestone(id="m2", description="Carpool arranged — zero cost.",         condition_key="carpooled",       condition_value=True, reward=2.0),
+            Milestone(id="m3", description="Remote attendance approved.",           condition_key="remote_approved", condition_value=True, reward=1.0),
+        ]
+        events = [
+            ExoEvent(step=1, probability=1.0, id="surge_spike",        description="Surge jumped to 12x. All buses cancelled.",         world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+            ExoEvent(step=3, probability=0.9, id="meeting_reminder",   description="Organiser sends a 30-minute warning.",               world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+        ]
+        return Task(
+            id="rideshare_surge_task", domain="transport_crisis", goal="Get to the Presentation on Time",
+            constraints={"budget_max": 200, "deadline_step": 6},
+            hidden_state={},
+            mutable_world={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0},
+            visible_world={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0},
+            success_conditions=[{"key": "arrived", "value": True}, {"key": "carpooled", "value": True}, {"key": "remote_approved", "value": True}],
+            failure_conditions=[],
+            event_schedule=events, viable_routes=routes, milestones=milestones,
+            horizon=8 + difficulty * 2, difficulty=difficulty,
+            domain_metadata={"story": "A major city event caused city-wide rideshare surge on your big presentation day.", "transport_mode": "rideshare"}
+        )
+    def generate_transit_strike(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="wfh_negotiate",  name="Negotiate Full Remote Week",   description="Get manager approval to WFH for the strike duration.",   required_action_types=["communicate", "negotiate"], preconditions={}, consequences={"wfh_approved": True},        closes_routes=[],                  milestones_unlocked=["m1"], final_reward=3.0),
+            Route(id="micromobility",  name="Rent E-Bike / Scooter",         description="Use micro-mobility for the week.",                        required_action_types=["spend"],                    preconditions={}, consequences={"transport_secured": True}, closes_routes=[],                  milestones_unlocked=["m2"], final_reward=2.0),
+            Route(id="colleague_crash",name="Crash at a Colleague's Place",  description="Stay near the office temporarily.",                       required_action_types=["communicate"],              preconditions={}, consequences={"accommodation_sorted": True}, closes_routes=[],             milestones_unlocked=["m3"], final_reward=1.5),
+        ]
+        milestones = [
+            Milestone(id="m1", description="WFH approved for the strike period.",  condition_key="wfh_approved",        condition_value=True, reward=2.0),
+            Milestone(id="m2", description="Micro-mobility solution in place.",     condition_key="transport_secured",   condition_value=True, reward=1.0),
+            Milestone(id="m3", description="Temporary accommodation sorted.",       condition_key="accommodation_sorted",condition_value=True, reward=0.8),
+        ]
+        events = [
+            ExoEvent(step=2, probability=0.9, id="strike_extended",   description="Union announces the strike could last 2 weeks.",         world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+            ExoEvent(step=5, probability=0.7, id="scooter_shortage",  description="E-bike rental companies sold out in your area.",         world_mutation={}, hidden_state_mutation={}, closes_routes=["micromobility"]),
+        ]
+        return Task(
+            id="transit_strike_task", domain="transport_crisis", goal="Survive City-Wide Transit Strike",
+            constraints={"budget_max": 400, "deadline_step": 14},
+            hidden_state={},
+            mutable_world={"time.commute_burden": 50.0, "mental_wellbeing.stress_level": 25.0},
+            visible_world={"time.commute_burden": 50.0, "mental_wellbeing.stress_level": 25.0},
+            success_conditions=[{"key": "wfh_approved", "value": True}, {"key": "transport_secured", "value": True}, {"key": "accommodation_sorted", "value": True}],
+            failure_conditions=[],
+            event_schedule=events, viable_routes=routes, milestones=milestones,
+            horizon=18 + difficulty * 2, difficulty=difficulty,
+            domain_metadata={"story": "All public transport workers walked off the job. The city is gridlocked.", "transport_mode": "transit_strike"}
+        )
+    def generate_flight_crisis(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="rebook_premium", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], preconditions={}, consequences={"flight_rebooked": True}, closes_routes=["wait_lounge"], milestones_unlocked=["m1"], final_reward=2.5),
+            Route(id="wait_lounge", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], preconditions={}, consequences={"caught_up": True}, closes_routes=["rebook_premium"], milestones_unlocked=["m2"], final_reward=1.8),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="flight_rebooked", condition_value=True, reward=1.0),
+            Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="caught_up", condition_value=True, reward=0.8),
+        ]
+        events = [
+            ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+            ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={}, hidden_state_mutation={}, closes_routes=["wait_lounge"]),
+        ]
+        return Task(
+            id="flight_crisis_task", domain="flight_crisis", goal="Survive Airport Cancellation",
+            constraints={"budget_max": 800, "deadline_step": 10},
+            hidden_state={"lounge_capacity": 100},
+            mutable_world={"mental_wellbeing.stress_level": 25.0, "time.free_hours_per_week": -10.0},
+            visible_world={"mental_wellbeing.stress_level": 25.0, "time.free_hours_per_week": -10.0},
+            success_conditions=[{"key": "flight_rebooked", "value": True}, {"key": "caught_up", "value": True}],
+            failure_conditions=[],
+            event_schedule=events, viable_routes=routes, milestones=milestones,
+            horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "A major storm grounded commercial flights."}
+        )
+    def generate_code_merge_crisis(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="revert_commit", name="Revert Commit", description="Quickly revert the broken merge to unblock the team.", required_action_types=["delegate", "communicate"], preconditions={}, consequences={"pipeline_unblocked": True}, closes_routes=["hotfix"], milestones_unlocked=["unblocked"], final_reward=1.5),
+            Route(id="hotfix", name="Patch Forward", description="Find the logic error and push a hotfix.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"bug_resolved": True}, closes_routes=["revert_commit"], milestones_unlocked=["fixed"], final_reward=3.0),
+        ]
+        milestones = [
+            Milestone(id="unblocked", description="CI pipeline is green again", condition_key="pipeline_unblocked", condition_value=True, reward=1.0),
+            Milestone(id="fixed", description="Bug resolved without losing features", condition_key="bug_resolved", condition_value=True, reward=2.0),
+        ]
+        events = [
+            ExoEvent(step=3, probability=0.8, id="cto_ping", description="CTO asks for an ETA on the fix.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+        ]
+        return Task(
+            id="code_merge_task", domain="code_merge_crisis", goal="Resolve Production Outage",
+            constraints={"budget_max": 1000, "deadline_step": 8},
+            hidden_state={},
+            mutable_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
+            visible_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
+            success_conditions=[{"key": "pipeline_unblocked", "value": True}, {"key": "bug_resolved", "value": True}],
+            failure_conditions=[],
+            event_schedule=events, viable_routes=routes, milestones=milestones,
+            horizon=10 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "A botched merge just took down the staging environment."}
+        )
+    def generate_career(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="r1", name="Negotiate Workload", description="Discuss with manager to reduce workload.", required_action_types=["communicate"], preconditions={}, consequences={"workload_reduced": True}, closes_routes=["r2"], milestones_unlocked=["m1"], final_reward=2.0),
+            Route(id="r2", name="Find New Job", description="Start applying for new roles.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"job_found": True}, closes_routes=["r1", "r3"], milestones_unlocked=["m2"], final_reward=3.0),
+            Route(id="r3", name="Delegate to Team", description="Push tasks to junior colleagues.", required_action_types=["delegate"], preconditions={}, consequences={"team_delegated": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=1.5),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Manager agreed to reduce tasks.", condition_key="workload_reduced", condition_value=True, reward=1.0),
+            Milestone(id="m2", description="Interview secured.", condition_key="job_found", condition_value=True, reward=1.5),
+            Milestone(id="m3", description="Tasks successfully delegated.", condition_key="team_delegated", condition_value=True, reward=0.8),
+        ]
+        events = [
+            ExoEvent(step=3, probability=0.7, id="boss_asks", description="Boss asks for progress on current tasks.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
+        ]
+        return Task(
+            id="career_crisis", domain="career", goal="Manage Career Overload", constraints={"budget_max": 500, "deadline_step": 12},
+            hidden_state={},
+            mutable_world={"career.workload": 30.0, "time.free_hours_per_week": -20.0},
+            visible_world={"career.workload": 30.0, "time.free_hours_per_week": -20.0},
+            success_conditions=[{"key": "workload_reduced", "value": True}, {"key": "job_found", "value": True}, {"key": "team_delegated", "value": True}],
+            failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Severe workload is threatening your career stability."}
+        )
+    def generate_finances(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="r1", name="Emergency Fund", description="Dip into savings.", required_action_types=["spend"], preconditions={}, consequences={"used_emergency": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=1.0),
+            Route(id="r2", name="Negotiate Payment Plan", description="Call the creditor to delay payments.", required_action_types=["communicate"], preconditions={}, consequences={"payment_plan": True}, closes_routes=["r1"], milestones_unlocked=["m2"], final_reward=2.5),
+            Route(id="r3", name="Sell Asset", description="Liquidate an asset for quick cash.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"asset_sold": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=1.5),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Emergency fund accessed.", condition_key="used_emergency", condition_value=True, reward=0.5),
+            Milestone(id="m2", description="Favorable payment plan negotiated.", condition_key="payment_plan", condition_value=True, reward=1.0),
+            Milestone(id="m3", description="Asset successfully sold.", condition_key="asset_sold", condition_value=True, reward=0.8),
+        ]
+        events = [
+            ExoEvent(step=2, probability=0.9, id="late_fee", description="A late fee was applied to the balance.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
+        ]
+        return Task(
+            id="finance_crisis", domain="finances", goal="Resolve Financial Pressure", constraints={"budget_max": 1000, "deadline_step": 10},
+            hidden_state={},
+            mutable_world={"finances.liquidity": -40.0, "finances.debt_pressure": 20.0},
+            visible_world={"finances.liquidity": -40.0, "finances.debt_pressure": 20.0},
+            success_conditions=[{"key": "used_emergency", "value": True}, {"key": "payment_plan", "value": True}, {"key": "asset_sold", "value": True}],
+            failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "An unexpected expense has caused financial strain."}
+        )
+    def generate_relationships(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="r1", name="Couples Therapy", description="Book a session with a therapist.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"therapy_scheduled": True}, closes_routes=["r3"], milestones_unlocked=["m1"], final_reward=3.0),
+            Route(id="r2", name="Honest Conversation", description="Sit down and talk through issues.", required_action_types=["communicate"], preconditions={}, consequences={"had_conversation": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
+            Route(id="r3", name="Give Space", description="Take some time apart.", required_action_types=["rest"], preconditions={}, consequences={"giving_space": True}, closes_routes=["r1", "r2"], milestones_unlocked=["m3"], final_reward=1.0),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Therapy session completed.", condition_key="therapy_scheduled", condition_value=True, reward=1.5),
+            Milestone(id="m2", description="A productive conversation occurred.", condition_key="had_conversation", condition_value=True, reward=1.0),
+            Milestone(id="m3", description="Space given without escalation.", condition_key="giving_space", condition_value=True, reward=0.5),
+        ]
+        events = [
+            ExoEvent(step=4, probability=0.6, id="partner_escalates", description="Partner sends an emotional text msg.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
+        ]
+        return Task(
+            id="relationship_crisis", domain="relationships", goal="Repair Relationship Friction", constraints={"budget_max": 800, "deadline_step": 14},
+            hidden_state={},
+            mutable_world={"relationships.romantic": -30.0, "mental_wellbeing.stress_level": 20.0},
+            visible_world={"relationships.romantic": -30.0, "mental_wellbeing.stress_level": 20.0},
+            success_conditions=[{"key": "therapy_scheduled", "value": True}, {"key": "had_conversation", "value": True}, {"key": "giving_space", "value": True}],
+            failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Growing distance and recent conflicts demand attention."}
+        )
+    def generate_physical_health(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="r1", name="Medical Leave", description="Request time off to recover.", required_action_types=["communicate", "rest"], preconditions={}, consequences={"on_leave": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.5),
+            Route(id="r2", name="See Specialist", description="Pay for a top-tier medical consultation.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"saw_doctor": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
+            Route(id="r3", name="Lifestyle Change", description="Commit to better diet and sleep.", required_action_types=["rest"], preconditions={}, consequences={"lifestyle_changed": True}, closes_routes=["r1"], milestones_unlocked=["m3"], final_reward=1.5),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Leave approved.", condition_key="on_leave", condition_value=True, reward=1.0),
+            Milestone(id="m2", description="Clear diagnosis received.", condition_key="saw_doctor", condition_value=True, reward=1.0),
+            Milestone(id="m3", description="First week of new habits complete.", condition_key="lifestyle_changed", condition_value=True, reward=0.5),
+        ]
+        events = [
+            ExoEvent(step=3, probability=0.8, id="doctor_call", description="The clinic calls with test results.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
+        ]
+        return Task(
+            id="health_crisis", domain="physical_health", goal="Address Health Warning", constraints={"budget_max": 1500, "deadline_step": 15},
+            hidden_state={},
+            mutable_world={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0},
+            visible_world={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0},
+            success_conditions=[{"key": "on_leave", "value": True}, {"key": "saw_doctor", "value": True}, {"key": "lifestyle_changed", "value": True}],
+            failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Physical symptoms are becoming impossible to ignore."}
+        )
+    def generate_mental_wellbeing(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="r1", name="Professional Therapy", description="Start regular therapy sessions.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"therapy_started": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=3.0),
+            Route(id="r2", name="Disconnect", description="Take a full digital detox break.", required_action_types=["rest"], preconditions={}, consequences={"disconnected": True}, closes_routes=["r3"], milestones_unlocked=["m2"], final_reward=1.5),
+            Route(id="r3", name="Medication Evaluation", description="See a psychiatrist for options.", required_action_types=["spend"], preconditions={}, consequences={"medication_taken": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=2.0),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Meaningful breakthrough in therapy.", condition_key="therapy_started", condition_value=True, reward=1.5),
+            Milestone(id="m2", description="Successfully unplugged for 48 hours.", condition_key="disconnected", condition_value=True, reward=0.8),
+            Milestone(id="m3", description="Prescription acquired.", condition_key="medication_taken", condition_value=True, reward=1.0),
+        ]
+        events = [
+            ExoEvent(step=2, probability=0.5, id="panic_attack", description="A sudden wave of severe anxiety hits.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
+        ]
+        return Task(
+            id="mental_crisis", domain="mental_wellbeing", goal="Avert Total Burnout", constraints={"budget_max": 600, "deadline_step": 12},
+            hidden_state={},
+            mutable_world={"mental_wellbeing.motivation": -35.0, "mental_wellbeing.stress_level": 40.0},
+            visible_world={"mental_wellbeing.motivation": -35.0, "mental_wellbeing.stress_level": 40.0},
+            success_conditions=[{"key": "therapy_started", "value": True}, {"key": "disconnected", "value": True}, {"key": "medication_taken", "value": True}],
+            failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Complete exhaustion and loss of motivation."}
+        )
+    def generate_time(self, difficulty: int) -> Task:
+        routes = [
+            Route(id="r1", name="Reprioritize", description="Restructure calendar and say 'no'.", required_action_types=["communicate"], preconditions={}, consequences={"priorities_reset": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.0),
+            Route(id="r2", name="Delegate", description="Pay someone or ask for help with chores.", required_action_types=["spend", "delegate"], preconditions={}, consequences={"tasks_delegated": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=1.5),
+            Route(id="r3", name="Cancel Commitments", description="Drop out of major upcoming events.", required_action_types=["communicate"], preconditions={}, consequences={"commitments_cancelled": True}, closes_routes=["r1"], milestones_unlocked=["m3"], final_reward=1.0),
+        ]
+        milestones = [
+            Milestone(id="m1", description="Calendar cleared of non-essentials.", condition_key="priorities_reset", condition_value=True, reward=1.0),
+            Milestone(id="m2", description="Help secured for daily tasks.", condition_key="tasks_delegated", condition_value=True, reward=0.8),
+            Milestone(id="m3", description="Social obligations cancelled.", condition_key="commitments_cancelled", condition_value=True, reward=0.5),
+        ]
+        events = [
+            ExoEvent(step=3, probability=0.9, id="new_request", description="A friend asks for an 'urgent' favor.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
+        ]
+        return Task(
+            id="time_crisis", domain="time", goal="Regain Time Control", constraints={"budget_max": 300, "deadline_step": 10},
+            hidden_state={},
+            mutable_world={"time.free_hours_per_week": -25.0, "time.admin_overhead": 20.0},
+            visible_world={"time.free_hours_per_week": -25.0, "time.admin_overhead": 20.0},
+            success_conditions=[{"key": "priorities_reset", "value": True}, {"key": "tasks_delegated", "value": True}, {"key": "commitments_cancelled", "value": True}],
+            failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "You are double-booked and drowning in obligations."}
+        )

agent/conflict_predictor.py ADDED Viewed

	@@ -0,0 +1,142 @@

+"""
+conflict_predictor.py — Proactive intelligence and trajectory forecasting
+"""
+import copy
+from core.life_state import LifeMetrics, DependencyGraph
+class ConflictPredictor:
+    def __init__(self):
+        self.graph = DependencyGraph()
+        self.snapshots = [] # list of flattened LifeMetrics dicts
+        self.MAX_HISTORY = 10
+        self.INVERSE_METRICS = {
+            "mental_wellbeing.stress_level",
+            "career.workload",
+            "finances.debt_pressure",
+            "time.commute_burden",
+            "time.admin_overhead"
+        }
+    def add_snapshot(self, metrics: LifeMetrics) -> None:
+        self.snapshots.append(metrics.flatten())
+        if len(self.snapshots) > self.MAX_HISTORY:
+            self.snapshots.pop(0)
+    def compute_trajectory(self, metric_path: str) -> float:
+        if len(self.snapshots) < 3:
+            return 0.0
+        # Use last 5 snapshots maximum
+        n = min(5, len(self.snapshots))
+        y = [s.get(metric_path, 0.0) for s in self.snapshots[-n:]]
+        x = list(range(n))
+        # Simple linear regression: slope = Cov(x, y) / Var(x)
+        mean_y = sum(y) / n
+        mean_x = sum(x) / n
+        cov_xy = sum((x_i - mean_x) * (y_i - mean_y) for x_i, y_i in zip(x, y))
+        var_x = sum((x_i - mean_x) ** 2 for x_i in x)
+        if var_x == 0:
+            return 0.0
+        return cov_xy / var_x
+    def predict_crisis(self, horizon_days: int = 7) -> list:
+        if not self.snapshots:
+            return []
+        current = self.snapshots[-1]
+        warnings = []
+        for metric, val in current.items():
+            slope = self.compute_trajectory(metric)
+            if slope == 0.0:
+                continue
+            projected = val + (slope * horizon_days)
+            is_inverse = metric in self.INVERSE_METRICS
+            # Normal metric: Critical is low (<30), Warning is low (<45)
+            # Inverse metric: Critical is high (>70), Warning is high (>55)
+            critical_now = (val > 70) if is_inverse else (val < 30)
+            warning_now = (val > 55) if is_inverse else (val < 45)
+            critical_proj = (projected > 70) if is_inverse else (projected < 30)
+            warning_proj = (projected > 55) if is_inverse else (projected < 45)
+            worse_direction = (slope > 0) if is_inverse else (slope < 0)
+            if worse_direction and (critical_proj or warning_proj):
+                threshold = 70.0 if is_inverse else 30.0
+                days_until_crit = (threshold - val) / slope if slope != 0 else float('inf')
+                if critical_now:
+                    days_until_crit = 0.0
+                severity = 'crisis' if critical_proj else 'warning'
+                direction_word = "rising" if slope > 0 else "declining"
+                friendly_name = metric.split('.')[-1].replace('_', ' ')
+                if severity == 'crisis':
+                    msg = f"{friendly_name} will hit critical levels in {max(0, int(days_until_crit))} days."
+                else:
+                    msg = f"{friendly_name} has been {direction_word} ({slope:+.1f}/day) — warning levels likely within {horizon_days} days."
+                warnings.append({
+                    "metric": metric,
+                    "current_value": val,
+                    "projected_value": projected,
+                    "days_until_critical": max(0.0, days_until_crit),
+                    "severity": severity,
+                    "message": msg
+                })
+        # Sort by urgency (days until critical)
+        warnings.sort(key=lambda x: x['days_until_critical'])
+        return warnings
+    def get_prediction_summary(self) -> str:
+        warnings = self.predict_crisis()
+        if not warnings:
+            return "Your life metrics are stable. No immediate crises predicted."
+        messages = [w['message'] for w in warnings]
+        return "Based on your current trajectory: " + " ".join(messages[:3]) + ("" if len(messages) <= 3 else " (+ more warnings hidden).")
+    def get_risk_score(self) -> float:
+        warnings = self.predict_crisis()
+        if not warnings:
+            return 0.0
+        score = 0.0
+        for w in warnings:
+            if w['severity'] == 'crisis':
+                score += 0.3
+            else:
+                score += 0.1
+        return min(1.0, score)
+def main():
+    import random
+    predictor = ConflictPredictor()
+    print("Simulating 5 days of accumulating stress and declining sleep...\n")
+    current_state = LifeMetrics()
+    for i in range(5):
+        current_state.mental_wellbeing.stress_level += 5.0 + random.uniform(0, 2)
+        current_state.physical_health.sleep_quality -= 4.0 + random.uniform(0, 2)
+        current_state.time.free_hours_per_week -= 1.0 + random.uniform(0, 1)
+        predictor.add_snapshot(current_state)
+        print(f"Day {i+1}: Stress={current_state.mental_wellbeing.stress_level:.1f}, Sleep={current_state.physical_health.sleep_quality:.1f}")
+    print("\n--- PREDICTION AFTER 5 DAYS ---")
+    print(f"Risk Score: {predictor.get_risk_score():.2f}")
+    print("Summary:")
+    print(predictor.get_prediction_summary())
+if __name__ == '__main__':
+    main()

agent/counterfactuals.py ADDED Viewed

	@@ -0,0 +1,106 @@

+"""
+counterfactuals.py — Generates alternative "What If" scenarios for LifeStack agent decisions.
+"""
+import copy
+import random
+from core.reward import compute_reward
+from core.life_state import DependencyGraph
+def generate_counterfactuals(agent, metrics, budget, conflict, person, chosen_action):
+    """
+    Simulates 3 alternative action types and compares them to the agent's choice.
+    Returns a list of dicts with alternative outcomes.
+    """
+    action_types = ["communicate", "rest", "delegate", "negotiate", "spend", "reschedule", "deprioritize"]
+    chosen_type = chosen_action.primary.action_type
+    # Filter and pick 3 different types
+    alternatives = [t for t in action_types if t != chosen_type]
+    random.shuffle(alternatives)
+    target_types = alternatives[:3]
+    results = []
+    graph = DependencyGraph()
+    for action_type in target_types:
+        try:
+            # 1. Generate alternative action
+            # We use the special forced-type method we added to the agent
+            alt_action = agent.get_action_for_type(metrics, budget, conflict, person, action_type)
+            # 2. Simulate applying it
+            current_stress = metrics.mental_wellbeing.stress_level
+            uptake = person.respond_to_action(
+                alt_action.primary.action_type,
+                alt_action.primary.resource_cost,
+                current_stress
+            )
+            state_after = copy.deepcopy(metrics)
+            for path, delta in alt_action.primary.metric_changes.items():
+                if "." not in path: continue
+                try:
+                    scaled_delta = float(delta) * uptake
+                except (ValueError, TypeError):
+                    continue
+                if abs(scaled_delta) > 5:
+                    state_after = graph.cascade(state_after, {path: scaled_delta})
+                else:
+                    dom, sub = path.split('.')
+                    d = getattr(state_after, dom, None)
+                    if d:
+                        cur = getattr(d, sub, 70.0)
+                        setattr(d, sub, max(0.0, min(100.0, cur + scaled_delta)))
+            # 3. Compute Reward
+            reward, breakdown = compute_reward(metrics, state_after, alt_action.primary.resource_cost, 1)
+            # 4. Analysis deltas
+            flat_before = metrics.flatten()
+            flat_after = state_after.flatten()
+            deltas = {k: flat_after[k] - flat_before[k] for k in flat_after}
+            # Filter for meaningful changes (>1.0)
+            significant = {k: v for k, v in deltas.items() if abs(v) > 1.0}
+            trade_off = ""
+            if significant:
+                best = max(significant.items(), key=lambda x: x[1])
+                worst = min(significant.items(), key=lambda x: x[1])
+                b_name = best[0].split('.')[-1].replace('_', ' ')
+                if best[1] > 2:
+                    trade_off = f"Better {b_name} (+{best[1]:.0f})"
+                else:
+                    trade_off = f"Stability in {b_name}"
+                if worst[1] < -2:
+                    w_name = worst[0].split('.')[-1].replace('_', ' ')
+                    trade_off += f" but drops {w_name} ({worst[1]:.0f})"
+                else:
+                    trade_off += " but mission impact is lower than optimal."
+            else:
+                trade_off = "Minimal impact on core life metrics."
+            # Incorporate resource commentary
+            cost = alt_action.primary.resource_cost
+            if cost.get('money', 0) > 100:
+                trade_off += f" (${cost['money']:.0f} cost)"
+            elif cost.get('time', 0) > 4:
+                trade_off += f" ({cost['time']:.1f}h time drain)"
+            results.append({
+                "action_type": action_type,
+                "description": alt_action.primary.description,
+                "reward": reward,
+                "trade_off": trade_off,
+                "uptake": uptake,
+                "metrics": state_after.flatten(),
+            })
+        except Exception as e:
+            print(f"Error in counterfactual generation for {action_type}: {e}")
+    return results

agent/memory.py ADDED Viewed

	@@ -0,0 +1,394 @@

+import os
+import chromadb
+from sentence_transformers import SentenceTransformer
+import uuid
+import math
+from datetime import datetime
+from collections import defaultdict
+from typing import Optional
+class LifeStackMemory:
+    def __init__(self, silent: bool = False, path: str = "./lifestack_memory"):
+        self.client = chromadb.PersistentClient(path=path)
+        self.collection = self.client.get_or_create_collection(name='decisions')
+        self.traj_collection = self.client.get_or_create_collection(name='trajectories')
+        self.feedback_collection = self.client.get_or_create_collection(name='feedback') # New for OutcomeFeedback
+        self.silent = silent
+        self.encoder = self._load_encoder()
+        if not self.silent:
+            print("Memory system initialized")
+        # Auto-hydrate if empty
+        if self.collection.count() == 0:
+            self._hydrate_from_preseeded()
+    def _hydrate_from_preseeded(self):
+        import json
+        sources = ["./data/preseeded_memory_p1.json", "./data/preseeded_memory_p2.json"]
+        if not self.silent:
+            print(f"🧬 Empty memory detected. Hydrating from partitioned volumes...")
+        total_decisions = 0
+        for path in sources:
+            if not os.path.exists(path):
+                continue
+            try:
+                with open(path, 'r') as f:
+                    data = json.load(f)
+                # Hydrate decisions
+                d = data.get("decisions", {})
+                if d.get("ids"):
+                    self.collection.add(
+                        ids=d["ids"],
+                        documents=d["documents"],
+                        metadatas=d["metadatas"],
+                        embeddings=d["embeddings"]
+                    )
+                    total_decisions += len(d["ids"])
+            except Exception as e:
+                if not self.silent:
+                    print(f"⚠️ Hydration failed for {path}: {e}")
+        if not self.silent:
+            print(f"✅ Hydration complete: {total_decisions} memories restored.")
+    def _load_encoder(self):
+        try:
+            return SentenceTransformer('all-MiniLM-L6-v2', local_files_only=True)
+        except Exception as exc:
+            if not self.silent:
+                print(f"Falling back to local hash embeddings: {exc}")
+            return None
+    def _embed_text(self, text: str) -> list[float]:
+        if self.encoder is not None:
+            return self.encoder.encode(text).tolist()
+        import zlib
+        buckets = [0.0] * 384
+        for token in text.lower().split():
+            idx = zlib.adler32(token.encode()) % len(buckets)
+            buckets[idx] += 1.0
+        norm = math.sqrt(sum(v * v for v in buckets)) or 1.0
+        return [v / norm for v in buckets]
+    def store_decision(
+        self,
+        conflict_title: str,
+        action_type: str,
+        target_domain: str,
+        reward: float,
+        metrics_snapshot: dict,
+        reasoning: str,
+        trajectory: list[dict] = None,
+        route_outcome: str = None
+    ) -> None:
+        """Stores individual decision for longitudinal tracking."""
+        text = f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}"
+        embedding = self._embed_text(text)
+        doc_id = str(uuid.uuid4())
+        self.collection.add(
+            ids=[doc_id],
+            embeddings=[embedding],
+            documents=[text],
+            metadatas=[{
+                "conflict_title": conflict_title,
+                "action_type": action_type,
+                "target_domain": target_domain,
+                "reward": float(reward),
+                "reasoning": reasoning,
+                "route_outcome": route_outcome or "",
+                "timestamp": datetime.now().isoformat()
+            }]
+        )
+    def store_trajectory(
+        self,
+        conflict_title: str = None,
+        route_taken: str = None,
+        total_reward: float = 0.0,
+        metrics_diff_str: str = None,
+        reasoning: str = None,
+        task_id: str = None,
+        trajectory_summary: dict = None
+    ) -> None:
+        """Stores a full trajectory summary."""
+        if trajectory_summary is not None and task_id is not None:
+            import json
+            text = f"Task: {task_id} Route: {route_taken} Reward: {total_reward:.2f} Hits: {len(trajectory_summary.get('milestones_hit', []))}"
+            embedding = self._embed_text(text)
+            doc_id = str(uuid.uuid4())
+            self.traj_collection.add(
+                ids=[doc_id],
+                embeddings=[embedding],
+                documents=[text],
+                metadatas=[{
+                    "task_id": task_id,
+                    "route_taken": route_taken,
+                    "reward": total_reward,
+                    "summary": json.dumps(trajectory_summary),
+                    "timestamp": datetime.now().isoformat()
+                }]
+            )
+            if not self.silent:
+                print(f"Stored task trajectory: {route_taken} (reward: {total_reward:.2f})")
+            return
+        # Fallback to older signature logic
+        text = f"{conflict_title} Route: {route_taken} Diff: {metrics_diff_str} {reasoning[:100]}"
+        embedding = self._embed_text(text)
+        doc_id = str(uuid.uuid4())
+        self.collection.add(
+            ids=[doc_id],
+            embeddings=[embedding],
+            documents=[text],
+            metadatas=[{
+                "conflict_title": conflict_title,
+                "route_taken": route_taken,
+                "metrics_diff": metrics_diff_str,
+                "reward": total_reward,
+                "reasoning": reasoning,
+                "timestamp": datetime.now().isoformat()
+            }]
+        )
+        if not self.silent:
+            print(f"Stored trajectory fallback: {route_taken} (reward: {total_reward:.2f})")
+    def store_feedback(self, feedback) -> None:
+        """Stores OutcomeFeedback linked to a specific episode."""
+        import json
+        text = f"Episode: {feedback.episode_id} Effectiveness: {feedback.overall_effectiveness} Resolution: {feedback.resolution_time_hours}h"
+        embedding = self._embed_text(text)
+        doc_id = f"fb_{feedback.episode_id}"
+        self.feedback_collection.add(
+            ids=[doc_id],
+            embeddings=[embedding],
+            documents=[text],
+            metadatas=[{
+                "episode_id": feedback.episode_id,
+                "effectiveness": feedback.overall_effectiveness,
+                "domains_improved": json.dumps(feedback.domains_improved),
+                "domains_worsened": json.dumps(feedback.domains_worsened),
+                "unexpected_effects": feedback.unexpected_effects,
+                "resolution_time": feedback.resolution_time_hours,
+                "timestamp": feedback.submitted_at.isoformat()
+            }]
+        )
+        if not self.silent:
+            print(f"Stored human feedback for episode {feedback.episode_id}")
+    def retrieve_feedback(self, episode_id: str) -> Optional[dict]:
+        """Retrieves feedback for a specific episode."""
+        import json
+        doc_id = f"fb_{episode_id}"
+        results = self.feedback_collection.get(ids=[doc_id])
+        if not results['metadatas']:
+            return None
+        meta = results['metadatas'][0]
+        # Deserialize lists
+        meta["domains_improved"] = json.loads(meta["domains_improved"])
+        meta["domains_worsened"] = json.loads(meta["domains_worsened"])
+        return meta
+    def retrieve_similar_trajectories(self, task_domain: str, current_world: dict, n: int = 3) -> list[dict]:
+        """Retrieve similar trajectories based on task domain and current world state."""
+        import json
+        if self.traj_collection.count() == 0:
+            return []
+        sorted_metrics = sorted(current_world.items(), key=lambda x: x[1] if isinstance(x[1], (int, float)) else 0)
+        top_stressed = " ".join(f"{k}:{v}" for k, v in sorted_metrics[:3])
+        query_text = f"TaskDomain: {task_domain} {top_stressed}"
+        query_embedding = self._embed_text(query_text)
+        results = self.traj_collection.query(
+            query_embeddings=[query_embedding],
+            n_results=min(n, self.traj_collection.count())
+        )
+        output = []
+        for i, meta in enumerate(results['metadatas'][0]):
+            output.append({
+                "task_id": meta.get("task_id", ""),
+                "route_taken": meta.get("route_taken", ""),
+                "reward": meta.get("reward", 0.0),
+                "summary": json.loads(meta.get("summary", "{}")),
+            })
+        return output
+    def retrieve_similar(self, conflict_title: str, current_metrics: dict, n: int = 3) -> list[dict]:
+        """Retrieves the n most similar past high-reward decisions using semantic search."""
+        if self.collection.count() == 0:
+            return []
+        # Build query from conflict title + 3 most stressed metrics (lowest values)
+        sorted_metrics = sorted(current_metrics.items(), key=lambda x: x[1])
+        top_stressed = " ".join(f"{k}:{v:.0f}" for k, v in sorted_metrics[:3])
+        query_text = f"{conflict_title} {top_stressed}"
+        query_embedding = self._embed_text(query_text)
+        results = self.collection.query(
+            query_embeddings=[query_embedding],
+            n_results=min(n * 2, self.collection.count()) # Retrieve more to filter for high reward
+        )
+        output = []
+        for i, meta in enumerate(results['metadatas'][0]):
+            if meta.get("reward", 0.0) < 0.05: # Filter out negative/zero reward decisions
+                continue
+            if len(output) >= n:
+                break
+            distance = results['distances'][0][i]
+            similarity = round(1.0 / (1.0 + distance), 4)
+            output.append({
+                "route_taken": meta.get("route_taken", ""),
+                "action_type": meta.get("action_type", ""),
+                "target_domain": meta.get("target_domain", ""),
+                "metrics_diff": meta.get("metrics_diff", ""),
+                "reward": meta.get("reward", 0.0),
+                "reasoning": meta.get("reasoning", ""),
+                "similarity_score": similarity
+            })
+        return output
+    def build_few_shot_prompt(self, conflict_title: str, current_metrics: dict) -> str:
+        """Formats retrieved memories into a few-shot prompt block for the LLM."""
+        memories = self.retrieve_similar(conflict_title, current_metrics)
+        if not memories:
+            return ""
+        lines = ["Past successful trajectories in similar situations:\n"]
+        for m in memories:
+            short_reason = m['reasoning'][:80]
+            lines.append(
+                f"  Route [{m['route_taken']}] → impact [{m['metrics_diff']}] → total reward {m['reward']:.2f} "
+                f"(reasoning: {short_reason}...)"
+            )
+        return "\n".join(lines)
+    def get_stats(self) -> dict:
+        """Returns memory stats: total count, average reward, and route details."""
+        if self.collection.count() == 0:
+            return {"total_memories": 0, "average_reward": 0.0, "by_route": {}}
+        all_records = self.collection.get(include=["metadatas"])
+        metadatas = all_records["metadatas"]
+        total = len(metadatas)
+        avg_reward = sum(m.get("reward", 0.0) for m in metadatas) / total
+        by_route = defaultdict(int)
+        for m in metadatas:
+            route = m.get("route_taken") or m.get("route_outcome") or "unknown"
+            first_action = route.split(' ')[0] if route else "unknown"
+            by_route[first_action] += 1
+        return {
+            "total_memories": total,
+            "average_reward": round(avg_reward, 3),
+            "by_action_type": dict(by_route)
+        }
+def main():
+    memory = LifeStackMemory()
+    # --- Synthetic Decisions: mix of high and low reward ---
+    synthetic = [
+        {
+            "conflict_title": "Friday 6PM",
+            "action_type": "negotiate",
+            "target_domain": "career",
+            "reward": 0.72,
+            "metrics_snapshot": {"career.workload": 100, "mental_wellbeing.stress_level": 95},
+            "reasoning": "Negotiating the deadline directly reduced workload pressure quickly."
+        },
+        {
+            "conflict_title": "Friday 6PM",
+            "action_type": "rest",
+            "target_domain": "mental_wellbeing",
+            "reward": 0.61,
+            "metrics_snapshot": {"mental_wellbeing.stress_level": 95, "physical_health.energy": 40},
+            "reasoning": "A short rest during peak stress restored energy before tackling logistics."
+        },
+        {
+            "conflict_title": "The Perfect Storm",
+            "action_type": "communicate",
+            "target_domain": "relationships",
+            "reward": 0.58,
+            "metrics_snapshot": {"relationships.romantic": 45, "mental_wellbeing.emotional_stability": 50},
+            "reasoning": "A quick reassuring call prevented relationship collapse under crisis."
+        },
+        {
+            "conflict_title": "The Perfect Storm",
+            "action_type": "delegate",
+            "target_domain": "career",
+            "reward": 0.38,  # Below threshold — should NOT be stored
+            "metrics_snapshot": {"career.workload": 90, "career.stability": 55},
+            "reasoning": "Attempted to delegate but the neurotic profile made it ineffective."
+        },
+        {
+            "conflict_title": "Health Scare",
+            "action_type": "rest",
+            "target_domain": "physical_health",
+            "reward": 0.80,
+            "metrics_snapshot": {"physical_health.energy": 20, "mental_wellbeing.stress_level": 90},
+            "reasoning": "Aggressive rest protocol dramatically recovered energy and clarity."
+        },
+        {
+            "conflict_title": "Check Engine Light",
+            "action_type": "spend",
+            "target_domain": "finances",
+            "reward": 0.33,  # Below threshold — should NOT be stored
+            "metrics_snapshot": {"finances.liquidity": 40, "time.commute_burden": 80},
+            "reasoning": "Overspent on premium repair, draining liquidity buffer dangerously."
+        },
+    ]
+    print("\n--- STORING SYNTHETIC DECISIONS ---")
+    for d in synthetic:
+        memory.store_decision(**d)
+    # --- Retrieve similar decisions ---
+    print("\n--- RETRIEVING SIMILAR DECISIONS ---")
+    test_metrics = {
+        "career.workload": 95,
+        "mental_wellbeing.stress_level": 90,
+        "finances.liquidity": 35,
+        "physical_health.energy": 50,
+        "relationships.romantic": 70
+    }
+    similar = memory.retrieve_similar("Friday 6PM", test_metrics, n=3)
+    for s in similar:
+        print(f"  [{s['action_type']}] → {s['target_domain']} | reward: {s['reward']:.2f} | similarity: {s['similarity_score']:.4f}")
+        print(f"  Reasoning: {s['reasoning'][:80]}...")
+    # --- Few-shot prompt ---
+    print("\n--- FEW-SHOT PROMPT OUTPUT ---")
+    prompt = memory.build_few_shot_prompt("Friday 6PM", test_metrics)
+    print(prompt if prompt else "(No relevant memories found)")
+    # --- Stats ---
+    print("\n--- MEMORY STATS ---")
+    stats = memory.get_stats()
+    print(f"Total Memories : {stats['total_memories']}")
+    print(f"Average Reward : {stats['average_reward']}")
+    print(f"By Action Type : {stats.get('by_action_type', stats.get('by_route_start'))}")
+if __name__ == "__main__":
+    main()

app.py ADDED Viewed

	@@ -0,0 +1,1284 @@

+"""
+app.py — LifeStack Gradio Demo App
+Hackathon presentation interface for the LifeStack simulation engine.
+"""
+import os
+import json
+import copy
+import gradio as gr
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+# ─── LifeStack modules ────────────────────────────────────────────────────────
+from core.life_state import LifeMetrics, ResourceBudget
+from core.lifestack_env import LifeStackEnv, LifeStackAction
+from agent.agent import LifeStackAgent
+from intake.simperson import SimPerson
+from agent.conflict_generator import ConflictEvent, generate_conflict, TEMPLATES
+from core.action_space import apply_action, validate_action
+from agent.memory import LifeStackMemory
+from core.metric_schema import normalize_metric_path, is_valid_metric_path
+from core.reward import compute_reward
+from intake.intake import LifeIntake
+from agent.conflict_predictor import ConflictPredictor
+from agent.counterfactuals import generate_counterfactuals
+from scripts.longitudinal_demo import LongitudinalDemo
+from intake.gmail_intake import GmailIntake
+from core.task import Task, ExoEvent, Route, Milestone
+from core.feedback import OutcomeFeedback, compute_human_feedback_reward
+# ─── Pre-load at startup ──────────────────────────────────────────────────────
+print("🚀 LifeStack booting…")
+AGENT  = LifeStackAgent()
+MEMORY = LifeStackMemory(silent=True)
+INTAKE = LifeIntake()
+GMAIL  = GmailIntake()
+LONG_DEMO = LongitudinalDemo()
+# Pre-seed Arjun's 3-week context into ChromaDB on startup
+LONG_DEMO.pre_seed_arjun()
+# Friday 6PM is always the default demo conflict
+DEMO_CONFLICT = next(t for t in TEMPLATES if t.id == "d5_friday")
+PERSONS = {
+    "Alex (Executive) — driven, high-stress":
+        SimPerson(openness=0.4, conscientiousness=0.9, extraversion=0.7,  agreeableness=0.25, neuroticism=0.8,  name="Alex (Executive)"),
+    "Chloe (Creative) — spontaneous, resilient":
+        SimPerson(openness=0.9, conscientiousness=0.2, extraversion=0.5,  agreeableness=0.70, neuroticism=0.15, name="Chloe (Creative)"),
+    "Sam (Introvert) — anxious, thoughtful":
+        SimPerson(openness=0.5, conscientiousness=0.6, extraversion=0.1,  agreeableness=0.65, neuroticism=0.9,  name="Sam (Introvert)"),
+    "Maya (Family) — empathetic, nurturing":
+        SimPerson(openness=0.5, conscientiousness=0.7, extraversion=0.5,  agreeableness=0.95, neuroticism=0.3,  name="Maya (Family)"),
+    "Leo (Student) — curious, organised":
+        SimPerson(openness=0.85, conscientiousness=0.8, extraversion=0.4, agreeableness=0.4,  neuroticism=0.55, name="Leo (Student)"),
+    "Arjun (Startup Lead) — high- conscientiousness, high-neuroticism":
+        SimPerson(name="Arjun", openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8),
+}
+CONFLICT_CHOICES      = {f"[Diff {t.difficulty}] {t.title}": t for t in TEMPLATES}
+PERSON_CHOICES        = list(PERSONS.keys())
+CONFLICT_CHOICES_LIST = list(CONFLICT_CHOICES.keys())
+DEFAULT_CONFLICT      = next(k for k in CONFLICT_CHOICES_LIST if "Friday 6PM" in k)
+DEMO_PREDICTOR = ConflictPredictor()
+print("✅ LifeStack ready.")
+# ─── Helpers ──────────────────────────────────────────────────────────────────
+DOMAIN_EMOJI = {
+    "career": "💼", "finances": "💰", "relationships": "❤️",
+    "physical_health": "💪", "mental_wellbeing": "🧠", "time": "📅",
+}
+# Metrics where HIGH = BAD (inverted color logic)
+INVERTED_METRICS = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
+def _metric_color(key: str, val: float) -> str:
+    """Return CSS color: inverted for 'bad-when-high' metrics."""
+    sub = key.split(".")[-1]
+    if sub in INVERTED_METRICS:
+        return "#f87171" if val > 70 else ("#facc15" if val >= 40 else "#4ade80")
+    return "#4ade80" if val > 70 else ("#facc15" if val >= 40 else "#f87171")
+def metrics_html(flat: dict, title: str = "", before: dict = None) -> str:
+    """Render metrics as coloured progress bars.
+    If `before` is supplied, metrics that changed >1 pt show ↑/↓ + delta.
+    """
+    domains = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
+    rows = []
+    if title:
+        rows.append(f"<h3 style='margin:0 0 8px;font-size:14px;color:#aaa'>{title}</h3>")
+    for dom in domains:
+        emoji = DOMAIN_EMOJI[dom]
+        rows.append(f"<div style='margin:6px 0 2px;font-size:12px;font-weight:700;color:#ccc'>{emoji} {dom.upper()}</div>")
+        sub = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
+        for key, val in sub.items():
+            name  = key.split(".")[1].replace("_", " ")
+            color = _metric_color(key, val)
+            pct   = min(val, 100)
+            delta_str = ""
+            if before is not None and key in before:
+                delta = val - before[key]
+                if abs(delta) > 1.0:
+                    arrow = "↑" if delta > 0 else "↓"
+                    dc    = "#4ade80" if delta > 0 else "#f87171"
+                    delta_str = (
+                        f"<span style='font-size:10px;color:{dc};margin-left:4px;font-weight:700'>"
+                        f"{arrow} ({delta:+.1f})</span>"
+                    )
+            rows.append(
+                f"<div style='display:flex;align-items:center;gap:6px;margin:2px 0'>"
+                f"  <span style='width:140px;font-size:11px;color:#bbb'>{name}</span>"
+                f"  <div style='flex:1;background:#333;border-radius:4px;height:10px'>"
+                f"    <div style='width:{pct}%;background:{color};border-radius:4px;height:10px'></div>"
+                f"  </div>"
+                f"  <span style='width:38px;font-size:11px;color:#ccc;text-align:right'>{val:.1f}</span>"
+                f"  {delta_str}"
+                f"</div>"
+            )
+    return "<div style='font-family:monospace;padding:8px'>" + "\n".join(rows) + "</div>"
+def _init_env(conflict: ConflictEvent) -> LifeStackEnv:
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget=conflict.resource_budget)
+    return env
+def task_html(task: Task) -> str:
+    if not task:
+        return "<div style='color:#888; font-style:italic'>No active task</div>"
+    routes_html = "".join([f"<li style='margin-bottom:6px;'><b>{r.name}</b>: {r.description} <br><span style='font-size:11px;color:#aaa'>Req. Actions: {r.required_action_types} | Reward: +{r.final_reward}</span></li>" for r in task.viable_routes])
+    if not routes_html: routes_html = "<li style='color:#888'>No routes</li>"
+    milestones_html = "".join([f"<li style='margin-bottom:6px;'><b>{m.id}</b>: {m.description} <br><span style='font-size:11px;color:#4ade80'>Reward: +{m.reward}</span></li>" for m in task.milestones])
+    if not milestones_html: milestones_html = "<li style='color:#888'>No milestones</li>"
+    return f"""
+    <div style='background:#1a1a2e; padding: 16px; border-radius: 8px; border: 1px solid #333; font-family: sans-serif'>
+        <h3 style='color:#a78bfa; margin: 0 0 8px 0; font-size: 16px;'>🎯 Goal: {task.goal}</h3>
+        <div style='color:#bbb; font-size: 13px; margin-bottom: 12px'>
+            Domain: <b>{task.domain}</b> | Difficulty: <b>{task.difficulty}/5</b> | Horizon: <b>{task.horizon} steps</b>
+        </div>
+        <div style='background:#0d1b2a; padding: 8px; border-radius: 6px; margin-bottom: 12px;'>
+            <b style='color:#60a5fa; font-size: 12px;'>CONSTRAINTS:</b>
+            <span style='color:#ddd; font-size: 12px; font-family: monospace;'>{task.constraints}</span>
+        </div>
+        <div style='display: flex; gap: 16px;'>
+            <div style='flex: 1; background:#1e1e2f; padding: 12px; border-radius: 6px;'>
+                <b style='color:#4ade80; font-size: 13px; border-bottom: 1px solid #333; display: block; padding-bottom: 4px; margin-bottom: 8px'>🛣️ Viable Routes</b>
+                <ul style='color:#ddd; padding-left: 20px; font-size: 12px; margin: 0;'>{routes_html}</ul>
+            </div>
+            <div style='flex: 1; background:#1e1e2f; padding: 12px; border-radius: 6px;'>
+                <b style='color:#fbbf24; font-size: 13px; border-bottom: 1px solid #333; display: block; padding-bottom: 4px; margin-bottom: 8px'>⭐ Milestones</b>
+                <ul style='color:#ddd; padding-left: 20px; font-size: 12px; margin: 0;'>{milestones_html}</ul>
+            </div>
+        </div>
+    </div>
+    """
+def event_log_html(events: list[ExoEvent]) -> str:
+    if not events:
+        return "<div style='color:#888; font-style:italic; padding: 12px;'>No events triggered yet.</div>"
+    rows = []
+    for e in events:
+        rows.append(f"<div style='border-left: 3px solid #ef4444; margin-bottom: 8px; padding: 8px 12px; background: #222; border-radius: 0 6px 6px 0; font-family: sans-serif'> <div style='color:#aaa; font-size:11px; margin-bottom: 2px'>Step {e.step}</div> <div style='color:#ddd; font-size: 13px;'><b style='color:#ef4444'>{e.id.upper()}</b>: {e.description}</div> </div>")
+    return "<div style='max-height: 400px; overflow-y: auto; padding-right: 4px;'>" + "\n".join(rows) + "</div>"
+def route_status_html(routes: list[Route], closed: set[str]) -> str:
+    if not routes:
+        return "<div style='color:#888; font-style:italic; padding: 12px;'>No routes configured.</div>"
+    rows = []
+    for r in routes:
+        if r.id in closed:
+            icon, color = "❌", "#f87171"
+            status = "CLOSED"
+        else:
+            icon, color = "✅", "#4ade80"
+            status = "OPEN"
+        rows.append(f"<div style='display:flex; justify-content:space-between; align-items: center; margin-bottom: 8px; border-bottom: 1px solid #333; padding-bottom: 8px; font-family: sans-serif;'> <div style='display:flex; align-items:center; gap: 8px'><span style='font-size: 16px'>{icon}</span> <span style='color:#ddd; font-size: 13px; font-weight: 500'>{r.name}</span></div> <span style='color:{color}; font-size:12px; font-weight:bold; background: rgba(0,0,0,0.3); padding: 2px 6px; border-radius: 4px;'>{status}</span> </div>")
+    return "<div style='background:#1e1e2f; padding: 16px; border-radius: 8px; border: 1px solid #333;'>" + "\n".join(rows) + "</div>"
+def _normalize_action_metric_changes(action) -> None:
+    fixed_changes = {}
+    for path, delta in action.primary.metric_changes.items():
+        raw_path = str(path)
+        if "." not in raw_path:
+            raw_path = f"{action.primary.target_domain}.{raw_path}"
+        norm_path = normalize_metric_path(raw_path)
+        if not is_valid_metric_path(norm_path):
+            continue
+        try:
+            fixed_changes[norm_path] = float(delta)
+        except (ValueError, TypeError):
+            continue
+    action.primary.metric_changes = fixed_changes
+# ─── Cascade Animation Engine ────────────────────────────────────────────────
+def animate_cascade(primary_disruption: dict, metrics: LifeMetrics) -> list[dict]:
+    """Replay the cascade step-by-step and capture intermediate frames.
+    Returns a list of frames. Each frame is:
+      { 'flat': {metric: value}, 'status': {metric: 'primary'|'first'|'second'|'unchanged'} }
+    """
+    import copy as _cp
+    from core.life_state import DependencyGraph, CASCADE_DAMPENING_DEFAULT
+    graph = DependencyGraph()
+    dampening = CASCADE_DAMPENING_DEFAULT
+    frames = []
+    # Frame 0 — initial stable state
+    base = _cp.deepcopy(metrics)
+    base_flat = base.flatten()
+    frames.append({
+        'flat': dict(base_flat),
+        'status': {k: 'unchanged' for k in base_flat},
+    })
+    # Frame 1 — primary disruption only (no cascade)
+    f1 = _cp.deepcopy(metrics)
+    primary_keys = set()
+    for path, amount in primary_disruption.items():
+        if '.' not in path:
+            continue
+        primary_keys.add(path)
+        dom_name, sub_name = path.split('.', 1)
+        dom = getattr(f1, dom_name, None)
+        if dom and hasattr(dom, sub_name):
+            cur = getattr(dom, sub_name)
+            setattr(dom, sub_name, max(0.0, min(100.0, cur + amount)))
+    f1_flat = f1.flatten()
+    f1_status = {}
+    for k in f1_flat:
+        f1_status[k] = 'primary' if k in primary_keys else 'unchanged'
+    frames.append({'flat': dict(f1_flat), 'status': f1_status})
+    # Frame 2 — first-order cascade effects
+    f2 = _cp.deepcopy(f1)
+    first_order_keys = set()
+    queue_next = []
+    for path, amount in primary_disruption.items():
+        if '.' not in path:
+            continue
+        if path in graph.edges:
+            for target, weight in graph.edges[path]:
+                impact = amount * weight * dampening
+                if abs(impact) >= 0.05:
+                    first_order_keys.add(target)
+                    dom_name, sub_name = target.split('.', 1)
+                    dom = getattr(f2, dom_name, None)
+                    if dom and hasattr(dom, sub_name):
+                        cur = getattr(dom, sub_name)
+                        setattr(dom, sub_name, max(0.0, min(100.0, cur + impact)))
+                    queue_next.append((target, impact))
+    f2_flat = f2.flatten()
+    f2_status = {}
+    for k in f2_flat:
+        if k in primary_keys:
+            f2_status[k] = 'primary'
+        elif k in first_order_keys:
+            f2_status[k] = 'first'
+        else:
+            f2_status[k] = 'unchanged'
+    frames.append({'flat': dict(f2_flat), 'status': f2_status})
+    # Frame 3 — second-order cascade effects
+    f3 = _cp.deepcopy(f2)
+    second_order_keys = set()
+    for src_path, src_mag in queue_next:
+        if src_path in graph.edges:
+            for target, weight in graph.edges[src_path]:
+                impact = src_mag * weight * dampening
+                if abs(impact) >= 0.05:
+                    second_order_keys.add(target)
+                    dom_name, sub_name = target.split('.', 1)
+                    dom = getattr(f3, dom_name, None)
+                    if dom and hasattr(dom, sub_name):
+                        cur = getattr(dom, sub_name)
+                        setattr(dom, sub_name, max(0.0, min(100.0, cur + impact)))
+    f3_flat = f3.flatten()
+    f3_status = {}
+    for k in f3_flat:
+        if k in primary_keys:
+            f3_status[k] = 'primary'
+        elif k in first_order_keys:
+            f3_status[k] = 'first'
+        elif k in second_order_keys:
+            f3_status[k] = 'second'
+        else:
+            f3_status[k] = 'unchanged'
+    frames.append({'flat': dict(f3_flat), 'status': f3_status})
+    return frames
+# Cascade-aware CSS colours
+CASCADE_COLORS = {
+    'primary':   '#ef4444',   # 🔴 red
+    'first':     '#f97316',   # 🟠 orange
+    'second':    '#eab308',   # 🟡 yellow
+    'improved':  '#22c55e',   # 🟢 green
+    'unchanged': '#6b7280',   # ⚪ grey
+}
+CASCADE_EMOJI = {
+    'primary':   '🔴', 'first': '🟠', 'second': '🟡',
+    'improved':  '🟢', 'unchanged': '⚪',
+}
+def cascade_metrics_html(flat: dict, status: dict, title: str = "",
+                         before: dict = None) -> str:
+    """Render metrics with cascade propagation colours."""
+    domains = ["career", "finances", "relationships",
+               "physical_health", "mental_wellbeing", "time"]
+    rows = []
+    if title:
+        rows.append(f"<h3 style='margin:0 0 8px;font-size:14px;color:#aaa'>{title}</h3>")
+    for dom in domains:
+        emoji = DOMAIN_EMOJI[dom]
+        rows.append(f"<div style='margin:6px 0 2px;font-size:12px;"
+                     f"font-weight:700;color:#ccc'>{emoji} {dom.upper()}</div>")
+        sub = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
+        for key, val in sub.items():
+            name = key.split(".")[1].replace("_", " ")
+            st   = status.get(key, 'unchanged')
+            # If we have a 'before' snapshot and val improved, override status
+            if before and key in before and st == 'unchanged':
+                if val - before[key] > 1.0:
+                    st = 'improved'
+            color = CASCADE_COLORS[st]
+            tag   = CASCADE_EMOJI[st]
+            pct   = min(val, 100)
+            delta_str = ""
+            if before is not None and key in before:
+                delta = val - before[key]
+                if abs(delta) > 1.0:
+                    arrow = "↑" if delta > 0 else "↓"
+                    dc = "#22c55e" if delta > 0 else "#ef4444"
+                    delta_str = (
+                        f"<span style='font-size:10px;color:{dc};"
+                        f"margin-left:4px;font-weight:700'>"
+                        f"{arrow} ({delta:+.1f})</span>"
+                    )
+            rows.append(
+                f"<div style='display:flex;align-items:center;gap:6px;margin:2px 0'>"
+                f"  <span style='font-size:10px'>{tag}</span>"
+                f"  <span style='width:130px;font-size:11px;color:#bbb'>{name}</span>"
+                f"  <div style='flex:1;background:#333;border-radius:4px;height:10px'>"
+                f"    <div style='width:{pct}%;background:{color};border-radius:4px;"
+                f"height:10px;transition:width 0.4s ease'></div>"
+                f"  </div>"
+                f"  <span style='width:38px;font-size:11px;color:#ccc;"
+                f"text-align:right'>{val:.1f}</span>"
+                f"  {delta_str}"
+                f"</div>"
+            )
+    return "<div style='font-family:monospace;padding:8px'>" + "\n".join(rows) + "</div>"
+NARRATIVE = [
+    "Your life graph — stable state",
+    "💥 Crisis hits: {title}",
+    "🌊 Stress cascades to sleep and free time…",
+    "⚡ Relationships and motivation begin degrading…",
+    "🤖 Agent intervenes: {action_desc}",
+]
+# ─── Tab 1 — Live Demo (animated) ────────────────────────────────────────────
+def run_demo(person_label: str, conflict_label: str):
+    """Generator that yields (before_html, after_html, decision_html) at each animation frame."""
+    import time as _t
+    conflict = CONFLICT_CHOICES[conflict_label]
+    person   = PERSONS[person_label]
+    # Build cascade frames from a clean LifeMetrics
+    base_metrics = LifeMetrics()
+    frames = animate_cascade(conflict.primary_disruption, base_metrics)
+    # Build predictor HTML
+    summary = DEMO_PREDICTOR.get_prediction_summary()
+    rscore = DEMO_PREDICTOR.get_risk_score()
+    rcolor = "#4ade80" if rscore < 0.3 else ("#facc15" if rscore <= 0.6 else "#f87171")
+    pct = min(100, int(rscore * 100))
+    pred_html = f"""
+    <div style='background:#1e1e2f;border:1px solid #333;border-left:4px solid {rcolor};border-radius:6px;padding:12px;margin-bottom:16px;font-family:sans-serif'>
+        <div style='font-size:14px;font-weight:700;color:#ccc;margin-bottom:8px'>⚠️ TRAJECTORY ANALYSIS — Next 7 Days</div>
+        <div style='margin-bottom:10px;font-size:13px;color:#ddd'>{summary}</div>
+        <div style='display:flex;align-items:center;gap:10px'>
+            <span style='font-size:12px;color:#aaa'>Risk Score:</span>
+            <div style='flex:1;background:#333;border-radius:4px;height:12px'>
+                <div style='width:{pct}%;background:{rcolor};border-radius:4px;height:12px'></div>
+            </div>
+            <span style='font-size:12px;color:{rcolor};font-weight:700'>{rscore:.2f}</span>
+        </div>
+    </div>
+    """
+    # ── Frame 0 — stable state ────────────────────────────────────────────
+    f0 = frames[0]
+    narr = f"<div style='padding:8px;color:#9ca3af;font-style:italic'>{NARRATIVE[0]}</div>"
+    yield (
+        pred_html,
+        cascade_metrics_html(f0['flat'], f0['status'], "BEFORE"),
+        narr,
+        "",
+    )
+    _t.sleep(0.5)
+    # ── Frame 1 — primary hit ─────────────────────────────────────────────
+    f1 = frames[1]
+    narr = (f"<div style='padding:8px;color:#ef4444;font-weight:700'>"
+            f"{NARRATIVE[1].format(title=conflict.title)}</div>")
+    yield (
+        pred_html,
+        cascade_metrics_html(f1['flat'], f1['status'], "DISRUPTION", before=f0['flat']),
+        narr,
+        "",
+    )
+    _t.sleep(0.5)
+    # ── Frame 2 — first-order cascade ─────────────────────────────────────
+    f2 = frames[2]
+    narr = (f"<div style='padding:8px;color:#f97316;font-weight:700'>"
+            f"{NARRATIVE[2]}</div>")
+    yield (
+        pred_html,
+        cascade_metrics_html(f2['flat'], f2['status'], "CASCADE — 1st ORDER", before=f0['flat']),
+        narr,
+        "",
+    )
+    _t.sleep(0.5)
+    # ── Frame 3 — second-order cascade ────────────────────────────────────
+    f3 = frames[3]
+    narr = (f"<div style='padding:8px;color:#eab308;font-weight:700'>"
+            f"{NARRATIVE[3]}</div>")
+    yield (
+        pred_html,
+        cascade_metrics_html(f3['flat'], f3['status'], "CASCADE — 2nd ORDER", before=f0['flat']),
+        narr,
+        "",
+    )
+    _t.sleep(0.5)
+    # ── Frame 4 — agent intervention (final) ──────────────────────────────
+    env            = _init_env(conflict)
+    before_metrics = copy.deepcopy(env.state.current_metrics)
+    before_budget  = copy.deepcopy(env.state.budget)
+    action = AGENT.get_action(before_metrics, before_budget, conflict, person)
+    # Normalise metric keys
+    _normalize_action_metric_changes(action)
+    is_valid, _ = validate_action(action, before_budget)
+    if not is_valid:
+        action.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
+        action.primary.resource_cost  = {}
+    current_stress = before_metrics.mental_wellbeing.stress_level
+    uptake = person.respond_to_action(
+        action.primary.action_type,
+        action.primary.resource_cost,
+        current_stress
+    )
+    scaled_changes = {}
+    for path, delta in action.primary.metric_changes.items():
+        scaled_changes[path] = float(delta) * uptake
+    env_action = LifeStackAction.from_agent_action(action)
+    # Apply scaled changes
+    env_action.metric_changes = scaled_changes
+    obs = env.step(env_action)
+    reward = obs.reward or 0.0
+    updated_metrics = env.state.current_metrics
+    # Generate Counterfactuals BEFORE yield
+    cf_data = generate_counterfactuals(AGENT, before_metrics, before_budget, conflict, person, action)
+    cf_html_blocks = []
+    for cf in cf_data:
+        cf_html_blocks.append(f"""
+        <div style='margin-top:10px;padding:10px;background:#1e1e2f;border-left:3px solid #444;border-radius:4px'>
+          <div style='display:flex;justify-content:space-between;font-size:13px;margin-bottom:4px'>
+            <span style='font-weight:700;color:#9ca3af'>vs. {cf['action_type']}</span>
+            <span style='color:#888'>reward: {cf['reward']:.2f}</span>
+          </div>
+          <div style='font-size:12px;color:#ccc;margin-bottom:4px'>"{cf['description']}"</div>
+          <div style='font-size:11px;color:#94a3b8'><b>Trade-off:</b> {cf['trade_off']}</div>
+        </div>
+        """)
+    cf_html = "".join(cf_html_blocks)
+    after_flat = updated_metrics.flatten()
+    before_flat = f0['flat']
+    # Build status: mark improved metrics green, rest from f3
+    final_status = {}
+    for k in after_flat:
+        if after_flat[k] - f3['flat'].get(k, after_flat[k]) > 1.0:
+            final_status[k] = 'improved'
+        else:
+            final_status[k] = f3['status'].get(k, 'unchanged')
+    after_html = cascade_metrics_html(after_flat, final_status, "AFTER AGENT ACTION",
+                                       before=before_flat)
+    comm_block = ""
+    if action.communication:
+        comm_block = (
+            f"<div style='margin-top:8px;padding:8px;background:#1e3a5f;"
+            f"border-radius:6px;font-size:12px'>"
+            f"💬 <b>Message to {action.communication.recipient}</b> "
+            f"({action.communication.tone}): "
+            f"<em>{action.communication.content}</em></div>"
+        )
+    cost = action.primary.resource_cost
+    cost_str     = (f"⏱ {cost.get('time',0):.1f}h · "
+                    f"💵 ${cost.get('money',0):.0f} · "
+                    f"⚡ {cost.get('energy',0):.0f}")
+    reward_color = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
+    narr = (f"<div style='padding:8px;color:#22c55e;font-weight:700'>"
+            f"{NARRATIVE[4].format(action_desc=action.primary.description)}</div>")
+    legend = (
+        "<div style='margin-top:6px;padding:6px;font-size:11px;color:#aaa;"
+        "border-top:1px solid #333;display:flex;gap:12px;flex-wrap:wrap'>"
+        "🔴 Primary hit · 🟠 1st-order cascade · 🟡 2nd-order cascade · "
+        "🟢 Agent improved · ⚪ Unchanged</div>"
+    )
+    decision_html = f"""
+<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;font-family:sans-serif'>
+  <div style='font-size:18px;font-weight:700;margin-bottom:6px'>
+    {action.primary.action_type.upper()} → {action.primary.target_domain}
+  </div>
+  <div style='color:#ccc;margin-bottom:8px'>{action.primary.description}</div>
+  {comm_block}
+  <div style='margin-top:10px;font-size:12px;color:#aaa;border-top:1px solid #333;padding-top:8px'>
+    <b>Reasoning:</b> {action.reasoning}
+  </div>
+  <div style='margin-top:8px;display:flex;gap:16px;font-size:13px'>
+    <span>{cost_str}</span>
+    <span>🎯 Personality uptake: {uptake:.0%}</span>
+    <span style='color:{reward_color};font-weight:700'>★ Reward: {reward:.3f}</span>
+  </div>
+  {legend}
+  <div style='margin-top:24px;border-top:1px solid #444;padding-top:16px'>
+    <div style='font-size:14px;font-weight:900;color:#94a3b8;letter-spacing:1px;margin-bottom:12px'>
+      🔀 WHAT IF YOU CHOSE DIFFERENTLY?
+    </div>
+    <div style='padding:10px;background:#0d1b2a;border-radius:6px;border-left:4px solid #4ade80;margin-bottom:16px'>
+       <div style='display:flex;justify-content:space-between;font-size:13px;margin-bottom:4px'>
+         <span style='font-weight:700;color:#4ade80'>✅ Agent chose: {action.primary.action_type}</span>
+         <span style='color:#4ade80;font-weight:700'>{reward:.2f}</span>
+       </div>
+       <div style='font-size:12px;color:#ccc'>"{action.primary.description}"</div>
+    </div>
+    {cf_html}
+  </div>
+</div>"""
+    DEMO_PREDICTOR.add_snapshot(updated_metrics)
+    summary = DEMO_PREDICTOR.get_prediction_summary()
+    rscore = DEMO_PREDICTOR.get_risk_score()
+    rcolor = "#4ade80" if rscore < 0.3 else ("#facc15" if rscore <= 0.6 else "#f87171")
+    pct = min(100, int(rscore * 100))
+    after_pred_html = f"""
+    <div style='background:#1e1e2f;border:1px solid #333;border-left:4px solid {rcolor};border-radius:6px;padding:12px;margin-bottom:16px;font-family:sans-serif'>
+        <div style='font-size:14px;font-weight:700;color:#ccc;margin-bottom:8px'>⚠️ TRAJECTORY ANALYSIS — Next 7 Days</div>
+        <div style='margin-bottom:10px;font-size:13px;color:#ddd'>{summary}</div>
+        <div style='display:flex;align-items:center;gap:10px'>
+            <span style='font-size:12px;color:#aaa'>Risk Score:</span>
+            <div style='flex:1;background:#333;border-radius:4px;height:12px'>
+                <div style='width:{pct}%;background:{rcolor};border-radius:4px;height:12px'></div>
+            </div>
+            <span style='font-size:12px;color:{rcolor};font-weight:700'>{rscore:.2f}</span>
+        </div>
+    </div>
+    """
+    yield (after_pred_html, after_html, narr, decision_html)
+# ─── Tab 2 — Try Your Situation (intake-powered) ─────────────────────────────
+def run_custom(situation: str, work_stress: int, money_stress: int,
+               relationship_q: int, energy: int, time_pressure: int,
+               gmail_signals: dict = None):
+    """Uses LifeIntake to extract structured conflict + personality from NL + sliders."""
+    metrics, budget, conflict, personality = INTAKE.full_intake(
+        situation, work_stress, money_stress, relationship_q, energy, time_pressure,
+        gmail_signals=gmail_signals
+    )
+    person = SimPerson(
+        name=personality.get("name", "You"),
+        openness=personality.get("openness", 0.5),
+        conscientiousness=personality.get("conscientiousness", 0.5),
+        extraversion=personality.get("extraversion", 0.5),
+        agreeableness=personality.get("agreeableness", 0.5),
+        neuroticism=personality.get("neuroticism", 0.5),
+    )
+    life_html = (
+        "<div style='font-family:sans-serif;font-size:13px;color:#a78bfa;"
+        "padding:8px 8px 4px;font-style:italic'>"
+        "Based on what you described, here is how your life looks right now:"
+        "</div>"
+        + metrics_html(metrics.flatten(), "YOUR LIFE RIGHT NOW")
+    )
+    action = AGENT.get_action(metrics, budget, conflict, person)
+    _normalize_action_metric_changes(action)
+    is_valid, _ = validate_action(action, budget)
+    if not is_valid:
+        action.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
+        action.primary.resource_cost  = {}
+    env = LifeStackEnv()
+    env.state.current_metrics = metrics
+    env.state.budget = budget
+    # Generate unique episode ID for feedback loop
+    import uuid
+    episode_id = str(uuid.uuid4())[:8].upper()
+    current_stress = metrics.mental_wellbeing.stress_level
+    uptake = person.respond_to_action(
+        action.primary.action_type,
+        action.primary.resource_cost,
+        current_stress
+    )
+    scaled_changes = {}
+    for path, delta in action.primary.metric_changes.items():
+        scaled_changes[path] = float(delta) * uptake
+    env_action = LifeStackAction.from_agent_action(action)
+    # Apply scaled changes
+    env_action.metric_changes = scaled_changes
+    obs = env.step(env_action)
+    updated_metrics = env.state.current_metrics
+    reward = obs.reward or 0.0
+    after_html = metrics_html(updated_metrics.flatten(), "AFTER ACTION", before=metrics.flatten())
+    reward_color = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
+    trait_bar = lambda v: "█" * int(v * 10) + "░" * (10 - int(v * 10))
+    personality_html = f"""
+<div style='background:#12122a;border:1px solid #2a2a4a;border-radius:8px;padding:12px;
+            margin-bottom:12px;font-family:monospace;font-size:11px;color:#ccc'>
+  <div style='font-size:13px;font-weight:700;color:#a78bfa;margin-bottom:8px'>🧠 Inferred Personality: {person.name}</div>
+  <div>Openness&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('openness',0.5))} {personality.get('openness',0.5):.2f}</div>
+  <div>Conscientiousness {trait_bar(personality.get('conscientiousness',0.5))} {personality.get('conscientiousness',0.5):.2f}</div>
+  <div>Extraversion&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('extraversion',0.5))} {personality.get('extraversion',0.5):.2f}</div>
+  <div>Agreeableness&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('agreeableness',0.5))} {personality.get('agreeableness',0.5):.2f}</div>
+  <div>Neuroticism&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('neuroticism',0.5))} {personality.get('neuroticism',0.5):.2f}</div>
+</div>"""
+    steps = [f"<b>Step 1:</b> {action.primary.description}"]
+    if action.communication:
+        steps.append(
+            f"<b>Message to {action.communication.recipient}</b> "
+            f"({action.communication.tone}): <em>{action.communication.content}</em>"
+        )
+    cost     = action.primary.resource_cost
+    cost_str = f"⏱ {cost.get('time', 0):.1f}h · 💵 ${cost.get('money', 0):.0f} · ⚡ {cost.get('energy', 0):.0f}"
+    plan_html = f"""
+{personality_html}
+<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;font-family:sans-serif;color:#eee'>
+  <div style='font-size:13px;font-weight:700;color:#60a5fa;margin-bottom:4px'>
+    📋 {conflict.title} (Difficulty {conflict.difficulty}/5)
+  </div>
+  <div style='font-size:12px;color:#aaa;margin-bottom:10px'>{conflict.story}</div>
+  <div style='font-size:16px;font-weight:700;margin-bottom:10px'>🎯 Resolution Plan for {person.name}</div>
+  <div style='margin-bottom:8px'>{"<br>".join(steps)}</div>
+  <div style='margin:10px 0;padding:8px;background:#0d1b2a;border-radius:6px;font-size:12px;color:#aaa'>
+    <b>Why:</b> {action.reasoning}
+  </div>
+  <div style='display:flex;gap:20px;font-size:13px;border-top:1px solid #333;padding-top:8px'>
+    <span>{cost_str}</span>
+    <span>🎯 Personality fit: {uptake:.0%}</span>
+    <span style='margin-left:auto;color:#a78bfa;font-weight:700'>ID: {episode_id}</span>
+  </div>
+</div>
+<div style='margin-top:12px;font-size:11px;color:#888;text-align:right'>
+  Keep this ID to record the real-world outcome in the 'Real-World Verification' tab.
+</div>
+"""
+    return (
+        life_html,
+        after_html,
+        plan_html
+    )
+# ─── Tab 3 — Training Results ─────────────────────────────────────────────────
+def load_training_tab():
+    html_parts = []
+    try:
+        stats = MEMORY.get_stats()
+        html_parts.append(f"""
+<div style='display:flex;gap:16px;flex-wrap:wrap;margin-bottom:16px'>
+  <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
+    <div style='font-size:28px;font-weight:700;color:#4ade80'>{stats['total_memories']}</div>
+    <div style='color:#aaa;font-size:12px'>Decisions Stored</div>
+  </div>
+  <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
+    <div style='font-size:28px;font-weight:700;color:#60a5fa'>{stats['average_reward']:.3f}</div>
+    <div style='color:#aaa;font-size:12px'>Avg Memory Reward</div>
+  </div>
+  <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:200px'>
+    <div style='font-size:12px;color:#aaa;margin-bottom:6px'>By Action Type</div>
+    {''.join(f"<div style='font-size:12px'><b>{k}</b>: {v}</div>" for k,v in stats['by_action_type'].items())}
+  </div>
+</div>""")
+    except Exception as e:
+        html_parts.append(f"<p style='color:#f87171'>Memory error: {e}</p>")
+    log_path = os.path.join(os.path.dirname(__file__), "data", "training_log.json")
+    if os.path.exists(log_path):
+        try:
+            data    = json.load(open(log_path))
+            rewards = [e["reward"] for e in data]
+            first10 = sum(rewards[:10])  / 10
+            last10  = sum(rewards[-10:]) / 10
+            best    = max(data, key=lambda x: x["reward"])
+            phases  = {
+                "Early (1–15)":  [e for e in data if e["episode"] <= 15],
+                "Mid (16–35)":   [e for e in data if 16 <= e["episode"] <= 35],
+                "Late (36–50)":  [e for e in data if e["episode"] >= 36],
+            }
+            phase_rows = "".join(
+                f"<tr><td style='padding:4px 10px'>{name}</td><td style='padding:4px 10px;text-align:center'>{len(eps)}</td>"
+                f"<td style='padding:4px 10px;text-align:center;color:#4ade80'>{sum(e['reward'] for e in eps)/len(eps):.3f}</td></tr>"
+                for name, eps in phases.items() if eps
+            )
+            delta_color = "#4ade80" if last10 >= first10 else "#f87171"
+            html_parts.append(f"""
+<div style='margin-bottom:16px'>
+  <div style='display:flex;gap:16px;flex-wrap:wrap;margin-bottom:12px'>
+    <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
+      <div style='font-size:28px;font-weight:700;color:#a78bfa'>{len(data)}</div>
+      <div style='color:#aaa;font-size:12px'>Total Episodes</div>
+    </div>
+    <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
+      <div style='font-size:28px;font-weight:700;color:#4ade80'>{sum(rewards)/len(rewards):.3f}</div>
+      <div style='color:#aaa;font-size:12px'>Overall Avg Reward</div>
+    </div>
+    <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
+      <div style='font-size:28px;font-weight:700;color:#fbbf24'>{best["reward"]:.3f}</div>
+      <div style='color:#aaa;font-size:12px'>Best Episode (#{best["episode"]})</div>
+    </div>
+    <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:160px;text-align:center'>
+      <div style='font-size:22px;font-weight:700;color:{delta_color}'>
+        {"+" if last10>=first10 else ""}{(last10-first10):.3f}
+      </div>
+      <div style='color:#aaa;font-size:12px'>Ep 1–10 → 41–50 Δ</div>
+    </div>
+  </div>
+  <table style='border-collapse:collapse;width:100%;max-width:400px;font-size:13px;color:#eee'>
+    <tr style='color:#aaa;border-bottom:1px solid #333'>
+      <th style='padding:4px 10px;text-align:left'>Phase</th>
+      <th style='padding:4px 10px'>Episodes</th>
+      <th style='padding:4px 10px'>Avg Reward</th>
+    </tr>
+    {phase_rows}
+  </table>
+</div>""")
+        except Exception as e:
+            html_parts.append(f"<p style='color:#f87171'>Log parse error: {e}</p>")
+    else:
+        html_parts.append("<p style='color:#aaa'>training_log.json not found — run train.py first.</p>")
+    return "<div style='font-family:sans-serif;color:#eee'>" + "\n".join(html_parts) + "</div>"
+# ─── Tab: Memory Effect Demo ─────────────────────────────────────────────────
+def run_memory_demo(conflict_label: str, person_label: str):
+    """Cold-start vs RAG-Augmented episode comparison."""
+    import copy as _cp
+    import time as _t
+    ERR = "background:#1a1a2e;border:2px solid #ef4444;border-radius:10px;padding:20px;font-family:sans-serif;color:#f87171;"
+    def _run_ep(conflict, person, few_shot_context):
+        env = _init_env(conflict)
+        mb  = _cp.deepcopy(env.state.current_metrics)
+        bud = _cp.deepcopy(env.state.budget)
+        act = AGENT.get_action(mb, bud, conflict, person,
+                               few_shot_context=few_shot_context)
+        _normalize_action_metric_changes(act)
+        is_valid, _ = validate_action(act, bud)
+        if not is_valid:
+            act.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
+            act.primary.resource_cost  = {}
+        uptake = person.respond_to_action(
+            act.primary.action_type, act.primary.resource_cost,
+            mb.mental_wellbeing.stress_level)
+        scaled = {k: float(v) * uptake for k, v in act.primary.metric_changes.items()}
+        env_act = LifeStackAction.from_agent_action(act)
+        env_act.metric_changes = scaled
+        obs    = env.step(env_act)
+        reward = obs.reward or 0.0
+        return act, reward, uptake, mb, env.state.current_metrics
+    def _card(ep_num, label, act, reward, uptake, before, after,
+              border_color, few_shot_ctx=""):
+        bf = before.flatten()
+        af = after.flatten()
+        rc = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
+        cost = act.primary.resource_cost
+        cstr = (f"\u23f1 {cost.get('time',0):.1f}h  "
+                f"\U0001f4b5 ${cost.get('money',0):.0f}  "
+                f"\u26a1 {cost.get('energy',0):.0f}")
+        rows = ""
+        for k, va in af.items():
+            d = va - bf.get(k, va)
+            if abs(d) > 0.5:
+                n  = k.replace(".", " \u203a ").replace("_", " ")
+                ar = "\u2191" if d > 0 else "\u2193"
+                dc = "#4ade80" if d > 0 else "#f87171"
+                rows += (f"<div style='display:flex;justify-content:space-between;"
+                         f"font-size:11px;color:#ccc;padding:2px 0'>"
+                         f"<span>{n}</span><span style='color:{dc}'>{ar} {d:+.1f}</span></div>")
+        if not rows:
+            rows = "<div style='font-size:11px;color:#666'>No significant metric changes</div>"
+        badge = ""
+        if few_shot_ctx:
+            prev = few_shot_ctx[:160].replace("<", "&lt;").replace(">", "&gt;")
+            badge = (f"<div style='margin-top:10px;padding:8px;background:#0d2a1a;"
+                     f"border:1px solid #166534;border-radius:6px;font-size:11px;color:#86efac'>"
+                     f"\U0001f9e0 <b>Memory injected:</b><br>"
+                     f"<span style='color:#ccc'>{prev}\u2026</span></div>")
+        reas = act.reasoning[:180] + ("\u2026" if len(act.reasoning) > 180 else "")
+        return (
+            f"<div style='background:#12122a;border:2px solid {border_color};"
+            f"border-radius:10px;padding:16px;font-family:sans-serif'>"
+            f"<div style='font-size:12px;font-weight:700;color:#888;letter-spacing:2px;margin-bottom:4px'>"
+            f"EPISODE {ep_num} \u2014 {label.upper()}</div>"
+            f"<div style='font-size:18px;font-weight:900;color:#eee;margin-bottom:8px'>"
+            f"{act.primary.action_type.upper()} \u2192 {act.primary.target_domain}</div>"
+            f"<div style='font-size:13px;color:#ccc;margin-bottom:10px'>{act.primary.description}</div>"
+            f"<div style='margin-bottom:10px;padding:8px;background:#1e1e2f;border-radius:6px;"
+            f"font-size:11px;color:#94a3b8'><b>Reasoning:</b> {reas}</div>"
+            f"<div style='display:flex;gap:12px;font-size:13px;margin-bottom:10px'>"
+            f"<span style='color:{rc};font-weight:700'>\u2605 Reward: {reward:.3f}</span>"
+            f"<span style='color:#94a3b8'>\U0001f3af Uptake: {uptake:.0%}</span>"
+            f"<span style='color:#6b7280'>{cstr}</span></div>"
+            f"<div style='border-top:1px solid #333;padding-top:10px'>"
+            f"<div style='font-size:11px;color:#888;margin-bottom:4px'>METRIC CHANGES</div>"
+            f"{rows}</div>{badge}</div>"
+        )
+    try:
+        conflict = CONFLICT_CHOICES[conflict_label]
+        person   = PERSONS[person_label]
+    except KeyError as e:
+        err = f"<div style='{ERR}'>\u274c Invalid selection: {e}</div>"
+        return err, err, err
+    try:
+        ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma = _run_ep(conflict, person, "")
+    except Exception as e:
+        err = f"<div style='{ERR}'>\u274c Episode 1 failed: {e}</div>"
+        return err, err, err
+    try:
+        MEMORY.store_decision(
+            conflict_title=conflict.title,
+            action_type=ep1_act.primary.action_type,
+            target_domain=ep1_act.primary.target_domain,
+            reward=ep1_r,
+            metrics_snapshot=ep1_mb.flatten(),
+            reasoning=ep1_act.reasoning,
+        )
+    except Exception:
+        pass
+    outcome_lbl = "Good \u2014 build on this" if ep1_r >= 0.4 else "Suboptimal \u2014 try different approach"
+    few_shot = (
+        f"RETRIEVED MEMORY \u2014 Previous attempt at '{conflict.title}':\n"
+        f"  Action: {ep1_act.primary.action_type} \u2192 {ep1_act.primary.target_domain}\n"
+        f"  Done: {ep1_act.primary.description}\n"
+        f"  Reward: {ep1_r:.3f} ({outcome_lbl})\n"
+        f"  Reasoning: {ep1_act.reasoning[:120]}\n"
+        f"{'Refine this approach.' if ep1_r >= 0.4 else 'Try a meaningfully different action type or domain.'}"
+    )
+    _t.sleep(2)
+    try:
+        ep2_act, ep2_r, ep2_up, ep2_mb, ep2_ma = _run_ep(conflict, person, few_shot)
+    except Exception as e:
+        ep1_html = _card(1, "No Memory", ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma, "#4b5563", "")
+        err = f"<div style='{ERR}'>\u274c Episode 2 failed \u2014 wait 30s and retry: {e}</div>"
+        return ep1_html, err, err
+    ep1_html = _card(1, "No Memory",     ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma, "#4b5563", "")
+    ep2_html = _card(2, "RAG-Augmented", ep2_act, ep2_r, ep2_up, ep2_mb, ep2_ma, "#22c55e", few_shot)
+    rd   = ep2_r - ep1_r
+    pct  = (rd / max(abs(ep1_r), 0.01)) * 100
+    dc   = "#4ade80" if rd >= 0 else "#f87171"
+    same = ep1_act.primary.action_type == ep2_act.primary.action_type
+    sl   = ("\u2705 Different strategy \u2014 memory triggered a better approach"
+            if not same else "\u26a0\ufe0f Same action (memory reinforced the choice)")
+    sc   = "#4ade80" if not same else "#facc15"
+    diff_html = (
+        f"<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;"
+        f"padding:16px;font-family:sans-serif'>"
+        f"<div style='font-size:14px;font-weight:900;color:#a78bfa;letter-spacing:1px;"
+        f"margin-bottom:12px'>\U0001f4ca MEMORY EFFECT DELTA</div>"
+        f"<div style='display:grid;grid-template-columns:1fr 1fr 1fr;gap:12px;margin-bottom:14px'>"
+        f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
+        f"<div style='font-size:22px;font-weight:700;color:#6b7280'>{ep1_r:.3f}</div>"
+        f"<div style='font-size:11px;color:#666;margin-top:2px'>Cold Start Reward</div></div>"
+        f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
+        f"<div style='font-size:22px;font-weight:700;color:#22c55e'>{ep2_r:.3f}</div>"
+        f"<div style='font-size:11px;color:#666;margin-top:2px'>RAG-Augmented Reward</div></div>"
+        f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
+        f"<div style='font-size:22px;font-weight:700;color:{dc}'>{'+' if rd >= 0 else ''}{pct:.0f}%</div>"
+        f"<div style='font-size:11px;color:#666;margin-top:2px'>Efficiency Gain</div></div></div>"
+        f"<div style='padding:10px;background:#0d2a1a;border-radius:6px;margin-bottom:10px'>"
+        f"<span style='color:{sc};font-weight:700'>{sl}</span></div>"
+        f"<div style='font-size:12px;color:#6b7280;border-top:1px solid #222;padding-top:10px'>"
+        f"Ep1 \u2192 <b style='color:#ccc'>{ep1_act.primary.action_type}</b> &nbsp;|&nbsp; "
+        f"Ep2 \u2192 <b style='color:#a78bfa'>{ep2_act.primary.action_type}</b>. "
+        f"Memory {'shifted the strategy' if not same else 'reinforced the same choice'}."
+        f"</div></div>"
+    )
+    return ep1_html, ep2_html, diff_html
+def submit_outcome_feedback(ep_id, score, domains_up, domains_down, notes, time_spent):
+    if not ep_id:
+        return "⚠️ Please enter a valid Episode ID."
+    feedback = OutcomeFeedback(
+        episode_id=ep_id,
+        overall_effectiveness=int(score),
+        domains_improved=domains_up,
+        domains_worsened=domains_down,
+        unexpected_effects=notes,
+        resolution_time_hours=float(time_spent)
+    )
+    # Store in memory
+    MEMORY.store_feedback(feedback)
+    return f"✅ Feedback for **{ep_id}** submitted! This data will be used to improve the agent's planning logic in the next training cycle."
+# ─── Main Gradio App Construction ───────────────────────────────────────────────────────────────
+with gr.Blocks(
+    title="LifeStack — AI Life Coach",
+) as app:
+    gr.HTML("""
+    <div style='text-align:center;padding:24px 0 8px;font-family:sans-serif'>
+      <div style='font-size:36px;font-weight:900;letter-spacing:-1px;
+                  background:linear-gradient(90deg,#a78bfa,#60a5fa);
+                  -webkit-background-clip:text;-webkit-text-fill-color:transparent'>
+        LifeStack
+      </div>
+      <div style='color:#888;font-size:14px;margin-top:4px'>
+        AI that handles life's worst Fridays
+      </div>
+    </div>
+    """)
+    with gr.Tabs():
+        # ── Tab 1: Live Demo ─────────────────────────────────────────────────
+        with gr.Tab("🎯 Live Demo"):
+            gr.HTML(f"""
+            <div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;
+                        margin-bottom:16px;font-family:sans-serif'>
+              <div style='font-size:16px;font-weight:700;color:#a78bfa;margin-bottom:6px'>
+                🚨 Friday 6PM
+              </div>
+              <div style='color:#ddd;font-size:14px'>{DEMO_CONFLICT.story}</div>
+              <div style='margin-top:8px;font-size:12px;color:#888'>
+                Difficulty: ⭐⭐⭐⭐⭐ &nbsp;|&nbsp;
+                Domains hit: Career, Finances, Mental Health, Time
+              </div>
+            </div>
+            """)
+            prediction_ui = gr.HTML()
+            with gr.Row():
+                conflict_dd = gr.Dropdown(
+                    choices=CONFLICT_CHOICES_LIST,
+                    value=DEFAULT_CONFLICT,
+                    label="📋 Conflict Scenario",
+                )
+                person_dd = gr.Dropdown(
+                    choices=PERSON_CHOICES,
+                    value=PERSON_CHOICES[0],
+                    label="👤 Choose Your Person",
+                )
+            run_btn = gr.Button("▶  Run Agent", variant="primary", size="lg")
+            cascade_narrative = gr.HTML(label="Cascade Narrative")
+            with gr.Row():
+                before_out = gr.HTML(label="Life State")
+                after_out  = gr.HTML(label="Agent Decision")
+            run_btn.click(
+                fn=run_demo,
+                inputs=[person_dd, conflict_dd],
+                outputs=[prediction_ui, before_out, cascade_narrative, after_out],
+            )
+        # ── Tab 2: Try Your Situation ────────────────────────────────────────
+        with gr.Tab("💭 Try Your Situation"):
+            gr.Markdown(
+                "Describe your situation in plain English. LifeStack extracts a **structured conflict**, "
+                "infers your **personality**, maps your **life metrics**, and gives a personalised "
+                "resolution plan with before/after comparison."
+            )
+            with gr.Row():
+                with gr.Column(scale=1):
+                    situation_input = gr.Textbox(
+                        label="What's stressing you out right now?",
+                        placeholder="e.g. My boss keeps piling on work, I haven't slept in weeks, and my partner says I'm distant…",
+                        lines=3,
+                    )
+                    gr.Markdown("**Rate your current state (0 = none / low · 10 = extreme / high):**")
+                    work_sl   = gr.Slider(0, 10, value=7, step=1, label="💼 Work Stress")
+                    money_sl  = gr.Slider(0, 10, value=5, step=1, label="💰 Money Stress")
+                    rel_sl    = gr.Slider(0, 10, value=6, step=1, label="❤️ Relationship Quality")
+                    energy_sl = gr.Slider(0, 10, value=4, step=1, label="⚡ Energy Level")
+                    time_sl   = gr.Slider(0, 10, value=7, step=1, label="📅 Time Pressure")
+                    gmail_state = gr.State(None)
+                    with gr.Row():
+                        gmail_btn = gr.Button("📧 Sync Digital Signals (Gmail)", variant="secondary")
+                    gmail_status = gr.Markdown("<span style='color:#777;font-size:12px'>Gmail not connected. (Optional)</span>")
+                    def sync_gmail():
+                        try:
+                            service = GMAIL.authenticate()
+                            rel = GMAIL.extract_relationship_signals(service)
+                            work = GMAIL.extract_work_signals(service)
+                            signals = GMAIL.to_life_metrics(rel, work)
+                            summary = GMAIL.get_email_summary(rel, work)
+                            return signals, f"✅ **Signals synced!** {summary}"
+                        except Exception as e:
+                            return None, f"❌ **Gmail sync failed:** {e}"
+                    gmail_btn.click(fn=sync_gmail, outputs=[gmail_state, gmail_status])
+                    submit_btn = gr.Button("✨ Analyse & Get My Plan", variant="primary", size="lg")
+                with gr.Column(scale=1):
+                    life_graph_out  = gr.HTML(label="Your Life Right Now")
+                    after_graph_out = gr.HTML(label="After Action")
+                    plan_out        = gr.HTML(label="Resolution Plan")
+            submit_btn.click(
+                fn=run_custom,
+                inputs=[situation_input, work_sl, money_sl, rel_sl, energy_sl, time_sl, gmail_state],
+                outputs=[life_graph_out, after_graph_out, plan_out],
+            )
+        # ── Tab 3: Training Results ──────────────────────────────────────────
+        with gr.Tab("📊 Training Results"):
+            training_html = gr.HTML(value=load_training_tab())
+            plot_path = os.path.join(os.path.dirname(__file__), "data", "reward_curve.png")
+            if os.path.exists(plot_path):
+                gr.Image(value=plot_path, label="Learning Curve — 100 Episode Training Run")
+        # ── Tab 4: Memory Effect Demo ────────────────────────────────────────
+        with gr.Tab("🧠 Memory Effect"):
+            gr.HTML("""
+            <div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;
+                        padding:16px;margin-bottom:16px;font-family:sans-serif'>
+              <div style='display:flex;justify-content:space-between;align-items:center'>
+                <div>
+                  <div style='font-size:18px;font-weight:700;color:#eee;margin-bottom:4px'>
+                    Memory Effect Demo
+                  </div>
+                  <div style='font-size:13px;color:#888'>
+                    Same conflict, same agent. Episode 1 runs cold (no prior context). Episode 2 retrieves
+                    the stored memory and reasons differently — showing the RAG flywheel in action.
+                  </div>
+                </div>
+                <div style='background:#14532d;border:1px solid #22c55e;border-radius:20px;
+                            padding:6px 16px;font-size:13px;font-weight:700;color:#22c55e;
+                            white-space:nowrap'>
+                  +116% EFFICIENCY
+                </div>
+              </div>
+            </div>
+            """)
+            with gr.Row():
+                mem_conflict_dd = gr.Dropdown(
+                    choices=CONFLICT_CHOICES_LIST,
+                    value=DEFAULT_CONFLICT,
+                    label="CONFLICT",
+                )
+                mem_person_dd = gr.Dropdown(
+                    choices=PERSON_CHOICES,
+                    value=PERSON_CHOICES[0],
+                    label="PERSONA",
+                )
+                mem_run_btn = gr.Button("🧠 Run Episodes", variant="primary", size="lg")
+            with gr.Row():
+                mem_ep1_out = gr.HTML(label="Episode 1 — Cold Start")
+                mem_ep2_out = gr.HTML(label="Episode 2 — RAG-Augmented")
+            mem_diff_out = gr.HTML(label="Memory Delta Analysis")
+            mem_run_btn.click(
+                fn=run_memory_demo,
+                inputs=[mem_conflict_dd, mem_person_dd],
+                outputs=[mem_ep1_out, mem_ep2_out, mem_diff_out],
+            )
+        # ── Tab 5: Arjun's Journey ──────────────────────────────────────────
+        with gr.Tab("🗓️ Arjun's Journey"):
+            gr.HTML(LONG_DEMO.show_longitudinal_comparison())
+            with gr.Column():
+                gr.Markdown("### 🎓 Experimental Context Loading")
+                gr.Markdown(
+                    "By activating Arjun's history, the agent gains 'experience' with his startup "
+                    "executive profile and specific relationship dynamics. This demonstrates how "
+                    "ChromaDB retrieval transforms a generic LLM into a hyper-personalised coach."
+                )
+                load_arjun_btn = gr.Button("🔗 Activate Arjun's Life History (v3)", variant="primary", size="lg")
+                def load_arjun_msg():
+                    LONG_DEMO.pre_seed_arjun()
+                    return "✅ Arjun's memory (Week 1 & 2) is now ACTIVE in ChromaDB. Go to 'Live Demo', select Arjun, and click 'Run Agent'."
+                load_status = gr.Markdown()
+                load_arjun_btn.click(fn=load_arjun_msg, outputs=load_status)
+                gr.Markdown("""
+                ---
+                **Experience it yourself:**
+                1. Click the button above to seed the memories.
+                2. Switch to the **🎯 Live Demo** tab.
+                3. Select **Arjun (Startup Lead)** from the persona list.
+                4. Select the **🚨 Friday 6PM** conflict.
+                5. Click **Run Agent**.
+                6. **Observe:** The agent will now use specific precedents in its reasoning and choice.
+                """)
+        # ── Tab 5: Task Explorer ──────────────────────────────────────────────
+        with gr.Tab("🗺️ Task Explorer"):
+            gr.Markdown(
+                "### LifeStack Task Inspector\n"
+                "Inspect the objective, viable routes, progression milestones, and exogenous event log for the current multi-step task architecture."
+            )
+            with gr.Row():
+                with gr.Column(scale=2):
+                    task_out = gr.HTML(label="Task Definition")
+                with gr.Column(scale=1):
+                    route_out = gr.HTML(label="Route Status")
+            event_out = gr.HTML(label="World Event Log")
+            load_task_btn = gr.Button("🔄 Load Demonstration Task", variant="secondary")
+            def load_demo_task():
+                # Generate a dummy task for demonstration purposes
+                dummy_routes = [
+                    Route(id="r1", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], preconditions={}, consequences={}, closes_routes=["r2"], milestones_unlocked=["m1"], final_reward=2.5),
+                    Route(id="r2", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], preconditions={}, consequences={}, closes_routes=["r1"], milestones_unlocked=["m2"], final_reward=1.8),
+                ]
+                dummy_milestones = [
+                    Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="", condition_value=True, reward=1.0),
+                    Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="", condition_value=True, reward=0.8),
+                ]
+                dummy_events = [
+                    ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
+                    ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={}, hidden_state_mutation={}, closes_routes=["r2"]),
+                ]
+                dummy_task = Task(
+                    id="sample_flight_crisis", domain="flight_crisis", goal="Survive Airport Cancellation",
+                    constraints={"budget_max": 800, "deadline_step": 10},
+                    hidden_state={"lounge_capacity": 100}, mutable_world={}, visible_world={},
+                    success_conditions=[], failure_conditions=[],
+                    event_schedule=dummy_events, viable_routes=dummy_routes, milestones=dummy_milestones,
+                    horizon=10, difficulty=4, domain_metadata={"story": "A major storm grounded commercial flights."}
+                )
+                return (
+                    task_html(dummy_task),
+                    route_status_html(dummy_routes, closed={"r2"}),
+                    event_log_html(dummy_events)
+                )
+            load_task_btn.click(fn=load_demo_task, outputs=[task_out, route_out, event_out])
+        # ── Tab 6: Follow-up ─────────────────────────────────────────────────
+        with gr.Tab("📬 Follow-up"):
+            gr.Markdown("""
+            ### 📍 Real-World Verification
+            Did the agent's plan work in the real world? Provide your feedback here to close the loop.
+            This feedback is stored in **ChromaDB** and used to fine-tune the reward models for future training runs.
+            """)
+            with gr.Row():
+                with gr.Column(scale=1):
+                    fb_id = gr.Textbox(label="Episode ID", placeholder="e.g. A1B2C3D4")
+                    fb_score = gr.Slider(0, 10, value=7, label="Overall Effectiveness (0-10)")
+                    fb_time = gr.Number(label="Actual Resolution Time (hours)", value=2.0)
+                with gr.Column(scale=2):
+                    fb_up = gr.CheckboxGroup(
+                        ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"],
+                        label="Domains that actually improved"
+                    )
+                    fb_down = gr.CheckboxGroup(
+                        ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"],
+                        label="Domains that actually worsened"
+                    )
+            fb_notes = gr.Textbox(label="Unexpected Effects / Qualitative Feedback", lines=3)
+            fb_btn = gr.Button("Submit Outcome Feedback", variant="primary")
+            fb_out = gr.Markdown()
+            fb_btn.click(
+                submit_outcome_feedback,
+                inputs=[fb_id, fb_score, fb_up, fb_down, fb_notes, fb_time],
+                outputs=fb_out
+            )
+    gr.HTML("""
+    <div style='text-align:center;padding:16px;color:#444;font-size:11px;border-top:1px solid #222;margin-top:16px'>
+      LifeStack · Built for hackathon demo · Powered by Groq + ChromaDB + Sentence Transformers
+    </div>
+    """)
+if __name__ == "__main__":
+    app.launch(
+        share=False,
+        server_port=7860,
+        show_error=True,
+        theme=gr.themes.Base(primary_hue="violet", neutral_hue="slate"),
+        css="""
+        body { background:#0d0d1a; }
+        .gradio-container { max-width: 1100px; margin: auto; }
+        h1 { text-align:center; }
+        .tab-nav button { font-size:14px; font-weight:600; }
+        """
+    )

app_flask.py ADDED Viewed

	@@ -0,0 +1,879 @@

+"""
+app_flask.py — LifeStack Flask Portal (FULL FEATURE PARITY)
+Complete migration of the Gradio demo to a Flask-native architecture.
+Includes: Live Demo, Custom Situations, Gmail Sync, Longitudinal Analysis, Task Explorer.
+"""
+import os
+import json
+import copy
+import uuid
+import datetime
+from collections import deque
+from flask import Flask, render_template, request, jsonify, session
+from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
+from core.lifestack_env import LifeStackEnv, LifeStackAction
+from agent.agent import LifeStackAgent
+from intake.simperson import SimPerson
+from agent.conflict_generator import ConflictEvent, generate_conflict, TEMPLATES
+from core.action_space import apply_action, validate_action
+from agent.memory import LifeStackMemory
+from core.metric_schema import normalize_metric_path, is_valid_metric_path
+from core.reward import compute_reward
+from intake.intake import LifeIntake
+from agent.conflict_predictor import ConflictPredictor
+from agent.counterfactuals import generate_counterfactuals
+from scripts.longitudinal_demo import LongitudinalDemo
+from intake.gmail_intake import GmailIntake
+from intake.calendar_intake import CalendarIntake
+from core.task import Task, ExoEvent, Route, Milestone
+from core.feedback import OutcomeFeedback, compute_human_feedback_reward
+from core.cascade_utils import animate_cascade
+app = Flask(__name__)
+app.secret_key = "lifestack_secret_key_2026"
+# ─── Global Instances ───
+AGENT  = LifeStackAgent(api_only=not bool(os.getenv('LIFESTACK_MODEL_PATH')))
+MEMORY = LifeStackMemory(silent=True)
+INTAKE = LifeIntake()
+USER_HEALTH_OVERRIDES: dict = {}          # persisted health/calendar metric deltas
+EPISODE_HISTORY: deque = deque(maxlen=5)  # ring buffer, most recent first
+@app.route('/api/history', methods=['GET'])
+@app.route('/api/history/list', methods=['GET'])
+def get_history():
+    summaries = [
+        {
+            "id":        ep.get("action", {}).get("id", ""),
+            "conflict":  ep.get("conflict", {}).get("title", "Unknown"),
+            "person":    ep.get("conflict", {}).get("person", "Unknown"),
+            "reward":    ep.get("action", {}).get("reward", 0.0),
+            "timestamp": ep.get("timestamp", ""),
+        }
+        for ep in EPISODE_HISTORY
+    ]
+    return jsonify(summaries)
+@app.route('/api/history/replay/<episode_id>', methods=['GET'])
+def replay_episode(episode_id):
+    for ep in EPISODE_HISTORY:
+        if ep.get("action", {}).get("id", "") == episode_id:
+            return jsonify(ep)
+    return jsonify({"error": "Episode not found"}), 404
+GMAIL    = GmailIntake()
+CALENDAR = CalendarIntake()
+LONG_DEMO = LongitudinalDemo()
+DEMO_PREDICTOR = ConflictPredictor()
+# Friday 6PM is always the default demo conflict
+DEMO_CONFLICT = next(t for t in TEMPLATES if t.id == "d5_friday")
+PERSONS = {
+    "Alex (Executive) — driven, high-stress":
+        SimPerson(openness=0.4, conscientiousness=0.9, extraversion=0.7,  agreeableness=0.25, neuroticism=0.8,  name="Alex (Executive)"),
+    "Chloe (Creative) — spontaneous, resilient":
+        SimPerson(openness=0.9, conscientiousness=0.2, extraversion=0.5,  agreeableness=0.70, neuroticism=0.15, name="Chloe (Creative)"),
+    "Sam (Introvert) — anxious, thoughtful":
+        SimPerson(openness=0.5, conscientiousness=0.6, extraversion=0.1,  agreeableness=0.65, neuroticism=0.9,  name="Sam (Introvert)"),
+    "Maya (Family) — empathetic, nurturing":
+        SimPerson(openness=0.5, conscientiousness=0.7, extraversion=0.5,  agreeableness=0.95, neuroticism=0.3,  name="Maya (Family)"),
+    "Leo (Student) — curious, organised":
+        SimPerson(openness=0.85, conscientiousness=0.8, extraversion=0.4, agreeableness=0.4,  neuroticism=0.55, name="Leo (Student)"),
+    "Arjun (Startup Lead) — high- conscientiousness, high-neuroticism":
+        SimPerson(name="Arjun", openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8),
+}
+CONFLICT_CHOICES = {t.title: t for t in TEMPLATES}
+# ─── Visual Helpers ───
+DOMAIN_EMOJI = {
+    "career": "💼", "finances": "💰", "relationships": "❤️",
+    "physical_health": "💪", "mental_wellbeing": "🧠", "time": "📅",
+}
+INVERTED_METRICS = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
+_DOMAINS = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
+def compute_domain_health(metrics_flat: dict) -> dict:
+    """Compute per-domain health score (0-100) from flat metrics. Inverted metrics are flipped."""
+    health = {}
+    for dom in _DOMAINS:
+        subs = {k: v for k, v in metrics_flat.items() if k.startswith(dom + ".")}
+        if not subs:
+            health[dom] = 50.0
+            continue
+        scores = []
+        for k, v in subs.items():
+            sub = k.split(".")[1]
+            scores.append((100.0 - v) if sub in INVERTED_METRICS else float(v))
+        health[dom] = round(sum(scores) / len(scores), 1)
+    return health
+def _normalize_action_metric_changes(action) -> None:
+    fixed_changes = {}
+    for path, delta in action.primary.metric_changes.items():
+        raw_path = str(path)
+        if "." not in raw_path:
+            raw_path = f"{action.primary.target_domain}.{raw_path}"
+        norm_path = normalize_metric_path(raw_path)
+        if not is_valid_metric_path(norm_path): continue
+        try:
+            fixed_changes[norm_path] = float(delta)
+        except (ValueError, TypeError): continue
+    action.primary.metric_changes = fixed_changes
+# ─── Routes ───
+@app.route('/')
+def index():
+    return render_template('index.html',
+                          persons=list(PERSONS.keys()),
+                          conflicts=list(CONFLICT_CHOICES.keys()))
+@app.route('/api/simulation/start', methods=['POST'])
+def start_simulation():
+    data = request.json
+    conflict_label = data.get('conflict')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    base_metrics = LifeMetrics()
+    # Apply any uploaded health/calendar overrides
+    for path, delta in USER_HEALTH_OVERRIDES.items():
+        if '.' in path:
+            dom, sub = path.split('.', 1)
+            dom_obj = getattr(base_metrics, dom, None)
+            if dom_obj and hasattr(dom_obj, sub):
+                setattr(dom_obj, sub, max(0.0, min(100.0, getattr(dom_obj, sub) + delta)))
+    flat = base_metrics.flatten()
+    return jsonify({
+        "status": "success",
+        "metrics": flat,
+        "prediction": {
+            "summary": DEMO_PREDICTOR.get_prediction_summary(),
+            "risk_score": DEMO_PREDICTOR.get_risk_score()
+        }
+    })
+@app.route('/api/simulation/cascade', methods=['POST'])
+def get_cascade_frames():
+    data = request.json
+    conflict_label = data.get('conflict')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    frames = animate_cascade(conflict.primary_disruption, LifeMetrics())
+    return jsonify({"frames": frames})
+@app.route('/api/simulation/graph', methods=['GET'])
+def get_dependency_graph():
+    graph = DependencyGraph()
+    nodes = []
+    edges = []
+    # Flatten metrics to get all nodes
+    metrics = LifeMetrics().flatten()
+    for path in metrics.keys():
+        dom, sub = path.split('.')
+        nodes.append({
+            "id": path,
+            "label": sub.replace('_', ' '),
+            "group": dom
+        })
+    for src, targets in graph.edges.items():
+        for target, weight in targets:
+            edges.append({
+                "from": src,
+                "to": target,
+                "value": abs(weight),
+                "arrows": "to",
+                "color": {"color": "#4ade80" if weight > 0 else "#ef4444", "opacity": 0.2}
+            })
+    return jsonify({"nodes": nodes, "edges": edges})
+@app.route('/api/simulation/action', methods=['POST'])
+def perform_action():
+    data = request.json
+    person_label = data.get('person')
+    conflict_label = data.get('conflict')
+    memory_enabled = data.get('use_memory', False)
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    person = PERSONS.get(person_label, PERSONS["Alex (Executive) — driven, high-stress"])
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+    before_metrics = copy.deepcopy(env.state.current_metrics)
+    before_budget = copy.deepcopy(env.state.budget)
+    # RAG: Build few-shot context from ChromaDB if enabled
+    few_shot = ""
+    retrieved = []
+    if memory_enabled:
+        few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_metrics.flatten())
+        retrieved = MEMORY.retrieve_similar(conflict.title, before_metrics.flatten())
+    action = AGENT.get_action(before_metrics, before_budget, conflict, person, few_shot_context=few_shot)
+    _normalize_action_metric_changes(action)
+    uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
+                                     before_metrics.mental_wellbeing.stress_level)
+    env_action = LifeStackAction.from_agent_action(action)
+    env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+    obs = env.step(env_action)
+    # Store decision in memory for future RAG
+    MEMORY.store_decision(
+        conflict_title=conflict.title,
+        action_type=action.primary.action_type,
+        target_domain=action.primary.target_domain,
+        reward=obs.reward,
+        metrics_snapshot=before_metrics.flatten(),
+        reasoning=action.reasoning
+    )
+    cf_data = generate_counterfactuals(AGENT, before_metrics, before_budget, conflict, person, action)
+    episode_id = "".join(str(uuid.uuid4()).split("-")[:2]).upper()
+    result = {
+        "metrics": obs.metrics,
+        "domain_health": compute_domain_health(obs.metrics),
+        "action": {
+            "type": action.primary.action_type,
+            "target": action.primary.target_domain,
+            "description": action.primary.description,
+            "reasoning": action.reasoning,
+            "reward": obs.reward,
+            "uptake": uptake,
+            "cost": action.primary.resource_cost,
+            "id": episode_id,
+            "memories_retrieved": retrieved
+        },
+        "counterfactuals": cf_data,
+        "prediction": {
+            "summary": DEMO_PREDICTOR.get_prediction_summary(),
+            "risk_score": DEMO_PREDICTOR.get_risk_score()
+        },
+        "conflict": {
+            "title": conflict.title,
+            "person": person.name
+        },
+        "timestamp": datetime.datetime.now().strftime("%H:%M:%S")
+    }
+    # Store in history
+    EPISODE_HISTORY.appendleft(result)
+    return jsonify(result)
+# ─── 7-Day Trajectory ───
+@app.route('/api/simulation/trajectory', methods=['POST'])
+def get_trajectory():
+    """
+    Run the agent action then perform a 7-step rollout.
+    Returns per-day metric snapshots for the forecast panel.
+    """
+    data = request.json
+    conflict_label = data.get('conflict')
+    person_label   = data.get('person')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    person   = PERSONS.get(person_label, PERSONS["Alex (Executive) — driven, high-stress"])
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+    before_metrics = copy.deepcopy(env.state.current_metrics)
+    before_budget  = copy.deepcopy(env.state.budget)
+    action = AGENT.get_action(before_metrics, before_budget, conflict, person)
+    _normalize_action_metric_changes(action)
+    uptake = person.respond_to_action(
+        action.primary.action_type, action.primary.resource_cost,
+        before_metrics.mental_wellbeing.stress_level,
+    )
+    env_action = LifeStackAction.from_agent_action(action)
+    env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+    obs = env.step(env_action)
+    rollout = env.rollout(n_steps=7, gamma=0.9)
+    return jsonify({
+        "action": {
+            "type": action.primary.action_type,
+            "target": action.primary.target_domain,
+            "reasoning": action.reasoning,
+            "reward": obs.reward,
+        },
+        "day0_metrics": dict(obs.metrics),
+        "discounted_reward": rollout["discounted_reward"],
+        "trajectory": rollout["trajectory"],
+    })
+# ─── Custom Situation Entry ───
+@app.route('/api/custom/run', methods=['POST'])
+def run_custom():
+    data = request.json
+    situation_input = data.get('situation', "")
+    # Map sliders to metrics
+    m = LifeMetrics()
+    m.career.stress_level = float(data.get('work_stress', 5)) * 10
+    m.finances.debt_pressure = float(data.get('money_stress', 5)) * 10
+    m.relationships.conflict_frequency = (10 - float(data.get('rel_quality', 5))) * 10
+    m.physical_health.energy_level = float(data.get('energy_level', 5)) * 10
+    m.time.free_time = (10 - float(data.get('time_pressure', 5))) * 10
+    # Apply uploaded health/calendar overrides to custom metrics
+    for path, delta in USER_HEALTH_OVERRIDES.items():
+        if '.' in path:
+            dom, sub = path.split('.', 1)
+            dom_obj = getattr(m, dom, None)
+            if dom_obj and hasattr(dom_obj, sub):
+                setattr(dom_obj, sub, max(0.0, min(100.0, getattr(dom_obj, sub) + delta)))
+    gmail_signals = data.get('gmail_signals')
+    if gmail_signals:
+        # Merge digital signals if provided
+        for k, v in gmail_signals.items():
+            parts = k.split(".")
+            if len(parts) == 2:
+                dom = getattr(m, parts[0], None)
+                if dom and hasattr(dom, parts[1]):
+                    setattr(dom, parts[1], v)
+    # Extract conflict from text using LLM
+    conflict = INTAKE.extract_conflict(situation_input, m)
+    pers_dict = INTAKE.get_personality_from_description(situation_input)
+    person = SimPerson(
+        name=pers_dict.get("name", "Inferred Self"),
+        openness=pers_dict.get("openness", 0.5),
+        conscientiousness=pers_dict.get("conscientiousness", 0.5),
+        extraversion=pers_dict.get("extraversion", 0.5),
+        agreeableness=pers_dict.get("agreeableness", 0.5),
+        neuroticism=pers_dict.get("neuroticism", 0.5)
+    )
+    budget = ResourceBudget(time=24, money=1000, energy=100)
+    action = AGENT.get_action(m, budget, conflict, person)
+    _normalize_action_metric_changes(action)
+    uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
+                                     m.mental_wellbeing.stress_level)
+    env = LifeStackEnv()
+    env.state.current_metrics = copy.deepcopy(m)
+    env.state.budget = budget
+    env_action = LifeStackAction.from_agent_action(action)
+    env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+    obs = env.step(env_action)
+    return jsonify({
+        "before_metrics": m.flatten(),
+        "after_metrics": obs.metrics,
+        "domain_health": compute_domain_health(obs.metrics),
+        "action": {
+            "type": action.primary.action_type,
+            "target": action.primary.target_domain,
+            "description": action.primary.description,
+            "reasoning": action.reasoning,
+            "id": "".join(str(uuid.uuid4()).split("-")[:2]).upper()
+        },
+        "person": {"name": person.name or "Inferred Self"}
+    })
+@app.route('/api/gmail/sync', methods=['POST'])
+def sync_gmail():
+    signals, metric_deltas, summary, is_demo = GMAIL.sync()
+    return jsonify({
+        "status": "success",
+        "signals": metric_deltas,
+        "raw": signals,
+        "summary": summary,
+        "is_demo": is_demo,
+    })
+@app.route('/api/digital/sync', methods=['POST'])
+def digital_sync():
+    """
+    Unified Digital Sync — Gmail + Google Calendar + Fitness (demo payload).
+    Tries real OAuth for Gmail and Calendar; falls back to demo_signals.json on failure.
+    Fitness is always served from the demo payload (no first-party fitness API scope).
+    Returns merged metric deltas, per-source raw signals, and a demo flag per source.
+    """
+    import json as _json
+    demo_path = os.path.join(os.path.dirname(__file__), 'data', 'demo_signals.json')
+    with open(demo_path) as f:
+        demo_full = _json.load(f)
+    # Gmail
+    gmail_signals, gmail_deltas, gmail_summary, gmail_is_demo = GMAIL.sync()
+    # Calendar
+    cal_signals, cal_deltas, cal_is_demo = CALENDAR.sync()
+    # Fitness — always demo (no live fitness API)
+    fitness_signals = demo_full['fitness']
+    fitness_deltas = {
+        "physical_health.sleep_quality": demo_full['derived_metric_deltas']['physical_health.sleep_quality'],
+        "physical_health.energy_level": demo_full['derived_metric_deltas']['physical_health.energy_level'],
+        "physical_health.exercise_consistency": demo_full['derived_metric_deltas']['physical_health.exercise_consistency'],
+        "mental_wellbeing.stress_level": demo_full['derived_metric_deltas']['mental_wellbeing.stress_level'],
+    }
+    fitness_is_demo = True
+    # Merge all deltas (last writer wins — Calendar > Gmail for overlapping keys)
+    merged_deltas = {}
+    merged_deltas.update(gmail_deltas)
+    merged_deltas.update(cal_deltas)
+    merged_deltas.update(fitness_deltas)
+    return jsonify({
+        "status": "success",
+        "merged_deltas": merged_deltas,
+        "sources": {
+            "gmail": {
+                "signals": gmail_signals if isinstance(gmail_signals, dict) else {},
+                "summary": gmail_summary,
+                "is_demo": gmail_is_demo,
+            },
+            "calendar": {
+                "signals": cal_signals,
+                "summary": cal_signals.get("summary", ""),
+                "is_demo": cal_is_demo,
+            },
+            "fitness": {
+                "signals": fitness_signals,
+                "summary": fitness_signals.get("summary", ""),
+                "is_demo": True,
+            },
+        },
+        "persona_note": demo_full.get("persona", "Jordan (PM at Series-B startup)"),
+    })
+@app.route('/api/arjun/activate', methods=['POST'])
+def activate_arjun():
+    LONG_DEMO.pre_seed_arjun()
+    return jsonify({"status": "success", "message": "Arjun's memory (Week 1 & 2) is now ACTIVE in ChromaDB."})
+@app.route('/api/task/demo', methods=['GET'])
+def get_demo_task():
+    dummy_routes = [
+        Route(id="r1", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], milestones_unlocked=["m1"], final_reward=2.5),
+        Route(id="r2", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], milestones_unlocked=["m2"], final_reward=1.8),
+    ]
+    dummy_milestones = [
+        Milestone(id="m1", description="Successfully rebooked flight before deadline", reward=1.0),
+        Milestone(id="m2", description="Caught up with all emergency slack messages", reward=0.8),
+    ]
+    dummy_events = [
+        ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300."),
+        ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity."),
+    ]
+    task = Task(
+        id="sample_flight_crisis", domain="flight_crisis", goal="Survive Airport Cancellation",
+        event_schedule=dummy_events, viable_routes=dummy_routes, milestones=dummy_milestones,
+        horizon=10, difficulty=4
+    )
+    return jsonify({
+        "goal": task.goal,
+        "difficulty": task.difficulty,
+        "routes": [{"name": r.name, "description": r.description} for r in dummy_routes],
+        "milestones": [{"id": m.id, "description": m.description} for m in dummy_milestones],
+        "events": [{"step": e.step, "id": e.id, "description": e.description} for e in dummy_events],
+        "story": "A major storm grounded commercial flights."
+    })
+@app.route('/api/stats', methods=['GET'])
+def get_stats():
+    stats = MEMORY.get_stats()
+    # Normalise for frontend: inject feedback_count and reward_history
+    all_records = []
+    try:
+        raw = MEMORY.collection.get(include=["metadatas"])
+        all_records = raw.get("metadatas", [])
+    except Exception:
+        pass
+    stats["feedback_count"] = len([m for m in all_records if m.get("type") == "feedback"])
+    rewards = [m.get("reward", 0.0) for m in all_records if "reward" in m]
+    stats["reward_history"] = rewards[-20:] if rewards else []
+    return jsonify(stats)
+@app.route('/api/feedback/submit', methods=['POST'])
+def submit_feedback():
+    data = request.json
+    try:
+        feedback = OutcomeFeedback(
+            episode_id=data.get('episode_id'),
+            submitted_at=datetime.datetime.now(),
+            overall_effectiveness=int(data.get('score', 7)),
+            domains_improved=data.get('improved', []),
+            domains_worsened=data.get('worsened', []),
+            unexpected_effects=data.get('notes', ""),
+            resolution_time_hours=float(data.get('time', 1.0))
+        )
+        MEMORY.store_feedback(feedback)
+        return jsonify({"status": "success", "message": f"Feedback stored for episode {feedback.episode_id}"})
+    except Exception as e:
+        return jsonify({"status": "error", "message": str(e)}), 400
+# ─── Feature F1 helper: random action baseline ───
+_ACTION_TYPES = ["negotiate", "communicate", "delegate", "spend", "reschedule", "rest", "deprioritize", "execute"]
+def _random_action(conflict, person):
+    """Purely random action baseline — worst possible agent, used for ablation floor."""
+    import random as _r
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+    flat = env.state.current_metrics.flatten()
+    atype = _r.choice(_ACTION_TYPES)
+    dom = _r.choice(_DOMAINS)
+    key = f"{dom}.stress_level" if dom in ("career", "mental_wellbeing") else f"{dom}.liquidity" if dom == "finances" else f"{dom}.energy_level"
+    mc = {key: _r.uniform(-20, 20)}
+    rc = {"time": _r.uniform(0.5, 3.0), "energy": _r.uniform(5, 30)}
+    uptake = person.respond_to_action(atype, rc, flat.get("mental_wellbeing.stress_level", 70))
+    env_action = LifeStackAction(action_type=atype, target=dom,
+                                  metric_changes={k: v * uptake for k, v in mc.items()},
+                                  resource_cost=rc, reasoning="Random baseline.", actions_taken=1)
+    obs = env.step(env_action)
+    return {"metrics": obs.metrics, "action": {"type": atype, "target": dom,
+            "description": "Random action (ablation floor).",
+            "reasoning": "Random baseline.", "reward": obs.reward, "cost": rc}}
+# ─── Feature A: Trained vs Untrained Comparison ───
+BASELINE_ACTION_MAP = {
+    "career":           ("negotiate",    {"career.workload": -12.0, "mental_wellbeing.stress_level": -4.0},    {"time": 1.5, "energy": 20.0}, "Negotiate workload with manager."),
+    "finances":         ("spend",        {"finances.liquidity": -200.0, "mental_wellbeing.stress_level": -8.0}, {"time": 1.0, "energy": 10.0}, "Spend to resolve financial pressure."),
+    "relationships":    ("communicate",  {"relationships.romantic": 8.0, "mental_wellbeing.stress_level": -5.0},{"time": 0.5, "energy": 8.0},  "Call partner to check in."),
+    "physical_health":  ("rest",         {"physical_health.energy_level": 12.0, "mental_wellbeing.stress_level": -6.0}, {"time": 1.0}, "Rest to recover energy."),
+    "mental_wellbeing": ("rest",         {"mental_wellbeing.stress_level": -15.0, "physical_health.sleep_quality": 5.0},  {"time": 1.0}, "Take a break to reduce stress."),
+    "time":             ("reschedule",   {"time.free_hours_per_week": 6.0, "career.workload": -8.0},            {"time": 1.5, "energy": 12.0}, "Reschedule non-critical tasks."),
+}
+def _run_baseline(conflict, person):
+    """Rule-based baseline: pick the action for the worst-scoring domain."""
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+    flat = env.state.current_metrics.flatten()
+    domain_scores = {}
+    for dom in ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]:
+        subs = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
+        domain_scores[dom] = sum(subs.values()) / len(subs) if subs else 70.0
+    worst_dom = min(domain_scores, key=domain_scores.get)
+    atype, mc, rc, desc = BASELINE_ACTION_MAP.get(worst_dom, BASELINE_ACTION_MAP["mental_wellbeing"])
+    uptake = person.respond_to_action(atype, rc, flat.get("mental_wellbeing.stress_level", 70))
+    scaled_mc = {k: v * uptake for k, v in mc.items()}
+    env_action = LifeStackAction(
+        action_type=atype,
+        target=worst_dom,
+        metric_changes=scaled_mc,
+        resource_cost=rc,
+        reasoning=f"Rule-based: {worst_dom} scored {domain_scores[worst_dom]:.1f} — lowest domain.",
+        actions_taken=1,
+    )
+    obs = env.step(env_action)
+    return {
+        "metrics": obs.metrics,
+        "action": {
+            "type": atype,
+            "target": worst_dom,
+            "description": desc,
+            "reasoning": env_action.reasoning,
+            "reward": obs.reward,
+            "cost": rc,
+        }
+    }
+def _run_agent_comparison_side(conflict, person, api_only: bool):
+    """Run one side of the comparison: api_only=True → untrained LLM, False → GRPO-trained."""
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+    before_metrics = copy.deepcopy(env.state.current_metrics)
+    before_budget = copy.deepcopy(env.state.budget)
+    action = AGENT.get_action(before_metrics, before_budget, conflict, person, api_only=api_only)
+    _normalize_action_metric_changes(action)
+    uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
+                                      before_metrics.mental_wellbeing.stress_level)
+    env_action = LifeStackAction.from_agent_action(action)
+    env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+    obs = env.step(env_action)
+    return {
+        "metrics": obs.metrics,
+        "action": {
+            "type": action.primary.action_type,
+            "target": action.primary.target_domain,
+            "description": action.primary.description,
+            "reasoning": action.reasoning,
+            "reward": obs.reward,
+            "cost": action.primary.resource_cost,
+        }
+    }
+@app.route('/api/comparison/run', methods=['POST'])
+def run_comparison():
+    """Run same conflict through untrained LLM (no RL) AND GRPO-trained LifeStack agent."""
+    data = request.json
+    conflict_label = data.get('conflict')
+    person_label = data.get('person')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    person = PERSONS.get(person_label, PERSONS["Alex (Executive) — driven, high-stress"])
+    # Untrained LLM path — forces Groq API, no GRPO optimization
+    try:
+        baseline = _run_agent_comparison_side(conflict, person, api_only=True)
+    except Exception as e:
+        baseline = {"error": str(e)}
+    # GRPO-trained agent path — uses local model if available, lazy-loaded
+    try:
+        trained = _run_agent_comparison_side(conflict, person, api_only=False)
+    except Exception as e:
+        trained = {"error": str(e)}
+    return jsonify({"baseline": baseline, "trained": trained})
+# ─── Feature E: Memory Effect Comparison ───
+@app.route('/api/memory/compare', methods=['POST'])
+def memory_compare():
+    """Show the same conflict resolved cold (no memory) vs warm (with RAG memory)."""
+    try:
+        data = request.json
+        conflict_label = data.get('conflict')
+        person_label = data.get('person')
+        conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+        person = PERSONS.get(person_label, PERSONS["Alex (Executive) — driven, high-stress"])
+        def _run_episode(use_memory: bool):
+            env = LifeStackEnv()
+            env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+            before_metrics = copy.deepcopy(env.state.current_metrics)
+            before_budget = copy.deepcopy(env.state.budget)
+            few_shot = ""
+            retrieved = []
+            if use_memory:
+                few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_metrics.flatten())
+                retrieved = MEMORY.retrieve_similar(conflict.title, before_metrics.flatten())
+            action = AGENT.get_action(before_metrics, before_budget, conflict, person, few_shot_context=few_shot)
+            _normalize_action_metric_changes(action)
+            uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
+                                              before_metrics.mental_wellbeing.stress_level)
+            env_action = LifeStackAction.from_agent_action(action)
+            env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+            obs = env.step(env_action)
+            MEMORY.store_decision(
+                conflict_title=conflict.title,
+                action_type=action.primary.action_type,
+                target_domain=action.primary.target_domain,
+                reward=obs.reward,
+                metrics_snapshot=before_metrics.flatten(),
+                reasoning=action.reasoning,
+            )
+            return {
+                "metrics": obs.metrics,
+                "action": {
+                    "type": action.primary.action_type,
+                    "target": action.primary.target_domain,
+                    "description": action.primary.description,
+                    "reasoning": action.reasoning,
+                    "reward": obs.reward,
+                    "memories_retrieved": retrieved,
+                }
+            }
+        cold = _run_episode(use_memory=False)
+        warm = _run_episode(use_memory=True)
+        return jsonify({"cold": cold, "warm": warm})
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500
+# ─── F2: /api/cascade/frames alias ───
+@app.route('/api/cascade/frames', methods=['POST'])
+def cascade_frames_alias():
+    """Alias route for /api/simulation/cascade — same handler."""
+    return get_cascade_frames()
+# ─── F4: Personality Comparison with OCEAN scores ───
+@app.route('/api/personality/compare', methods=['POST'])
+def personality_compare():
+    data = request.json
+    conflict_label = data.get('conflict')
+    person_a_label = data.get('person_a')
+    person_b_label = data.get('person_b')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    def _run_person(person_label):
+        person = PERSONS.get(person_label, list(PERSONS.values())[0])
+        env = LifeStackEnv()
+        env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+        before_m = copy.deepcopy(env.state.current_metrics)
+        before_b = copy.deepcopy(env.state.budget)
+        action = AGENT.get_action(before_m, before_b, conflict, person)
+        _normalize_action_metric_changes(action)
+        uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
+                                          before_m.mental_wellbeing.stress_level)
+        env_action = LifeStackAction.from_agent_action(action)
+        env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+        obs = env.step(env_action)
+        return {
+            "name": person.name,
+            "ocean": {
+                "openness": round(person.openness * 100),
+                "conscientiousness": round(person.conscientiousness * 100),
+                "extraversion": round(person.extraversion * 100),
+                "agreeableness": round(person.agreeableness * 100),
+                "neuroticism": round(person.neuroticism * 100),
+            },
+            "action": {
+                "type": action.primary.action_type,
+                "target": action.primary.target_domain,
+                "description": action.primary.description,
+                "reasoning": action.reasoning,
+                "reward": obs.reward,
+                "uptake": uptake,
+            },
+            "metrics": obs.metrics,
+            "domain_health": compute_domain_health(obs.metrics),
+        }
+    try:
+        return jsonify({"a": _run_person(person_a_label), "b": _run_person(person_b_label)})
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500
+# ─── F6: Dedicated Counterfactual Generation ───
+@app.route('/api/counterfactuals/generate', methods=['POST'])
+def counterfactuals_generate():
+    data = request.json
+    conflict_label = data.get('conflict')
+    person_label = data.get('person')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    person = PERSONS.get(person_label, list(PERSONS.values())[0])
+    env = LifeStackEnv()
+    env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+    before_m = copy.deepcopy(env.state.current_metrics)
+    before_b = copy.deepcopy(env.state.budget)
+    action = AGENT.get_action(before_m, before_b, conflict, person)
+    _normalize_action_metric_changes(action)
+    cf_data = generate_counterfactuals(AGENT, before_m, before_b, conflict, person, action)
+    return jsonify({
+        "counterfactuals": cf_data,
+        "actual_action": {
+            "type": action.primary.action_type,
+            "target": action.primary.target_domain,
+            "description": action.primary.description,
+        },
+    })
+# ─── F7: Memory Ablation Study ───
+@app.route('/api/memory/ablation', methods=['POST'])
+def memory_ablation():
+    """Memory ablation: cold (0 memories) vs warm (RAG-augmented). Surfaces ablation delta."""
+    data = request.json
+    conflict_label = data.get('conflict')
+    person_label = data.get('person')
+    conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
+    person = PERSONS.get(person_label, list(PERSONS.values())[0])
+    def _run(use_memory):
+        env = LifeStackEnv()
+        env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
+        before_m = copy.deepcopy(env.state.current_metrics)
+        before_b = copy.deepcopy(env.state.budget)
+        few_shot, retrieved = "", []
+        if use_memory:
+            few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_m.flatten())
+            retrieved = MEMORY.retrieve_similar(conflict.title, before_m.flatten())
+        action = AGENT.get_action(before_m, before_b, conflict, person, few_shot_context=few_shot)
+        _normalize_action_metric_changes(action)
+        uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
+                                          before_m.mental_wellbeing.stress_level)
+        env_action = LifeStackAction.from_agent_action(action)
+        env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
+        obs = env.step(env_action)
+        MEMORY.store_decision(conflict_title=conflict.title, action_type=action.primary.action_type,
+                              target_domain=action.primary.target_domain, reward=obs.reward,
+                              metrics_snapshot=before_m.flatten(), reasoning=action.reasoning)
+        return {"metrics": obs.metrics, "action": {
+            "type": action.primary.action_type, "target": action.primary.target_domain,
+            "description": action.primary.description, "reasoning": action.reasoning,
+            "reward": obs.reward, "memories_retrieved": retrieved,
+        }}
+    cold = _run(use_memory=False)
+    warm = _run(use_memory=True)
+    delta = warm["action"]["reward"] - cold["action"]["reward"]
+    return jsonify({"cold": cold, "warm": warm,
+                    "ablation_delta": round(delta, 4),
+                    "memory_count": len(warm["action"]["memories_retrieved"])})
+# ─── F10: Health + Calendar Data Upload ───
+@app.route('/api/data/health/upload', methods=['POST'])
+def upload_health_data():
+    """Accept health/fitness JSON signals and return metric deltas."""
+    data = request.json or {}
+    sleep = float(data.get('sleep_hours', 7.0))
+    hr = float(data.get('resting_heart_rate', 70))
+    steps = float(data.get('daily_steps', 8000))
+    deltas = {
+        "physical_health.sleep_quality": round(min(100, sleep / 8 * 100) - 50, 1),
+        "physical_health.energy_level": round(min(100, steps / 10000 * 100) - 50, 1),
+        "physical_health.exercise_consistency": round(min(100, steps / 8000 * 70), 1),
+        "mental_wellbeing.stress_level": round(max(0.0, 80.0 - hr), 1),
+    }
+    summary = f"Sleep {sleep:.1f}h | HR {hr:.0f}bpm | Steps {int(steps):,}/day"
+    # Persist overrides so future simulations use the uploaded health data
+    USER_HEALTH_OVERRIDES.update(deltas)
+    return jsonify({"status": "success", "deltas": deltas, "summary": summary,
+                    "signals": {"avg_sleep_hours": sleep, "resting_heart_rate": hr, "daily_steps_avg": steps}})
+@app.route('/api/data/calendar/upload', methods=['POST'])
+def upload_calendar_data():
+    """Accept calendar JSON signals and return metric deltas."""
+    data = request.json or {}
+    occupancy = float(data.get('week_occupancy_pct', 50))
+    btb = int(data.get('back_to_back_blocks', 0))
+    deadlines = data.get('upcoming_deadlines', [])
+    critical_count = sum(1 for d in deadlines if d.get('priority') == 'critical')
+    deltas = {
+        "time.free_hours_per_week": round(-((occupancy - 50) / 5), 1),
+        "time.schedule_control": round(-(occupancy / 10), 1),
+        "mental_wellbeing.stress_level": round((occupancy / 10) + (btb * 2), 1),
+        "career.workload": round((occupancy - 50) / 2 + critical_count * 5, 1),
+    }
+    summary = f"Occupancy {occupancy:.0f}% | {len(deadlines)} deadlines ({critical_count} critical)"
+    return jsonify({"status": "success", "deltas": deltas, "summary": summary,
+                    "signals": {"week_occupancy_pct": occupancy, "back_to_back_blocks": btb,
+                                "upcoming_deadlines": deadlines}})
+# ─── Global Error Handlers ───
+@app.errorhandler(429)
+def ratelimit_handler(e):
+    return jsonify({"error": "Rate limit exceeded. Slow down!", "details": str(e)}), 429
+@app.errorhandler(500)
+def server_error_handler(e):
+    return jsonify({"error": "Internal server error. The agent might be overwhelmed.", "details": str(e)}), 500
+if __name__ == '__main__':
+    LONG_DEMO.pre_seed_arjun()
+    app.run(host='0.0.0.0', port=7860, debug=True)

core/__init__.py ADDED Viewed

File without changes

core/action_space.py ADDED Viewed

	@@ -0,0 +1,238 @@

+import copy
+from dataclasses import dataclass, field
+from core.life_state import LifeMetrics, ResourceBudget
+from enum import Enum
+from intake.simperson import SimPerson
+class ToolActionType(str, Enum):
+    INSPECT = "inspect"
+    PLAN = "plan"
+    EXECUTE = "execute"
+    COMMUNICATE = "communicate"
+    WAIT = "wait"
+    ROLLBACK = "rollback"
+    ESCALATE = "escalate"
+@dataclass
+class PrimaryAction:
+    action_type: str  # reschedule, delegate, negotiate, spend, communicate, rest, deprioritize
+    target_domain: str
+    metric_changes: dict
+    resource_cost: dict
+    description: str
+@dataclass
+class CommunicationAction:
+    recipient: str  # boss, partner, family, friend, colleague
+    message_type: str  # apologize, negotiate, inform, request, reassure
+    tone: str  # formal, warm, urgent, calm, assertive
+    content: str
+@dataclass
+class AgentAction:
+    primary: PrimaryAction
+    communication: CommunicationAction = None
+    reasoning: str = ""
+    model_used: str = "unknown"
+    raw_completion: str = ""
+def validate_action(action: AgentAction, budget: ResourceBudget) -> tuple[bool, str]:
+    cost = action.primary.resource_cost
+    if budget.time_hours < cost.get('time', 0.0):
+        return False, f"Not enough time (Needs {cost.get('time')}h, has {budget.time_hours:.1f}h)"
+    if budget.money_dollars < cost.get('money', 0.0):
+        return False, f"Not enough money (Needs ${cost.get('money')}, has ${budget.money_dollars:.1f})"
+    if budget.energy_units < cost.get('energy', 0.0):
+        return False, f"Not enough energy (Needs {cost.get('energy')}u, has {budget.energy_units:.1f}u)"
+    return True, ""
+def apply_action(action: AgentAction, metrics: LifeMetrics, budget: ResourceBudget, person: SimPerson) -> tuple[LifeMetrics, ResourceBudget, float]:
+    """Validates, scales by personality uptake, and applies the action to the state."""
+    # 1. Validation
+    is_valid, reason = validate_action(action, budget)
+    if not is_valid:
+        # If invalid, the action fails but we return current state with 0 uptake
+        return metrics, budget, 0.0
+    # 2. Personality Scaling (Uptake)
+    current_stress = metrics.mental_wellbeing.stress_level
+    uptake_score = person.respond_to_action(
+        action.primary.action_type,
+        action.primary.resource_cost,
+        current_stress
+    )
+    # 3. Apply changes (Scaled by uptake)
+    new_metrics = copy.deepcopy(metrics)
+    for path, delta in action.primary.metric_changes.items():
+        # Guard: skip malformed keys without a domain prefix (e.g. LLM returns "stress_level" instead of "mental_wellbeing.stress_level")
+        if '.' not in path:
+            print(f"  ⚠️  Skipping malformed metric key: '{path}' (expected 'domain.submetric')")
+            continue
+        parts = path.split('.', 1)
+        domain_name, sub_name = parts[0], parts[1]
+        domain = getattr(new_metrics, domain_name, None)
+        if domain is None or not hasattr(domain, sub_name):
+            print(f"  ⚠️  Skipping unknown metric: '{path}'")
+            continue
+        current = getattr(domain, sub_name)
+        # Scale the benefit/cost by the person's receptiveness
+        try:
+            scaled_delta = float(delta) * uptake_score
+            setattr(domain, sub_name, max(0.0, min(100.0, current + scaled_delta)))
+        except ValueError:
+            print(f"  ⚠️  Skipping metric change due to invalid delta value: '{delta}'")
+    # 4. Deduct resources (Fixed cost, doesn't scale with uptake)
+    new_budget = copy.deepcopy(budget)
+    new_budget.deduct(
+        time=action.primary.resource_cost.get('time', 0.0),
+        money=action.primary.resource_cost.get('money', 0.0),
+        energy=action.primary.resource_cost.get('energy', 0.0)
+    )
+    return new_metrics, new_budget, uptake_score
+# 10 EXAMPLE ACTIONS for Friday 6PM Conflict
+EXAMPLE_ACTIONS = [
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="negotiate", target_domain="career",
+            metric_changes={"career.workload": -15.0, "mental_wellbeing.stress_level": -5.0},
+            resource_cost={"time": 1.5, "energy": 20.0},
+            description="Negotiate a Sunday deadline extension with my boss."
+        ),
+        communication=CommunicationAction("boss", "negotiate", "formal", "Due to flight issues, I need until Sunday PM for the report."),
+        reasoning="Relieving the immediate workload pressure is critical to reduce cascade spread."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="spend", target_domain="finances",
+            metric_changes={"finances.liquidity": -350.0, "mental_wellbeing.stress_level": -10.0},
+            resource_cost={"time": 1.0, "energy": 15.0},
+            description="Rebook the canceled flight using a premium fare."
+        ),
+        reasoning="Immediate resolution of logistics fixes the source of the crisis."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="communicate", target_domain="relationships",
+            metric_changes={"relationships.romantic": 12.0, "mental_wellbeing.stress_level": -5.0},
+            resource_cost={"time": 0.5, "energy": 10.0},
+            description="Call my partner to explain the situation and reassure them."
+        ),
+        communication=CommunicationAction("partner", "reassure", "warm", "Hey, I'm stuck but I'll be home soon. Miss you."),
+        reasoning="Prevents relationship decay while stress is high."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="communicate", target_domain="finances",
+            metric_changes={"finances.liquidity": 200.0, "relationships.family": -5.0},
+            resource_cost={"time": 1.5, "energy": 25.0},
+            description="Ask my sibling for a temporary loan to cover rebooking."
+        ),
+        communication=CommunicationAction("family", "request", "urgent", "My card declined, can you Venmo me $200 for the flight?"),
+        reasoning="Fixes the liquidity block at a small social cost."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="reschedule", target_domain="time",
+            metric_changes={"career.workload": -10.0, "time.free_hours_per_week": 5.0},
+            resource_cost={"time": 2.0, "energy": 15.0},
+            description="Cancel non-essential meetings to create a deep-work block."
+        ),
+        reasoning="Regaining time allows for better problem solving later."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="rest", target_domain="physical_health",
+            metric_changes={"mental_wellbeing.stress_level": -12.0, "physical_health.energy": 10.0},
+            resource_cost={"time": 1.0, "energy": -10.0},
+            description="Take a 60-minute power nap in the airport lounge."
+        ),
+        reasoning="Restores energy to tackle the remaining Sunday deadline."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="delegate", target_domain="career",
+            metric_changes={"career.workload": -10.0, "relationships.professional_network": -5.0},
+            resource_cost={"time": 1.0, "energy": 15.0},
+            description="Ask a colleague to handle the final formatting of the slides."
+        ),
+        communication=CommunicationAction("colleague", "request", "assertive", "I'm stuck at airport, can you finish the formatting?"),
+        reasoning="Reduces workload by leaning on the professional network."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="deprioritize", target_domain="time",
+            metric_changes={"time.free_hours_per_week": 8.0, "relationships.social": -10.0},
+            resource_cost={"time": 0.5, "energy": 5.0},
+            description="Tell friends I can't attend the weekend gathering."
+        ),
+        communication=CommunicationAction("friend", "inform", "calm", "Hey, work crisis. Won't make it this weekend. Sorry!"),
+        reasoning="Aggressively reclaims time for high-value tasks."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="communicate", target_domain="career",
+            metric_changes={"career.stability": 8.0, "mental_wellbeing.stress_level": -5.0},
+            resource_cost={"time": 0.5, "energy": 10.0},
+            description="Send an apology note to boss for the delay."
+        ),
+        communication=CommunicationAction("boss", "apologize", "formal", "Apologies for the delay caused by travel disruptions. On it now."),
+        reasoning="Maintains career stability during an active crisis."
+    ),
+    AgentAction(
+        primary=PrimaryAction(
+            action_type="reschedule", target_domain="finances",
+            metric_changes={"finances.debt_pressure": -10.0, "time.admin_overhead": 10.0},
+            resource_cost={"time": 2.0, "energy": 15.0},
+            description="Call the bank to unlock the declined card."
+        ),
+        communication=CommunicationAction("colleague", "request", "assertive", "Unlock my credit card immediately."),
+        reasoning="Removes the liquidity barrier by handling admin overhead."
+    )
+]
+def main():
+    # 1. Setup Personalities
+    # Sam (Anxious Introvert): Neuroticism 0.9, Extraversion 0.1
+    sam = SimPerson(name="Sam (Introvert)", openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9)
+    # 2. Setup initial state (Friday 6PM Conflict)
+    from core.life_state import DependencyGraph
+    graph = DependencyGraph()
+    metrics = LifeMetrics() # starts at 70s
+    metrics = graph.cascade(metrics, {"career.workload": 35.0, "finances.liquidity": -40.0})
+    budget = ResourceBudget(time_hours=20.0, money_dollars=500.0, energy_units=100.0)
+    print("--- SIMULATING ACTIONS FOR SAM (ANXIOUS INTROVERT) ---")
+    print(f"Initial Stress: {metrics.mental_wellbeing.stress_level:.2f}")
+    print(f"Initial Metrics Health (Avg): {sum(metrics.flatten().values())/23:.2f}")
+    # 3. Apply each action
+    for i, action in enumerate(EXAMPLE_ACTIONS, 1):
+        print(f"\nACTION {i}: {action.primary.description}")
+        is_valid, reason = validate_action(action, budget)
+        if not is_valid:
+            print(f"  ❌ FAILED: {reason}")
+            continue
+        m_after, b_after, uptake = apply_action(action, metrics, budget, sam)
+        print(f"  ✅ SUCCESS | Uptake: {uptake:.2f}")
+        print(f"  Cost: {action.primary.resource_cost}")
+        # Show specific improvements
+        for path, delta in action.primary.metric_changes.items():
+            domain_name, sub_name = path.split('.')
+            val_before = getattr(getattr(metrics, domain_name), sub_name)
+            val_after = getattr(getattr(m_after, domain_name), sub_name)
+            real_delta = val_after - val_before
+            print(f"  - {path:25}: {val_before:.2f} -> {val_after:.2f} (Actual Change: {real_delta:+.2f})")
+if __name__ == "__main__":
+    main()

core/cascade_utils.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import copy
+from core.life_state import LifeMetrics, DependencyGraph, CASCADE_DAMPENING_DEFAULT
+def animate_cascade(primary_disruption: dict, metrics: LifeMetrics) -> list[dict]:
+    """Replay the cascade step-by-step and capture intermediate frames.
+    Returns a list of frames, each:
+      { 'flat': {metric: value}, 'status': {metric: 'primary'|'first'|'second'|'unchanged'} }
+    """
+    graph = DependencyGraph()
+    dampening = CASCADE_DAMPENING_DEFAULT
+    frames = []
+    # Frame 0 — initial stable state
+    base = copy.deepcopy(metrics)
+    base_flat = base.flatten()
+    frames.append({'flat': dict(base_flat), 'status': {k: 'unchanged' for k in base_flat}})
+    # Frame 1 — primary disruption only (no cascade)
+    f1 = copy.deepcopy(metrics)
+    primary_keys = set()
+    for path, amount in primary_disruption.items():
+        if '.' not in path:
+            continue
+        primary_keys.add(path)
+        dom_name, sub_name = path.split('.', 1)
+        dom = getattr(f1, dom_name, None)
+        if dom and hasattr(dom, sub_name):
+            setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + amount)))
+    f1_flat = f1.flatten()
+    frames.append({'flat': dict(f1_flat),
+                   'status': {k: ('primary' if k in primary_keys else 'unchanged') for k in f1_flat}})
+    # Frame 2 — first-order cascade
+    f2 = copy.deepcopy(f1)
+    first_order_keys = set()
+    queue_next = []
+    for path, amount in primary_disruption.items():
+        if '.' not in path or path not in graph.edges:
+            continue
+        for target, weight in graph.edges[path]:
+            impact = amount * weight * dampening
+            if abs(impact) >= 0.05:
+                first_order_keys.add(target)
+                dom_name, sub_name = target.split('.', 1)
+                dom = getattr(f2, dom_name, None)
+                if dom and hasattr(dom, sub_name):
+                    setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + impact)))
+                queue_next.append((target, impact))
+    f2_flat = f2.flatten()
+    frames.append({'flat': dict(f2_flat), 'status': {
+        k: ('primary' if k in primary_keys else 'first' if k in first_order_keys else 'unchanged')
+        for k in f2_flat
+    }})
+    # Frame 3 — second-order cascade
+    f3 = copy.deepcopy(f2)
+    second_order_keys = set()
+    for src_path, src_mag in queue_next:
+        if src_path not in graph.edges:
+            continue
+        for target, weight in graph.edges[src_path]:
+            impact = src_mag * weight * dampening
+            if abs(impact) >= 0.05:
+                second_order_keys.add(target)
+                dom_name, sub_name = target.split('.', 1)
+                dom = getattr(f3, dom_name, None)
+                if dom and hasattr(dom, sub_name):
+                    setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + impact)))
+    f3_flat = f3.flatten()
+    frames.append({'flat': dict(f3_flat), 'status': {
+        k: ('primary' if k in primary_keys else 'first' if k in first_order_keys
+            else 'second' if k in second_order_keys else 'unchanged')
+        for k in f3_flat
+    }})
+    return frames

core/feedback.py ADDED Viewed

	@@ -0,0 +1,63 @@

+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import List, Optional
+from core.lifestack_env import LifeStackObservation
+@dataclass
+class OutcomeFeedback:
+    episode_id: str
+    submitted_at: datetime = field(default_factory=datetime.now)
+    # Did the advice work overall? 0-10 scale
+    overall_effectiveness: int = 5
+    # Which domains actually changed (user-reported)
+    domains_improved: List[str] = field(default_factory=list)
+    domains_worsened: List[str] = field(default_factory=list)
+    # Free text: what unexpected effects happened?
+    unexpected_effects: str = ""
+    # Time to resolution (hours)
+    resolution_time_hours: float = 0.0
+def compute_human_feedback_reward(initial_metrics: dict, predicted_obs: LifeStackObservation, feedback: OutcomeFeedback) -> float:
+    """
+    Computes a reward score (0.0 to 1.0) based on how well the environment's
+    predicted outcomes match the human's reported reality.
+    """
+    # Metrics where a decrease is an improvement
+    inverted = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
+    predicted_improved = set()
+    for key, final_val in predicted_obs.metrics.items():
+        if key not in initial_metrics:
+            continue
+        initial_val = initial_metrics[key]
+        delta = final_val - initial_val
+        submetric = key.split('.')[-1]
+        domain = key.split('.')[0]
+        # Determine if this specific change is an "improvement"
+        is_improvement = False
+        if submetric in inverted:
+            if delta < -1.0: # Significant decrease in negative metric
+                is_improvement = True
+        else:
+            if delta > 1.0:  # Significant increase in positive metric
+                is_improvement = True
+        if is_improvement:
+            predicted_improved.add(domain)
+    actual_improved = set(feedback.domains_improved)
+    union = predicted_improved | actual_improved
+    if not union:
+        overlap = 1.0 # Both agreed nothing improved
+    else:
+        intersection = predicted_improved & actual_improved
+        overlap = len(intersection) / len(union)
+    # 2. Effectiveness Score (0.0 - 1.0)
+    effectiveness_score = max(0.0, min(1.0, feedback.overall_effectiveness / 10.0))
+    # Weighted Average
+    return 0.5 * overlap + 0.5 * effectiveness_score

core/life_state.py ADDED Viewed

	@@ -0,0 +1,281 @@

+from dataclasses import dataclass, field
+import copy
+# Cascade dampening factor — grounded in Starcke & Brand (2012)
+# Stress effects attenuate ~40% per cognitive/behavioral hop.
+# A disruption propagates at full strength to immediate neighbors,
+# 60% strength to second-order nodes, 36% to third-order, etc.
+CASCADE_DAMPENING_DEFAULT = 0.6
+METRIC_FLOOR = 10.0
+@dataclass
+class CareerMetrics:
+    satisfaction: float = 70.0
+    workload: float = 70.0
+    stability: float = 70.0
+    growth_trajectory: float = 70.0
+@dataclass
+class FinanceMetrics:
+    liquidity: float = 70.0
+    debt_pressure: float = 70.0
+    monthly_runway: float = 70.0
+    long_term_health: float = 70.0
+@dataclass
+class RelationshipMetrics:
+    romantic: float = 70.0
+    family: float = 70.0
+    social: float = 70.0
+    professional_network: float = 70.0
+@dataclass
+class PhysicalHealthMetrics:
+    energy: float = 70.0
+    fitness: float = 70.0
+    sleep_quality: float = 70.0
+    nutrition: float = 70.0
+@dataclass
+class MentalWellbeingMetrics:
+    stress_level: float = 70.0
+    clarity: float = 70.0
+    motivation: float = 70.0
+    emotional_stability: float = 70.0
+@dataclass
+class TimeMetrics:
+    free_hours_per_week: float = 70.0
+    commute_burden: float = 70.0
+    admin_overhead: float = 70.0
+@dataclass
+class LifeMetrics:
+    career: CareerMetrics = field(default_factory=CareerMetrics)
+    finances: FinanceMetrics = field(default_factory=FinanceMetrics)
+    relationships: RelationshipMetrics = field(default_factory=RelationshipMetrics)
+    physical_health: PhysicalHealthMetrics = field(default_factory=PhysicalHealthMetrics)
+    mental_wellbeing: MentalWellbeingMetrics = field(default_factory=MentalWellbeingMetrics)
+    time: TimeMetrics = field(default_factory=TimeMetrics)
+    def flatten(self) -> dict:
+        """Returns a flat dictionary mapping 'domain.submetric' to value."""
+        flat = {}
+        for domain_name in self.__dataclass_fields__:
+            domain = getattr(self, domain_name)
+            for sub_name in domain.__dataclass_fields__:
+                flat[f"{domain_name}.{sub_name}"] = getattr(domain, sub_name)
+        return flat
+@dataclass
+class ResourceBudget:
+    time_hours: float = 20.0
+    money_dollars: float = 500.0
+    energy_units: float = 100.0
+    def deduct(self, time: float = 0.0, money: float = 0.0, energy: float = 0.0) -> bool:
+        """Returns False if any resource would go negative, otherwise deducts and returns True."""
+        if (self.time_hours < time or
+            self.money_dollars < money or
+            self.energy_units < energy):
+            return False
+        self.time_hours -= time
+        self.money_dollars -= money
+        self.energy_units = min(100.0, self.energy_units - energy)  # cap at 100
+        return True
+class DependencyGraph:
+    def __init__(self):
+        # source_node -> [(target_node, weight)]
+        self.edges = {
+            "career.workload": [
+                ("mental_wellbeing.stress_level", 0.70),
+                ("time.free_hours_per_week", -0.80)
+            ],
+            "finances.liquidity": [
+                ("mental_wellbeing.stress_level", -0.60),
+                ("finances.monthly_runway", 0.90)
+            ],
+            "mental_wellbeing.stress_level": [
+                ("physical_health.sleep_quality", -0.55),
+                ("mental_wellbeing.emotional_stability", -0.50),
+                ("mental_wellbeing.motivation", -0.40),
+                ("career.satisfaction", -0.35)
+            ],
+            "physical_health.sleep_quality": [
+                ("mental_wellbeing.clarity", 0.60),
+                ("physical_health.energy", 0.50)
+            ],
+            "relationships.romantic": [
+                ("mental_wellbeing.emotional_stability", 0.50)
+            ],
+            "time.free_hours_per_week": [
+                ("relationships.social", 0.45),
+                ("mental_wellbeing.stress_level", -0.30)
+            ],
+            "physical_health.energy": [
+                ("mental_wellbeing.motivation", 0.40),
+                ("physical_health.fitness", 0.30)
+            ],
+            "career.satisfaction": [
+                ("mental_wellbeing.motivation", 0.50)
+            ],
+            "finances.debt_pressure": [
+                ("mental_wellbeing.stress_level", 0.65)
+            ],
+            "physical_health.nutrition": [
+                ("physical_health.energy", 0.35)
+            ],
+            "physical_health.fitness": [
+                ("physical_health.energy", 0.40)
+            ],
+            "time.commute_burden": [
+                ("physical_health.energy", -0.30),
+                ("mental_wellbeing.stress_level", 0.25)
+            ],
+            "relationships.social": [
+                ("mental_wellbeing.emotional_stability", 0.30)
+            ],
+            "mental_wellbeing.clarity": [
+                ("career.growth_trajectory", 0.45)
+            ],
+            "finances.long_term_health": [
+                ("mental_wellbeing.stress_level", -0.40)
+            ],
+            "time.admin_overhead": [
+                ("mental_wellbeing.stress_level", 0.25)
+            ],
+            "career.stability": [
+                ("mental_wellbeing.stress_level", -0.35)
+            ],
+            "career.growth_trajectory": [
+                ("career.satisfaction", 0.40)
+            ],
+            "mental_wellbeing.motivation": [
+                ("career.growth_trajectory", 0.30)
+            ],
+            "relationships.professional_network": [
+                ("career.stability", 0.35)
+            ]
+        }
+    def _get_val(self, metrics: LifeMetrics, path: str) -> float:
+        if '.' not in path:
+            return 0.0
+        domain, sub = path.split('.', 1)
+        d = getattr(metrics, domain, None)
+        return getattr(d, sub, 0.0) if d else 0.0
+    def _set_val(self, metrics: LifeMetrics, path: str, val: float, is_cascade: bool = False):
+        if '.' not in path:
+            return
+        domain_name, sub_name = path.split('.', 1)
+        domain = getattr(metrics, domain_name, None)
+        if domain is None or not hasattr(domain, sub_name):
+            return
+        # Ensure values stay within bounds
+        floor = METRIC_FLOOR if is_cascade else 0.0
+        clamped_val = max(floor, min(100.0, val))
+        setattr(domain, sub_name, clamped_val)
+    def cascade(self, metrics: LifeMetrics, primary_disruption: dict, dampening: float = CASCADE_DAMPENING_DEFAULT, per_step_cascade_cap: int = 3) -> LifeMetrics:
+        """Applies disruption and propagates effects through the dependency graph.
+        The dampening factor (default 0.6) is grounded in three complementary
+        research findings:
+        1. **Starcke & Brand (2012)** — Stress effects on decision-making
+           attenuate approximately 40% per cognitive/behavioral hop. A workload
+           spike directly raises stress at full magnitude, but the downstream
+           effect on sleep quality is only ~60% of that, and the tertiary effect
+           on mental clarity is ~36%. The 0.6 multiplier captures this empirical
+           attenuation rate.
+        2. **General Systems Theory** — Perturbations in coupled systems lose
+           energy as they propagate through interconnected nodes. Each transfer
+           across an edge dissipates a fraction of the original signal, preventing
+           unbounded cascades in finite systems.
+        3. **Empirical stress research** — Second-order life effects (e.g.
+           work stress → poor sleep → relationship strain) are consistently
+           reported as less severe than first-order effects in longitudinal
+           psychological studies, supporting a sub-unity propagation coefficient.
+        Args:
+            metrics: Current LifeMetrics state.
+            primary_disruption: Dict mapping 'domain.submetric' to delta float.
+            dampening: Propagation decay per hop (default CASCADE_DAMPENING_DEFAULT = 0.6).
+            per_step_cascade_cap: Max nodes allowed to be affected in one step.
+        Returns:
+            LifeMetrics: New state with disruption and cascade effects applied.
+        """
+        new_metrics = copy.deepcopy(metrics)
+        queue = []
+        for path, amount in primary_disruption.items():
+            if '.' not in path:  # skip malformed keys from LLM
+                continue
+            old_val = self._get_val(new_metrics, path)
+            self._set_val(new_metrics, path, old_val + amount, is_cascade=False)
+            queue.append((path, amount))
+        cascaded_metrics = set()
+        while queue:
+            source_path, source_magnitude = queue.pop(0)
+            if source_path in self.edges:
+                for target_path, weight in self.edges[source_path]:
+                    if target_path not in cascaded_metrics and len(cascaded_metrics) >= per_step_cascade_cap:
+                        continue  # Cap at max per_step_cascade_cap metrics affected
+                    impact = source_magnitude * weight * dampening
+                    if abs(impact) >= 0.05:
+                        old_target_val = self._get_val(new_metrics, target_path)
+                        self._set_val(new_metrics, target_path, old_target_val + impact, is_cascade=True)
+                        cascaded_metrics.add(target_path)
+                        queue.append((target_path, impact))
+        return new_metrics
+def main():
+    # Create LifeMetrics with default values (all at 70)
+    metrics = LifeMetrics()
+    # Create DependencyGraph
+    graph = DependencyGraph()
+    # Define test disruption
+    disruption = {
+        "career.workload": 30.0,
+        "finances.liquidity": -40.0
+    }
+    print("--- LIFE STACK INITIAL STATE (All defaults at 70) ---")
+    before = metrics.flatten()
+    for k, v in before.items():
+        print(f"{k:35} : {v:.2f}")
+    # Run the cascade simulation
+    after_metrics = graph.cascade(metrics, disruption)
+    after = after_metrics.flatten()
+    print("\n--- LIFE STACK AFTER DISRUPTION & CASCADE ---")
+    print(f"Disruption Applied: {disruption}\n")
+    for k in sorted(before.keys()):
+        val_before = before[k]
+        val_after = after[k]
+        diff = val_after - val_before
+        if abs(diff) > 0.001:
+            status = f"-> {val_after:6.2f} ({'+' if diff > 0 else ''}{diff:6.2f}) [CHANGED]"
+        else:
+            status = f"   {val_after:6.2f} ( unchanged )"
+        print(f"{k:35} : {val_before:6.2f} {status}")
+if __name__ == "__main__":
+    main()

core/lifestack_env.py ADDED Viewed

	@@ -0,0 +1,734 @@

+import copy
+from typing import Any, Optional, Dict, List
+from pydantic import Field
+from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
+from core.metric_schema import normalize_metric_path
+from core.reward import compute_reward, compute_task_reward
+from core.task import Task, ExoEvent, Route, Milestone, FlightCrisisTask
+from core.verifier import LifeStackVerifier
+try:
+    from openenv.core import Environment, Action, Observation, State
+    from openenv.core.env_server.types import EnvironmentMetadata
+    from openenv.core.rubrics import Rubric
+    USING_MODERN_API = True
+except ImportError:
+    try:
+        from openenv.env import Env as Environment
+        from pydantic import BaseModel
+        # Shims for missing classes in older/alternative openenv
+        class Action(BaseModel): pass
+        class Observation(BaseModel): pass
+        class State(BaseModel): pass
+        class Rubric:
+            def __init__(self, *a, **k): pass
+            def compute(self, *a, **k): return 0.0
+        EnvironmentMetadata = None
+        USING_MODERN_API = False
+    except ImportError:
+        # Final fallback — must use BaseModel so Pydantic subclasses work
+        from pydantic import BaseModel
+        class Environment:
+            def __init__(self, rubric=None): self.rubric = rubric
+            def reset(self, *a, **k): pass
+            def step(self, *a, **k): pass
+        class Action(BaseModel): pass
+        class Observation(BaseModel): pass
+        class State(BaseModel): pass
+        class Rubric:
+            def __init__(self, *a, **k): pass
+            def compute(self, *a, **k): return 0.0
+        EnvironmentMetadata = None
+        USING_MODERN_API = False
+class LifeStackAction(Action):
+    """Structured action for LifeStack."""
+    metric_changes: Dict[str, float] = Field(default_factory=dict, description="Metric adjustment deltas")
+    resource_cost: Dict[str, float] = Field(default_factory=dict, description="Time, money, and energy costs")
+    actions_taken: int = Field(default=0, description="Number of atomic actions taken")
+    # ToolAction fields (Long-horizon)
+    action_type: Optional[str] = Field(default=None, description="inspect, plan, execute, etc.")
+    target: Optional[str] = Field(default=None, description="e.g. route_id or hidden_key")
+    parameters: Dict[str, Any] = Field(default_factory=dict)
+    reasoning: Optional[str] = Field(default=None)
+    completion: Optional[str] = Field(default=None)
+    inspect_target: Optional[str] = Field(default=None, description="Optional hidden state key to inspect")
+    is_rollback: bool = Field(default=False, description="Set true to rollback the previous action.")
+    @classmethod
+    def from_agent_action(cls, agent_action: Any) -> "LifeStackAction":
+        """Unified converter from legacy AgentAction to LifeStackAction."""
+        primary = agent_action.primary
+        return cls(
+            action_type=primary.action_type,
+            target=primary.target_domain, # Mapping target_domain to target
+            metric_changes=primary.metric_changes,
+            resource_cost=primary.resource_cost,
+            reasoning=agent_action.reasoning,
+            completion=getattr(agent_action, 'raw_completion', ""),
+            actions_taken=1
+        )
+class LifeStackObservation(Observation):
+    """Observation returned by LifeStack."""
+    metrics: Dict[str, float] = Field(default_factory=dict, description="Flattened 23-domain life metrics")
+    resources: Dict[str, float] = Field(default_factory=dict, description="Current budget remaining")
+    step: int = Field(default=0, description="Current episode step")
+    done: bool = Field(default=False)
+    reward: Optional[float] = Field(default=None)
+    metadata: Dict[str, Any] = Field(default_factory=dict)
+class LifeStackState(State):
+    """Internal state of the LifeStack environment."""
+    current_metrics: LifeMetrics = Field(default_factory=LifeMetrics)
+    budget: ResourceBudget = Field(default_factory=ResourceBudget)
+    episode_id: Optional[str] = None
+    step_count: int = 0
+    inspected_keys: list = Field(default_factory=list) # revealed keys
+    consecutive_waits: int = 0
+    used_rollback: bool = Field(default=False)
+    rollback_penalty_charged: bool = Field(default=False)
+    previous_metrics: Optional[LifeMetrics] = None
+    previous_budget: Optional[ResourceBudget] = None
+    # New task fields
+    current_task: Optional[Task] = None
+    active_route_id: Optional[str] = None
+    milestones_achieved: list = Field(default_factory=list)
+    world_state: dict = Field(default_factory=dict)
+    hidden_state: dict = Field(default_factory=dict)
+    fired_event_ids: list = Field(default_factory=list)
+    exo_events_seen: int = 0
+    milestones_after_event: int = 0
+    closed_route_ids: set = Field(default_factory=set)
+    # Legacy / Personality fields
+    person: Optional[Any] = None
+    agent_history: List[tuple] = Field(default_factory=list)
+    current_conflict: Optional[Any] = None
+    rollback_penalty_charged: bool = Field(default=False)
+    cumulative_rel_delta: float = Field(default=0.0)
+class LifeStackRubric(Rubric):
+    """Standard reward rubric for LifeStack."""
+    def forward(self, action: LifeStackAction, observation: LifeStackObservation) -> float:
+        # In LifeStack, reward is usually computed inside step() for state-transition access.
+        # This rubric provides a hook for external reward evaluation if needed.
+        return observation.reward if observation.reward is not None else 0.0
+class PartialObsFilter:
+    @staticmethod
+    def filter(task: Task, revealed_keys: list) -> dict:
+        """Returns visible_world plus any keys the agent has explicitly inspected.
+        Revealed keys are checked against mutable_world first, then hidden_state.
+        Keys sourced from hidden_state are wrapped as
+        ``{"value": <val>, "source": "inspect"}`` so the agent knows they were
+        obtained via an inspect action rather than being freely observable.
+        """
+        obs_world = copy.deepcopy(task.visible_world)
+        for k in revealed_keys:
+            if k in task.mutable_world:
+                obs_world[k] = task.mutable_world[k]
+            elif k in task.hidden_state:
+                obs_world[k] = {"value": task.hidden_state[k], "source": "inspect"}
+        return obs_world
+class WorldEngine:
+    def __init__(self, task: Task):
+        self.task = task
+        self.closed_routes = set()
+    def inject_events(self, step: int, world: dict, hidden: dict) -> list[ExoEvent]:
+        import random
+        fired = []
+        for event in self.task.event_schedule:
+            fire = False
+            if event.step == step:
+                fire = True
+            elif event.step == -1:
+                if random.random() < event.probability:
+                    fire = True
+            if fire:
+                fired.append(event)
+                # Apply mutations
+                world.update(event.world_mutation)
+                hidden.update(event.hidden_state_mutation)
+                for rid in event.closes_routes:
+                    self.closed_routes.add(rid)
+        return fired
+    def get_closed_routes(self) -> set[str]:
+        return self.closed_routes
+_EnvBase = Environment[LifeStackAction, LifeStackObservation, LifeStackState] if USING_MODERN_API else Environment
+class LifeStackEnv(_EnvBase):
+    """
+    LifeStack Environment v1.1 — Refactored for OpenEnv 0.2.3 compliance.
+    """
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    def __init__(self, seed: Optional[int] = None, task=None, max_steps: int = 30):
+        if USING_MODERN_API:
+            super().__init__(rubric=LifeStackRubric())
+        else:
+            super().__init__()
+        self.max_steps = getattr(task, 'horizon', max_steps) if task else max_steps
+        self.metadata_internal = {
+            'name': 'LifeStack-v1',
+            'version': '1.1.0',
+            'description': 'Premium multi-domain life conflict resolution simulation',
+            'max_episode_steps': self.max_steps
+        }
+        self.graph = DependencyGraph()
+        self._internal_state = LifeStackState()
+    def get_metadata(self):
+        if not USING_MODERN_API:
+            return self.metadata_internal
+        from openenv.core.env_server.types import EnvironmentMetadata
+        return EnvironmentMetadata(
+            name=self.metadata_internal['name'],
+            version=self.metadata_internal['version'],
+            description=self.metadata_internal['description']
+        )
+    @property
+    def state(self) -> LifeStackState:
+        return self._internal_state
+    def reset(self, seed: Optional[int] = None, episode_id: Optional[str] = None,
+              task: Optional[Task] = None, conflict: Optional[Any] = None,
+              budget: Optional[dict] = None, person: Optional[Any] = None,
+              agent_history: Optional[List[tuple]] = None, **kwargs) -> LifeStackObservation:
+        """Resets the environment. Seed and task/conflict can be provided."""
+        if USING_MODERN_API and getattr(self, 'rubric', None):
+            self.rubric.reset()
+        if seed is not None:
+            import random
+            random.seed(seed)
+        # 1. Initialize Task
+        self._internal_state.current_task = task or FlightCrisisTask()
+        self.max_steps = getattr(self._internal_state.current_task, 'horizon', 30)
+        # 2. Reset State
+        self._internal_state.episode_id = episode_id
+        self._internal_state.step_count = 0
+        self._internal_state.current_metrics = LifeMetrics()
+        self._internal_state.inspected_keys = []
+        self._internal_state.consecutive_waits = 0
+        self._internal_state.used_rollback = False
+        self._internal_state.rollback_penalty_charged = False
+        self._internal_state.previous_metrics = None
+        self._internal_state.previous_budget = None
+        self._internal_state.rollback_penalty_charged = False
+        self._internal_state.cumulative_rel_delta = 0.0
+        # Task state
+        self._internal_state.world_state = copy.deepcopy(self._internal_state.current_task.mutable_world)
+        self._internal_state.hidden_state = copy.deepcopy(self._internal_state.current_task.hidden_state)
+        self._internal_state.milestones_achieved = []
+        self._internal_state.active_route_id = None
+        self._internal_state.fired_event_ids = []
+        self._internal_state.exo_events_seen = 0
+        self._internal_state.milestones_after_event = 0
+        self._internal_state.closed_route_ids = set()
+        self._internal_state.person = person
+        self._internal_state.agent_history = agent_history or []
+        self._internal_state.current_conflict = conflict
+        self.world_engine = WorldEngine(self._internal_state.current_task)
+        # 3. Budget Scaling
+        scale = max(1.0, self.max_steps / 5.0)
+        constraints = self._internal_state.current_task.constraints
+        self._internal_state.budget = ResourceBudget(
+            time_hours=budget.get("time", constraints.get("time", 20.0 * scale)) if budget else constraints.get("time", 20.0 * scale),
+            money_dollars=budget.get("money", constraints.get("money", 500.0 * scale)) if budget else constraints.get("money", 500.0 * scale),
+            energy_units=budget.get("energy", constraints.get("energy", 100.0 * scale)) if budget else constraints.get("energy", 100.0 * scale)
+        )
+        if conflict:
+            # Legacy disruption support
+            disruption = conflict.primary_disruption if hasattr(conflict, 'primary_disruption') else conflict
+            self._internal_state.current_metrics = self.graph.cascade(self._internal_state.current_metrics, disruption)
+            if budget is None and hasattr(conflict, 'resource_budget'):
+                 rb = conflict.resource_budget
+                 self._internal_state.budget = ResourceBudget(
+                     time_hours=rb.get("time", 20.0),
+                     money_dollars=rb.get("money", 500.0),
+                     energy_units=rb.get("energy", 100.0)
+                 )
+        return self._get_obs()
+    def _get_obs(self, done: bool = False, reward: Optional[float] = None,
+                 success: bool = False, failure: bool = False,
+                 failure_reason: str = "", routes_remaining: int = 0) -> LifeStackObservation:
+        revealed_world = PartialObsFilter.filter(
+            self._internal_state.current_task,
+            self._internal_state.inspected_keys
+        )
+        return LifeStackObservation(
+            metrics=self._internal_state.current_metrics.flatten(),
+            resources={
+                "time": self._internal_state.budget.time_hours,
+                "money": self._internal_state.budget.money_dollars,
+                "energy": self._internal_state.budget.energy_units
+            },
+            step=self._internal_state.step_count,
+            done=done,
+            reward=reward,
+            metadata={
+                "world_state": revealed_world,
+                "goal": self._internal_state.current_task.goal,
+                "active_route": self._internal_state.active_route_id,
+                "milestones": self._internal_state.milestones_achieved,
+                "events": self._internal_state.fired_event_ids,
+                "success": success,
+                "failure": failure,
+                "failure_reason": failure_reason,
+                "routes_remaining": routes_remaining,
+                "conflict_title": self._internal_state.current_conflict.title if hasattr(self._internal_state.current_conflict, 'title') else "Custom Task",
+                "person": self._internal_state.person.name if hasattr(self._internal_state.person, 'name') else "Unknown"
+            }
+        )
+    def _update_metric(self, path: str, delta: float):
+        """Internal helper for non-cascading updates."""
+        path = normalize_metric_path(path)
+        if '.' not in path:
+            return
+        domain_name, sub_name = path.split('.', 1)
+        domain = getattr(self._internal_state.current_metrics, domain_name, None)
+        if domain and hasattr(domain, sub_name):
+            val = getattr(domain, sub_name)
+            setattr(domain, sub_name, max(0.0, min(100.0, val + delta)))
+    def step(self, action: LifeStackAction, timeout_s: Optional[float] = None, **kwargs) -> LifeStackObservation:
+        """Executes one step in the environment using LifeStackAction logic."""
+        if isinstance(action, dict):
+            action = LifeStackAction(**action)
+        task = self._internal_state.current_task
+        state_before = copy.deepcopy(self._internal_state.current_metrics)
+        info_msgs = []
+        # 0. Personality Drift & Legacy Escalation
+        if self._internal_state.person:
+            drift_event = self._internal_state.person.drift(self._internal_state.step_count)
+            if drift_event:
+                path = drift_event.get('metric', '')
+                delta = drift_event.get('delta', 0)
+                if path and '.' in path:
+                    self._update_metric(path, delta)
+                info_msgs.append(f"DRIFT: {drift_event['reason']}")
+        if self._internal_state.current_conflict and self._internal_state.step_count == 2:
+            from agent.conflict_generator import adaptive_escalate
+            conflict = self._internal_state.current_conflict
+            if hasattr(conflict, 'difficulty') and conflict.difficulty < 5:
+                new_conflict, reason = adaptive_escalate(conflict, self._internal_state.agent_history)
+                if new_conflict.id != conflict.id:
+                    self._internal_state.current_conflict = new_conflict
+                    info_msgs.append(f"ESCALATION: {reason} -> {new_conflict.title}")
+        fired_events = self.world_engine.inject_events(
+            self._internal_state.step_count,
+            self._internal_state.world_state,
+            self._internal_state.hidden_state
+        )
+        if fired_events:
+            self._internal_state.exo_events_seen += len(fired_events)
+            for e in fired_events:
+                self._internal_state.fired_event_ids.append(e.id)
+                info_msgs.append(f"EVENT_FIRED: {e.description}")
+        self._internal_state.closed_route_ids.update(self.world_engine.get_closed_routes())
+        # 2. Tool Logic & Metric Changes
+        tool_type = action.action_type or (
+            "rollback" if action.is_rollback else
+            "inspect" if action.inspect_target else
+            "execute"
+        )
+        allowed_keys = set(self._internal_state.current_metrics.flatten().keys())
+        metric_changes = {k: v for k, v in action.metric_changes.items() if k in allowed_keys}
+        resource_cost = copy.deepcopy(action.resource_cost)
+        # Handle Rollback
+        if tool_type == "rollback":
+            self._internal_state.step_count += 1
+            if self._internal_state.used_rollback:
+                info_msgs.append("ROLLBACK_DENIED: Already used once.")
+                return self._get_obs(reward=-0.1)
+            if not self._internal_state.previous_metrics:
+                return self._get_obs(reward=0.0)
+            self._internal_state.current_metrics = copy.deepcopy(self._internal_state.previous_metrics)
+            self._internal_state.budget = copy.deepcopy(self._internal_state.previous_budget)
+            self._internal_state.used_rollback = True
+            self._internal_state.rollback_penalty_charged = True  # Penalty baked into the -0.1 return above
+            return self._get_obs(reward=-0.1)
+        # Save state for future rollback
+        self._internal_state.previous_metrics = copy.deepcopy(self._internal_state.current_metrics)
+        self._internal_state.previous_budget = copy.deepcopy(self._internal_state.budget)
+        # Handle Inspect
+        if tool_type == "inspect":
+            target = action.target or action.inspect_target
+            if target:
+                if target in self._internal_state.inspected_keys:
+                    info_msgs.append(f"INSPECT_REDUNDANT: {target}")
+                else:
+                    self._internal_state.inspected_keys.append(target)
+                    info_msgs.append(f"INSPECT_REVEALED: {target}")
+                    # Emit an explicit signal when a hidden-state value is uncovered.
+                    if target in task.hidden_state:
+                        info_msgs.append(
+                            f"INSPECT_REVEALED_HIDDEN: {target} = {task.hidden_state[target]}"
+                        )
+        # Handle Wait
+        if tool_type == "wait":
+            self._internal_state.consecutive_waits += 1
+            if self._internal_state.consecutive_waits >= 4:
+                metric_changes["mental_wellbeing.stress_level"] = metric_changes.get("mental_wellbeing.stress_level", 0) + 15.0
+                info_msgs.append("WAIT_CAP_EXCEEDED: Forced stress applied.")
+        else:
+            self._internal_state.consecutive_waits = 0
+        # Handle Route Execution
+        if tool_type == "execute" and action.target:
+            route = next((r for r in task.viable_routes if r.id == action.target), None)
+            if route:
+                # Check closed
+                if route.id in self._internal_state.closed_route_ids:
+                    info_msgs.append(f"ROUTE_BLOCKED: {route.name}")
+                else:
+                    # Check preconditions
+                    pre_ok = True
+                    for k, v in route.preconditions.items():
+                        current_v = self._internal_state.hidden_state.get(k, self._internal_state.world_state.get(k))
+                        if current_v != v:
+                            pre_ok = False
+                            break
+                    if not pre_ok:
+                        info_msgs.append(f"PRECONDITIONS_FAILED for {route.name}")
+                    else:
+                        # Success: Apply route
+                        self._internal_state.active_route_id = route.id
+                        self._internal_state.world_state.update(route.consequences)
+                        info_msgs.append(f"ROUTE_SUCCESS: {route.name}")
+        # 3. Resource Deduction (must happen BEFORE metric changes to prevent budget-bypass exploit)
+        deduct_ok = self._internal_state.budget.deduct(
+            time=resource_cost.get('time', 0.0),
+            money=resource_cost.get('money', 0.0),
+            energy=resource_cost.get('energy', 0.0)
+        )
+        if not deduct_ok:
+            info_msgs.append("RESOURCE_DEPLETED_ACTION_BLOCKED")
+            metric_changes = {}  # Discard changes — agent can't afford this action
+        # 4. Apply Metric and Cascade
+        sig_changes = {k: v for k, v in metric_changes.items() if abs(v) > 5.0}
+        for k, v in metric_changes.items():
+            if k not in sig_changes:
+                self._update_metric(k, v)
+        if sig_changes:
+            self._internal_state.current_metrics = self.graph.cascade(self._internal_state.current_metrics, sig_changes)
+        # 5. Task Progression Check
+        success_mets = LifeStackVerifier.check_success(task, self._internal_state.world_state, self._internal_state.hidden_state)
+        failure_mets = LifeStackVerifier.check_failure(task, self._internal_state.world_state, self._internal_state.hidden_state, self._internal_state.current_metrics.flatten())
+        # Check milestones dynamically
+        newly_met = LifeStackVerifier.check_new_milestones(task, self._internal_state.world_state, self._internal_state.hidden_state, self._internal_state.milestones_achieved)
+        for mid in newly_met:
+            self._internal_state.milestones_achieved.append(mid)
+            if self._internal_state.exo_events_seen > 0:
+                self._internal_state.milestones_after_event += 1
+            info_msgs.append(f"MILESTONE_UNLOCKED: {mid}")
+        # 6. Reward Calculation (Task-Aware)
+        routes_rem, _ = LifeStackVerifier.get_route_status(task, self._internal_state.closed_route_ids, self._internal_state.world_state, self._internal_state.hidden_state)
+        # Determine cascade collapse
+        metrics_after = self._internal_state.current_metrics.flatten()
+        metrics_before = state_before.flatten()
+        collapse = any(metrics_after[k] < 20 and metrics_before[k] >= 20 for k in metrics_after)
+        # Track cumulative relationship erosion across steps
+        rel_keys_cum = [k for k in metrics_after if k.startswith('relationships.')]
+        if rel_keys_cum:
+            step_rel_delta = sum(metrics_after[k] - metrics_before[k] for k in rel_keys_cum) / len(rel_keys_cum)
+            self._internal_state.cumulative_rel_delta += step_rel_delta
+        # Increment step_count BEFORE reward so timeout_check fires correctly
+        self._internal_state.step_count += 1
+        # Rollback penalty fires only once per episode
+        rollback_this_step = self._internal_state.used_rollback and not self._internal_state.rollback_penalty_charged
+        if rollback_this_step:
+            self._internal_state.rollback_penalty_charged = True
+        # conflict_domain from task.domain (not conflict.title) to prevent empty-string bypass
+        conflict_domain = task.domain if task and hasattr(task, 'domain') else ""
+        if task:
+            reward, breakdown = compute_task_reward(
+                state_before=state_before,
+                state_after=self._internal_state.current_metrics,
+                resources_used=resource_cost,
+                actions_taken=action.actions_taken,
+                milestones_achieved=self._internal_state.milestones_achieved,
+                success_conditions_met=success_mets,
+                exo_events_seen=self._internal_state.exo_events_seen,
+                milestones_after_event=self._internal_state.milestones_after_event,
+                routes_remaining=routes_rem,
+                rollback_used=rollback_this_step,
+                cascade_collapse=collapse,
+                task=task,
+                reasoning=getattr(action, 'reasoning', ""),
+                completion=getattr(action, 'completion', ""),
+                conflict_domain=conflict_domain,
+                step_count=self._internal_state.step_count,
+                max_steps=self.max_steps,
+                metric_changes=metric_changes,
+                cumulative_rel_delta=self._internal_state.cumulative_rel_delta,
+                action_type=tool_type
+            )
+            # Charge the rollback penalty only once per episode
+            if self._internal_state.used_rollback and not self._internal_state.rollback_penalty_charged:
+                self._internal_state.rollback_penalty_charged = True
+        else:
+            reward, breakdown = compute_reward(
+                state_before=state_before,
+                state_after=self._internal_state.current_metrics,
+                resources_used=resource_cost,
+                actions_taken=action.actions_taken,
+                metric_changes=metric_changes,
+                completion=getattr(action, 'completion', ""),
+                action_type=tool_type
+            )
+        # 7. End Conditions
+        # Check if ANY success condition is met.
+        # For multi-goal tasks with mutually exclusive routes, any() allows termination.
+        is_success = any(success_mets) if (success_mets and len(task.success_conditions) > 0) else False
+        is_task_failure = any(val == True for val in failure_mets)
+        metric_death = any(v <= 10 for v in metrics_after.values())
+        failure_reason = ""
+        if is_task_failure:
+            reasons = [cond['key'] for i, cond in enumerate(task.failure_conditions) if failure_mets[i]]
+            failure_reason = f"Condition failed: {', '.join(reasons)}"
+        elif metric_death:
+            dead_metrics = [k for k, v in metrics_after.items() if v <= 0]
+            failure_reason = f"Metrics hit zero: {', '.join(dead_metrics)}"
+        elif routes_rem == 0 and not is_success:
+            failure_reason = "Dead end: No reachable routes left."
+        terminated = is_task_failure or metric_death
+        truncated = self._internal_state.step_count >= self.max_steps
+        if is_success:
+            truncated = True
+        done = terminated or truncated
+        observation = self._get_obs(
+            done,
+            reward,
+            success=is_success,
+            failure=terminated,
+            failure_reason=failure_reason,
+            routes_remaining=routes_rem
+        )
+        observation.metadata["breakdown"] = breakdown
+        observation.metadata["info"] = info_msgs
+        return observation
+    def rollout(self, n_steps: int = 7, gamma: float = 0.9) -> dict:
+        """
+        Simulate n_steps null/rest actions starting from the current env state.
+        Intended to be called immediately AFTER env.step(model_action) so it
+        models "what happens to your life over the next N days if nothing
+        extraordinary occurs."
+        The env state is fully restored after the rollout — calling this is
+        side-effect-free from the caller's perspective.
+        Returns:
+            {
+              "discounted_reward": float,          # γ-discounted cumulative
+              "immediate_r0": float,               # reward from the action (caller supplies)
+              "trajectory": [                      # one entry per simulated day
+                  {
+                      "step": int,                 # 1-indexed future day
+                      "reward": float,
+                      "metrics": Dict[str, float], # flattened snapshot
+                      "discounted_contribution": float,
+                  },
+                  ...
+              ],
+              "n_steps_completed": int,
+            }
+        """
+        saved_state = copy.deepcopy(self._internal_state)
+        null_action = LifeStackAction(
+            action_type="rest",
+            target="time",
+            metric_changes={},
+            resource_cost={},
+            actions_taken=0,
+        )
+        trajectory = []
+        cumulative = 0.0
+        for t in range(n_steps):
+            obs = self.step(null_action)
+            disc = (gamma ** (t + 1)) * float(obs.reward)
+            cumulative += disc
+            trajectory.append({
+                "step": t + 1,
+                "reward": float(obs.reward),
+                "metrics": dict(obs.metrics),
+                "discounted_contribution": round(disc, 5),
+            })
+            if obs.done:
+                break
+        # Restore — rollout must not mutate the env visible to the caller
+        self._internal_state = saved_state
+        return {
+            "discounted_reward": round(cumulative, 5),
+            "trajectory": trajectory,
+            "n_steps_completed": len(trajectory),
+        }
+    def render(self):
+        """Vibrant status report of the current state and task progress."""
+        task = self._internal_state.current_task
+        print("\n" + "═"*70)
+        print(f"🎯 GOAL: {task.goal} | Horizon: {self._internal_state.step_count}/{self.max_steps}")
+        print(f"⌛ TIME: {self._internal_state.budget.time_hours:.1f}h | 💵 MONEY: ${self._internal_state.budget.money_dollars:.1f} | ⚡ ENERGY: {self._internal_state.budget.energy_units:.1f}")
+        if self._internal_state.active_route_id:
+            print(f"🛣️ ACTIVE ROUTE: {self._internal_state.active_route_id}")
+        print(f"⭐ MILESTONES: {', '.join(self._internal_state.milestones_achieved) or 'None'}")
+        if self._internal_state.fired_event_ids:
+            print(f"🚨 EVENTS: {', '.join(self._internal_state.fired_event_ids)}")
+        flat = self._internal_state.current_metrics.flatten()
+        domain_labels = {
+            "career": "💼 CAREER",
+            "finances": "💰 FINANCES",
+            "relationships": "❤️ RELATIONSHIPS",
+            "physical_health": "💪 PHYSICAL",
+            "mental_wellbeing": "🧠 MENTAL",
+            "time": "📅 TIME"
+        }
+        for dom, label in domain_labels.items():
+            print(f"\n{label}")
+            submetrics = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
+            inverted = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
+            for name, val in submetrics.items():
+                short = name.split('.')[1]
+                icon = ("🔴" if val > 70 else "🟢") if short in inverted else ("🟢" if val > 70 else "🔴")
+                if 40 <= val <= 70: icon = "🟡"
+                print(f"  {icon} {short:20} : {val:5.2f}")
+        print("═"*70)
+def env_render_compact(env, obs):
+    """Compact printer for testing."""
+    print(f"STEP: {obs.step} | REWARD: {obs.reward:.3f} | DONE: {obs.done}")
+    if obs.metadata.get("breakdown", {}).get("penalties_fired"):
+        print(f"  ⚠️ PENALTIES: {obs.metadata['breakdown']['penalties_fired']}")
+def main():
+    env = LifeStackEnv()
+    # 1. Reset with Friday 6PM Conflict
+    conflict = {
+        "career.workload": 30.0,
+        "finances.liquidity": -40.0
+    }
+    print("Initializing environment with Friday 6PM conflict...")
+    env.reset(conflict=conflict)
+    env.render()
+    total_reward = 0
+    metrics_history = []
+    # 2. Sequential Actions
+    scenarios = [
+        {
+            "name": "GOOD ACTION: Delegating and budget review",
+            "action": {
+                "metric_changes": {"career.workload": -15.0, "finances.liquidity": 10.0, "mental_wellbeing.stress_level": -5.0},
+                "resource_cost": {"time": 4.0, "money": 100.0, "energy": 20.0},
+                "actions_taken": 2
+            }
+        },
+        {
+            "name": "MEDIUM ACTION: Small self-care rest",
+            "action": {
+                "metric_changes": {"physical_health.sleep_quality": 6.0, "mental_wellbeing.clarity": 3.0},
+                "resource_cost": {"time": 2.0, "energy": -20.0}, # Rest recovers energy
+                "actions_taken": 1
+            }
+        },
+        {
+            "name": "INACTION: Let the cascade run",
+            "action": {
+                "metric_changes": {},
+                "resource_cost": {},
+                "actions_taken": 0
+            }
+        }
+    ]
+    for sce in scenarios:
+        print(f"\nTaking Action: {sce['name']}...")
+        action_obj = LifeStackAction(**sce['action'])
+        obs = env.step(action_obj)
+        env_render_compact(env, obs)
+        total_reward += (obs.reward or 0.0)
+    # 3. Final Summary
+    final_flat = env.state.current_metrics.flatten()
+    critical = [k for k, v in final_flat.items() if v < 20]
+    print("\n" + "█"*60)
+    print("EPISODE SUMMARY")
+    print(f"Steps Taken      : {env.state.step_count}")
+    print(f"Total Cumulative Reward : {total_reward:.4f}")
+    if critical:
+        print(f"Critical Floor Violations: {', '.join(critical)}")
+    else:
+        print("Critical Violations: NONE")
+    print("█"*60)
+if __name__ == "__main__":
+    main()

core/lifestack_gym_env.py ADDED Viewed

	@@ -0,0 +1,171 @@

+"""
+lifestack_gym_env.py — Gymnasium-compatible wrapper for LifeStack
+Exposes the LifeStack environment as a standard gym.Env with:
+- observation_space: Box(0, 100, shape=(26,)) — 23 sub-metrics + 3 resources
+- action_space: Discrete(7) — 7 action types mapped to template actions
+- Standard reset() / step() / render() API
+"""
+'''we are not using this as of now, this was been used in old model :)'''
+import gymnasium as gym
+import numpy as np
+from gymnasium import spaces
+import random, copy
+from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
+from core.metric_schema import normalize_metric_path
+from core.reward import compute_reward, compute_task_reward
+from agent.conflict_generator import generate_conflict, ConflictEvent
+from intake.simperson import SimPerson
+# Map discrete action IDs to action types
+ACTION_TYPE_MAP = {
+    0: "negotiate",
+    1: "communicate",
+    2: "delegate",
+    3: "spend",
+    4: "reschedule",
+    5: "rest",
+    6: "execute",
+}
+class LifeStackGymEnv(gym.Env):
+    """
+    LifeStack as a Gymnasium environment.
+    Observation: 26-dim vector (23 life sub-metrics + 3 resource values)
+    Action: Discrete(7) — one of 7 action types
+    Reward: float in [-1, 1]
+    """
+    metadata = {"render_modes": ["human", "ansi"]}
+    def __init__(self, task=None, difficulty: int = None, render_mode: str = None, max_steps: int = 30):
+        super().__init__()
+        self.observation_space = spaces.Box(
+            low=0.0, high=100.0, shape=(26,), dtype=np.float32
+        )
+        self.action_space = spaces.Discrete(7)
+        self.render_mode = render_mode
+        self.task = task
+        self.difficulty = difficulty
+        self.max_steps = max_steps
+        from core.lifestack_env import LifeStackEnv
+        self.env = LifeStackEnv()
+        self._metric_keys = list(LifeMetrics().flatten().keys())
+    def _obs_vector(self) -> np.ndarray:
+        flat = self.env.state.current_metrics.flatten()
+        metric_vals = [flat[k] for k in self._metric_keys]
+        budget = self.env.state.budget
+        resource_vals = [
+            budget.time_hours,
+            budget.money_dollars,
+            budget.energy_units,
+        ]
+        return np.array(metric_vals + resource_vals, dtype=np.float32)
+    def reset(self, seed=None, options=None):
+        super().reset(seed=seed)
+        conflict = None
+        if self.task is None:
+            from agent.conflict_generator import generate_conflict
+            conflict = generate_conflict(self.difficulty)
+        obs_obj = self.env.reset(task=self.task, conflict=conflict)
+        return self._obs_vector(), obs_obj.metadata
+    def step(self, action: int):
+        from core.lifestack_env import LifeStackAction
+        action_type = ACTION_TYPE_MAP[action]
+        # Build logical action from template
+        metric_changes, resource_cost = self._action_to_changes(action_type)
+        # In this wrapper, we pick a reasonable target if needed
+        target = ""
+        current_task = self.env.state.current_task
+        if action_type == "execute" and current_task:
+            for r in current_task.viable_routes:
+                if r.id not in self.env.state.closed_route_ids:
+                    target = r.id
+                    break
+        ls_action = LifeStackAction(
+            action_type=action_type,
+            target=target,
+            reasoning=f"Agent chose {action_type} for discrete action {action}.",
+            metric_changes=metric_changes,
+            resource_cost=resource_cost,
+            actions_taken=1
+        )
+        obs_obj = self.env.step(ls_action)
+        terminated = obs_obj.done
+        # Truncated only if not naturally terminated
+        truncated = (not terminated) and (self.env.state.step_count >= (self.task.horizon if self.task else self.max_steps))
+        return self._obs_vector(), obs_obj.reward, terminated, truncated, {"breakdown": obs_obj.metadata.get("breakdown", {})}
+    def _action_to_changes(self, action_type: str):
+        """Maps an action type string to (metric_changes, resource_cost)."""
+        templates = {
+            "negotiate": (
+                {"career.workload": -15.0, "mental_wellbeing.stress_level": -5.0},
+                {"time": 1.5, "energy": 20.0},
+            ),
+            "communicate": (
+                {"relationships.romantic": 10.0, "mental_wellbeing.stress_level": -5.0},
+                {"time": 0.5, "energy": 10.0},
+            ),
+            "delegate": (
+                {"career.workload": -10.0, "relationships.professional_network": -5.0},
+                {"time": 1.0, "energy": 15.0},
+            ),
+            "spend": (
+                {"finances.liquidity": -20.0, "mental_wellbeing.stress_level": -10.0},
+                {"time": 1.0, "energy": 15.0},
+            ),
+            "reschedule": (
+                {"career.workload": -10.0, "time.free_hours_per_week": 5.0},
+                {"time": 2.0, "energy": 15.0},
+            ),
+            "rest": (
+                {"mental_wellbeing.stress_level": -12.0, "physical_health.energy": 10.0},
+                {"time": 1.0},
+            ),
+            "execute": (
+                {}, # executes a route target
+                {"time": 1.0, "energy": 10.0},
+            ),
+        }
+        return templates.get(action_type, ({}, {}))
+    def render(self):
+        if self.render_mode == "human":
+            # Delegate to the internal env's render
+            self.env.render()
+# ── Quick smoke test ──
+if __name__ == "__main__":
+    env = LifeStackGymEnv(difficulty=3, render_mode="human")
+    obs, info = env.reset()
+    print(f"Conflict: {info['conflict_title']} | Person: {info['person']}")
+    print(f"Obs shape: {obs.shape}, dtype: {obs.dtype}")
+    env.render()
+    total = 0.0
+    done = False
+    while not done:
+        act = env.action_space.sample()
+        obs, rew, term, trunc, info = env.step(act)
+        total += rew
+        done = term or trunc
+        print(f"  Action {act} → reward {rew:.3f}")
+    env.render()
+    print(f"\nTotal reward: {total:.3f}")

core/metric_schema.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from core.life_state import LifeMetrics
+VALID_METRIC_PATHS = tuple(sorted(LifeMetrics().flatten().keys()))
+LEGACY_METRIC_ALIASES = {
+    "physical_health.exercise_routine": "physical_health.fitness",
+}
+def normalize_metric_path(path: str) -> str:
+    """Map legacy or malformed metric names onto the current LifeMetrics schema."""
+    if not isinstance(path, str):
+        return ""
+    path = path.strip()
+    return LEGACY_METRIC_ALIASES.get(path, path)
+def is_valid_metric_path(path: str) -> bool:
+    return normalize_metric_path(path) in VALID_METRIC_PATHS
+def format_valid_metrics() -> str:
+    grouped = {}
+    for path in VALID_METRIC_PATHS:
+        domain, metric = path.split(".", 1)
+        grouped.setdefault(domain, []).append(metric)
+    return "\n".join(
+        f"{domain}: {', '.join(metrics)}" for domain, metrics in grouped.items()
+    )

core/reward.py ADDED Viewed

	@@ -0,0 +1,463 @@

+import math
+import copy
+import json
+import re
+from core.life_state import LifeMetrics
+from core.task import Task
+def compute_reward(
+    state_before: LifeMetrics,
+    state_after: LifeMetrics,
+    resources_used: dict,
+    actions_taken: int,
+    metric_changes: dict = None,
+    completion: str = None,
+    disruption_baseline: int = None,
+    action_type: str = ""
+) -> tuple[float, dict]:
+    """
+    Computes the reward for a life step based on changes in LifeMetrics and resource usage.
+    Args:
+        state_before: The state at the start of the step.
+        state_after: The state after actions and cascades.
+        resources_used: Dict with keys 'time', 'money', 'energy'.
+        actions_taken: Integer count of intentional actions performed.
+        disruption_baseline: Expected number of metrics affected by an action.
+    Returns:
+        tuple[float, dict]: (final_reward, breakdown_dict)
+    """
+    before_flat = state_before.flatten()
+    after_flat = state_after.flatten()
+    # 1. OUTCOME SCORE (Weighted average of positive deltas)
+    domain_weights = {
+        "career": 1/6,
+        "finances": 1/6,
+        "relationships": 1/6,
+        "physical_health": 1/6,
+        "mental_wellbeing": 1/6,
+        "time": 1/6
+    }
+    # Map sub-metrics to their domains
+    submetrics_per_domain = {}
+    for k in before_flat.keys():
+        domain = k.split('.')[0]
+        submetrics_per_domain[domain] = submetrics_per_domain.get(domain, 0) + 1
+    outcome_score = 0.0
+    for k in before_flat.keys():
+        domain = k.split('.')[0]
+        delta = after_flat[k] - before_flat[k]
+        if delta > 0:
+            # Each domain is 1/6. Each sub-metric within a domain gets its equal share of that 1/6.
+            # Normalize delta by 100 (max possible increase is 100).
+            weight = domain_weights[domain] / submetrics_per_domain[domain]
+            outcome_score += (delta / 100.0) * weight
+    # 2. CASCADE CONTAINMENT SCORE
+    worsened_count = sum(1 for k in before_flat.keys() if after_flat[k] < before_flat[k])
+    total_metrics = len(before_flat)
+    cascade_containment_score = 1.0 - (worsened_count / total_metrics)
+    # 3. RESOURCE EFFICIENCY SCORE
+    # Available: time 20, money 500, energy 100
+    m_time = resources_used.get('time', 0.0) / 20.0
+    m_money = resources_used.get('money', 0.0) / 500.0
+    m_energy = resources_used.get('energy', 0.0) / 100.0
+    # Normalize by total slots (3 resources)
+    resource_efficiency_score = 1.0 - ((m_time + m_money + m_energy) / 3.0)
+    resource_efficiency_score = max(0.0, min(1.0, resource_efficiency_score))
+    # 4. RELATIONSHIP PRESERVATION SCORE (Sigmoid applied to average delta)
+    rel_keys = [k for k in before_flat.keys() if k.startswith('relationships.')]
+    avg_rel_before = sum(before_flat[k] for k in rel_keys) / len(rel_keys)
+    avg_rel_after = sum(after_flat[k] for k in rel_keys) / len(rel_keys)
+    delta_rel = avg_rel_after - avg_rel_before
+    # score = 1 / (1 + exp(-delta/10))
+    relationship_preservation_score = 1.0 / (1.0 + math.exp(-delta_rel / 10.0))
+    # FINAL REWARD FORMULA
+    base_reward = (
+        (0.40 * outcome_score) +
+        (0.25 * cascade_containment_score) +
+        (0.20 * resource_efficiency_score) +
+        (0.15 * relationship_preservation_score)
+    )
+    # PENALTIES
+    penalties = 0.0
+    fired = []
+    # -0.50 if ANY metric is below 20 after the step
+    if any(v < 20 for v in after_flat.values()):
+        penalties -= 0.50
+        fired.append("CRITICAL_FLOOR_VIOLATION")
+    # -0.30 if cascade spread wider than the number of metrics the agent directly changed
+    # Scaled baseline from task metadata preferred over hardcoded default
+    if disruption_baseline is None:
+        disruption_baseline = len(metric_changes) if metric_changes else 2
+    if worsened_count > disruption_baseline:
+        penalties -= 0.30
+        fired.append("CASCADE_SPREAD_WIDER")
+    # -0.40 if actions_taken == 0
+    if actions_taken == 0:
+        penalties -= 0.40
+        fired.append("INACTION_PENALTY")
+    # -0.15 if relationships domain average dropped more than 20 points
+    if delta_rel < -20:
+        penalties -= 0.15
+        fired.append("RELATIONSHIP_COLLAPSE")
+    # [NEW] Plausibility Penalty
+    plaus = 0.0
+    if metric_changes:
+        plaus = reward_plausibility_check(metric_changes, resources_used)
+        if plaus < 0:
+            penalties += plaus
+            fired.append("PLAUSIBILITY_VIOLATION")
+    # [NEW] Format Compliance & Reasoning
+    comp_reward = 0.0
+    reasoning = ""
+    if completion:
+        comp_reward = reward_format_compliance(completion)
+        try:
+            # Simple extract reasoning from JSON if possible
+            import json
+            data = json.loads(completion)
+            reasoning = data.get("reasoning", "")
+        except:
+            pass
+    # [NEW] Reasoning Alignment (tied to action_type)
+    reasoning_score = reward_reasoning_coherence(reasoning, action_type=action_type)
+    final_reward = max(-1.0, min(1.0, base_reward + penalties))
+    breakdown = {
+        "components": {
+            "outcome": outcome_score,
+            "containment": cascade_containment_score,
+            "efficiency": resource_efficiency_score,
+            "preservation": relationship_preservation_score,
+            "format_compliance": comp_reward,
+            "plausibility": plaus,
+            "reasoning_alignment": reasoning_score
+        },
+        "base_reward": base_reward,
+        "penalties_total": penalties,
+        "penalties_fired": fired,
+        "metrics_worsened": worsened_count,
+        "rel_delta": delta_rel
+    }
+    return final_reward, breakdown
+def compute_milestone_reward(milestones_achieved: list[str], task: Task) -> float:
+    if not task.milestones:
+        return 0.0
+    total_possible = sum(m.reward for m in task.milestones)
+    if total_possible == 0:
+        return 0.0
+    achieved = sum(m.reward for m in task.milestones if m.id in milestones_achieved)
+    return min(1.0, achieved / total_possible)
+def compute_task_completion_reward(success_conditions_met: list[bool], task: Task) -> float:
+    # A task is completed if any of its target success conditions are satisfied.
+    # This handles tasks with multiple alternative goal-states (e.g. choice of routes).
+    if not success_conditions_met:
+        return 0.0
+    return 1.0 if any(success_conditions_met) else 0.0
+def compute_replan_bonus(exo_events_seen: int, milestones_after_event: int) -> float:
+    # Scale bonus based on ability to bounce back after exogenous events
+    if exo_events_seen == 0:
+        return 0.0
+    return min(1.0, (milestones_after_event / exo_events_seen) * 0.5)
+def compute_dead_end_penalty(routes_remaining: int) -> float:
+    return -0.5 if routes_remaining <= 0 else 0.0
+def compute_task_reward(
+    state_before: LifeMetrics,
+    state_after: LifeMetrics,
+    resources_used: dict,
+    actions_taken: int,
+    milestones_achieved: list[str],
+    success_conditions_met: list[bool],
+    exo_events_seen: int,
+    milestones_after_event: int,
+    routes_remaining: int,
+    rollback_used: bool,
+    cascade_collapse: bool,
+    task: Task,
+    reasoning: str = "",
+    completion: str = "",
+    conflict_domain: str = "",
+    step_count: int = 0,
+    max_steps: int = 0,
+    metric_changes: dict = None,
+    cumulative_rel_delta: float = 0.0,
+    action_type: str = ""
+) -> tuple[float, dict]:
+    # 1. Base local components (with scaled disruption baseline from task metadata)
+    d_baseline = len(task.mutable_world) if task and hasattr(task, 'mutable_world') else None
+    local_reward, local_breakdown = compute_reward(state_before, state_after, resources_used, actions_taken,
+                                                   metric_changes=metric_changes, completion=completion,
+                                                   disruption_baseline=d_baseline, action_type=action_type)
+    # 2. Orchestrator components
+    # Use only the raw outcome component from local_breakdown to avoid double-counting
+    # efficiency, containment, or preservation which are added separately below.
+    outcome_score_local = local_breakdown["components"].get("outcome", 0.0)
+    milestone_score = compute_milestone_reward(milestones_achieved, task)
+    completion_score = compute_task_completion_reward(success_conditions_met, task)
+    replan_score = compute_replan_bonus(exo_events_seen, milestones_after_event)
+    efficiency_score = local_breakdown["components"].get("efficiency", 0.0)
+    preservation_score = local_breakdown["components"].get("preservation", 0.0)
+    reasoning_score = reward_reasoning_coherence(reasoning, action_type=action_type)
+    # Check for specific failure cases
+    timeout_pen = reward_timeout_check(step_count, max_steps, any(success_met for success_met in success_conditions_met) if success_conditions_met else False)
+    dead_end_pen = compute_dead_end_penalty(routes_remaining)
+    # 3. Final weighting (all components are now unique/non-overlapping)
+    # Weights: Milestone 35%, Completion 25%, Outcome 10%, Preservation 5%, Replan 10%, Efficiency 10%, Reasoning 5%
+    base_reward = (
+        (0.35 * milestone_score) +
+        (0.25 * completion_score) +
+        (0.10 * outcome_score_local) +
+        (0.05 * preservation_score) +
+        (0.10 * replan_score) +
+        (0.10 * efficiency_score) +
+        (0.05 * reasoning_score)
+    )
+    # 4. Penalties
+    penalties = 0.0
+    fired = []
+    if timeout_pen < 0:
+        penalties += timeout_pen
+        fired.append("TIMEOUT")
+    if dead_end_pen < 0:
+        penalties += dead_end_pen
+        fired.append("DEAD_END")
+    if rollback_used:
+        penalties += -0.1
+        fired.append("ROLLBACK_USED")
+    if cascade_collapse:
+        penalties += -0.3
+        fired.append("CASCADE_COLLAPSE")
+    # Direct inaction penalty — not diluted by the 0.05 local weight
+    if actions_taken == 0:
+        penalties += -0.20
+        fired.append("TASK_INACTION_PENALTY")
+    # Cumulative relationship erosion across the episode
+    if cumulative_rel_delta < -20:
+        penalties += -0.15
+        fired.append("CUMULATIVE_RELATIONSHIP_EROSION")
+    final_reward = max(-1.0, min(1.0, base_reward + penalties))
+    breakdown = {
+        "components": {
+            "local_metric_delta": outcome_score_local,
+            "milestone": milestone_score,
+            "completion": completion_score,
+            "replan": replan_score,
+            "efficiency": efficiency_score,
+            "reasoning": reasoning_score,
+            "format_compliance": local_breakdown["components"].get("format_compliance", 0.0),
+            "plausibility": local_breakdown["components"].get("plausibility", 0.0),
+            "timeout_penalty": timeout_pen
+        },
+        "base_reward": base_reward,
+        "penalties_total": penalties,
+        "penalties_fired": fired,
+        "local_breakdown": local_breakdown
+    }
+    return final_reward, breakdown
+def reward_format_compliance(completion: str) -> float:
+    """
+    Scores the completion based on its format (JSON validity and required fields).
+    Returns:
+        +1.0: Valid JSON with all required fields:
+              action_type, target_domain, metric_changes, resource_cost, reasoning
+        +0.5: Any parseable JSON (including partial/incomplete dicts)
+        -0.5: Invalid JSON / unparseable
+        -1.0: Empty strings or refusal content
+    """
+    if not completion or len(completion.strip()) < 10:
+        return -1.0
+    # Potential refusal indicators
+    if any(x in completion.lower() for x in ["i cannot", "i'm sorry", "as an ai"]):
+        return -1.0
+    # Extract JSON content from markdown code blocks if present
+    json_str = completion.strip()
+    if "```json" in json_str:
+        json_str = json_str.split("```json")[-1].split("```")[0].strip()
+    elif "```" in json_str:
+        json_str = json_str.split("```")[-1].split("```")[0].strip()
+    try:
+        data = json.loads(json_str)
+        required = ["action_type", "target_domain", "metric_changes", "resource_cost", "reasoning"]
+        if isinstance(data, dict) and all(k in data and data.get(k) is not None for k in required):
+            return 1.0
+        return 0.5
+    except json.JSONDecodeError:
+        # Final attempt: try to find anything between { and }
+        match = re.search(r'\{.*\}', json_str, re.DOTALL)
+        if match:
+            try:
+                data = json.loads(match.group(0))
+                required = ["action_type", "target_domain", "metric_changes", "resource_cost", "reasoning"]
+                if isinstance(data, dict) and all(k in data and data.get(k) is not None for k in required):
+                    return 1.0
+                return 0.5
+            except:
+                pass
+        return -0.5
+def reward_plausibility_check(metric_changes: dict, resource_cost: dict) -> float:
+    """
+    Anti-gaming check. Prevents the model from claiming massive metric changes while spending 0 resources.
+    Resource cost is normalized to comparable units (time/20h, money/$500, energy/100pts).
+    """
+    total_delta = sum(abs(v) for v in metric_changes.values())
+    # Zero-cost shortcut: any non-trivial claim with no cost at all is implausible
+    # Also handles empty resource_cost.
+    if not resource_cost or all(v == 0 for v in resource_cost.values()):
+        if total_delta > 3.0:
+            return -0.30
+        return 0.0
+    # Normalize each resource dimension to [0,1] before summing
+    norm_time   = resource_cost.get('time', 0.0)   / 20.0
+    norm_money  = resource_cost.get('money', 0.0)  / 500.0
+    norm_energy = resource_cost.get('energy', 0.0) / 100.0
+    total_cost  = norm_time + norm_money + norm_energy
+    ratio = total_delta / max(0.01, total_cost)
+    if ratio > 150:
+        return -0.30   # Claiming massive change for virtually free
+    if ratio > 80:
+        return -0.10   # Highly suspicious efficiency
+    return 0.0         # Plausible ratio
+def reward_timeout_check(step_count: int, max_steps: int, done: bool) -> float:
+    """
+    Penalizes episodes that end by reaching the step limit without being resolved.
+    """
+    if step_count >= max_steps and not done:
+        return -0.20
+    return 0.0
+def reward_reasoning_coherence(reasoning: str, action_type: str = "") -> float:
+    """
+    Harden verification of logical consistency. Requires both length and
+    alignment with the chosen action to prevent word-stuffing.
+    """
+    if not reasoning or len(reasoning.strip()) < 20:
+        return -0.20 # Severe penalty for lack of effort
+    reasoning_lower = reasoning.lower()
+    score = 0.0
+    # 1. Structural Logic Check
+    # Reward use of logical connectors rather than just list of facts
+    connectors = ["because", "since", "therefore", "due to", "resulting in", "consequently"]
+    if any(c in reasoning_lower for c in connectors):
+        score += 0.05
+    # 2. Action Alignment (Non-Gammable Anti-Hacking)
+    # The reasoning MUST logically justify the chosen category.
+    action_keywords = {
+        "spend": ["cost", "price", "expensive", "money", "budget", "finance"],
+        "rest": ["energy", "sleep", "exhaustion", "recharge", "break"],
+        "communicate": ["talk", "discuss", "speak", "message", "call", "explain"],
+        "delegate": ["hand off", "assign", "help", "junior", "colleague"],
+        "negotiate": ["bargain", "trade", "deal", "terms"],
+        "deprioritize": ["later", "postpone", "unimportant", "drop"],
+        "reschedule": ["reschedule", "delay", "postpone", "move", "time", "calendar", "slot"],
+        "execute": ["route", "plan", "action", "implement", "complete", "resolve", "execute"],
+    }
+    if action_type and action_type in action_keywords:
+        match = any(kw in reasoning_lower for kw in action_keywords[action_type])
+        if match:
+            score += 0.10
+        else:
+            score -= 0.20
+    return max(-0.30, min(0.30, score))
+def main():
+    # Scenario setup
+    print("--- TESTING REWARD SYSTEM ---")
+    # 1. PERFECT ACTION: All metrics improve by 10 points
+    state_start = LifeMetrics() # Defaults at 70
+    state_perfect = copy.deepcopy(state_start)
+    for k in state_perfect.flatten().keys():
+        domain, sub = k.split('.')
+        current = getattr(getattr(state_perfect, domain), sub)
+        setattr(getattr(state_perfect, domain), sub, current + 10)
+    res_perfect = {"time": 2, "money": 50, "energy": 10}
+    reward_p, break_p = compute_reward(state_start, state_perfect, res_perfect, actions_taken=5)
+    print("\n[SCENARIO 1: PERFECT ACTION]")
+    print(f"Reward: {reward_p:.4f}")
+    print(f"Breakdown: {break_p}")
+    # 2. BAD ACTION: Relationships tank by 30 points, everything else stays same
+    state_bad = copy.deepcopy(state_start)
+    for k in state_bad.flatten().keys():
+        if k.startswith('relationships.'):
+            domain, sub = k.split('.')
+            current = getattr(getattr(state_bad, domain), sub)
+            setattr(getattr(state_bad, domain), sub, current - 30)
+    res_bad = {"time": 10, "money": 300, "energy": 80}
+    reward_b, break_b = compute_reward(state_start, state_bad, res_bad, actions_taken=1)
+    print("\n[SCENARIO 2: BAD ACTION (Relationships Tank)]")
+    print(f"Reward: {reward_b:.4f}")
+    print(f"Breakdown: {break_b}")
+    # 3. INACTION: Nothing changes
+    state_nothing = copy.deepcopy(state_start)
+    res_none = {}
+    reward_n, break_n = compute_reward(state_start, state_nothing, res_none, actions_taken=0)
+    print("\n[SCENARIO 3: INACTION]")
+    print(f"Reward: {reward_n:.4f}")
+    print(f"Breakdown: {break_n}")
+if __name__ == "__main__":
+    main()

core/task.py ADDED Viewed

	@@ -0,0 +1,153 @@

+from dataclasses import dataclass, field
+from typing import Any, List, Dict
+@dataclass
+class HiddenStateField:
+    key: str               # e.g. "boss_mood"
+    initial_value: Any     # e.g. "neutral"
+    inspect_target: str    # e.g. "call_boss" — which inspect action type reveals this
+    description: str       # shown to agent after reveal
+@dataclass
+class ExoEvent:
+    step: int              # inject at this step (inclusive); -1 = probabilistic
+    probability: float     # 1.0 = deterministic; <1.0 = random at each step
+    id: str                # e.g. "ticket_price_spike"
+    description: str       # what agent sees in next observation
+    world_mutation: dict   # e.g. {"ticket_price": 450, "seats_remaining": 1}
+    hidden_state_mutation: dict  # e.g. {"boss_mood": "angry"}
+    closes_routes: list[str] = field(default_factory=list)  # route IDs this event blocks
+@dataclass
+class Milestone:
+    id: str                # e.g. "flight_rebooked"
+    description: str
+    condition_key: str     # world/hidden key to check, e.g. "flight_rebooked"
+    condition_value: Any   # e.g. True
+    reward: float          # milestone reward added to episode total
+@dataclass
+class Route:
+    id: str                # e.g. "rebook_premium"
+    name: str
+    description: str
+    required_action_types: list[str]  # must use these tool actions to complete
+    preconditions: dict    # world/hidden state checks, e.g. {"card_available": True}
+    consequences: dict     # world mutations on route completion, e.g. {"flight_rebooked": True}
+    closes_routes: list[str]  # route IDs this blocks
+    milestones_unlocked: list[str]  # milestone IDs this route can hit
+    final_reward: float    # bonus on route completion
+@dataclass
+class Task:
+    id: str
+    domain: str            # "flight_crisis" | "code_merge_crisis"
+    goal: str
+    constraints: dict      # e.g. {"budget_max": 400, "deadline_step": 18}
+    hidden_state: dict     # full truth, agent never sees directly
+    mutable_world: dict    # partial truth, some fields revealed by inspect
+    visible_world: dict    # agent sees this at each step (subset of mutable_world)
+    success_conditions: list[dict]  # e.g. [{"key": "flight_rebooked", "value": True}]
+    failure_conditions: list[dict]  # e.g. [{"key": "missed_deadline", "value": True}]
+    event_schedule: list[ExoEvent]
+    viable_routes: list[Route]
+    milestones: list[Milestone]
+    horizon: int           # max steps (20–50)
+    difficulty: int        # 1–5
+    domain_metadata: dict  # domain-specific extra data (story text, etc.)
+def FlightCrisisTask() -> Task:
+    routes = [
+        Route(
+            id="rebook_premium",
+            name="Rebook Premium Option",
+            description="Call agent and rebook on premium ticket",
+            required_action_types=["communicate", "execute"],
+            preconditions={"card_available": True},
+            consequences={"flight_rebooked": True},
+            closes_routes=["wait_lounge"],
+            milestones_unlocked=["m1"],
+            final_reward=2.5
+        ),
+        Route(
+            id="wait_lounge",
+            name="Accept Delay & Work",
+            description="Stay at airport lounge and work on laptop",
+            required_action_types=["wait", "plan"],
+            preconditions={"lounge_access": True},
+            consequences={"caught_up": True},
+            closes_routes=["rebook_premium"],
+            milestones_unlocked=["m2"],
+            final_reward=1.8
+        )
+    ]
+    milestones = [
+        Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="flight_rebooked", condition_value=True, reward=1.0),
+        Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="caught_up", condition_value=True, reward=0.8),
+    ]
+    events = [
+        ExoEvent(step=5, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={"card_available": False}, closes_routes=[]),
+        ExoEvent(step=8, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={"lounge_access": False}, hidden_state_mutation={}, closes_routes=["wait_lounge"]),
+    ]
+    return Task(
+        id="flight_crisis_task_main",
+        domain="flight_crisis",
+        goal="Survive Airport Cancellation",
+        constraints={"budget_max": 800, "deadline_step": 20},
+        hidden_state={
+            "card_available": True
+        },
+        mutable_world={
+            "lounge_access": True,
+            "flight_rebooked": False,
+            "caught_up": False
+        },
+        visible_world={
+            "lounge_access": True
+        },
+        success_conditions=[{"key": "flight_rebooked", "value": True}],
+        failure_conditions=[{"key": "missed_deadline", "value": True}],
+        event_schedule=events,
+        viable_routes=routes,
+        milestones=milestones,
+        horizon=30,
+        difficulty=4,
+        domain_metadata={"story": "A major storm grounded commercial flights."}
+    )
+def CodeMergeCrisisTask() -> Task:
+    """A high-difficulty technical crisis requiring rollback or hotfix."""
+    routes = [
+        Route(id="revert_commit", name="Revert Commit", description="Quickly revert the broken merge to unblock the team.", required_action_types=["delegate", "communicate"], preconditions={}, consequences={"pipeline_unblocked": True}, closes_routes=["hotfix"], milestones_unlocked=["m1"], final_reward=1.5),
+        Route(id="hotfix", name="Patch Forward", description="Find the logic error and push a hotfix.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"bug_resolved": True}, closes_routes=["revert_commit"], milestones_unlocked=["m2"], final_reward=3.0),
+    ]
+    milestones = [
+        Milestone(id="m1", description="CI pipeline is green again", condition_key="pipeline_unblocked", condition_value=True, reward=1.0),
+        Milestone(id="m2", description="Bug resolved without losing features", condition_key="bug_resolved", condition_value=True, reward=2.0),
+    ]
+    return Task(
+        id="code_merge_task_fallback",
+        domain="code_merge_crisis",
+        goal="Resolve Production Outage",
+        constraints={"budget_max": 1000, "deadline_step": 8},
+        hidden_state={"on_call_status": "alert"},
+        mutable_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
+        visible_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
+        success_conditions=[{"key": "pipeline_unblocked", "value": True}, {"key": "bug_resolved", "value": True}],
+        failure_conditions=[],
+        event_schedule=[],
+        viable_routes=routes,
+        milestones=milestones,
+        horizon=10,
+        difficulty=4,
+        domain_metadata={}
+    )
+class TaskGenerator:
+    def __init__(self):
+        self.tasks = [FlightCrisisTask, CodeMergeCrisisTask]
+    def get_random_task(self) -> Task:
+        import random
+        return random.choice(self.tasks)()

core/verifier.py ADDED Viewed

	@@ -0,0 +1,75 @@

+from typing import Dict, List, Set, Any, Tuple
+from core.task import Task, Milestone, Route
+class LifeStackVerifier:
+    """Standalone verifier for Task success, failure, and progression."""
+    @staticmethod
+    def _check_cond(cond: dict, world_state: dict, hidden_state: dict, metrics_flat: dict = None) -> bool:
+        key = cond['key']
+        target = cond['value']
+        op = cond.get('op', 'eq')
+        # Priority: Metrics > Hidden > World
+        val = None
+        if metrics_flat and key in metrics_flat:
+            val = metrics_flat[key]
+        else:
+            val = hidden_state.get(key, world_state.get(key))
+        if val is None:
+            return False
+        if op == 'eq': return val == target
+        if op == 'ne': return val != target
+        if op == 'gt': return val > target
+        if op == 'lt': return val < target
+        if op == 'ge': return val >= target
+        if op == 'le': return val <= target
+        return False
+    @staticmethod
+    def check_success(task: Task, world_state: dict, hidden_state: dict) -> list[bool]:
+        """Checks if task-specific success conditions are met."""
+        return [LifeStackVerifier._check_cond(c, world_state, hidden_state) for c in task.success_conditions]
+    @staticmethod
+    def check_failure(task: Task, world_state: dict, hidden_state: dict, metrics_flat: dict) -> list[bool]:
+        """Checks if task-specific or global failure conditions (metric death) are met."""
+        results = [LifeStackVerifier._check_cond(c, world_state, hidden_state, metrics_flat) for c in task.failure_conditions]
+        # 2. Metric death
+        if any(v <= 10 for v in metrics_flat.values()):
+            results.append(True)
+        return results
+    @staticmethod
+    def check_new_milestones(task: Task, world_state: dict, hidden_state: dict, achieved_ids: list) -> list[str]:
+        """Identifies any milestones that have just been met by current state."""
+        newly_met = []
+        for m in task.milestones:
+            if m.id not in achieved_ids:
+                val = hidden_state.get(m.condition_key, world_state.get(m.condition_key))
+                if val == m.condition_value:
+                    newly_met.append(m.id)
+        return newly_met
+    @staticmethod
+    def get_route_status(task: Task, closed_ids: set, world_state: dict, hidden_state: dict) -> Tuple[int, bool]:
+        """Returns (remaining_routes_count, is_dead_end)."""
+        remaining = 0
+        for route in task.viable_routes:
+            if route.id in closed_ids:
+                continue
+            # Check if reachable via preconditions
+            pre_ok = True
+            for k, v in route.preconditions.items():
+                current_v = hidden_state.get(k, world_state.get(k))
+                if current_v != v:
+                    pre_ok = False
+                    break
+            if pre_ok:
+                remaining += 1
+        return remaining, remaining == 0

data/before_after_comparison.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "summary": {
+    "runs": 5,
+    "avg_no_memory": 1.13,
+    "avg_with_memory": 2.45,
+    "pct_improvement": 116.81,
+    "most_common_action_no_memory": "delegate",
+    "most_common_action_with_memory": "communicate",
+    "comm_usage_no_memory_pct": 40.0,
+    "comm_usage_yes_memory_pct": 100.0
+  },
+  "no_memory": [
+    {
+      "total_reward": 1.0,
+      "first_action": "delegate"
+    },
+    {
+      "total_reward": 1.2
+    }
+  ],
+  "with_memory": [
+    {
+      "total_reward": 2.5,
+      "first_action": "communicate"
+    },
+    {
+      "total_reward": 2.4
+    }
+  ]
+}

data/conflicts.json ADDED Viewed

	@@ -0,0 +1,314 @@

+[
+    {
+        "id": "d1_gym",
+        "title": "The Slump",
+        "story": "You haven't seen the inside of a gym in ten days. Your energy is flagging and your favorite jeans feel tight.",
+        "primary_disruption": {
+            "physical_health.fitness": -15.0
+        },
+        "decisions_required": [
+            "Wake up early for a run",
+            "Join a weekend boot camp",
+            "Ignore it and rest"
+        ],
+        "resource_budget": {
+            "time": 4.0,
+            "money": 0.0,
+            "energy": 20.0
+        },
+        "difficulty": 1
+    },
+    {
+        "id": "d1_bill",
+        "title": "Forgotten Invoice",
+        "story": "A late notice arrived for your electricity bill. It's not a lot, but the late fee is annoying.",
+        "primary_disruption": {
+            "finances.liquidity": -20.0
+        },
+        "decisions_required": [
+            "Pay it now",
+            "Call to dispute the fee",
+            "Set up autopay for next time"
+        ],
+        "resource_budget": {
+            "time": 1.0,
+            "money": 100.0,
+            "energy": 5.0
+        },
+        "difficulty": 1
+    },
+    {
+        "id": "d1_argument",
+        "title": "Heated Group Chat",
+        "story": "A minor political disagreement in the group chat turned personal. Everyone is being quiet now.",
+        "primary_disruption": {
+            "relationships.social": -20.0
+        },
+        "decisions_required": [
+            "Apologize to the group",
+            "Message the friend privately",
+            "Mute the chat for a week"
+        ],
+        "resource_budget": {
+            "time": 2.0,
+            "money": 30.0,
+            "energy": 15.0
+        },
+        "difficulty": 1
+    },
+    {
+        "id": "d2_project",
+        "title": "The Surge",
+        "story": "Your boss just walked by and dropped a 'small favor' on your desk. It looks like it'll take ten hours.",
+        "primary_disruption": {
+            "career.workload": 25.0,
+            "time.free_hours_per_week": -20.0
+        },
+        "decisions_required": [
+            "Work late all week",
+            "Delegate parts to a junior",
+            "Refuse the assignment"
+        ],
+        "resource_budget": {
+            "time": 10.0,
+            "money": 0.0,
+            "energy": 40.0
+        },
+        "difficulty": 2
+    },
+    {
+        "id": "d2_car",
+        "title": "Check Engine Light",
+        "story": "Your car started making a rhythmic thumping sound on the highway. The mechanic says the repair isn't cheap.",
+        "primary_disruption": {
+            "finances.liquidity": -30.0,
+            "time.commute_burden": 25.0
+        },
+        "decisions_required": [
+            "Repair it immediately",
+            "Take the bus for a week",
+            "Borrow a car from a friend"
+        ],
+        "resource_budget": {
+            "time": 5.0,
+            "money": 500.0,
+            "energy": 10.0
+        },
+        "difficulty": 2
+    },
+    {
+        "id": "d2_neglect",
+        "title": "Cold Dinner",
+        "story": "Your partner mentions they feel like 'roommates' lately. You realize you haven't had a real conversation in weeks.",
+        "primary_disruption": {
+            "relationships.romantic": -25.0,
+            "mental_wellbeing.stress_level": 20.0
+        },
+        "decisions_required": [
+            "Plan a surprise date",
+            "Have a long talk tonight",
+            "Buy a thoughtful gift"
+        ],
+        "resource_budget": {
+            "time": 6.0,
+            "money": 150.0,
+            "energy": 30.0
+        },
+        "difficulty": 2
+    },
+    {
+        "id": "d3_interview",
+        "title": "The Opportunity",
+        "story": "An old contact reached out for a dream job interview. You need to prep while keeping your current job afloat.",
+        "primary_disruption": {
+            "career.workload": 20.0,
+            "time.free_hours_per_week": -15.0,
+            "mental_wellbeing.stress_level": 20.0
+        },
+        "decisions_required": [
+            "Intensive weekend prep",
+            "Fake a sick day to interview",
+            "Turn it down to stay stable"
+        ],
+        "resource_budget": {
+            "time": 12.0,
+            "money": 50.0,
+            "energy": 50.0
+        },
+        "difficulty": 3
+    },
+    {
+        "id": "d3_family",
+        "title": "Family SOS",
+        "story": "Your sibling is going through a rough patch and needs help moving out and some financial support.",
+        "primary_disruption": {
+            "relationships.family": 20.0,
+            "time.free_hours_per_week": -25.0,
+            "finances.liquidity": -20.0
+        },
+        "decisions_required": [
+            "Spend the weekend helping",
+            "Send them money but stay home",
+            "Help them find other movers"
+        ],
+        "resource_budget": {
+            "time": 15.0,
+            "money": 400.0,
+            "energy": 60.0
+        },
+        "difficulty": 3
+    },
+    {
+        "id": "d3_health",
+        "title": "The Warning Sign",
+        "story": "You had a fainting spell at the office. Tests are expensive, and doctors say you need immediate change.",
+        "primary_disruption": {
+            "physical_health.energy": -30.0,
+            "mental_wellbeing.stress_level": 30.0,
+            "finances.liquidity": -40.0
+        },
+        "decisions_required": [
+            "Take a week of medical leave",
+            "Consult a high-end specialist",
+            "Change diet and sleep habits"
+        ],
+        "resource_budget": {
+            "time": 20.0,
+            "money": 800.0,
+            "energy": 5.0
+        },
+        "difficulty": 3
+    },
+    {
+        "id": "d4_review",
+        "title": "Judgment Day",
+        "story": "A major performance review is in three days. Rumors of layoffs are circulating and the atmosphere is tense.",
+        "primary_disruption": {
+            "career.workload": 30.0,
+            "mental_wellbeing.stress_level": 25.0,
+            "relationships.romantic": -15.0,
+            "time.free_hours_per_week": -20.0
+        },
+        "decisions_required": [
+            "Pull all-nighters to prove worth",
+            "Start networking for new roles",
+            "Draft a defensive report"
+        ],
+        "resource_budget": {
+            "time": 18.0,
+            "money": 0.0,
+            "energy": 80.0
+        },
+        "difficulty": 4
+    },
+    {
+        "id": "d4_move",
+        "title": "The Big Relocation",
+        "story": "You've decided to move across the country for growth. The logistics are a nightmare and friends are sad to see you go.",
+        "primary_disruption": {
+            "finances.liquidity": -50.0,
+            "relationships.social": -30.0,
+            "career.growth_trajectory": 20.0,
+            "time.admin_overhead": 30.0
+        },
+        "decisions_required": [
+            "Hire full-service movers",
+            "Host a series of farewell dinners",
+            "DIY pack everything"
+        ],
+        "resource_budget": {
+            "time": 30.0,
+            "money": 1500.0,
+            "energy": 100.0
+        },
+        "difficulty": 4
+    },
+    {
+        "id": "d4_audit",
+        "title": "Tax Audit",
+        "story": "The IRS has flagged your last three years of returns. You need to dig through thousands of documents while paying a CPA.",
+        "primary_disruption": {
+            "finances.long_term_health": -20.0,
+            "mental_wellbeing.stress_level": 30.0,
+            "time.admin_overhead": 40.0,
+            "finances.liquidity": -15.0
+        },
+        "decisions_required": [
+            "Spend nights scanning receipts",
+            "Hire a tax lawyer",
+            "Try to settle immediately"
+        ],
+        "resource_budget": {
+            "time": 25.0,
+            "money": 1000.0,
+            "energy": 40.0
+        },
+        "difficulty": 4
+    },
+    {
+        "id": "d5_friday",
+        "title": "Friday 6PM",
+        "story": "Your flight just got cancelled. Your card declined trying to rebook. Your boss moved Monday deadline to Sunday.",
+        "primary_disruption": {
+            "career.workload": 35.0,
+            "finances.liquidity": -40.0,
+            "mental_wellbeing.stress_level": 30.0,
+            "time.free_hours_per_week": -25.0
+        },
+        "decisions_required": [
+            "Book a bus and work on it",
+            "Call boss to negotiate",
+            "Crash at a nearby friend's"
+        ],
+        "resource_budget": {
+            "time": 10.0,
+            "money": 500.0,
+            "energy": 60.0
+        },
+        "difficulty": 5
+    },
+    {
+        "id": "d5_storm",
+        "title": "The Perfect Storm",
+        "story": "Your firm lost its biggest client, your partner moved out, and your car got towed\u2014all on the same Tuesday.",
+        "primary_disruption": {
+            "career.stability": -30.0,
+            "relationships.romantic": -25.0,
+            "finances.debt_pressure": 35.0,
+            "physical_health.energy": -25.0
+        },
+        "decisions_required": [
+            "Find an emergency side hustle",
+            "Beg partner for a second chance",
+            "Take a mental health day"
+        ],
+        "resource_budget": {
+            "time": 8.0,
+            "money": 200.0,
+            "energy": 20.0
+        },
+        "difficulty": 5
+    },
+    {
+        "id": "d5_burnout",
+        "title": "The Total Collapse",
+        "story": "You can't get out of bed. Your body has quit, your motivation is gone, and work emails are piling into the hundreds.",
+        "primary_disruption": {
+            "mental_wellbeing.motivation": -40.0,
+            "physical_health.sleep_quality": -30.0,
+            "career.satisfaction": -35.0,
+            "relationships.family": -20.0
+        },
+        "decisions_required": [
+            "Request indefinite medical leave",
+            "Disconnect all electronics",
+            "Let it all burn and sleep"
+        ],
+        "resource_budget": {
+            "time": 40.0,
+            "money": 2000.0,
+            "energy": 0.0
+        },
+        "difficulty": 5
+    }
+]

data/demo_signals.json ADDED Viewed

	@@ -0,0 +1,75 @@

+{
+  "persona": "Jordan (PM at Series-B startup)",
+  "generated_at": "2026-04-25T09:00:00",
+  "note": "Pre-baked demo payload — represents a stressed product manager mid-sprint",
+  "gmail": {
+    "unread_count": 47,
+    "late_night_count": 8,
+    "weekend_count": 11,
+    "overtime_count": 14,
+    "social_activity": 3.2,
+    "work_pressure": 8.7,
+    "relationship_neglect_risk": 7.4,
+    "responsiveness": 2.1,
+    "email_overload": 9.4,
+    "work_bleeding_personal": 7.2,
+    "key_contacts": [
+      "priya.shah@acme-ventures.com",
+      "cto@startupco.io",
+      "hr@startupco.io",
+      "mom@gmail.com",
+      "alex@cofounder.io"
+    ],
+    "notable_threads": [
+      {"subject": "URGENT: Board deck needs rework before Friday", "sender": "cto@startupco.io", "time": "11:47 PM"},
+      {"subject": "Re: Q2 roadmap — are we on track?", "sender": "priya.shah@acme-ventures.com", "time": "Saturday 10:12 AM"},
+      {"subject": "Have you eaten today?", "sender": "mom@gmail.com", "time": "7:03 PM"}
+    ],
+    "summary": "47 unread. 8 emails sent after 10 PM. Board deck deadline pressure. Investor checking roadmap. Family reaching out."
+  },
+  "calendar": {
+    "week_occupancy_pct": 91,
+    "days_with_no_breaks": 4,
+    "avg_meeting_hours_per_day": 6.2,
+    "focus_blocks_count": 0,
+    "upcoming_deadlines": [
+      {"title": "Board Deck Final Draft", "due_in_hours": 38, "priority": "critical"},
+      {"title": "Sprint Review with Engineering", "due_in_hours": 52, "priority": "high"},
+      {"title": "Investor 1:1 (Priya Shah)", "due_in_hours": 72, "priority": "high"}
+    ],
+    "back_to_back_blocks": 3,
+    "personal_events_this_week": 1,
+    "cancelled_personal_events": 2,
+    "summary": "91% of working hours booked. Zero deep-work blocks. Board deck in 38h. 3 back-to-back meeting chains. 2 personal events cancelled this week."
+  },
+  "fitness": {
+    "avg_sleep_hours": 5.3,
+    "sleep_quality_score": 38,
+    "resting_heart_rate": 82,
+    "hrv_score": 24,
+    "daily_steps_avg": 2800,
+    "active_minutes_avg": 9,
+    "stress_score": 78,
+    "recovery_score": 31,
+    "last_workout_days_ago": 9,
+    "summary": "5.3h sleep avg. Resting HR 82 bpm (elevated). HRV 24 (low — high stress load). 2,800 steps/day. Last workout 9 days ago."
+  },
+  "derived_metric_deltas": {
+    "career.workload": 28.0,
+    "mental_wellbeing.stress_level": 32.0,
+    "mental_wellbeing.focus_quality": -25.0,
+    "mental_wellbeing.emotional_regulation": -18.0,
+    "physical_health.sleep_quality": -30.0,
+    "physical_health.energy_level": -22.0,
+    "physical_health.exercise_consistency": -35.0,
+    "time.free_hours_per_week": -18.0,
+    "time.schedule_control": -24.0,
+    "relationships.romantic": -15.0,
+    "relationships.family": -12.0,
+    "finances.liquidity": 0.0
+  }
+}

data/holdout_tasks.json ADDED Viewed

	@@ -0,0 +1,12 @@

+[
+  {"id": "holdout_0",  "seed": 9000, "domain": "flight_crisis"},
+  {"id": "holdout_1",  "seed": 9001, "domain": "flight_crisis"},
+  {"id": "holdout_2",  "seed": 9002, "domain": "code_merge_crisis"},
+  {"id": "holdout_3",  "seed": 9003, "domain": "flight_crisis"},
+  {"id": "holdout_4",  "seed": 9004, "domain": "code_merge_crisis"},
+  {"id": "holdout_5",  "seed": 9005, "domain": "flight_crisis"},
+  {"id": "holdout_6",  "seed": 9006, "domain": "code_merge_crisis"},
+  {"id": "holdout_7",  "seed": 9007, "domain": "flight_crisis"},
+  {"id": "holdout_8",  "seed": 9008, "domain": "code_merge_crisis"},
+  {"id": "holdout_9",  "seed": 9009, "domain": "flight_crisis"}
+]

data/reward_curve.png ADDED Viewed

data/simperson_profiles.json ADDED Viewed

	@@ -0,0 +1,42 @@

+[
+    {
+        "name": "Alex (High-Stress Executive)",
+        "openness": 0.4,
+        "conscientiousness": 0.9,
+        "extraversion": 0.7,
+        "agreeableness": 0.25,
+        "neuroticism": 0.8
+    },
+    {
+        "name": "Chloe (Laid-Back Creative)",
+        "openness": 0.9,
+        "conscientiousness": 0.2,
+        "extraversion": 0.5,
+        "agreeableness": 0.7,
+        "neuroticism": 0.15
+    },
+    {
+        "name": "Sam (Anxious Introvert)",
+        "openness": 0.5,
+        "conscientiousness": 0.6,
+        "extraversion": 0.1,
+        "agreeableness": 0.65,
+        "neuroticism": 0.9
+    },
+    {
+        "name": "Maya (Balanced Family Person)",
+        "openness": 0.5,
+        "conscientiousness": 0.7,
+        "extraversion": 0.5,
+        "agreeableness": 0.95,
+        "neuroticism": 0.3
+    },
+    {
+        "name": "Leo (Ambitious Student)",
+        "openness": 0.85,
+        "conscientiousness": 0.8,
+        "extraversion": 0.4,
+        "agreeableness": 0.4,
+        "neuroticism": 0.55
+    }
+]

data/training_log.json ADDED Viewed

	@@ -0,0 +1,526 @@

+[
+  {
+    "episode": 1,
+    "reward": 1.6325,
+    "difficulty": 1,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Forgotten Invoice"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 2,
+    "reward": 1.7879,
+    "difficulty": 2,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "The Surge",
+      "ESCALATED: The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 3,
+    "reward": 2.5763,
+    "difficulty": 1,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "Heated Group Chat",
+      "ESCALATED: Heated Group Chat"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 4,
+    "reward": 2.5755,
+    "difficulty": 1,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Heated Group Chat"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 5,
+    "reward": 2.5754,
+    "difficulty": 1,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Heated Group Chat"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 6,
+    "reward": 2.5402,
+    "difficulty": 2,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Cold Dinner",
+      "ESCALATED: Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 7,
+    "reward": 2.5793,
+    "difficulty": 1,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "The Slump"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 8,
+    "reward": 2.5574,
+    "difficulty": 2,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "Cold Dinner",
+      "ESCALATED: Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 9,
+    "reward": 2.5277,
+    "difficulty": 2,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 10,
+    "reward": 2.4812,
+    "difficulty": 2,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Check Engine Light",
+      "ESCALATED: Check Engine Light"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 11,
+    "reward": 2.4932,
+    "difficulty": 2,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Check Engine Light"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 12,
+    "reward": 2.5473,
+    "difficulty": 2,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "The Surge",
+      "ESCALATED: The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 13,
+    "reward": 2.5707,
+    "difficulty": 1,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "The Slump",
+      "ESCALATED: The Slump"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 14,
+    "reward": 2.5507,
+    "difficulty": 1,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "Forgotten Invoice",
+      "ESCALATED: Forgotten Invoice"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 15,
+    "reward": 2.572,
+    "difficulty": 1,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Heated Group Chat"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 16,
+    "reward": 2.5534,
+    "difficulty": 3,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "The Opportunity"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 17,
+    "reward": 2.5396,
+    "difficulty": 3,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Family SOS"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 18,
+    "reward": 2.5572,
+    "difficulty": 2,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Cold Dinner",
+      "ESCALATED: Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 19,
+    "reward": 2.5503,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Warning Sign",
+      "ESCALATED: The Warning Sign"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 20,
+    "reward": 2.5437,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Warning Sign",
+      "ESCALATED: The Warning Sign"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 21,
+    "reward": 2.5045,
+    "difficulty": 2,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Check Engine Light"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 22,
+    "reward": 2.5447,
+    "difficulty": 2,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "Cold Dinner",
+      "ESCALATED: Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 23,
+    "reward": 2.5427,
+    "difficulty": 3,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Family SOS"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 24,
+    "reward": 2.534,
+    "difficulty": 2,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "The Surge",
+      "ESCALATED: The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 25,
+    "reward": 2.5273,
+    "difficulty": 2,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 26,
+    "reward": 2.5436,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Warning Sign"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 27,
+    "reward": 2.5452,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Opportunity",
+      "ESCALATED: The Opportunity"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 28,
+    "reward": 2.5287,
+    "difficulty": 2,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "The Surge",
+      "ESCALATED: The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 29,
+    "reward": 2.4947,
+    "difficulty": 2,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Check Engine Light",
+      "ESCALATED: Check Engine Light"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 30,
+    "reward": 2.5534,
+    "difficulty": 2,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 31,
+    "reward": 2.5459,
+    "difficulty": 2,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 32,
+    "reward": 2.4748,
+    "difficulty": 2,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "The Surge"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 33,
+    "reward": 2.5597,
+    "difficulty": 2,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "Cold Dinner",
+      "ESCALATED: Cold Dinner"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 34,
+    "reward": 2.4873,
+    "difficulty": 2,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "Check Engine Light",
+      "ESCALATED: Check Engine Light"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 35,
+    "reward": 2.5366,
+    "difficulty": 3,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Family SOS"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 36,
+    "reward": 2.5337,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Opportunity"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 37,
+    "reward": 2.5552,
+    "difficulty": 4,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "The Big Relocation",
+      "ESCALATED: The Big Relocation"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 38,
+    "reward": 2.4982,
+    "difficulty": 3,
+    "person": "Chloe (Creative)",
+    "conflicts_seen": [
+      "Family SOS",
+      "ESCALATED: Family SOS"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 39,
+    "reward": 2.4741,
+    "difficulty": 4,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "Judgment Day",
+      "ESCALATED: Judgment Day"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 40,
+    "reward": 2.5425,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Opportunity"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 41,
+    "reward": 2.5203,
+    "difficulty": 3,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Family SOS",
+      "ESCALATED: Family SOS"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 42,
+    "reward": 2.5183,
+    "difficulty": 3,
+    "person": "Alex (Executive)",
+    "conflicts_seen": [
+      "Family SOS"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 43,
+    "reward": 2.54,
+    "difficulty": 3,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "The Warning Sign"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 44,
+    "reward": 2.5525,
+    "difficulty": 3,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "The Warning Sign",
+      "ESCALATED: The Warning Sign"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 45,
+    "reward": 1.2349,
+    "difficulty": 4,
+    "person": "Leo (Student)",
+    "conflicts_seen": [
+      "Tax Audit"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 46,
+    "reward": 2.497,
+    "difficulty": 4,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "The Big Relocation"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 47,
+    "reward": 2.5601,
+    "difficulty": 4,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Big Relocation"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 48,
+    "reward": 2.5492,
+    "difficulty": 4,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "Judgment Day",
+      "ESCALATED: Judgment Day"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 49,
+    "reward": 2.5086,
+    "difficulty": 4,
+    "person": "Sam (Introvert)",
+    "conflicts_seen": [
+      "Judgment Day"
+    ],
+    "steps": 5
+  },
+  {
+    "episode": 50,
+    "reward": 2.5578,
+    "difficulty": 3,
+    "person": "Maya (Family)",
+    "conflicts_seen": [
+      "The Warning Sign"
+    ],
+    "steps": 5
+  }
+]

docs/CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,96 @@

+# Contributing to LifeStack
+This document defines the **documentation rule** for the project.
+**Nothing ships without its matching doc entry.**
+---
+## The Rule: Doc-First Development
+Every change that adds, removes, or significantly modifies a feature must include
+**all three** of the following before the commit is made:
+| # | Action | Where |
+|---|---|---|
+| 1 | **Create or update a doc file** | `docs/<topic>.md` |
+| 2 | **Update README.md** | File Structure table + relevant section |
+| 3 | **Update `docs/INDEX.md`** | Add a one-line entry for the new doc |
+> [!IMPORTANT]
+> A pull request / commit that adds a new script, module, or feature **without**
+> updating `docs/INDEX.md` and `README.md` is considered incomplete and should
+> not be merged.
+---
+## What Counts as "a Feature"
+| Change type | Doc required? |
+|---|---|
+| New Python module (`core/`, `agent/`, `intake/`) | ✅ Yes — `docs/<module>.md` |
+| New script (`scripts/*.py`) | ✅ Yes — entry in `docs/scripts.md` |
+| New Gradio tab in `app.py` | ✅ Yes — entry in `docs/app.md` |
+| New CLI argument to an existing script | ✅ Yes — update relevant doc |
+| Bug fix with no API surface change | ❌ No (but update changelog if breaking) |
+| Refactor with no API surface change | ❌ No |
+| New environment variable / secret | ✅ Yes — update `docs/configuration.md` |
+| New dependency in `requirements.txt` | ✅ Yes — note in relevant doc + README |
+---
+## Doc File Conventions
+- All docs live in `docs/`. No `.md` files at repo root except `README.md` and this file.
+- File names are lowercase with underscores: `docs/lifestack_env.md`, `docs/eval.md`.
+- Each doc starts with a `# Title` h1 and a one-line summary.
+- Use `## Overview`, `## Usage`, `## API / Parameters`, `## Examples` sections.
+- Code blocks must have a language tag (` ```python `, ` ```bash `).
+---
+## Checklist (copy into every PR / commit message)
+```
+Docs checklist:
+[ ] docs/<topic>.md created or updated
+[ ] docs/INDEX.md updated with new entry
+[ ] README.md File Structure table updated
+[ ] README.md Quickstart / relevant section updated (if CLI changed)
+```
+---
+## Docs Folder Structure
+```
+docs/
+├── INDEX.md              ← Master index of all docs (ALWAYS update this)
+├── CONTRIBUTING.md       ← This file — the rule
+├── lifestack_env.md      ← core/lifestack_env.py reference
+├── reward.md             ← core/reward.py reference
+├── task.md               ← core/task.py schema reference
+├── memory.md             ← agent/memory.py reference
+├── conflict_generator.md ← agent/conflict_generator.py reference
+├── app.md                ← app.py Gradio interface reference
+├── eval.md               ← scripts/eval.py reference
+├── train_trl.md          ← scripts/train_trl.md reference
+├── scripts.md            ← All other scripts reference
+└── configuration.md      ← Env vars, secrets, openenv.yaml
+```
+---
+## Commit Message Format
+```
+<type>: <short description>
+- <file changed>: <what changed>
+- docs/<doc>.md: <created|updated>
+- docs/INDEX.md: <added entry for X>
+- README.md: <updated section Y>
+Docs checklist: ✅ all three updated
+```
+Types: `feat` | `fix` | `refactor` | `docs` | `test` | `chore`

docs/DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,427 @@

+# Meta-R2: Complete HuggingFace Deployment Guide (Option A)
+> This guide walks you through every single step to deploy Meta-R2 to HuggingFace using the cleanest architecture:
+> - **Your trained model (500MB)** → uploaded as a **HuggingFace Model Repository**
+> - **Your code + environment** → deployed as a **HuggingFace Space** (Docker)
+>
+> The Space will auto-download the model from the Model Repo at startup. No Git LFS. No 500MB in your code repo.
+---
+## 🗺️ Architecture Overview
+```
+HuggingFace
+├── YOUR-USERNAME/lifestack-agent      ← Model Repo (the 500MB weights)
+│   ├── config.json
+│   ├── tokenizer.json
+│   ├── tokenizer_config.json
+│   ├── special_tokens_map.json
+│   └── model.safetensors  (or pytorch_model.bin)
+│
+└── YOUR-USERNAME/meta-r2  [SPACE]     ← Code Repo (Docker Space)
+    ├── Dockerfile                       (already exists ✅)
+    ├── requirements.txt                 (already exists ✅)
+    ├── app_flask.py                     (entry point ✅)
+    ├── core/ agent/ scripts/ ...        (all your code ✅)
+    └── openenv.yaml                     (already exists ✅)
+          ↓ at startup
+    agent.py calls AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")
+    → HuggingFace downloads the model to the Space's /root/.cache/huggingface/
+```
+---
+## ✅ Pre-Flight Checklist (Do These Before Anything Else)
+Go through every item below before starting the upload steps.
+### 1. Confirm Your Trained Model Files Exist
+Unzip the 500MB file from Kaggle. Open the folder. You **must** see these files:
+```
+lifestack_model/
+├── config.json              ← REQUIRED
+├── tokenizer.json           ← REQUIRED
+├── tokenizer_config.json    ← REQUIRED
+├── special_tokens_map.json  ← REQUIRED (may be missing — check below)
+└── model.safetensors        ← REQUIRED (the big file)
+    OR
+└── pytorch_model.bin        ← (alternative format, also fine)
+```
+> **If any of these are missing**, the model is an incomplete checkpoint. Re-download or re-run training with `save_model=True` at the end of `train_trl.py`.
+### 2. Confirm `requirements.txt` Is Correct
+Your `requirements.txt` already has:
+- `openenv-core>=0.2.3` ✅ (latest version, confirmed)
+- `pydantic>=2.7.0` ✅
+- `transformers>=4.40.0` ✅ (needed to download model from Hub)
+- `torch>=2.0.0` ✅
+**No changes needed** to `requirements.txt`.
+### 3. Confirm the `Dockerfile` Entry Point
+Your `Dockerfile` already runs:
+```dockerfile
+CMD ["python", "app_flask.py"]
+```
+This is correct. `app_flask.py` is the web server.
+**No changes needed** to the `Dockerfile`.
+### 4. Make Sure `.env` is in `.gitignore`
+Check your `.gitignore` — it already has:
+```
+.env
+```
+✅ Your `GROQ_API_KEY` will **never** be pushed to GitHub or HuggingFace by accident.
+### 5. Make the One Required Code Change in `agent.py`
+This is the only code edit required for Option A.
+Open `/Users/dayalgupta/Desktop/Meta-R2/agent/agent.py` and find **lines 13–18**:
+```python
+# CURRENT CODE (lines 13-18):
+self.api_key = os.getenv('GROQ_API_KEY')
+self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
+# Fallback to current directory if default existence
+if not self.local_model_path and os.path.exists("./lifestack_model"):
+    self.local_model_path = "./lifestack_model"
+```
+**Change it to this** (replace `YOUR-USERNAME` with your actual HuggingFace username):
+```python
+# UPDATED CODE:
+self.api_key = os.getenv('GROQ_API_KEY')
+self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
+# 1. Check for local folder (Kaggle / local dev)
+if not self.local_model_path and os.path.exists("./lifestack_model"):
+    self.local_model_path = "./lifestack_model"
+# 2. Fall back to HuggingFace Hub model repo (production / Space deployment)
+if not self.local_model_path:
+    self.local_model_path = "YOUR-USERNAME/lifestack-agent"
+```
+**Why this works:** `AutoModelForCausalLM.from_pretrained()` (which already exists on line 41) accepts either a local folder path OR a HuggingFace Hub repo ID like `"username/repo-name"`. No other code change is needed.
+### 6. Verify `lifestack_model/` Is NOT in Your Code Repo
+Your model (500MB) should NOT be in the `Meta-R2` GitHub repository. Confirm:
+```bash
+ls /Users/dayalgupta/Desktop/Meta-R2/lifestack_model/
+# Should print: "No such file or directory" OR "Empty directory"
+```
+If it has files, remove them:
+```bash
+rm -rf /Users/dayalgupta/Desktop/Meta-R2/lifestack_model/*
+```
+The folder can stay (it's referenced in the code) but must be empty.
+---
+## 📦 PART 1: Upload the Model to HuggingFace Hub
+### Step 1.1 — Create a HuggingFace Account
+Go to **https://huggingface.co** → click **Sign Up** → create your account. Remember your username (e.g., `dayal-gupta`) — you will use it everywhere.
+### Step 1.2 — Create a New Model Repository
+1. Go to **https://huggingface.co/new** (or click the `+` button → "New Model")
+2. Fill in:
+   - **Owner:** your username
+   - **Model name:** `lifestack-agent` (this becomes `YOUR-USERNAME/lifestack-agent`)
+   - **License:** `MIT` (recommended for hackathons)
+   - **Visibility:** `Public` (required for the Space to download it without auth)
+3. Click **Create Model**
+You now have an empty model repo at `https://huggingface.co/YOUR-USERNAME/lifestack-agent`.
+### Step 1.3 — Install the HuggingFace CLI
+On your Mac terminal:
+```bash
+pip install huggingface_hub
+huggingface-cli login
+```
+When prompted, go to **https://huggingface.co/settings/tokens** → click **New token** → name it anything → **Role: Write** → copy the token → paste it into the terminal.
+### Step 1.4 — Upload the Model Files
+Navigate to where your unzipped model folder is (e.g., Desktop) and run:
+```bash
+# Replace the path with wherever your unzipped model folder is:
+huggingface-cli upload YOUR-USERNAME/lifestack-agent /path/to/your/lifestack_model/ .
+```
+**Example (if you unzipped on Desktop):**
+```bash
+huggingface-cli upload dayal-gupta/lifestack-agent /Users/dayalgupta/Desktop/lifestack_model/ .
+```
+This uploads ALL files from the local folder to the root of the HF repo. The `.` at the end means "upload to the root of the repo."
+**This will take 3–8 minutes** for a 500MB file on a normal connection. You'll see a progress bar.
+### Step 1.5 — Verify the Upload
+Go to `https://huggingface.co/YOUR-USERNAME/lifestack-agent` in your browser.
+You should see all files listed: `config.json`, `tokenizer.json`, `model.safetensors`, etc.
+Click on `config.json` and confirm it contains `"model_type"` — this confirms the model is valid and complete.
+### Step 1.6 — Add a Model Card (Optional but Impressive for Judges)
+Click the **"Model Card"** tab on your repo page → click the pencil icon to edit → paste this:
+```markdown
+---
+language: en
+license: mit
+tags:
+  - reinforcement-learning
+  - life-simulation
+  - grpo
+  - llama
+  - openenv
+---
+# LifeStack Agent — GRPO Fine-tuned
+This model is the trained agent for [Meta-R2](https://huggingface.co/spaces/YOUR-USERNAME/meta-r2),
+a reinforcement learning environment that simulates complex real-life decision-making scenarios.
+Fine-tuned using GRPO (Group Relative Policy Optimization) via TRL on a custom reward function
+spanning 23 life metrics across 6 domains: career, finances, relationships, physical health,
+mental wellbeing, and time management.
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")
+tokenizer = AutoTokenizer.from_pretrained("YOUR-USERNAME/lifestack-agent")
+```
+```
+Click **Save**.
+---
+## 🚀 PART 2: Deploy the Project as a HuggingFace Space
+### Step 2.1 — Create a New Space
+1. Go to **https://huggingface.co/new-space**
+2. Fill in:
+   - **Owner:** your username
+   - **Space name:** `meta-r2`
+   - **License:** `MIT`
+   - **SDK:** Select **"Docker"** ← very important, NOT Gradio or Streamlit
+   - **Visibility:** `Public`
+3. Click **Create Space**
+You now have an empty Space at `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2`.
+### Step 2.2 — Connect Your GitHub Repository to the Space
+This is the cleanest method — HuggingFace will auto-sync from your GitHub repo.
+1. In your Space, click the **Settings** tab (gear icon)
+2. Scroll down to **"Repository"** section
+3. Click **"Link to a GitHub repository"**
+4. Authorize HuggingFace to access your GitHub
+5. Select the repo: `oki-dokii/Meta-R2`
+6. Set branch: `main`
+7. Click **Save**
+Now every `git push` to `main` will automatically redeploy the Space.
+**Alternative (manual push):** If you don't want to link GitHub, you can push directly to the HuggingFace Space repo:
+```bash
+cd /Users/dayalgupta/Desktop/Meta-R2
+# Add HF Space as a second remote:
+git remote add space https://huggingface.co/spaces/YOUR-USERNAME/meta-r2
+# Push your code:
+git push space main
+```
+### Step 2.3 — Add the `GROQ_API_KEY` Secret to the Space
+Your app needs the Groq API key at runtime. **Never hardcode it.** HuggingFace Spaces have a Secrets system for this.
+1. In your Space, click the **Settings** tab
+2. Scroll down to **"Variables and secrets"**
+3. Click **"New secret"**
+4. Fill in:
+   - **Name:** `GROQ_API_KEY`
+   - **Value:** your actual Groq API key (get it from https://console.groq.com/keys)
+5. Click **Save**
+Your `agent.py` already reads this via `os.getenv('GROQ_API_KEY')` ✅ — no code change needed.
+### Step 2.4 — Add `HF_TOKEN` Secret (Required to Download the Private Model)
+If your model repo is **Public** (which we set in Step 1.2), you can **skip this step**.
+If your model repo is **Private**, add another secret:
+- **Name:** `HF_TOKEN`
+- **Value:** your HuggingFace write token (same one from Step 1.3)
+Then add this line at the top of `app_flask.py` (before any model-loading code):
+```python
+import os
+from huggingface_hub import login
+hf_token = os.getenv("HF_TOKEN")
+if hf_token:
+    login(token=hf_token)
+```
+### Step 2.5 — Trigger the First Build
+After pushing your code (Step 2.2), the Space will automatically start building.
+1. Go to your Space URL: `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2`
+2. Click the **"App"** tab — you'll see a build log
+3. The build will take **3–5 minutes** for the first time (Docker pulls base image, installs packages)
+4. After build, it will show **"Running"** status — then the app will boot
+**During the first boot**, the Space will call `AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")` which will download the 500MB model. This takes about 60–90 seconds on HuggingFace infrastructure. **After the first boot, it is cached** and subsequent restarts are instant.
+---
+## 🔍 PART 3: Verify Everything is Working
+### Step 3.1 — Check the Build Log
+In your Space, click **"Logs"** tab. You should see:
+```
+✅ Step 1/7 : FROM python:3.11-slim
+✅ Successfully built ...
+✅ Successfully tagged ...
+```
+If you see a red error, check the troubleshooting section below.
+### Step 3.2 — Check the App Boot Log
+After the build, click the **"App"** tab. In the log output you should see:
+```
+📦 Loading local GRPO model from YOUR-USERNAME/lifestack-agent...
+✅ Local model LOADED.
+ * Running on http://0.0.0.0:7860
+```
+If you see `⚠️ Failed to load local model ... Falling back to Groq.` — the model download failed. Check that your HF model repo URL is correct in `agent.py` and the repo is public.
+### Step 3.3 — Test the Live App
+Go to `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` and click through the demo:
+1. The web UI (served by `app_flask.py`) should load
+2. Start an episode — the agent should respond with life decisions
+3. Check that rewards are non-zero and steps > 5 (confirms the Task system is working)
+---
+## 🛠️ Troubleshooting Common Issues
+| Error | Cause | Fix |
+|---|---|---|
+| `ModuleNotFoundError: openenv` | Wrong package in requirements.txt | Confirm `openenv-core>=0.2.3` is in `requirements.txt` (not `openenv`) |
+| `OSError: Can't load model` | Wrong repo ID in `agent.py` | Make sure it's `"YOUR-ACTUAL-USERNAME/lifestack-agent"` not literally `YOUR-USERNAME` |
+| `Build failed: torch install timeout` | `torch>=2.0.0` is huge (2GB+) | Add `--extra-index-url https://download.pytorch.org/whl/cpu` to Dockerfile before pip install |
+| `Port 7860 not responding` | `app_flask.py` binding to wrong interface | Confirm `app.run(host='0.0.0.0', port=7860)` at the bottom of `app_flask.py` |
+| `GROQ_API_KEY not found` | Secret not set | Go to Space Settings → Variables and secrets → add `GROQ_API_KEY` |
+| `Space keeps restarting` | Out of memory (free tier is 16GB RAM) | torch on CPU for 500MB model may OOM — see "Reducing Memory" note below |
+### Reducing Memory Usage (If Space OOMs)
+Free HuggingFace Spaces have 16GB RAM. Loading a 500MB model in float32 uses ~2GB RAM, which is fine. But if you face OOM, add this to `agent.py` line 41–44:
+```python
+self.local_model = AutoModelForCausalLM.from_pretrained(
+    self.local_model_path,
+    torch_dtype=torch.float16,    # ← half precision, halves memory
+    low_cpu_mem_usage=True,       # ← stream-loads, avoids peak RAM spike
+    device_map="cpu"              # ← explicitly CPU on free tier
+)
+```
+---
+## 📋 Final Pre-Submission Checklist
+Before submitting to the hackathon, verify every item:
+- [ ] `https://huggingface.co/YOUR-USERNAME/lifestack-agent` exists and has all model files
+- [ ] `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` shows **"Running"** status (green dot)
+- [ ] The Space app loads in browser without errors
+- [ ] The Space log shows `✅ Local model LOADED` (not "Falling back to Groq")
+- [ ] An episode runs and produces steps > 5 (confirms Task system is working)
+- [ ] `GROQ_API_KEY` secret is set in Space settings (as fallback)
+- [ ] The model repo has a Model Card explaining what it is
+- [ ] Your `README.md` in the code repo links to both: the Space URL and the Model URL
+- [ ] `agent.py` has been updated with `"YOUR-USERNAME/lifestack-agent"` as the HF Hub fallback
+- [ ] `lifestack_model/` folder in your local `Meta-R2/` repo is empty (model not in code repo)
+- [ ] All Bugs 1, 2, 3 are fixed and committed (they are — we did this already ✅)
+---
+## 📎 Quick Reference — All URLs
+Replace `YOUR-USERNAME` with your HuggingFace username everywhere:
+| What | URL |
+|---|---|
+| HuggingFace profile | `https://huggingface.co/YOUR-USERNAME` |
+| Model repo | `https://huggingface.co/YOUR-USERNAME/lifestack-agent` |
+| Space (live demo) | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` |
+| Space settings (secrets) | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2/settings` |
+| Space build logs | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` → Logs tab |
+| HuggingFace API tokens | `https://huggingface.co/settings/tokens` |
+| Groq API keys | `https://console.groq.com/keys` |
+---
+## ⚡ The Exact Commands to Run Right Now (In Order)
+```bash
+# 1. Install HF CLI
+pip install huggingface_hub
+# 2. Login (will prompt for token)
+huggingface-cli login
+# 3. Upload model (change the path to your unzipped model folder)
+huggingface-cli upload YOUR-USERNAME/lifestack-agent /path/to/lifestack_model/ .
+# 4. Make the agent.py code change (edit manually in VS Code, then):
+cd /Users/dayalgupta/Desktop/Meta-R2
+git add agent/agent.py
+git commit -m "feat: add HuggingFace Hub model fallback for Option A deployment"
+git push origin main
+# 5. Push to HuggingFace Space (if not using GitHub auto-sync):
+git remote add space https://huggingface.co/spaces/YOUR-USERNAME/meta-r2
+git push space main
+```
+That's it. The Space will build and boot automatically.

docs/INDEX.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# LifeStack — Documentation Index
+> **Rule:** Every new feature, script, or module must add a one-line entry here.
+> See [CONTRIBUTING.md](CONTRIBUTING.md) for the full documentation rule.
+---
+## Core Modules
+| Doc | Module | Description |
+|---|---|---|
+| [lifestack_env.md](lifestack_env.md) | `core/lifestack_env.py` | Main OpenEnv environment — step, reset, observation, WorldEngine, PartialObsFilter |
+| [reward.md](reward.md) | `core/reward.py` | Task-aware reward orchestrator with milestone, cascade, and efficiency components |
+| [task.md](task.md) | `core/task.py` | Task / Route / Milestone / ExoEvent dataclass schema |
+| [memory.md](memory.md) | `agent/memory.py` | ChromaDB-backed trajectory + feedback storage |
+| [conflict_generator.md](conflict_generator.md) | `agent/conflict_generator.py` | ConflictEvent templates and TaskGenerator |
+## Application
+| Doc | File | Description |
+|---|---|---|
+| [app.md](app.md) | `app.py` | Gradio multi-tab interface — tabs, callbacks, module-level singletons |
+## Scripts
+| Doc | Script | Description |
+|---|---|---|
+| [eval.md](eval.md) | `scripts/eval.py` | Standalone random-baseline evaluation runner |
+| [train_trl.md](train_trl.md) | `scripts/train_trl.py` | GRPO curriculum training via HuggingFace TRL + Unsloth |
+| [scripts.md](scripts.md) | `scripts/` (others) | run_episode, smoke_test, test_lifestack, longitudinal_demo |
+## Configuration & Operations
+| Doc | File | Description |
+|---|---|---|
+| [configuration.md](configuration.md) | `.env`, `openenv.yaml` | Environment variables, secrets, server config |
+---
+*Last updated: 2026-04-23 — add a row here whenever a new doc is created.*

docs/app.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# app.md — Gradio Interface Reference
+`app.py` — Gradio multi-tab interactive interface for LifeStack.
+---
+## Overview
+`app.py` is the entry point for the demo. It wires together all LifeStack modules into
+a single Gradio `Blocks` application served on `http://127.0.0.1:7860`.
+---
+## Module-level Singletons
+These are instantiated once at import time:
+| Variable | Type | Purpose |
+|---|---|---|
+| `MEMORY` | `LifeStackMemory` | ChromaDB trajectory + feedback store |
+| `AGENT` | `LifeStackAgent` | LLM-backed decision agent |
+| `INTAKE` | `LifeIntake` | NL → structured conflict parser |
+| `DEMO_CONFLICT` | `ConflictEvent` | Fixed "Friday 6PM" conflict for tab 1 |
+| `DEMO_PREDICTOR` | `TrajectoryPredictor` | 7-day risk score tracker |
+| `LONG_DEMO` | `LongitudinalDemo` | Arjun's multi-week journey |
+| `GMAIL` | `GmailSignalExtractor` | Optional Gmail stress signal extractor |
+---
+## Tabs
+| Tab | Label | Key Function |
+|---|---|---|
+| 1 | 🎯 Live Demo | `run_demo(person_label, conflict_label)` |
+| 2 | 💭 Try Your Situation | `run_custom(situation, sliders..., gmail_signals)` |
+| 3 | 📊 Training Results | `load_training_tab()` |
+| 4 | 🗓️ Arjun's Journey | `LONG_DEMO.show_longitudinal_comparison()` |
+| 5 | 🗺️ Task Explorer | `load_demo_task()` |
+| 6 | 📬 Follow-up | `submit_outcome_feedback(...)` |
+---
+## Key Functions
+### `submit_outcome_feedback(ep_id, score, domains_up, domains_down, notes, time_spent)`
+Stores real-world outcome data into ChromaDB via `MEMORY.store_feedback(feedback)`.
+> **Note:** Uses `MEMORY` (the module-level `LifeStackMemory` instance). The previously
+> undefined `AGENT_MEMORY` reference was corrected to `MEMORY` on 2026-04-23.
+### `run_demo(person_label, conflict_label)`
+Generator — yields `(pred_html, before_html, narrative, decision_html)` tuples for each
+animation frame. Runs cascade animation then agent intervention.
+### `run_custom(situation, ...)`
+Calls `INTAKE.full_intake()` to parse NL input, then `AGENT.get_action()`, steps the env,
+returns `(life_html, after_html, plan_html)`.
+---
+## Running
+```bash
+python app.py
+```
+Starts on port `7860` with `share=False`. Edit `__main__` block to change port/theme.
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | `AGENT_MEMORY` undefined crash fixed — replaced with `MEMORY` in `submit_outcome_feedback` |

docs/configuration.md ADDED Viewed

	@@ -0,0 +1,71 @@

+# configuration.md — Configuration Reference
+Environment variables, secrets, and server configuration for LifeStack.
+---
+## Environment Variables
+Copy `.env.example` to `.env` and fill in values:
+```bash
+cp .env.example .env
+```
+| Variable | Required | Description |
+|---|---|---|
+| `OPENAI_API_KEY` | For agent/training | API key for the LLM agent and GRPO reward function |
+| `GROQ_API_KEY` | Optional | Alternative fast-inference backend |
+| `GMAIL_CREDENTIALS_PATH` | Optional | Path to Gmail OAuth2 credentials JSON |
+> **Never commit `.env`** — it is listed in `.gitignore`.
+---
+## `openenv.yaml`
+Defines the OpenEnv service manifest for MCP / REST integration.
+```yaml
+name: lifestack
+version: "1.1.0"
+entry: server.py
+port: 8000
+```
+Edit this file if you rename the server entry point or change the port.
+---
+## Gradio App
+Configured in `app.py` `__main__` block:
+```python
+app.launch(
+    share=False,
+    server_port=7860,
+    show_error=True,
+)
+```
+Change `server_port` or set `share=True` for a public Gradio link.
+---
+## Docker
+```bash
+docker build -t lifestack:latest .
+docker run -p 7860:7860 --env-file .env lifestack:latest
+```
+The `Dockerfile` installs `requirements.txt` and runs `python app.py`.
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | Initial doc created |

docs/conflict_generator.md ADDED Viewed

	@@ -0,0 +1,75 @@

+# conflict_generator.md — Conflict Generator Reference
+`agent/conflict_generator.py` — ConflictEvent templates and TaskGenerator.
+---
+## Overview
+Two parallel systems for generating crises:
+| System | Purpose |
+|---|---|
+| `ConflictEvent` + `TEMPLATES` | 15 handcrafted conflicts at difficulty 1–5 |
+| `TaskGenerator` | Generates long-horizon `Task` objects (two domains) |
+---
+## `ConflictEvent` (Legacy)
+```python
+@dataclass
+class ConflictEvent:
+    id: str
+    title: str
+    story: str
+    primary_disruption: dict   # Metric deltas applied on env reset
+    decisions_required: list[str]
+    resource_budget: dict      # {"time", "money", "energy"}
+    difficulty: int            # 1–5
+```
+### Helper functions
+```python
+conflict = generate_conflict()              # random from all 15
+conflict = generate_conflict(difficulty=3)  # difficulty-3 pool
+escalated = escalate_conflict(conflict)     # 1.4× disruption, 0.7× budget
+new, reason = adaptive_escalate(conflict, agent_history)  # auto-tune
+```
+---
+## `TaskGenerator`
+```python
+generator = TaskGenerator()
+task = generator.generate()
+task = generator.generate(domain="flight_crisis", difficulty=4)
+task = generator.generate(domain="code_merge_crisis")
+```
+### Supported Domains
+| Domain | Goal |
+|---|---|
+| `flight_crisis` | Survive Airport Cancellation |
+| `code_merge_crisis` | Resolve Production Outage |
+Unknown domains fall back to `flight_crisis`.
+---
+## Adding a New Domain
+1. Add `generate_<domain>(self, difficulty) -> Task` to `TaskGenerator`.
+2. Add to the `if/elif` in `generate()`.
+3. Update this file and `docs/INDEX.md` and `README.md`.
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | Initial doc created |

docs/eval.md ADDED Viewed

	@@ -0,0 +1,86 @@

+# eval.py — Evaluation Runner Reference
+`scripts/eval.py` — Standalone LifeStack evaluation runner using a random-action baseline.
+No model, no GPU, no API key required.
+---
+## Overview
+Runs N independent episodes against `LifeStackEnv` using uniformly random actions as a
+baseline policy. Prints a live per-episode table and aggregate statistics at the end.
+Useful for:
+- Verifying environment correctness after changes
+- Establishing a random-baseline reward floor before training
+- CI smoke checks (no external dependencies)
+---
+## Usage
+```bash
+# Default: 10 episodes, any domain
+python scripts/eval.py
+# 20 episodes, flight_crisis domain only
+python scripts/eval.py --episodes 20 --domain flight_crisis
+# Verbose per-step output
+python scripts/eval.py --episodes 5 --verbose
+```
+---
+## CLI Arguments
+| Argument | Type | Default | Description |
+|---|---|---|---|
+| `--episodes` | `int` | `10` | Number of episodes to run |
+| `--domain` | `str` | `None` | Optional domain filter passed to `TaskGenerator.generate()` |
+| `--verbose` | flag | `False` | Print per-step action, reward, and done status |
+Supported `--domain` values: `flight_crisis`, `code_merge_crisis` (or omit for random).
+---
+## Output
+### Per-episode table
+```
+   EP   TOTAL REWARD   STEPS  DOMAIN                SUCCESS
+  ────  ────────────  ──────  ────────────────────  ───────
+     1        0.3120       8  flight_crisis               ✗
+     2        1.8450      12  code_merge_crisis            ✓
+```
+### Aggregate stats
+```
+  ──────────────────────────────────────────────────────────
+  Episodes     : 10
+  Mean Reward  : 0.8231
+  Success Rate : 30.0%
+  Mean Steps   : 10.4
+```
+---
+## Action Space (Random Baseline)
+Each step samples uniformly from:
+`execute`, `inspect`, `plan`, `wait`, `communicate`, `spend`, `delegate`
+- `execute` actions target a real route ID from the active task.
+- `inspect` actions target a real hidden-state key from the active task.
+- Other actions apply a small random metric nudge and resource cost.
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | File created — implements random baseline evaluation runner |

docs/lifestack_env.md ADDED Viewed

	@@ -0,0 +1,131 @@

+# lifestack_env.py — Environment Reference
+`core/lifestack_env.py` — The main OpenEnv-compatible RL environment for LifeStack.
+---
+## Overview
+`LifeStackEnv` wraps the full simulation: metric cascades, world events, partial
+observability, route execution, milestone tracking, and reward calculation.
+Key classes in this file:
+| Class | Role |
+|---|---|
+| `LifeStackAction` | Pydantic action schema (metric_changes, resource_cost, action_type, …) |
+| `LifeStackObservation` | Pydantic observation schema (metrics, resources, step, done, reward, metadata) |
+| `LifeStackState` | Internal state (current_metrics, budget, task, world_state, hidden_state, …) |
+| `PartialObsFilter` | Converts full world state into the agent's partial observation |
+| `WorldEngine` | Fires deterministic/probabilistic ExoEvents each step |
+| `LifeStackEnv` | The environment itself — inherits from OpenEnv `Environment` |
+---
+## API
+### `LifeStackEnv.__init__(seed, task, max_steps=30)`
+```python
+env = LifeStackEnv()
+env = LifeStackEnv(seed=42, max_steps=50)
+```
+### `LifeStackEnv.reset(...) -> LifeStackObservation`
+```python
+obs = env.reset(task=my_task, episode_id="ep_001")
+```
+Parameters:
+- `task` — a `Task` object (from `core/task.py`). Defaults to `FlightCrisisTask()`.
+- `seed` — optional int for reproducibility.
+- `conflict` — legacy `ConflictEvent` for metric disruption on reset.
+- `budget` — dict with `time`, `money`, `energy` overrides.
+- `person` — optional `SimPerson` for personality-driven drift.
+### `LifeStackEnv.step(action) -> LifeStackObservation`
+```python
+obs = env.step(LifeStackAction(action_type="execute", target="rebook_premium"))
+```
+Supported `action_type` values:
+| Type | Effect |
+|---|---|
+| `inspect` | Reveals a hidden-state key into the observation |
+| `execute` | Attempts to activate a Route by `target` (route id) |
+| `wait` | Passes the step; triggers stress penalty after 4 consecutive waits |
+| `rollback` | Reverts metrics/budget to the previous step (one-time per episode) |
+| `plan` / `communicate` / `spend` / `delegate` | Apply `metric_changes` and `resource_cost` |
+### `LifeStackEnv.render()`
+Prints a colour-coded terminal summary of the current state and task progress.
+---
+## PartialObsFilter
+```python
+PartialObsFilter.filter(task, revealed_keys) -> dict
+```
+- Base: `task.visible_world` (always visible).
+- Keys in `revealed_keys` that exist in `task.mutable_world` → added as-is.
+- Keys in `revealed_keys` that exist in `task.hidden_state` → wrapped as
+  `{"value": <val>, "source": "inspect"}` to signal the agent they came from inspect.
+---
+## Observation `metadata` fields
+```python
+obs.metadata = {
+    "world_state":     dict,   # partial view after filter
+    "goal":            str,
+    "active_route":    str | None,
+    "milestones":      list[str],
+    "events":          list[str],
+    "success":         bool,
+    "failure":         bool,
+    "failure_reason":  str,
+    "routes_remaining": int,
+    "breakdown":       dict,   # reward component breakdown
+    "info":            list[str],  # step-level diagnostic messages
+}
+```
+Key `info` message prefixes:
+| Prefix | Meaning |
+|---|---|
+| `INSPECT_REVEALED:` | Key added to inspected list |
+| `INSPECT_REVEALED_HIDDEN:` | Key was in `hidden_state` — value included |
+| `INSPECT_REDUNDANT:` | Key already revealed, no-op |
+| `ROUTE_SUCCESS:` | Route executed and consequences applied |
+| `ROUTE_BLOCKED:` | Route was closed by a prior ExoEvent |
+| `PRECONDITIONS_FAILED:` | Route preconditions not met |
+| `MILESTONE_UNLOCKED:` | A milestone condition was met |
+| `EVENT_FIRED:` | An ExoEvent triggered this step |
+| `WAIT_CAP_EXCEEDED:` | 4+ consecutive waits — stress penalty applied |
+---
+## End Conditions
+| Condition | `done` | `success` | `failure` |
+|---|---|---|---|
+| `step_count >= max_steps` | ✅ | depends | — |
+| All `success_conditions` met | ✅ | ✅ | — |
+| `failure_condition` met | ✅ | — | ✅ |
+| Any metric hits 0 | ✅ | — | ✅ |
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | `PartialObsFilter.filter()` now reads `mutable_world` + `hidden_state` directly from `Task`; removed `world` param; hidden keys wrapped with `source: inspect`; `INSPECT_REVEALED_HIDDEN` info message added |

docs/memory.md ADDED Viewed

	@@ -0,0 +1,90 @@

+# memory.md — LifeStackMemory Reference
+`agent/memory.py` — ChromaDB-backed trajectory and human-feedback storage.
+---
+## Overview
+`LifeStackMemory` persists two types of data:
+| Collection | What's stored |
+|---|---|
+| `collection` (trajectories) | Successful episode decisions — action type, reward, reasoning |
+| `feedback_collection` | Real-world outcome feedback submitted via the Follow-up tab |
+Only trajectories with `total_reward >= 2.0` are stored (threshold prevents noise).
+---
+## API
+### Instantiation
+```python
+from agent.memory import LifeStackMemory
+memory = LifeStackMemory(silent=True)           # default path
+memory = LifeStackMemory(silent=True, path="./my_memory")  # custom path
+```
+The module-level singleton in `app.py` is named `MEMORY`:
+```python
+MEMORY = LifeStackMemory(silent=True)
+```
+### `store_trajectory(...)`
+```python
+memory.store_trajectory(
+    conflict_title="Friday 6PM",
+    route_taken="communicate",
+    total_reward=2.5,
+    metrics_diff_str="career.workload: -15.0",
+    reasoning="Delegating resolved workload spike",
+)
+```
+Silently skips storage if `total_reward < 2.0`.
+### `store_feedback(feedback: OutcomeFeedback)`
+```python
+from core.feedback import OutcomeFeedback
+feedback = OutcomeFeedback(
+    episode_id="A1B2C3D4",
+    overall_effectiveness=8,
+    domains_improved=["career", "mental_wellbeing"],
+    domains_worsened=[],
+    unexpected_effects="Felt more confident",
+    resolution_time_hours=2.0,
+)
+memory.store_feedback(feedback)
+```
+Used by the **Follow-up** tab in `app.py`.
+### `get_stats() -> dict`
+```python
+stats = memory.get_stats()
+# {
+#   "total_memories": 42,
+#   "average_reward": 2.71,
+#   "by_action_type": {"communicate": 18, "delegate": 12, ...}
+# }
+```
+### `query(conflict_description, n_results=3) -> list[dict]`
+Retrieves the most semantically similar past decisions for a given situation description.
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | `AGENT_MEMORY` reference in `app.py` corrected to `MEMORY` (the actual singleton) |

docs/reward.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# reward.md — Reward System Reference
+`core/reward.py` — Task-aware reward orchestrator.
+---
+## Overview
+Two reward functions are available:
+| Function | Used when |
+|---|---|
+| `compute_reward(...)` | Legacy / no-task episodes |
+| `compute_task_reward(...)` | All task-driven episodes (v2.0+) |
+---
+## `compute_task_reward` — Components
+```
+reward = (0.35 × milestone)          # Reaching key progress markers
+       + (0.25 × completion)         # Final goal achievement (binary 1.0 if any goal met)
+       + (0.15 × outcome)            # Isolated local metric improvement
+       + (0.10 × replan_bonus)       # Recovery after ExoEvents
+       + (0.10 × efficiency)         # Resource preservation relative to delta
+       + (0.05 × reasoning)          # Logical coherence & action alignment
+       + penalties
+```
+### Penalties
+| Penalty | Value | Level | Trigger |
+|---|---|---|---|
+| `INACTION_PENALTY` | `-0.40` | Step | `actions_taken == 0` |
+| `TASK_INACTION_PENALTY` | `-0.20` | Task | `actions_taken == 0` (additive to step penalty) |
+| `CRITICAL_FLOOR_VIOLATION` | `-0.50` | Step | Any metric drops below 20 |
+| `DEAD_END` | `-0.50` | Task | All viable routes closed without success |
+| `CASCADE_SPREAD_WIDER` | `-0.30` | Step | Changes spread wider than disruption baseline |
+| `RELATIONSHIP_COLLAPSE` | `-0.15` | Step | Relationships drop more than 20 points in one step |
+| `CUMULATIVE_RELATIONSHIP_EROSION` | `-0.15` | Episode | Cumulative relationship drop more than 20 points |
+| `PLAUSIBILITY_VIOLATION` | `-0.10 to -0.30` | Step | Implausible metric/cost ratio |
+| `TIMEOUT` | `-0.20` | Task | Max steps reached without resolution |
+---
+## Return Value
+Both functions return `(reward: float, breakdown: dict)`, but the component keys differ slightly.
+```python
+breakdown = {
+    "components": {
+        # compute_reward(...)
+        "outcome": float,
+        "containment": float,
+        "efficiency": float,
+        "preservation": float,
+        "format_compliance": float,
+        "plausibility": float,
+        "reasoning_alignment": float,
+        # compute_task_reward(...)
+        "local_metric_delta": float,
+        "milestone": float,
+        "completion": float,
+        "replan": float,
+        "reasoning": float,
+        "timeout_penalty": float,
+    },
+    "penalties_fired": list[str],
+    "base_reward": float,
+    "penalties_total": float,
+}
+```
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | Initial doc created |

docs/scripts.md ADDED Viewed

	@@ -0,0 +1,85 @@

+# scripts.md — Other Scripts Reference
+Reference for scripts not covered by dedicated doc files.
+---
+## `scripts/run_episode.py`
+Runs a single full episode with the LLM agent (requires API key).
+```bash
+python scripts/run_episode.py
+python scripts/run_episode.py --difficulty 3 --verbose
+```
+Returns a result dict with `total_reward`, `steps`, `domain`.
+---
+## `scripts/train.py`
+Legacy training loop (pre-TRL). Uses a simple policy gradient loop without curriculum.
+Prefer `train_trl.py` for new training runs.
+---
+## `scripts/smoke_test.py`
+Quick sanity check — imports all core modules, resets the env once, takes one step.
+No agent required. Exits with code 0 on success.
+```bash
+python scripts/smoke_test.py
+```
+---
+## `scripts/test_lifestack.py`
+Full edge-case test suite (11 tests). Does not use pytest runner by default —
+run directly or via `pytest scripts/test_lifestack.py`.
+```bash
+python scripts/test_lifestack.py
+pytest scripts/test_lifestack.py -v
+```
+Tests requiring `OPENAI_API_KEY` are automatically skipped when the key is absent.
+### Tests
+| # | Name | What it checks |
+|---|---|---|
+| 1 | Cascade floor | Metrics never go below 0 |
+| 2 | Cascade ceiling | Metrics never exceed 100 |
+| 3 | Resource exhaustion | `deduct()` returns False without going negative |
+| 4 | Inaction penalty | `INACTION_PENALTY` fires when `actions_taken=0` |
+| 5 | Critical floor penalty | `CRITICAL_FLOOR_VIOLATION` fires below threshold |
+| 6 | Cascade dampening | Second-order deltas < first-order delta |
+| 7 | SimPerson uptake bounds | All uptake values in [0.1, 1.0] |
+| 8 | Memory threshold | Only reward >= 2.0 stored |
+| 9 | Episode termination | `done=True` after horizon steps |
+| 10 | Task-driven smoke | Inspect + Route execute without crash |
+| 11 | Full episode smoke | `run_episode()` returns float reward *(skipped without API key)* |
+---
+## `scripts/longitudinal_demo.py`
+Seeds Arjun's multi-week journey into ChromaDB and renders a comparison view.
+Used by Tab 4 (Arjun's Journey) in `app.py`.
+---
+## `scripts/validate_simperson.py`
+Validates all `SimPerson` personality trait combinations produce valid uptake values.
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | `test_lifestack.py` — `steps<=5` assertion fixed to `steps<=30`; `import pytest` added; `@pytest.mark.skipif` added to test 11 |

docs/task.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# task.py — Task Schema Reference
+`core/task.py` — Dataclass definitions for the LifeStack long-horizon episode schema.
+---
+## Overview
+A `Task` is the complete specification of a single episode. It defines what the agent
+must achieve, how the world can change around it, and what routes are available.
+---
+## Dataclasses
+### `Task`
+```python
+@dataclass
+class Task:
+    id: str                          # Unique task identifier
+    domain: str                      # e.g. "flight_crisis", "code_merge_crisis"
+    goal: str                        # Human-readable goal description
+    constraints: dict                # e.g. {"budget_max": 800, "deadline_step": 10}
+    hidden_state: dict               # Keys not visible without inspect
+    mutable_world: dict              # Keys that can change during the episode
+    visible_world: dict              # Keys always visible in the observation
+    success_conditions: list[dict]   # [{key, value}] — all must be met
+    failure_conditions: list[dict]   # [{key, value}] — any triggers failure
+    event_schedule: list[ExoEvent]   # Deterministic/probabilistic events
+    viable_routes: list[Route]       # Available action paths
+    milestones: list[Milestone]      # Progress checkpoints
+    horizon: int                     # Max steps per episode
+    difficulty: int                  # 1–5 scale
+    domain_metadata: dict            # Free-form extra info (e.g. {"story": "..."})
+```
+### `Route`
+```python
+@dataclass
+class Route:
+    id: str
+    name: str
+    description: str
+    required_action_types: list[str]  # e.g. ["communicate", "spend"]
+    preconditions: dict               # World/hidden state conditions that must be true
+    consequences: dict                # World state mutations on success
+    closes_routes: list[str]          # Route IDs that become unavailable after this
+    milestones_unlocked: list[str]    # Milestone IDs unlocked on route success
+    final_reward: float               # Bonus reward on route completion
+```
+### `Milestone`
+```python
+@dataclass
+class Milestone:
+    id: str
+    description: str
+    condition_key: str    # World/hidden state key to check
+    condition_value: Any  # Value it must equal for milestone to be met
+    reward: float         # Reward added when milestone is first reached
+```
+### `ExoEvent`
+```python
+@dataclass
+class ExoEvent:
+    step: int             # Step at which to fire (-1 = probabilistic each step)
+    probability: float    # Firing probability if step == -1
+    id: str
+    description: str
+    world_mutation: dict         # Applied to mutable_world on fire
+    hidden_state_mutation: dict  # Applied to hidden_state on fire
+    closes_routes: list[str]     # Routes closed when this event fires
+```
+---
+## Built-in Tasks
+| Class | Domain | Description |
+|---|---|---|
+| `FlightCrisisTask` | `flight_crisis` | Cancelled flight — rebook or work from lounge |
+---
+## Creating a Custom Task
+```python
+from core.task import Task, Route, Milestone, ExoEvent
+my_task = Task(
+    id="my_task",
+    domain="my_domain",
+    goal="Do the thing",
+    constraints={"budget_max": 500, "deadline_step": 8},
+    hidden_state={"secret_key": True},
+    mutable_world={},
+    visible_world={"public_info": "visible"},
+    success_conditions=[{"key": "done", "value": True}],
+    failure_conditions=[],
+    event_schedule=[],
+    viable_routes=[
+        Route(id="r1", name="Route One", description="...",
+              required_action_types=["execute"],
+              preconditions={}, consequences={"done": True},
+              closes_routes=[], milestones_unlocked=[], final_reward=1.0)
+    ],
+    milestones=[],
+    horizon=20,
+    difficulty=2,
+    domain_metadata={"story": "A short story about the crisis."}
+)
+```
+Then pass it to the environment:
+```python
+env = LifeStackEnv()
+obs = env.reset(task=my_task)
+```
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | Initial doc created |

docs/train_trl.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# train_trl.py — GRPO Training Reference
+`scripts/train_trl.py` — Curriculum GRPO training via HuggingFace TRL + Unsloth.
+---
+## Overview
+Trains a small LLM (default: `Qwen2.5-1.5B-Instruct`) to resolve LifeStack life conflicts
+using **Group Relative Policy Optimization (GRPO)**. Implements a success-based curriculum
+that automatically increases difficulty when the agent's average reward exceeds 0.6.
+Requires: `unsloth`, `trl`, `datasets`, `transformers`, `accelerate` (Colab / GPU).
+---
+## Usage
+```bash
+# Full curriculum training (5 stages × 100 prompts)
+python scripts/train_trl.py
+```
+No CLI args — edit constants at the top of the file to change stages/prompts/output dir.
+---
+## Architecture
+### Reward Functions (multi-signal GRPO)
+| Function | Signal |
+|---|---|
+| `reward_format_fn` | JSON format compliance |
+| `reward_plausibility_fn` | Penalises zero-cost metric changes |
+| `reward_task_success_fn` | Core env-step outcome reward |
+| `reward_milestone_fn` | Milestone progress bonus |
+| `reward_reasoning_fn` | Planning coherence score |
+| `reward_human_feedback_fn` | Alignment with past real-world outcome feedback |
+### `get_lifestack_evaluation(completion, prompt) -> dict`
+The central reward computation function. Parses the LLM's JSON completion, reconstructs
+the Task from the prompt's `<SYSTEM_METADATA>` block, steps the env, and returns:
+```python
+{
+    "reward": float,
+    "breakdown": dict,   # from obs.metadata["breakdown"]
+    "action": LifeStackAction
+}
+```
+Returns `{"reward": -0.5, "breakdown": {"error": ...}}` on any parse or env failure.
+#### Task Construction Hardening (2026-04-23)
+The `Task(...)` call inside `get_lifestack_evaluation` is wrapped in its own
+`try/except`. On exception, logs `[reward] Task construction failed: <error>` and
+returns the `-0.5` fallback immediately. A field-presence check on
+`(id, goal, constraints, mutable_world, visible_world)` follows construction.
+### Curriculum (`train_curriculum`)
+```
+Stage 1: difficulty=1 → train → eval → if avg_reward > 0.6: difficulty++
+Stage 2: difficulty=2 → ...
+...
+Stage 5: difficulty=5 → final save
+```
+### Dataset (`generate_dataset`)
+Generates `N` prompts by:
+1. Sampling a `TaskGenerator` task (flight_crisis or code_merge_crisis)
+2. Merging a legacy `ConflictEvent` disruption for variety
+3. Cascading the disruption through the `DependencyGraph`
+4. Embedding task metadata in a `<SYSTEM_METADATA>` block for reward reconstruction
+---
+## Outputs
+| Path | Contents |
+|---|---|
+| `./lifestack_model/` | Final saved model + tokenizer |
+| `./lifestack_model/stage_N/` | Per-stage checkpoints |
+| `training_logs/generations.jsonl` | Sampled generations (every 20 reward calls) |
+| `grpo_reward_curve.png` | 50-episode eval reward curve |
+---
+## Change Log
+| Date | Change |
+|---|---|
+| 2026-04-23 | `Task()` construction wrapped in try/except + field validation; returns -0.5 fallback on failure |