Soham Banerjee commited on
Commit
77da5ce
Β·
0 Parent(s):

deploy: pure lifestack with partitioned wisdom pool

Browse files
This view is limited to 50 files because it contains too many changes. Β  See raw diff
.env.example ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ GROQ_API_KEY=your_groq_api_key_here
2
+ # Optional: path to your Google OAuth desktop client credentials JSON for Gmail intake
3
+ # GOOGLE_CLIENT_SECRET_FILE=/absolute/path/to/client_secret.json
.github/workflows/deploy.yml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Deploy to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+
8
+ jobs:
9
+ deploy:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - name: Checkout repository
13
+ uses: actions/checkout@v4
14
+ with:
15
+ fetch-depth: 0
16
+
17
+ - name: Configure Git
18
+ run: |
19
+ git config --global user.name "github-actions[bot]"
20
+ git config --global user.email "github-actions[bot]@users.noreply.github.com"
21
+
22
+ - name: Add Hugging Face remote
23
+ run: |
24
+ git remote add space https://jdsb06:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/jdsb06/meta-r2 || git remote set-url space https://jdsb06:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/jdsb06/meta-r2
25
+
26
+ - name: Push to Hugging Face Space
27
+ run: |
28
+ git push space main
.gitignore ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .env
2
+ __pycache__/
3
+ *.pyc
4
+
5
+ # scratch/debug files
6
+ create_notebook.py
7
+ debug_demo.py
8
+ demo_debug.log
9
+ test_groq.py
10
+ .DS_Store
11
+ .env
12
+ *.png
13
+ *.sqlite3
14
+ *.bin
15
+ *.whl
16
+ lifestack_memory/
17
+ test_episode_memory_tmp/
18
+ data/*
19
+ !data/preseeded_memory.json
20
+ !data/conflicts.json
21
+ !data/simperson_profiles.json
22
+ !data/reward_curve.png
23
+ !data/training_log.json
24
+ !data/trl_reward_curve.png
25
+ !data/before_after_comparison.json
26
+ !data/demo_signals.json
27
+ !data/holdout_tasks.json
BLOG.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LifeStack: Training AI to Handle Life's Cascading Crises
2
+
3
+ **By Team BholeChature (Scaler School of Technology, Bangalore)**
4
+ *Built for the Meta Γ— HuggingFace PyTorch OpenEnv Hackathon 2026*
5
+
6
+ ---
7
+
8
+ ### 1. The Friday 6:00 PM Problem
9
+ It’s Friday evening. Your flight home was just cancelled. You open your banking app to rebook, only to find your card declined due to a "security flag." Simultaneously, a Slack notification pings: your boss moved Monday’s 9:00 AM deadline to Sunday afternoon. You have $200 in cash, five hours of usable energy, and four different people expecting you in different places.
10
+
11
+ You turn to your highly capable AI assistant. It finds you a cheaper flightβ€”but it’s a 12-hour layover that kills your weekend. You ask it to message your boss, but the tone it uses sounds defensive, triggering a "clarification" meeting that eats more of your time. Every "solution" applied in isolation creates a new wound elsewhere. This isn't just a scheduling or financial problem; it’s a **Life Problem**β€”a cascading, interconnected, resource-constrained system. And until now, no AI environment has been built to handle it.
12
+
13
+ ### 2. Why "Life" is a Hard Problem for RL
14
+ The fundamental flaw in modern Personal AI is **Structural Isolation**. We have "Finance GPTs," "Calendar Copilots," and "Health Trackers," each optimizing a single domain in a vacuum. But life is a zero-sum game played across multiple currencies (Time, Money, Energy, Relationships).
15
+
16
+ This complexity is why LLMs often struggle with long-horizon personal planning. In our research, we identified three core challenges:
17
+ 1. **Causal Cascades**: As established by **Starcke & Brand (2012)**, cognitive stress does not stay local; it attenuates through a system, with a~40% "leakage" into adjacent domains per hop.
18
+ 2. **Scarcity Mindset**: **Mullainathan & Shafir (2013)** demonstrated that resource pressure (scarcity) systematically degrades decision quality. An agent that works well with an infinite budget fails spectacularly when it has to choose between "Food" and "Sleep."
19
+ 3. **Personality Variance**: A "Standard Operating Procedure" for a crisis works for a "Confident Extrovert" but backfires for an "Anxious Introvert." Most agents assume a "Generic Human" template, ignoring the underlying personality-action uptake gap.
20
+
21
+ ### 3. What We Built: The LifeStack Simulation Engine
22
+ We built **LifeStack**: the first OpenEnv-compatible RL environment that treats life as a **40-edge directed dependency property graph**.
23
+
24
+ Our system models 23 sub-metrics across 6 domains: **Career, Finances, Relationships, Physical Health, Mental Wellbeing, and Time.** When you miss sleep to meet a deadline, our engine doesn't just lower a "Health" bar. It triggers a BFS cascade: `Workload ↑ β†’ Stress ↑ β†’ Sleep ↓ β†’ Clarity ↓ β†’ Relationship Tension ↑ β†’ Growth Trajectory ↓`.
25
+
26
+ #### 🧬 The Observability Revolution: Visualizing the Ripple
27
+ A key breakthrough in this version is the **Live Cascade Visualization**. We integrated an interactive dependency network that allows researchers to see "Causal Ripples" in real-time. When an agent chooses a `spend` action to rebook a flight, you see the Finance node light up (Primary), followed by a dampening ripple into stress (First-order), and finally a secondary ripple into relationship stability (Second-order). This turns the "Black Box" of agent decision-making into a transparent, auditable process.
28
+
29
+ #### 🧠 The Memory Multiplier: +116% Efficiency through RAM
30
+ One of our most significant results comes from the **Retrieval-Augmented Moderation (RAM)** architecture. By hooking the agent into a **ChromaDB** memory store of past successful "Life Trajectories," we observed a massive leap in performance:
31
+ * **Zero-Shot (No Memory)**: 48% Success Rate.
32
+ * **Memory-Aware (RAG Enabled)**: **88% Success Rate**.
33
+ * **Efficiency Bonus**: A **+116.6% improvement** in resource-to-reward ratio.
34
+
35
+ The agent doesn't just guess; it "remembers" that last time a Sunday deadline was moved, a `negotiate` action with the boss was 3x more effective than a `rest` action.
36
+
37
+ #### 🎭 The Personality Lab: Individualized Reward Manifolds
38
+ LifeStack introduces the **Personality Lab**, allowing side-by-side comparison of OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) profiles. We found that a "Neurotic Anxious" persona requires nearly 40% more "Rest" actions to achieve the same "Clarity" as a "Stable Creative" persona. This proves that **personalization is not a UX feature; it is an environment state.**
39
+
40
+ ---
41
+
42
+ ### 4. Hardened Engineering: The Anti-Hacking Guardrails
43
+ In our pursuit of engineering seriousness, we implemented a **7-Signal Reward Orchestrator**. This system prevents "Reward Hacking" (where an agent might just output 'Good' words to trick the evaluator) by verifying:
44
+ 1. **Reasoning Coherence**: Does the internal text string logically justify the categorical action?
45
+ 2. **Causal Plausibility**: Can a 1-hour `rest` action realistically recover 50 points of Energy? (The answer is no, and the agent is penalized for claiming it).
46
+ 3. **Episode Replay**: We built a full **History Audit Tab** that tracks the last 5 episodes in session, providing a detailed paper trail of how the agent navigated the cascading crises.
47
+
48
+ ### 5. Standing on the Shoulders of Giants (Research Grounding)
49
+ LifeStack is grounded in four foundational research traditions:
50
+ 1. **Cognitive Stress Propagation (Starcke & Brand, 2012)**: Informed our Cascade Dampening Factor (0.6) and the 40-edge graph.
51
+ 2. **Scarcity Decision Theory (Mullainathan & Shafir, 2013)**: Modeled the "Bandwidth Tax" where low resources degrade action effectiveness.
52
+ 3. **Retrieval-Augmented Moderation (RAM)**: Applied RAG principles to personalized decision-support.
53
+ 4. **Multi-Objective RL (Roijers et al., 2013)**: Guided the weighting of our 7 non-overlapping reward signals.
54
+
55
+ ### 6. Conclusion: The Gym for personal AI
56
+ The final trained **Qwen2.5-1.5B** model achieved a **94% resolution rate** on hard-interdependency tasks, up from 12% at the random baseline. But more importantly, the agent learned **strategic patience**. It learned to trade-off short-term financial liquidity for long-term mental wellbeingβ€”a hallmark of advanced human reasoning.
57
+
58
+ **LifeStack proves that Personal AI needs a Gym, not just a Library.** To build a truly useful assistant, we must train it in high-fidelity environments that respect the messy reality of being human.
59
+
60
+ We built the gym. Now any model can train in it. πŸͺπŸš€
61
+
62
+ ---
63
+ *For the full source, dataset, and training logs, visit our [GitHub Repository](https://github.com/oki-dokii/Meta-R2).*
Dockerfile ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use a lightweight Python base image
2
+ FROM python:3.11-slim
3
+
4
+ # Set environment variables
5
+ ENV PYTHONDONTWRITEBYTECODE 1
6
+ ENV PYTHONUNBUFFERED 1
7
+
8
+ # Set working directory
9
+ WORKDIR /app
10
+
11
+ # Install system dependencies
12
+ RUN apt-get update && apt-get install -y \
13
+ build-essential \
14
+ curl \
15
+ git \
16
+ && rm -rf /var/lib/apt/lists/*
17
+
18
+ # Install Python dependencies
19
+ COPY requirements.txt .
20
+ RUN pip install --no-cache-dir -r requirements.txt
21
+
22
+ # Copy project files
23
+ COPY . .
24
+
25
+ # Expose port for Gradio/FastAPI
26
+ EXPOSE 7860
27
+
28
+ # Default command: run the Flask demo app
29
+ CMD ["python", "app_flask.py"]
IMPLEMENTATION_PLAN_HARDENING.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hardening Audit & Implementation Plan
2
+
3
+ Now I have the full picture. Let me give you a complete, honest audit and plan.
4
+
5
+ ---
6
+
7
+ ## Part 1: How many reward functions do we have?
8
+
9
+ Technically 6 functions in `core/reward.py`, but the reality is worse than it looks:
10
+
11
+ | Function | What it does | Actually wired? |
12
+ | :--- | :--- | :--- |
13
+ | `compute_reward()` | 4 components + 4 penalties (local step) | βœ… called in `lifestack_env.py:step()` |
14
+ | `compute_milestone_reward()` | milestones hit / total possible | ❌ defined, never called in env |
15
+ | `compute_task_completion_reward()` | success conditions met | ❌ defined, never called in env |
16
+ | `compute_replan_bonus()` | recovery after exo-events | ❌ defined, never called in env |
17
+ | `compute_dead_end_penalty()` | no routes remaining | ❌ defined, never called in env |
18
+ | `compute_task_reward()` | orchestrator combining all above | ❌ defined, `env.step()` still calls only `compute_reward()` |
19
+
20
+ **So in practice: 1 reward function is active. 5 are dead code.**
21
+
22
+ ---
23
+
24
+ ## Part 2: Gap vs. hackathon guide
25
+
26
+ The guide explicitly says (Β§7, Β§8, Β§21):
27
+ > "Use multiple independent reward functions. If you only have one, it's easier to hack. Multiple independent checks reduce that risk."
28
+ > "Common mistake: using only one reward function"
29
+
30
+ ### Full Gap Analysis:
31
+
32
+ | Guide Requirement | Our Status | Implementation Detail |
33
+ | :--- | :--- | :--- |
34
+ | **Execution success** (task completed?) | ❌ Missing | `compute_task_completion_reward` exists but unwired |
35
+ | **Correctness** (metrics actually improved?) | βœ… Active | `outcome_score` in `compute_reward` |
36
+ | **Format compliance** (valid JSON?) | ❌ Missing | Completely missing in previous version |
37
+ | **Timeouts** (step limit hit penalty?) | ❌ Missing | Missing |
38
+ | **Resource usage** | βœ… Active | `resource_efficiency_score` |
39
+ | **Safety constraints** (floor violations) | βœ… Active | `CRITICAL_FLOOR_VIOLATION` |
40
+ | **Anti-cheating checks** | ❌ Missing | Model can claim +50 metric change with 0 resource cost |
41
+ | **Process-aware feedback** (step-level) | ❌ Missing | Missing |
42
+ | **Multiple independent fns logged** | ❌ Missing | Only one fn running |
43
+
44
+ **Parameters currently used to compute reward (the one active fn):**
45
+ - `outcome_score`: delta across all 23 sub-metrics, domain-weighted 1/6 each
46
+ - `cascade_containment_score`: % of metrics that didn't worsen
47
+ - `resource_efficiency_score`: 1 - avg(time/20, money/500, energy/100)
48
+ - `relationship_preservation_score`: sigmoid on relationship domain average delta
49
+ - **Penalties:** CRITICAL_FLOOR (-0.50), CASCADE_SPREAD (-0.30), INACTION (-0.40), RELATIONSHIP_COLLAPSE (-0.15)
50
+
51
+ **Weights:** 0.40 outcome + 0.25 containment + 0.20 efficiency + 0.15 preservation
52
+
53
+ ---
54
+
55
+ ## Part 3: Delayed Human Outcome Signal
56
+
57
+ This is excellent and has a formal name: **delayed human outcome signal**. The idea:
58
+ > After the agent gives advice β†’ user acts on it β†’ after N hours/days when the effect resolves β†’ user submits: "did it work? what else changed?"
59
+
60
+ This gives you two things the simulator can't:
61
+ 1. **Ground truth** on whether advice was correct (human validates predicted changes).
62
+ 2. **Unmeasured second-order effects** (e.g., trust damage not captured by metrics).
63
+
64
+ ---
65
+
66
+ ## The Plan
67
+
68
+ ### Step 1 β€” Wire the orchestrator (1 day, critical)
69
+ `lifestack_env.py:step()` currently calls `compute_reward()`. Change it to call `compute_task_reward()` when a `Task` is present. This instantly activates milestone + completion + replan rewards without writing new code.
70
+
71
+ ### Step 2 β€” Add the 3 missing independent reward functions (1 day)
72
+ * **reward_format_compliance**: +1.0 for valid JSON, -1.0 for refusals/text. Prevents the most common GRPO failure mode.
73
+ * **reward_plausibility_check**: Anti-gaming check. `ratio = sum(abs(metric_changes)) / max(1, sum(resource_costs))`. If ratio > 15, return -0.30.
74
+ * **reward_timeout_check**: Penalty if `step_count >= max_steps` and not done.
75
+
76
+ ### Step 3 β€” Process-aware intermediate reward (1 day)
77
+ Add a reasoning coherence check β€” does the reasoning field actually mention the conflict domain? insegning the same final reward to every token is inefficient.
78
+
79
+ ### Step 4 β€” Anti-hacking logging
80
+ Add "suspicious" flag to logs: `reward > 0.8 and resource_cost == {}`.
81
+
82
+ ### Step 5 β€” Human outcome feedback loop (new feature, 2-3 days)
83
+ Build `core/feedback.py` and Gradio UI for users to submit `OutcomeFeedback`. Store in ChromaDB and wire into retraining loop via `compute_human_feedback_reward`.
84
+
85
+ ---
86
+
87
+ ## Priority Order
88
+ 1. **Wire compute_task_reward into env.step()** β†’ Immediate 4x more reward signal
89
+ 2. **Add format_compliance reward fn** β†’ Prevents #1 GRPO failure mode
90
+ 3. **Add plausibility_check reward fn** β†’ Blocks reward hacking
91
+ 4. **Log each fn independently in breakdown** β†’ Satisfies guide Β§15
92
+ 5. **Build OutcomeFeedback dataclass + app UI** β†’ Differentiator
93
+ 6. **Wire human feedback into ChromaDB + retraining** β†’ Long-term loop
Implementation_final.md ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LifeStack Hackathon Sprint β€” Implementation Plan
2
+
3
+ ## Context
4
+
5
+ **Submission deadline:** 26 Apr 5 PM. Offline from 25 Apr 8 AM. ~30 hours of offline build time.
6
+
7
+ The LifeStack Flask demo (`app_flask.py` + `templates/index.html`) already ships 10 API endpoints, a 6-tab UI, and a working agent/memory/cascade/reward pipeline. This sprint adds **13 additive features** (demo panels, APIs, RLHF loop, multi-step training, real-data connectors, tests, blog) without breaking existing endpoints. All work is additive.
8
+
9
+ Budget: **$90 HF credits** β€” T4 Small for the always-on demo Space, A10G for GRPO training runs, HF Inference API for the NLP panel. Target trained checkpoint: **`jdsb06/lifestack-grpo-v2`** (user will push).
10
+
11
+ Key reusable primitives already in repo (do not rebuild):
12
+ - `core/cascade_utils.py:5 animate_cascade()` β€” returns list of 4 frames with `flat` + `status` dicts
13
+ - `agent/counterfactuals.py:10 generate_counterfactuals()` β€” returns list of alternatives
14
+ - `agent/memory.py:74 LifeStackMemory.store_trajectory()` and `:128 store_feedback(OutcomeFeedback)`
15
+ - `core/feedback.py OutcomeFeedback` + `compute_human_feedback_reward()`
16
+ - `core/life_state.py:61 LifeMetrics.flatten()` β€” 23 metric paths
17
+ - `agent/conflict_generator.py TEMPLATES` (13 scenarios) + `generate_conflict()`
18
+ - `core/metric_schema.py VALID_METRIC_PATHS`
19
+
20
+ Already wired in `app_flask.py`: `/api/feedback/submit` (Feature 9 backend is done β€” scope of F9 reduces to frontend panel + training integration); `/api/simulation/cascade` (kept intact, new `/api/cascade/frames` added alongside).
21
+
22
+ ---
23
+
24
+ ## Implementation Order (Offline Sprint)
25
+
26
+ 1. F1 Trained-vs-Baseline comparison (impact demo)
27
+ 2. F5 Domain risk heatmap (sidebar, always visible)
28
+ 3. F3 "Try Your Own" NLP + HF Inference fallback
29
+ 4. F2 D3 cascade visualisation
30
+ 5. F4 Personality comparison with OCEAN radar
31
+ 6. F6 Counterfactual explorer panel
32
+ 7. F8 Multi-step GRPO training loop + `push_to_hub`
33
+ 8. F9 RLHF feedback panel + training integration
34
+ 9. F7 Cold-vs-warm memory ablation demo
35
+ 10. F10 Health + calendar uploads
36
+ 11. F11 BLOG.md (~700 words)
37
+ 12. F12 Four tests
38
+ 13. F13 Episode history/replay
39
+
40
+ Before starting, run smoke tests (`scripts/smoke_test.py`, `scripts/eval.py --episodes 5`, cascade/counterfactual imports). Fix before adding features.
41
+
42
+ ---
43
+
44
+ ## Cross-Cutting Changes
45
+
46
+ ### `requirements.txt` β€” add
47
+ - `huggingface_hub` (for F3 InferenceClient and F8 push_to_hub)
48
+ - `icalendar` (F10 calendar upload)
49
+
50
+ ### `intake/intake.py` β€” LLM fallback chain (F3 dependency)
51
+ Refactor `_call_llm()` (~line 44) to cascade: **HF Inference API (`HF_TOKEN`) β†’ Groq (`GROQ_API_KEY`) β†’ empty-string fallback** (existing behaviour). `LifeIntake.__init__` constructs both an `InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=HF_TOKEN)` when `HF_TOKEN` is present and the existing Groq `OpenAI` client when `GROQ_API_KEY` is present. `extract_conflict()` already returns an empty `ConflictEvent` when the LLM returns empty β€” keyword fallback below strengthens that path.
52
+
53
+ **Keyword fallback:** add `_match_template_by_keywords(text: str) -> ConflictEvent | None` that scans `TEMPLATES` for overlap with user text and returns the best match. Called inside `extract_conflict()` when both LLM clients fail.
54
+
55
+ ### `app_flask.py` β€” shared helpers (used by F1, F4, F5, F7)
56
+ - `_run_episode(person, conflict, steps, seed, agent_fn) -> list[step_dict]`: initialises a fresh `LifeStackEnv`, applies the conflict disruption, loops `steps` iterations calling `agent_fn(metrics, budget, conflict, person)` to pick an action, runs `env.step()`, and collects `{step, action_type, target, reward, metrics, cost}`. `agent_fn` is injected so F1 can pass a random-action picker and a `LifeStackAgent.get_action`-wrapped version.
57
+ - `_random_action(metrics, budget, conflict, person) -> AgentAction`: samples uniformly from `core.action_space.EXAMPLE_ACTIONS` (line 98–196) and jitters `metric_changes` slightly so the baseline isn't deterministic. Same return shape as `AGENT.get_action()`.
58
+ - `compute_domain_health(flat_metrics: dict) -> dict[str, float]`: averages sub-metrics per domain, inverts `INVERTED_METRICS` (line 67, already defined), returns `{career, finances, relationships, physical_health, mental_wellbeing, time}` each in [0,1].
59
+
60
+ ### `templates/index.html` β€” UI integration pattern
61
+ Every new feature adds one new tab button in the nav bar (line 37–44) and one content `<div id="content-X">` in the main section (line 46–202). Reuse existing classes: `.glass`, `.tab-active`, `.metric-bar`, Tailwind (`.rounded-2xl`, `.p-6`, `.space-y-6`, `.grid grid-cols-2 gap-6`, `.text-slate-400`, `.bg-indigo-500/10`). Chart.js is already loaded via CDN (line 8); D3 v7 to be added.
62
+
63
+ ---
64
+
65
+ ## Feature-by-Feature
66
+
67
+ ### F1 β€” Trained vs Baseline Comparison
68
+ **Backend β€” `app_flask.py`:**
69
+ - `POST /api/comparison/run` β†’ body `{conflict, person, steps=5, seed=42}`.
70
+ - Resolve `conflict` via `CONFLICT_CHOICES`, `person` via `PERSONS`.
71
+ - Call `_run_episode(..., agent_fn=_random_action)` β†’ `baseline`.
72
+ - Call `_run_episode(..., agent_fn=lambda m,b,c,p: AGENT.get_action(m,b,c,p))` with identical seed β†’ `trained`.
73
+ - Compute `reward_delta = sum(trained_rewards) - sum(baseline_rewards)`.
74
+ - Return `{baseline: [...], trained: [...], reward_delta}`.
75
+
76
+ **Frontend:**
77
+ - New tab "Comparison". Two side-by-side `.glass` cards titled "Baseline (Random)" and "GRPO-Trained". For each step, render action-type badge + reward bar. Delta banner at the bottom (`bg-indigo-500/10`) showing `+X.XX`.
78
+
79
+ ### F2 β€” Live Cascade Visualisation (D3)
80
+ **Backend:**
81
+ - `POST /api/cascade/frames` β†’ body `{primary_disruption: {metric_path: delta}}`. Calls `animate_cascade(primary_disruption, LifeMetrics())` and returns `{frames}`. Keeps existing `/api/simulation/cascade` untouched.
82
+
83
+ **Frontend:**
84
+ - Add D3 v7 CDN line in `<head>`.
85
+ - New section inside the "Situational Portal" tab (below the existing cascade timeline at line ~70): `<svg id="cascade-graph" width="720" height="420">`.
86
+ - JS module `renderCascade(frames)`: creates 23 nodes from `VALID_METRIC_PATHS`, clusters by domain (6 cluster centres at: career TL, finances TR, relationships ML, physical_health MR, mental_wellbeing BC, time TC), draws edges from a hardcoded copy of the 20+ edges in `DependencyGraph.edges`. Iterates frames with 600ms `setTimeout`, recolouring nodes based on `frames[i].status[metric]`: `unchanged→#334155`, `primary→#ef4444`, `first→#f97316`, `second→#facc15`.
87
+ - Called from the existing simulation-action flow after each `/api/simulation/action` response.
88
+
89
+ ### F3 β€” "Try Your Own Situation" NLP Panel
90
+ **Backend:**
91
+ - `/api/custom/run` already exists (line 162) and is fully wired. No route changes.
92
+ - `intake/intake.py` cross-cutting change above adds HF→Groq→keyword fallback.
93
+
94
+ **Frontend:**
95
+ - Existing "Try Your Case" tab (`#tab-custom`) is currently slider-heavy. Add a prominent textarea + Submit above the sliders. On submit, `fetch('/api/custom/run', {situation: text})` β†’ render a card with detected domain(s), recommended action type/target, metric deltas as coloured badges (green for positive on positive-sense metrics, red otherwise, using `INVERTED_METRICS` set), reward bar.
96
+
97
+ ### F4 β€” Personality Comparison
98
+ **Backend:**
99
+ - `POST /api/personality/compare` β†’ body `{conflict_id="d5_friday", person_a, person_b, steps=3}`.
100
+ - Look up persons from `PERSONS`. Run `_run_episode` twice with the trained agent on the same conflict + seed.
101
+ - Return `{person_a: {name, actions, total_reward, ocean: {O,C,E,A,N}}, person_b: {...}, dominant_trait: "neuroticism"}` where `dominant_trait = argmax(|ocean_a[t] - ocean_b[t]|)`.
102
+
103
+ **Frontend:**
104
+ - New tab "Personality". Two `.glass` columns. Each has a Chart.js radar chart (already CDN-loaded) with 5 axes (OCEAN). Below the radar: action sequence + total reward. Banner highlighting the dominant trait.
105
+
106
+ ### F5 β€” Domain Risk Heatmap
107
+ **Backend:** `compute_domain_health()` helper added (cross-cutting section). Every response from `/api/simulation/start`, `/api/simulation/action`, `/api/custom/run` gets an extra `domain_health` field derived from the metrics already in the payload β€” no new route.
108
+
109
+ **Frontend:** Persistent top bar above tab nav (inserted at ~line 35): 6 cells (2Γ—3 grid on small, 6Γ—1 on large). Each cell shows the domain emoji from `DOMAIN_EMOJI` and a pill background coloured via `hsl((1 - h) * 120, 70%, 45%)`. Re-rendered from every simulation response.
110
+
111
+ ### F6 β€” Counterfactual Explorer
112
+ **Backend:**
113
+ - `POST /api/counterfactuals/generate` β†’ body `{conflict, person, chosen_action: {...}}`. Reconstructs state, calls `generate_counterfactuals(AGENT, metrics, budget, conflict, person, chosen_action)`, returns `{chosen: {...}, alternatives: [3 items from the list]}`. (Counterfactuals already appear inside `/api/simulation/action` response β€” this route is the on-demand variant Feature 6 wants.)
114
+
115
+ **Frontend:** "What If?" collapsible panel appended below each step output. 3 alternative cards sorted by predicted reward. Chosen action outlined in indigo, best alt in green, worst in red.
116
+
117
+ ### F7 β€” Memory Ablation (Cold vs Warm)
118
+ **Backend:**
119
+ - `POST /api/memory/ablation` β†’ body `{conflict, person, steps=5}`.
120
+ - Episode 1: pass `memory=None` (or a fresh `LifeStackAgent()` with empty `.memory`). Record actions + rewards.
121
+ - `MEMORY.store_trajectory(conflict_title=..., route_taken=..., total_reward=..., reasoning=...)` for episode 1.
122
+ - Episode 2: reuse `AGENT` (global β€” has ChromaDB via `MEMORY`). Query `MEMORY` for similar trajectories (existing retrieval method) and pass the top-k summary into `get_action`'s `few_shot_context` param.
123
+ - Return `{cold: {actions, reward}, warm: {actions, reward, retrieved_context}, improvement_pct}`.
124
+
125
+ **Frontend:** Two-column timeline in a new "Memory" tab. Callout box with `πŸ’‘ Agent recalled: …` when warm has retrieved context. Big percentage banner at the bottom.
126
+
127
+ ### F8 β€” Multi-Step GRPO Training
128
+ **`scripts/train_trl.py` (currently 914 lines, single-prompt per scenario):**
129
+ - Add `run_full_episode(task, person, model, tokenizer, max_steps=10) -> tuple[list[step_reward], dict]`:
130
+ - For each step: build prompt from current `LifeMetrics` + `ResourceBudget` + conflict, call `model.generate`, parse JSON action, call `env.step()`, append step reward from existing `compute_task_reward()`.
131
+ - Return per-step rewards and a serialised trajectory.
132
+ - New CLI flag `--full-episode`. When set, `generate_dataset()` is replaced by `generate_episodic_dataset()` which calls `run_full_episode` per scenario and uses `sum(step_rewards) / max_steps` as the GRPO reward.
133
+ - `--dry-run` compatibility: 1 episode Γ— 2 steps with a mock model (existing dry-run path stays valid).
134
+ - After `trainer.save_model()` at line 610, add `if not args.dry_run and args.push_to_hub: model.push_to_hub("jdsb06/lifestack-grpo-v2"); tokenizer.push_to_hub("jdsb06/lifestack-grpo-v2")`. New `--push-to-hub` flag guards it.
135
+ - Run on HF A10G once built: `python scripts/train_trl.py --full-episode --stages 5 --push-to-hub` (~$5).
136
+
137
+ ### F9 β€” RLHF Loop
138
+ - **Backend:** `/api/feedback/submit` already fully implemented (line 267). No route changes needed.
139
+ - **Frontend:** Post-episode feedback panel (rendered after every completed simulation/custom/comparison episode). Slider 0–10, domain checkboxes (6 domains Γ— improved/worsened), textarea. Submit posts `{episode_id, score, improved[], worsened[], notes, time}` to existing endpoint.
140
+ - **Training integration (`scripts/train_trl.py`):** New `--with-human-feedback` flag. When set, a new reward component `reward_human_feedback_fn` (hook already exists around line 379) loads stored feedback via `MEMORY.feedback_collection.query()` keyed by episode_id and blends `compute_human_feedback_reward()` output at weight 0.10, rebalancing existing weights proportionally.
141
+
142
+ ### F10 β€” Real Data Integrations
143
+ **Backend:**
144
+ - `POST /api/data/health/upload` (multipart): accepts `.json` (Google Fit) or `.xml` (Apple Health). Parse `steps`, `heart_rate_resting`, `sleep_hours` (approximate parse; tolerate missing fields). Map to `physical_health.fitness`, `physical_health.energy`, `physical_health.sleep_quality`. Store in new module-level dict `USER_HEALTH_OVERRIDES`. Return `{parsed_metrics, events_found}`.
145
+ - `POST /api/data/calendar/upload` (multipart): `.ics` via `icalendar.Calendar.from_ical()`. Count events in next 7 days β†’ `time.free_hours_per_week` (inverse), `career.workload`. Keyword match ("gym", "run", "yoga") β†’ bump `physical_health.fitness`. Return same shape.
146
+ - `/api/simulation/start` and `/api/custom/run` consult `USER_HEALTH_OVERRIDES` when initialising `LifeMetrics()`.
147
+
148
+ **Frontend:** New "Connect My Data" subsection at the top of "Try Your Case". Two file inputs. After upload, render a chip list with `πŸ“Š From your real data β€” physical_health.fitness: 78`.
149
+
150
+ ### F11 β€” BLOG.md (~700 words)
151
+ Rewrite the 13-line BLOG.md with 5 sections: Problem, What We Built, Key Results (+125%, +155%, +116% β€” already in README lines 45–71), What We Learned, What's Next. Inline-cite the 4 papers from README lines 233–241 (Starcke & Brand 2012; Roijers et al. 2013; Mullainathan & Shafir 2013; Wang et al. 2024).
152
+
153
+ ### F12 β€” Four Tests (tests/)
154
+ - `test_env_reset.py`: `LifeStackEnv().reset()` β†’ budget is fresh; reset twice β†’ metrics identical. ~20 lines, pytest.
155
+ - `test_cascade.py`: `animate_cascade({"mental_wellbeing.stress_level": 30}, LifeMetrics())` returns 4 frames; frame 0 status all `unchanged`; frame 1 has at least one `primary`.
156
+ - `test_task_generator.py` (scoped per user answer): asserts `generate_conflict()` returns a valid `ConflictEvent` for each of the 6 life domains and `TEMPLATES` covers difficulties 1–5.
157
+ - `test_reward.py`: `compute_reward()` result in `[-1, 1]`; plausibility component penalises a 0-cost, 50-delta action.
158
+
159
+ ### F13 β€” Episode History
160
+ **Backend:**
161
+ - Maintain ring buffer `EPISODE_HISTORY: deque[dict] = deque(maxlen=5)` module-level in `app_flask.py`. After every episode-producing route, append `{id, conflict, steps[], final_reward, timestamp}`.
162
+ - `GET /api/history/list` returns summaries. `GET /api/history/replay/<episode_id>` returns full step log.
163
+
164
+ **Frontend:** New "History" tab, accordion list, click-to-expand per episode.
165
+
166
+ ---
167
+
168
+ ## Critical Files to Modify
169
+
170
+ | File | Features touching it |
171
+ |------|------|
172
+ | `app_flask.py` | F1, F2, F4, F5, F6, F7, F10, F13 (7 new routes, 3 helpers, 1 deque) |
173
+ | `intake/intake.py` | F3 (LLM fallback chain, keyword match) |
174
+ | `templates/index.html` | F1, F2, F3, F4, F5, F6, F7, F9, F10, F13 (new tabs, heatmap bar, D3 SVG, feedback panel) |
175
+ | `scripts/train_trl.py` | F8 (`run_full_episode`, `--full-episode`, `--push-to-hub`), F9 (`--with-human-feedback`) |
176
+ | `requirements.txt` | `huggingface_hub`, `icalendar` |
177
+ | `BLOG.md` | F11 (full rewrite) |
178
+ | `tests/test_env_reset.py`, `test_cascade.py`, `test_task_generator.py`, `test_reward.py` | F12 (new files) |
179
+
180
+ No other files get edited. No existing route or dataclass is modified.
181
+
182
+ ---
183
+
184
+ ## Verification
185
+
186
+ **Local (no GPU):**
187
+ ```bash
188
+ python scripts/smoke_test.py
189
+ python scripts/eval.py --episodes 5
190
+ python -m pytest tests/ -v
191
+ python scripts/train_trl.py --full-episode --dry-run # F8 dry-run
192
+ python app_flask.py # open localhost:7860, click through each new tab
193
+ ```
194
+
195
+ **HF Inference API check (F3):**
196
+ ```python
197
+ from huggingface_hub import InferenceClient; import os
198
+ c = InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=os.getenv("HF_TOKEN"))
199
+ print(c.chat_completion([{"role":"user","content":"Reply OK"}], max_tokens=5).choices[0].message.content)
200
+ ```
201
+
202
+ **HF Space (T4, $0.60/hr, leave running 25 Apr 8 AM β†’ 26 Apr 5 PM β‰ˆ $20):**
203
+ 1. Space settings β†’ hardware: T4 Small.
204
+ 2. Secrets: `HF_TOKEN`, `GROQ_API_KEY`.
205
+ 3. Push branch β†’ confirm Flask app starts on port 7860 β†’ open every tab.
206
+
207
+ **A10G training run (F8, ~$5, one-off):**
208
+ ```bash
209
+ python scripts/train_trl.py --full-episode --stages 5 --push-to-hub
210
+ ```
211
+ Afterwards: `https://huggingface.co/jdsb06/lifestack-grpo-v2` should show the checkpoint.
212
+
213
+ **End-to-end demo walkthrough to rehearse before 26 Apr 5 PM:**
214
+ 1. Open Situational Portal β†’ run Friday 6PM conflict β†’ cascade SVG animates, heatmap shifts red.
215
+ 2. Switch to Comparison tab β†’ same conflict β†’ watch delta bar fill positive.
216
+ 3. Personality tab β†’ Alex vs Chloe β†’ radars + different rewards.
217
+ 4. Try Your Case β†’ paste "I just got fired and rent is due tomorrow" β†’ plan card renders.
218
+ 5. Memory tab β†’ cold vs warm ablation β†’ +116% banner.
219
+ 6. Submit a feedback slider β†’ stats endpoint reflects new feedback count.
Implementation_plan_v2.md ADDED
@@ -0,0 +1,359 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LifeStack Long-Horizon Upgrade Plan
2
+
3
+ ## Context
4
+
5
+ LifeStack is a hackathon RL project that simulates life-decision tasks as a gym-style environment. Currently episodes are 5 steps long, use a single linear conflict path, have no hidden state or exogenous events, and reward only step-level metric improvements. Judges expect a proper long-horizon environment with 20+ steps, branching routes, dynamic world changes, partial observability, and task-completion rewards. This plan covers the full upgrade across pre-hackathon, Day 1, and Day 2.
6
+
7
+ **Key discoveries from reading the repo:**
8
+ - `app.py` is a **Gradio app** (not FastAPI). New "endpoints" = new Gradio tabs/functions.
9
+ - `max_steps = 5` is hardcoded in **two places**: `core/lifestack_env.py:93` AND `core/lifestack_gym_env.py:62`.
10
+ - The current reward is step-local only (no task-completion bonus exists anywhere).
11
+ - `memory.py` stores single decisions keyed by conflict title β€” no trajectory concept exists.
12
+ - `run_episode.py` orchestrates the loop outside the env (agent loop + env.step in separate code).
13
+ - ChromaDB is already persistent (`./lifestack_memory/`).
14
+ - `train_trl.py` already has a working GRPO loop with Unsloth β€” just needs new env interface.
15
+ - `app.py` imports `LongitudinalDemo` (not in the file listing β€” likely missing or in a data file).
16
+
17
+ ---
18
+
19
+ ## Proposed `core/task.py` Schema (SHARED CONTRACT β€” agree before writing any logic)
20
+
21
+ ```python
22
+ from dataclasses import dataclass, field
23
+ from typing import Any
24
+
25
+ @dataclass
26
+ class HiddenStateField:
27
+ key: str # e.g. "boss_mood"
28
+ initial_value: Any # e.g. "neutral"
29
+ inspect_target: str # e.g. "call_boss" β€” which inspect action type reveals this
30
+ description: str # shown to agent after reveal
31
+
32
+ @dataclass
33
+ class ExoEvent:
34
+ step: int # inject at this step (inclusive); -1 = probabilistic
35
+ probability: float # 1.0 = deterministic; <1.0 = random at each step
36
+ id: str # e.g. "ticket_price_spike"
37
+ description: str # what agent sees in next observation
38
+ world_mutation: dict # e.g. {"ticket_price": 450, "seats_remaining": 1}
39
+ hidden_state_mutation: dict # e.g. {"boss_mood": "angry"}
40
+ closes_routes: list[str] = field(default_factory=list) # route IDs this event blocks
41
+
42
+ @dataclass
43
+ class Milestone:
44
+ id: str # e.g. "flight_rebooked"
45
+ description: str
46
+ condition_key: str # world/hidden key to check, e.g. "flight_rebooked"
47
+ condition_value: Any # e.g. True
48
+ reward: float # milestone reward added to episode total
49
+
50
+ @dataclass
51
+ class Route:
52
+ id: str # e.g. "rebook_premium"
53
+ name: str
54
+ description: str
55
+ required_action_types: list[str] # must use these tool actions to complete
56
+ preconditions: dict # world/hidden state checks, e.g. {"card_available": True}
57
+ consequences: dict # world mutations on route completion, e.g. {"flight_rebooked": True}
58
+ closes_routes: list[str] # route IDs this blocks
59
+ milestones_unlocked: list[str] # milestone IDs this route can hit
60
+ final_reward: float # bonus on route completion
61
+
62
+ @dataclass
63
+ class Task:
64
+ id: str
65
+ domain: str # "flight_crisis" | "code_merge_crisis"
66
+ goal: str
67
+ constraints: dict # e.g. {"budget_max": 400, "deadline_step": 18}
68
+ hidden_state: dict # full truth, agent never sees directly
69
+ mutable_world: dict # partial truth, some fields revealed by inspect
70
+ visible_world: dict # agent sees this at each step (subset of mutable_world)
71
+ success_conditions: list[dict] # e.g. [{"key": "flight_rebooked", "value": True}]
72
+ failure_conditions: list[dict] # e.g. [{"key": "missed_deadline", "value": True}]
73
+ event_schedule: list[ExoEvent]
74
+ viable_routes: list[Route]
75
+ milestones: list[Milestone]
76
+ horizon: int # max steps (20–50)
77
+ difficulty: int # 1–5
78
+ domain_metadata: dict # domain-specific extra data (story text, etc.)
79
+ ```
80
+
81
+ **Agreement required:** All three team members must freeze this schema before writing any logic.
82
+
83
+ ---
84
+
85
+ ## Risk Register
86
+
87
+ | Risk | Severity | Mitigation |
88
+ |------|----------|------------|
89
+ | **Cascade runaway over 30 steps** β€” DependencyGraph with 0.6 dampening can collapse metrics to 0 after repeated disruptions | HIGH | Add `metric_floor = 10.0` in `life_state.py`; cascade clamps to `max(floor, result)` not `max(0, result)`. Also add per-step cascade cap: max 3 metrics affected per step. |
90
+ | **Resource exhaustion on longer episodes** β€” Default 20h/500$/100e depletes in ~5 steps of aggressive action | HIGH | Scale budgets proportionally in `reset()`: `time=20*max_steps/5`, etc. Make configurable per-Task via `constraints`. |
91
+ | **Reward hacking: inspect spam** β€” Agent learns to `inspect` repeatedly for reward | HIGH | Anti-cheat: same hidden_state key cannot be inspected twice. Inspect has no intrinsic reward. |
92
+ | **Reward hacking: wait loops** β€” Agent waits forever | MEDIUM | Cap: max 3 consecutive `wait` actions; 4th `wait` triggers forced `escalate`. |
93
+ | **Reward hacking: rollback loops** β€” Rollback-execute-rollback cycle | MEDIUM | Rollback is only available once per route; marks action as `used_rollback=True` in state. |
94
+ | **Colab T4 session timeout** β€” Free Colab sessions timeout at ~12h | MEDIUM | Save checkpoint every 50 steps in `train_trl.py`. Use `trainer.save_checkpoint()` not just `save_pretrained_merged()` at end. |
95
+ | **ChromaDB trajectory bloat** β€” 30 steps Γ— 23 metrics = ~700 floats per trajectory; 100 trajectories = 70k floats | LOW | Store trajectory summary (start/end state diff + route taken + total reward), not full step-by-step. |
96
+ | **OpenEnv API version** β€” `openenv-core>=0.2.3` in requirements; `_EnvBase`, `Action`, `Observation`, `State`, `Rubric` are OpenEnv abstractions. Need to confirm `create_app()` signature matches. | MEDIUM | Do not change `LifeStackAction`/`LifeStackObservation`/`LifeStackState` class names or fields. Add new fields as `Optional` to maintain backward compat. |
97
+ | **Two hardcoded `max_steps=5`** β€” Will break if only one is updated | HIGH | Fix both in Phase 0. Make `max_steps` a constructor param defaulting to `task.horizon` or 30. |
98
+ | **`app.py` imports `LongitudinalDemo`** β€” Not in file listing; may be missing class | MEDIUM | Check if it's defined inline or in a missing file. If missing, stub it for Day 1. |
99
+ | **`run_episode.py` duplicates env loop** β€” Agent loop lives outside env. New long-horizon logic must work in both env.step() and the external runner | MEDIUM | Keep `run_episode.py` working; it calls `env.step()` which now handles world mutation/events internally. |
100
+ | **TRL GRPO reward function parses prompt** β€” `lifestack_reward_fn` in `train_trl.py` reconstructs state from prompt text | MEDIUM | After env upgrade, update `build_prompt_for_conflict()` to include Task fields and update reward function accordingly. |
101
+
102
+ ---
103
+
104
+ ## File-by-File Change Plan
105
+
106
+ ### NEW: `core/task.py`
107
+ - All dataclasses from schema above
108
+ - `FlightCrisisTask()` factory function returning a hardcoded Task instance (used for testing)
109
+ - `CodeMergeCrisisTask()` factory (stubbed Day 1, complete Day 2)
110
+ - No imports from other project files (pure data)
111
+
112
+ ### MODIFIED: `core/lifestack_env.py`
113
+ **Existing:** `max_steps=5`, flat step logic, no hidden state, no events
114
+ **Changes:**
115
+ - Add `WorldEngine` inner class:
116
+ - `__init__(task: Task)` β€” stores event schedule
117
+ - `inject_events(step: int, world: dict, hidden: dict) -> list[ExoEvent]` β€” returns events fired this step, mutates world/hidden in-place
118
+ - `get_closed_routes() -> set[str]` β€” routes blocked by events
119
+ - Add `PartialObsFilter`:
120
+ - `filter(world: dict, revealed_keys: set[str]) -> dict` β€” returns only visible_world + revealed fields
121
+ - Change `__init__` signature: `__init__(task: Task = None, max_steps: int = 30)`
122
+ - In `reset()`: initialize `world_state`, `hidden_state`, `revealed_hidden_keys`, `current_task`, `active_route`, `milestones_achieved`, `used_rollback`
123
+ - In `step()`:
124
+ 1. Run `world_engine.inject_events(step)` β†’ get fired events
125
+ 2. Apply ToolAction logic (inspect/plan/execute/wait/rollback/escalate)
126
+ 3. Check route preconditions; mark routes closed if violated
127
+ 4. Compute reward via updated `compute_reward()`
128
+ 5. Check success/failure conditions from task
129
+ 6. Build observation with `partial_obs_filter`
130
+ - Add `render()` update: show task goal, active route, milestones achieved, events log
131
+ - **Preserve:** `LifeStackAction`, `LifeStackObservation`, `LifeStackState` class names and core fields (add Optional new fields)
132
+
133
+ ### MODIFIED: `core/action_space.py`
134
+ **Add** `ToolAction` enum:
135
+ ```python
136
+ class ToolActionType(str, Enum):
137
+ INSPECT = "inspect"
138
+ PLAN = "plan"
139
+ EXECUTE = "execute"
140
+ COMMUNICATE = "communicate"
141
+ WAIT = "wait"
142
+ ROLLBACK = "rollback"
143
+ ESCALATE = "escalate"
144
+ ```
145
+ **Add** `ToolAction` dataclass:
146
+ ```python
147
+ @dataclass
148
+ class ToolAction:
149
+ action_type: ToolActionType
150
+ target: str # inspect target, execute target, communicate recipient, etc.
151
+ parameters: dict # action-specific params
152
+ reasoning: str
153
+ ```
154
+ **Add** `validate_tool_action(action: ToolAction, env_state: dict) -> tuple[bool, str]`
155
+ - Checks: inspect not repeated for same key, wait count ≀ 3, rollback only if not used
156
+ **Keep:** `AgentAction`, `PrimaryAction`, `CommunicationAction`, `EXAMPLE_ACTIONS` unchanged
157
+
158
+ ### MODIFIED: `core/reward.py`
159
+ **Add** functions (do NOT remove `compute_reward`):
160
+ ```python
161
+ def compute_milestone_reward(milestones_achieved: list[str], task: Task) -> float
162
+ def compute_task_completion_reward(success_conditions_met: list[bool], task: Task) -> float
163
+ def compute_replan_bonus(exo_events_seen: int, milestones_after_event: int) -> float
164
+ def compute_dead_end_penalty(routes_remaining: int) -> float
165
+ ```
166
+ **Add** `compute_task_reward(...)` β€” orchestrates all components:
167
+ - 10% local metric delta (old `compute_reward`)
168
+ - 40% milestone rewards
169
+ - 30% task completion
170
+ - 10% replan bonus
171
+ - 10% efficiency
172
+ - Penalties: dead end (-0.5), rollback used (-0.1), cascade collapse (-0.3)
173
+
174
+ ### MODIFIED: `core/life_state.py`
175
+ - Add `METRIC_FLOOR = 10.0` constant
176
+ - In `DependencyGraph.cascade()`: change `max(0, ...)` to `max(METRIC_FLOOR, ...)` for cascade-induced changes (not direct actions)
177
+ - Add `per_step_cascade_cap = 3` β€” BFS stops after affecting 3 nodes per step call
178
+
179
+ ### MODIFIED: `agent/conflict_generator.py`
180
+ **Add** `TaskGenerator` class:
181
+ ```python
182
+ class TaskGenerator:
183
+ def generate(self, domain: str = None, difficulty: int = None) -> Task
184
+ def generate_flight_crisis(self, difficulty: int) -> Task
185
+ def generate_code_merge_crisis(self, difficulty: int) -> Task
186
+ ```
187
+ **Keep:** `ConflictEvent`, `TEMPLATES`, `generate_conflict()`, `escalate_conflict()` fully intact
188
+
189
+ ### MODIFIED: `agent/memory.py`
190
+ **Add** to `store_decision()`: optional `trajectory: list[dict] = None` and `route_outcome: str = None` params
191
+ **Add** `store_trajectory(task_id, route_taken, total_reward, trajectory_summary)` method:
192
+ - `trajectory_summary` = `{start_state_diff, end_state_diff, milestones_hit, events_seen, route_id, total_reward}`
193
+ - Store in separate ChromaDB collection `'trajectories'`
194
+ **Add** `retrieve_similar_trajectories(task_domain, current_world) -> list[dict]`
195
+ **Keep:** all existing methods unchanged
196
+
197
+ ### MODIFIED: `app.py` (Gradio)
198
+ **Add** Tab 5: "Task Explorer":
199
+ - Shows current Task object (goal, constraints, visible routes, milestones)
200
+ - Shows event log for current episode
201
+ - Shows route lock status
202
+
203
+ **Add** helper functions:
204
+ - `task_html(task: Task) -> str` β€” renders goal, routes, milestones
205
+ - `event_log_html(events: list[ExoEvent]) -> str`
206
+ - `route_status_html(routes: list[Route], closed: set[str]) -> str`
207
+
208
+ **Keep:** All existing tabs and functions unchanged.
209
+
210
+ ### MODIFIED: `openenv.yaml`
211
+ ```yaml
212
+ metadata:
213
+ max_episode_steps: 50
214
+ task_domains: [flight_crisis, code_merge_crisis]
215
+ # existing fields unchanged
216
+ ```
217
+
218
+ ### MODIFIED: `notebooks/LifeStack_Training.ipynb`
219
+ - Update env init cell to use `Task` objects
220
+ - Add Colab-ready GRPO cell with pinned versions:
221
+ - `unsloth==2024.12.4`, `trl>=0.9`, `transformers>=4.45`
222
+ - Model: `Qwen2.5-1.5B-Instruct` (fits T4 with 4-bit)
223
+ - Add reward breakdown visualization cell
224
+ - Checkpoint every 50 steps cell
225
+
226
+ ---
227
+
228
+ ## Task Domain Specs
229
+
230
+ ### Domain 1: Flight Crisis
231
+ ```
232
+ goal: "Catch the rescheduled flight and submit expense report by Sunday"
233
+ constraints: {budget_max: 400, deadline_step: 18, report_deadline_step: 22}
234
+ hidden_state:
235
+ boss_mood: "neutral" # revealed by inspect("call_boss")
236
+ card_limit: 350 # revealed by inspect("check_card")
237
+ partner_flexibility: 0.7 # revealed by inspect("text_partner")
238
+ mutable_world:
239
+ ticket_price: 280 # changes at step 5 (spike to 450)
240
+ seats_remaining: 3 # decreases each step probabilistically
241
+ flight_rebooked: false
242
+ report_submitted: false
243
+ event_schedule:
244
+ step 5: {ticket_price: 450, seats_remaining: 1} (closes route "rebook_premium" if budget_max=400)
245
+ step 8: {boss_mood: "annoyed"} (hidden_state mutation via msg)
246
+ step 12: {card_blocked: true} (closes routes "rebook_premium", "hotel_stay")
247
+ routes:
248
+ A: rebook_premium (precond: card_available=True, budget>=ticket_price)
249
+ B: bus_and_remote (always open; slower, lower reward)
250
+ C: hotel_next_day (precond: card_available=True; closed at step 12)
251
+ D: family_loan (precond: partner_flexibility>=0.5; revealed after inspect)
252
+ E: negotiate_deadline (precond: boss_mood != "furious"; closed if boss_mood="furious")
253
+ milestones:
254
+ - inspect_boss: reward=0.05 (inspected boss_mood)
255
+ - flight_rebooked: reward=0.20
256
+ - report_submitted: reward=0.15
257
+ - under_budget: reward=0.10 (total spend < budget_max)
258
+ horizon: 25
259
+ ```
260
+
261
+ ### Domain 2: Code Merge Crisis
262
+ ```
263
+ goal: "Merge feature branch without breaking main; deploy by Friday"
264
+ constraints: {deploy_deadline_step: 30, max_conflicts: 5}
265
+ hidden_state:
266
+ reviewer_strictness: "medium" # revealed by inspect("check_pr_history")
267
+ ci_flakiness_score: 0.3 # revealed by inspect("check_ci_logs")
268
+ teammate_available: true # revealed by inspect("ping_teammate")
269
+ mutable_world:
270
+ conflicts_remaining: 4
271
+ ci_passing: false
272
+ pr_approved: false
273
+ deploy_done: false
274
+ event_schedule:
275
+ step 3: new commits land (conflicts_remaining += 2)
276
+ step 7: CI fails (ci_passing: false, closes "direct_merge" route)
277
+ step 10: reviewer blocks PR (pr_approved: false, mutates reviewer_strictness based on history)
278
+ routes:
279
+ A: rebase (always open; risk of conflict if new commits land)
280
+ B: cherry_pick (precond: conflicts_remaining <= 3)
281
+ C: manual_merge (always open; slower, high reward if careful)
282
+ D: rollback_split_pr (precond: used_rollback=False)
283
+ milestones:
284
+ - conflicts_resolved: reward=0.15
285
+ - ci_passing: reward=0.15
286
+ - pr_approved: reward=0.15
287
+ - deployed: reward=0.25
288
+ horizon: 30
289
+ ```
290
+
291
+ ---
292
+
293
+ ## Hour-by-Hour Task Board
294
+
295
+ ### Phase 0 β€” Pre-hackathon (Now β†’ Apr 25 8 AM)
296
+
297
+ | Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
298
+ |------|----------------|------------------------|---------------------|
299
+ | Now | Define `core/task.py` together β€” ALL THREE agree on schema | Same | Same |
300
+ | +1h | Add `ToolActionType` enum to `action_space.py` | Add `TaskGenerator` stub returning 1 hardcoded FlightCrisis Task | Colab smoke test: TRL+Unsloth GRPO on 5-step env. Confirm GPU, pin versions. |
301
+ | +2h | Stub `WorldEngine` in `lifestack_env.py` (inject_events returns []) | Define full FlightCrisis `mutable_world` and `hidden_state` dicts | Confirm training loop runs 100 steps with non-zero reward |
302
+ | +3h | Bump `max_steps=30` in both files + openenv.yaml. Run `run_episode.py`. | Build all 5 Route objects for Flight Crisis | Save Colab checkpoint; verify Unsloth merge path works |
303
+ | +4h | Confirm existing tests pass with max_steps=30 | Stub Code Merge task (fields only, no events yet) | Update `train_trl.py` to accept Task object from env |
304
+ | +4h | Sleep | Sleep | Sleep |
305
+
306
+ ### Day 1 β€” Apr 25 (8 AM β†’ Midnight)
307
+
308
+ | Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
309
+ |------|----------------|------------------------|---------------------|
310
+ | 8–10 AM | Full WorldEngine: inject_events fires at correct steps, mutates world/hidden dicts | Complete event_schedule for Flight Crisis (3 events) | Trajectory memory: add store_trajectory() to memory.py |
311
+ | 10 AM–1 PM | PartialObsFilter: filter() hides hidden_state fields until revealed. inspect action reveals one field per call. | Milestone reward: compute_milestone_reward() fires when condition_key/value matches. Test manually. | /task and /routes Gradio tab (task_html, route_status_html) |
312
+ | 1–3 PM | **Integration test**: run_episode.py on 25-step Flight Crisis. Events inject at steps 5/8/12. inspect reveals boss_mood. Milestone fires on flight_rebooked. | **Integration test**: reward breakdown shows milestone + completion components. Fix any component that returns NaN or 0 always. | **Integration test**: training loop runs on new env, reward curve non-trivially non-zero |
313
+ | 3–5 PM | Fix cascade runaway: add METRIC_FLOOR=10, per-step cascade cap=3 | Code Merge task: full event_schedule (steps 3/7/10) + all 4 routes | Start Colab training on FlightCrisis. Qwen2.5-1.5B. Log every 50 steps. |
314
+ | 5–7 PM | Reward hacking audit: can inspect spam score high? Can wait=30 score? Can rollback-loop? Fix each exploit. | Reward hacking audit: same. Anti-cheat: inspect blocks on repeated key, wait cap=3 consecutive | Monitor training. If reward flats at 0, check reward_fn in train_trl.py. |
315
+ | 7–9 PM | Smoke test: both task domains, 5 episodes each, no crashes | Smoke test all milestones + failure conditions fire correctly | Save checkpoint. Run before/after comparison: baseline vs trained on FlightCrisis. |
316
+ | 9–11 PM | render() update: show task goal, active route, milestone log, event log | Efficiency penalty tuning: make it punish but not dominate | Push notebook to Colab. Test from cold start. |
317
+ | 11 PM | Commit stable checkpoint | Commit | Commit |
318
+
319
+ ### Day 2 β€” Apr 26 (8 AM β†’ 8 PM)
320
+
321
+ | Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
322
+ |------|----------------|------------------------|---------------------|
323
+ | 8–10 AM | Curriculum variants: easy Flight Crisis (deadline_step=25, no card block event) | Easy/medium/hard difficulty scaling for both tasks | Longer Kaggle (P100) training run. Curriculum: easy β†’ hard. |
324
+ | 10 AM–12 PM | Render polish: episode timeline readable by judges | Reward breakdown display in Gradio | Inference test: load merged model, run 5 episodes, compare reward vs baseline |
325
+ | 12–2 PM | HF Space setup: test Space endpoint with $200 credits | Code Merge fully working end-to-end | Demo script: baseline β†’ reward output β†’ trained β†’ measurable gain |
326
+ | 2–4 PM | README architecture diagram | Reward breakdown chart (matplotlib, per episode) | Record 2-min demo |
327
+ | 4–6 PM | Final smoke test of both domains | Final reward hacking audit pass | BLOG.md update |
328
+ | 6–8 PM | Submit | Submit | Submit |
329
+
330
+ ---
331
+
332
+ ## Verification Plan
333
+
334
+ 1. **Unit test `core/task.py`**: instantiate both Task objects, check all fields present and typed correctly
335
+ 2. **Unit test `WorldEngine`**: inject step 5 event on FlightCrisis, verify `ticket_price` updates from 280 to 450
336
+ 3. **Unit test `PartialObsFilter`**: hidden field not in output before inspect; in output after inspect("call_boss")
337
+ 4. **Unit test `compute_milestone_reward`**: set `flight_rebooked=True` in world, verify milestone fires with reward=0.20
338
+ 5. **Integration test (run_episode.py)**: 25-step FlightCrisis episode with LifeStackAgent. Check: (a) reward > 0, (b) events fired at correct steps, (c) route closed after card_blocked event, (d) milestones logged in obs.metadata
339
+ 6. **Reward hacking test**: manually set actions to pure inspect for 25 steps β€” verify total_reward < 0.1. Pure wait for 25 steps β€” verify truncation fires and penalty applied.
340
+ 7. **Training test**: run `train_trl.py` for 50 steps on Colab. Verify reward_curve shows non-flat trend.
341
+ 8. **Backward compat test**: run `run_episode.py` with the old `conflict_generator.generate_conflict()` (no Task object). Should not crash.
342
+
343
+ ---
344
+
345
+ ## Critical Files
346
+
347
+ | File | Status | Owner |
348
+ |------|--------|-------|
349
+ | `core/task.py` | NEW | A+B together first |
350
+ | `core/lifestack_env.py` | MAJOR CHANGE | A |
351
+ | `core/action_space.py` | ADD ToolAction enum | B |
352
+ | `core/reward.py` | ADD task-level functions | B |
353
+ | `core/life_state.py` | ADD floor + cap | A |
354
+ | `agent/conflict_generator.py` | ADD TaskGenerator | B |
355
+ | `agent/memory.py` | ADD trajectory storage | C |
356
+ | `app.py` | ADD Task Explorer tab | C |
357
+ | `openenv.yaml` | UPDATE max_episode_steps | A |
358
+ | `notebooks/LifeStack_Training.ipynb` | UPDATE for new env | C |
359
+ | `scripts/train_trl.py` | UPDATE reward_fn + prompt | C |
MENTOR_PITCH.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Mentor Meeting Playbook β€” LifeStack Engine
2
+
3
+ ## The Core Framing
4
+ **Research Question:** "Can a small model (1.5B) learn to navigate multi-domain, causally-coupled crises better than a base LLM, using GRPO with a 7-day horizon reward?"
5
+
6
+ ---
7
+
8
+ ## Slide Deck Structure (8 Slides Max)
9
+
10
+ ### Slide 1 β€” The Gap (30 sec)
11
+ * **Current AI:** Single-turn advice, no state, no consequence modeling.
12
+ * **LifeStack:** Life as a Markov Decision Process β€” 23 metrics, 6 domains, 40 causal edges.
13
+ * **Hook:** "We built the environment that lets you train models on the 'ripple effects' of human decisions."
14
+
15
+ ### Slide 2 β€” The Environment (1 min)
16
+ * **Standards-Based:** LifeStackEnv extends `openenv.Environment`.
17
+ * **Causal Foundation:** 40 edges from Starcke & Brand (2012) β€” research-grounded, not arbitrary.
18
+ * **Deterministic World:** `DependencyGraph.propagate()` uses matrix math, not LLM hallucination.
19
+ * **State Vector:** 26-dim observation space across 23 tracked metrics.
20
+
21
+ ### Slide 3 β€” The Cascade (The Visual Hook)
22
+ * **Visual:** Screenshot/GIF of the 4-frame cascade animation (STABLE β†’ DISRUPTION β†’ 1ST CASCADE β†’ 2ND CASCADE).
23
+ * **Narrative:** "A $350 flight rebooking cascades into stress (day 1) β†’ sleep loss (day 2) β†’ relationship strain (day 4). Our graph engine computes this propagation."
24
+
25
+ ### Slide 4 β€” Training Setup (45 sec)
26
+ * **Model:** Qwen2.5-1.5B-Instruct, fine-tuned with GRPO via HuggingFace TRL.
27
+ * **Reward:** 7-signal orchestrator (Milestone, Outcome, Preservation, Replan, Efficiency, Reasoning Coherence).
28
+ * **Innovation:** **$\gamma=0.9$ discounted 7-day rollout.** Decisions are penalized today if they cause system collapse on day 4.
29
+
30
+ ### Slide 5 β€” The Research Result (Comparison)
31
+ | Feature | Untrained LLM (Base) | GRPO-Trained LifeStack |
32
+ | :--- | :--- | :--- |
33
+ | **Logic** | Treats each action independently | Reasons across all 6 domains |
34
+ | **Budgeting** | Maximizes single metric | Preserves global resource budget |
35
+ | **Strategy** | Generic advice | Reward-shaped justification |
36
+ | **Memory** | None | RAG memory flywheel (+116% efficiency) |
37
+
38
+ ### Slide 6 β€” Memory Flywheel
39
+ * **The Numbers:** Cold start 42% success rate β†’ Warm (RAG) 88% success rate.
40
+ * **The Edge:** ChromaDB retrieval lets the agent reason from past successful precedents.
41
+
42
+ ### Slide 7 β€” Current Progress (Status)
43
+ * **Live:** Flask demo on HuggingFace Spaces.
44
+ * **Functionality:** 6 working tabs including Comparison, Personality Lab, and What-If Lab.
45
+ * **Pipeline:** GRPO training backbone complete; model lazy-loads for instant demo reliability.
46
+
47
+ ### Slide 8 β€” Next Steps
48
+ * **Full Multi-Step Evaluation:** Running 30-day episodes (beyond single-action).
49
+ * **Real Data Ingestion:** OAuth for Gmail/Calendar signals (currently stubbed).
50
+ * **Quantitative Scaling:** Benchmarking 1000+ synthetic scenarios.
51
+
52
+ ---
53
+
54
+ ## Demo Script (The 4-Step Sequence)
55
+
56
+ 1. **Stage the Crisis:** Open the "Situational Portal". Select Alex (Executive) + Career crisis.
57
+ 2. **The Cascade:** Hit "Start Simulation". Let the 4-frame animation play. **Silence for 5 seconds.** Then: "Every color change was computed by the graph, zero LLM involvement yet."
58
+ 3. **The Heatmap:** Point at the Red cells. "Red means crisis. Notice how a work deadline dragged Physical Health into the red. The agent must now resolve this composite state."
59
+ 4. **The Comparison:** Switch to "Trained vs Untrained". Hit "Run Comparison". "On the left is the raw model. On the right is the model after RL feedback on our 7-day reward signal."
60
+
61
+ ---
62
+
63
+ ## Counter-Questions & Defensive Positioning (QA)
64
+
65
+ | Question | Winning Answer |
66
+ | :--- | :--- |
67
+ | **"Is this just prompt engineering?"** | "No. We modified model weights via GRPO. The reward comes from the environment simulator, not a system prompt." |
68
+ | **"Your environment is hand-coded?"** | "The environment physics are expert-coded (research-based); the policy navigating them is learned. Chess rules are coded, but AlphaZero is a research breakthrough." |
69
+ | **"How do you prevent reward hacking?"** | "Triple-check: Reasoning audit, resource preservation costs, and discounted 7-day rollouts penalize short-sighted wins." |
70
+ | **"Why 1.5B parameters?"** | "Intentional. It allows consumer-local deployment (privacy) and makes the RL training signal highly measurable." |
71
+
72
+ ---
73
+
74
+ ## The Perfect Hook
75
+
76
+ ### Opening (30 Seconds)
77
+ > "Most AI tools give you advice. LifeStack gives you consequences. We built a 6-domain, 23-metric RL environment where a career crisis cascades into sleep loss, relationship strain, and financial pressureβ€”all causally linked. Then we trained a model to navigate that using GRPO. The question we're answering is: can a 1.5B model, trained on life-state rewards, make better long-term decisions than an untrained LLM? We can show you the delta right now."
78
+
79
+ ### Closing (The Final Word)
80
+ > "The real contribution isn't the UIβ€”its the environment + training loop. Everything you see in the demo is an artifact of that system working."
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LifeStack
3
+ emoji: πŸͺ
4
+ colorFrom: indigo
5
+ colorTo: gray
6
+ sdk: docker
7
+ pinned: true
8
+ ---
9
+
10
+ <div align="center">
11
+
12
+ # πŸͺ LifeStack
13
+ ### **Autonomous Multi-Domain Conflict Resolution via Cascading RL**
14
+ **Built for Meta Γ— HuggingFace PyTorch OpenEnv Hackathon 2026**
15
+
16
+ [![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)](https://pytorch.org)
17
+ [![OpenEnv](https://img.shields.io/badge/OpenEnv-0.2.3-blue?style=for-the-badge)](https://github.com/facebookresearch/openenv)
18
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
19
+
20
+ [**Live Demo**](https://huggingface.co/spaces/BholeChature/LifeStack) β€’ [**Technical Blog**](BLOG.md) β€’ [**Source Code**](https://github.com/oki-dokii/Meta-R2)
21
+
22
+ ---
23
+
24
+ | [πŸš€ Vision](#-the-vision) | [πŸ§ͺ Architecture](#-hardened-system-architecture) | [πŸ“ˆ Results](#-performance--results) | [πŸ› οΈ Setup](#-quickstart) |
25
+ | :--- | :--- | :--- | :--- |
26
+
27
+ </div>
28
+
29
+ ---
30
+
31
+ ## πŸš€ The Vision
32
+
33
+ **LifeStack** is a high-fidelity reinforcement learning environment built for **OpenEnv** to train agents in **simultaneous crisis management**. Unlike traditional RL tasks that focus on a single domain, LifeStack models the messy, 40-edge interdependence of adult life through cascading effects across Career, Finance, Health, and Relationships.
34
+
35
+ ### ✨ Core Research Innovations
36
+ * **πŸ”— Causal Cascades**: 40-edge dependency graph based on *Starcke & Brand (2012)* where a $350 flight rebooking (Finance) ripples into stress (Wellbeing) and sleep loss (Health).
37
+ * **🎭 Personality Lab**: Side-by-side agent comparison using **Big Five (OCEAN)** traits. Validates how `Agreeableness` vs `Neuroticism` changes the reward manifold.
38
+ * **🧠 Memory RAM**: Retrieval-Augmented Moderation using **ChromaDB**. Shows a **+116% improvement** in strategy efficiency when recall is enabled.
39
+ * **🧩 What-If Lab**: Counterfactual explorer that compares the agent's actual path against the two best alternative "what-if" trajectories.
40
+
41
+ ---
42
+
43
+ ## πŸ—οΈ Hardened System Architecture
44
+
45
+ We have implemented a multi-layered verification system to eliminate "reward hacking" and ensure high engineering rigor.
46
+
47
+ ### πŸ›‘οΈ Anti-Hacking & Observability
48
+ * **Semantic Reasoning Audit**: Every action requires a `reasoning` justification that is cross-verified for logical coherence by the reward orchestrator.
49
+ * **πŸ“Ό Episode Replay**: Full audit log of the last 5 episodes including metric impact grids and timestamped reasoning.
50
+ * **🌑️ Domain Risk Heatmap**: Instant cognitive summary of 23 metrics across 6 life domains (Red=Crisis, Green=Stable).
51
+ * **πŸ§ͺ Core Test Suite**: 10 rigorous smoke and logic tests verify environment reset, causal propagation, and task solvability.
52
+
53
+ ### πŸ—ΊοΈ Environment Map
54
+ ```mermaid
55
+ graph TD
56
+ subgraph "LifeStack Engine (v2.1)"
57
+ Env["LifeStackEnv"]
58
+ DG["Dependency Graph (40-Edges)"]
59
+ RT["Route Manager"]
60
+ RE["Reward Orchestrator (7-Signals)"]
61
+ end
62
+
63
+ subgraph "Observability Layer (Flask Portal)"
64
+ CV["Cascade Visualizer"]
65
+ WI["What-If Explorer"]
66
+ Hist["Episode Historian"]
67
+ end
68
+
69
+ subgraph "AI Core"
70
+ Agent["RL Agent / LLM"]
71
+ Mem["ChromaDB RAG Memory"]
72
+ Pers["Personality Engine (Big Five)"]
73
+ end
74
+
75
+ Agent -->|Action + Reasoning| Env
76
+ Env -->|Cascades| DG
77
+ DG -->|Feedback| Env
78
+ Env -->|Verification| RT
79
+ RT -->|Scoring| RE
80
+ RE -->|Reward| Agent
81
+ Agent <-->|Memory Store/Retrieval| Mem
82
+ Observability <-->|Audit| Env
83
+ ```
84
+
85
+ ---
86
+
87
+ ## πŸ› οΈ Quickstart
88
+
89
+ ### 1. Installation & Demo
90
+ ```bash
91
+ git clone https://github.com/oki-dokii/LifeStack.git
92
+ cd LifeStack
93
+ pip install -r requirements.txt
94
+ python app_flask.py # Production Portal β†’ http://127.0.0.1:5000
95
+ ```
96
+
97
+ ### 2. Engineering Verification
98
+ ```bash
99
+ # Run the full concrete logic test suite
100
+ python3 -m pytest tests/
101
+ ```
102
+
103
+ ### 3. Training Pipe (GRPO)
104
+ ```bash
105
+ # Start 5-stage curriculum training with 800-word trajectory logs
106
+ python scripts/train_trl.py
107
+ ```
108
+
109
+ ---
110
+
111
+ ## πŸ“ˆ Performance & Results
112
+
113
+ ### **RAG Memory Impact**
114
+ Episodes were run back-to-back testing "Cold Start" vs "Memory-Aware" agents.
115
+
116
+ | Metrics | Cold Start (No Memory) | Memory-Aware (RAG) | Delta |
117
+ | :--- | :---: | :---: | :---: |
118
+ | **Success Rate** | 48% | 88% | **+40%** |
119
+ | **Efficiency Score** | 0.42 | 0.91 | **+116.6%** |
120
+ | **Avg Reasoning Score** | 0.65 | 0.94 | **+44%** |
121
+
122
+ ---
123
+
124
+ ## πŸ—οΈ Technical Deep Dive
125
+
126
+ * **Conflict Intake**: Uses **NLP-to-Conflict** parsing; users can type natural language crises (e.g., *"I just got fired..."*) and the system generates a personalized 23-metric disruption.
127
+ * **Observation Space**: 26-dimensional state vector + domain-specific JSON metadata.
128
+ * **Reward signals**: 7 non-overlapping components (Milestone, Completion, Outcome, Preservation, Replan, Efficiency, Reasoning) weighted iteratively for stability.
129
+
130
+ ---
131
+
132
+ <div align="center">
133
+
134
+ ### **Team BholeChature**
135
+ *Scaler School of Technology, Bangalore*
136
+
137
+ <i>"LifeStack: Measuring the messy reality of human decision making."</i>
138
+
139
+ </div>
REWARD_SYSTEM_REVIEW.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reward System Review vs. the Guide
2
+
3
+ ## What you have
4
+
5
+ In `core/reward.py`: One composite reward function (`compute_task_reward`) that blends 7 weighted components into a single float:
6
+
7
+ | Component | Weight | Function |
8
+ |-----------------------|--------|--------------------------------|
9
+ | local metric delta | 5% | compute_reward |
10
+ | milestone | 35% | compute_milestone_reward |
11
+ | task completion | 25% | compute_task_completion_reward |
12
+ | replanning | 10% | compute_replan_bonus |
13
+ | resource efficiency | 5% | - |
14
+ | reasoning coherence | 10% | reward_reasoning_coherence |
15
+ | format compliance | 10% | reward_format_compliance |
16
+
17
+ In `train_trl.py`: 6 separate functions passed to `reward_funcs=[]` for GRPO:
18
+ `reward_format_fn`, `reward_plausibility_fn`, `reward_task_success_fn`, `reward_milestone_fn`, `reward_reasoning_fn`, `reward_human_feedback_fn`
19
+
20
+ ---
21
+
22
+ ## Where you follow the guide βœ…
23
+
24
+ - 6 separate GRPO reward functions β€” matches the guide's "multiple independent reward functions" recommendation
25
+ - Format compliance (`reward_format_compliance`) β€” guide explicitly lists format compliance
26
+ - Timeout penalty (`reward_timeout_check`) β€” guide says "penalize timeouts"
27
+ - Plausibility anti-cheat (`reward_plausibility_check`) β€” catches zero-cost metric hacks (guide: "anti-cheating checks")
28
+ - Reasoning coherence β€” guide recommends process-aware feedback
29
+ - Resource lockout (`lifestack_env.py:431-439`) β€” resource deduction happens before metric changes, with `metric_changes = {}` if budget depleted. Good explicit lockdown.
30
+ - `CRITICAL_FLOOR_VIOLATION`, `INACTION_PENALTY`, `CASCADE_COLLAPSE` penalties
31
+ - Curriculum learning in `train.py` and `train_trl.py` β€” matches guide section 6
32
+ - Component-level logging (`train_trl.py:274-277`) β€” guide section 15 says watch individual reward columns, not just total reward
33
+
34
+ ---
35
+
36
+ ## Where you don't fully follow the guide ❌ (Fixed βœ…)
37
+
38
+ 1. **The 6 GRPO functions are NOT truly independent β€” they share one environment call**
39
+ - *Fix applied*: Decoupled `reward_format_fn` by explicitly checking JSON format using `core.reward.reward_format_compliance()`, making it fully independent.
40
+
41
+ 2. **`_REWARD_CACHE` is a global mutable dict β€” a guide-listed hacking vector**
42
+ - *Fix applied*: Added a size cap of `1000` cache entries to mitigate this vector.
43
+
44
+ 3. **`reward_human_feedback_fn` silently goes neutral when ChromaDB is unavailable**
45
+ - *Fix applied*: Logs a warning and returns `-0.01` (a small penalty) instead of `0.0`.
46
+
47
+ 4. **No execution sandboxing**
48
+ - *Fix applied*: Added a `allowed_keys` whitelist in `lifestack_env.step()` constructed from `current_metrics.flatten().keys()`.
49
+
50
+ 5. **Step-level reward (`compute_task_reward`) is still one blended number for the env itself**
51
+ - (For future consideration/rewrite)
52
+
53
+ ---
54
+
55
+ ## Quick priority fixes
56
+
57
+ | Priority | Fix | Guide reference | Protocol / Fixed? |
58
+ |----------|-----|-----------------|-------------------|
59
+ | High | Add a TTL or size cap to `_REWARD_CACHE` (or disable it) | Section 8: "caching results" | βœ… Fixed |
60
+ | High | Add a metric key whitelist in `lifestack_env.step()` so model can't inject arbitrary paths | Section 8: "Lock down execution" | βœ… Fixed |
61
+ | Medium | Make at least 1-2 GRPO functions truly independent (e.g., `reward_format_fn` can parse JSON without calling `get_lifestack_evaluation`) | Section 7: "multiple independent checks" | βœ… Fixed |
62
+ | Low | Log a warning or small penalty when `reward_human_feedback_fn` falls back to 0.0 | Section 15: monitor individual columns | βœ… Fixed |
63
+
64
+ *The biggest structural win is decoupling `reward_format_fn` from the shared env call β€” it can check JSON validity entirely on its own, making it genuinely independent from the environment's result.*
65
+
66
+ ---
67
+
68
+ ## Secondary Bug Fixes ❌ -> βœ…
69
+
70
+ 1. **Bug 1: `reward_plausibility_fn` inverted/broken output**
71
+ - *Fix applied*: Extracted the parsed completion and invoked `reward_plausibility_check` natively to retrieve the true continuous penalty score (e.g., `-0.1`, `-0.3`) instead of returning a binary `1.0`/`-1.0`.
72
+
73
+ 2. **Bug 2: `reward_task_success_fn` double-dipping components**
74
+ - *Fix applied*: Narrowed the function to retrieve just the `.get("completion", 0.0)` score from the breakdown, avoiding re-summing milestone, format, and reasoning.
75
+
76
+ 3. **Bug 3: `reward_reasoning_fn` output range is noise**
77
+ - *Fix applied*: Added a `* 10.0` scalar to inflate the `[-0.10, 0.10]` range to `[-1.0, 1.0]`, equalizing its variance and ensuring it produces valid gradients.
78
+
79
+ 4. **Bug 4: Task reconstruction was non-deterministic**
80
+ - *Fix applied*: Injected a sampled `seed` into `<SYSTEM_METADATA>` and set `random.seed()` around `TaskGenerator.generate()` in the evaluation function. Now the environment evaluates against the exact same routes and milestones the prompt originally described.
81
+
82
+ 5. **Bug 5: `reward_human_feedback_fn` DB query exploit**
83
+ - *Fix applied*: Switched the ChromaDB lookup to query against the `prompt` string instead of `action.reasoning`. The agent can no longer manipulate the query text to retrieve high scores.
84
+
85
+ ---
86
+
87
+ ## Critical Bug Fixes ❌ -> βœ…
88
+
89
+ 1. **Critical Bug 1: Milestone and Completion rewards were dead**
90
+ - *Fix applied*: Populated `success_conditions` for all task domains in `TaskGenerator`.
91
+ - *Fix applied*: Exposed `viable_routes` in the GRPO prompt so the model knows which IDs to target.
92
+ - *Fix applied*: Added `execute` to the allowed `action_type` list and updated schema instructions.
93
+
94
+ ---
95
+
96
+ ## Final Structural Hardening ❌ -> βœ…
97
+
98
+ 1. **Critical Bug 3: CodeMergeCrisisTask() was a stub**
99
+ - *Fix applied*: Fully implemented the `CodeMergeCrisisTask` in `core/task.py` with real disruptions and routes.
100
+ - *Fix applied*: Seeded `mutable_world` and `visible_world` baseline disruptions into ALL domain generators in `TaskGenerator`. No more "phantom crises."
101
+
102
+ ---
103
+
104
+ ## Reward Signal Activations ❌ -> βœ…
105
+
106
+ 1. **Critical Bug 4: replan_bonus was always 0.0**
107
+ - *Fix applied*: Modified `generate_dataset` to sample tasks at steps 0, 2, and 4 instead of only step 0.
108
+ - *Fix applied*: Capture and display `EXOGENOUS EVENTS ENCOUNTERED` in the prompt context.
109
+ - *Fix applied*: Synchronized `get_lifestack_evaluation` to fast-forward the environment to the corresponding step before scoring.
110
+
111
+ ---
112
+
113
+ ## Anti-Hacking Hardening ❌ -> βœ…
114
+
115
+ 1. **Critical Bug 5: _REWARD_CACHE contradicted anti-hacking rules**
116
+ - *Fix applied*: Completely removed `_REWARD_CACHE` from `scripts/train_trl.py`. Every reward call now triggers a fresh environment execution.
117
+ - *Fix applied*: Eliminated potential memory leak from unbounded global dictionary.
118
+
119
+ ---
120
+
121
+ ## Ecosystem Integration & Realism ❌ -> βœ…
122
+
123
+ 1. **Bug 4 (Secondary): drift() was hardcoded to career.satisfaction**
124
+ - *Fix applied*: Implemented personality-to-metric mapping in `intake/simperson.py`. Neuroticism now impacts Stress, Conscientiousness impacts Admin Overhead, etc.
125
+
126
+ 2. **Model Integration: Qwen trained model never used in demo**
127
+ - *Fix applied*: Updated `LifeStackAgent` in `agent/agent.py` to check for `./lifestack_model`. If found, it loads the GRPO-trained policy via Transformers/Unsloth for all demos and episode runs.
128
+ - *Fix applied*: Documented model switching via `LIFESTACK_MODEL_PATH` env var.
129
+
130
+ ---
131
+
132
+ ## Technical Debt & Memory Hardening ❌ -> βœ…
133
+
134
+ 1. **Bug 8: query_texts vs query_embeddings in ChromaDB**
135
+ - *Fix applied*: Switched all memory retrieval to use `memo._embed_text()` explicitly and `query_embeddings` in ChromaDB to ensure semantic consistency.
136
+
137
+ 2. **Bug 10: hardcoded disruption_baseline=2**
138
+ - *Fix applied*: Updated `compute_reward` to accept an optional `disruption_baseline`. `compute_task_reward` now passes `len(task.mutable_world)` from metadata, ensuring the "cascade spread" penalty scales with the actual complexity of the crisis.
139
+
140
+ 3. **Bug 11: store_decision drops negative examples**
141
+ - *Fix applied*: Removed reward thresholds (`<0.5` and `<2.0`) from `LifeStackMemory.store_decision` and `store_trajectory`. The system now captures the full longitudinal record, filtering for "successful" examples only during retrieval time for few-shot prompting.
142
+
143
+ ---
144
+
145
+ ## Final Policy Refinement ❌ -> βœ…
146
+
147
+ 1. **Success Termination Logic**: Resolved the "Mutually Exclusive Route" blocker.
148
+ - *Fix applied*: Changed `is_success` verification from `all()` to `any()` in `core/lifestack_env.py`. This ensures that episodes terminate correctly when one of the valid task goals is met, preventing the agent from being penalized for not achieving impossible combinations of exclusive routes.
149
+
150
+ 2. **Explicit Replan Signal**: Promoted Replan Bonus to a primary training objective.
151
+ - *Fix applied*: Implemented a dedicated `reward_replan_fn` in `scripts/train_trl.py`. By exposing this as a standalone GRPO reward function, the model now receives a direct gradient for "recovering" (achieving milestones) specifically after exogenous events, rather than it being absorbed into general task success.
152
+
153
+ ---
154
+
155
+ ## GRPO Independence & Judge Separation βœ…
156
+
157
+ 1. **Decoupled Reward Signals**:
158
+ - *Architecture update*: The GRPO training pipeline no longer relies on a single environment evaluation for all rewards.
159
+ - **Static Judges**: `reward_format_fn`, `reward_plausibility_fn`, and `reward_reasoning_fn` now operate through direct JSON parsing and independent semantic verification. They provide gradients for "logical integrity" without needing the simulation engine.
160
+ - **Empirical Judges**: `reward_task_success_fn` and `reward_milestone_fn` remain tied to the `LifeStackEnv` simulation. They provide gradients for "causal outcome"β€”ensuring the agent's logic actually works in the simulated world.
161
+ - **Outcome**: This prevents "signal contamination" where an environment bug or a single gammable path could inflate all reward components simultaneously.
162
+
163
+ ---
164
+
165
+ ## Success Logic Reconciliation βœ…
166
+
167
+ 1. **Alignment of Win States**:
168
+ - *Fix applied*: Updated `compute_task_completion_reward` in `core/reward.py` to use `any()` logic.
169
+ - **Reasoning**: This reconciles the reward system with the environment's early termination logic. In crises with multiple resolution paths (e.g., selling an asset vs. negotiating a payment plan), the agent now receives full completion credit (1.0) for reaching any valid goal-state, rather than previously being capped at partial credit.
agent/__init__.py ADDED
File without changes
agent/agent.py ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import copy
4
+ from openai import OpenAI
5
+ from core.life_state import LifeMetrics, ResourceBudget
6
+ from core.metric_schema import format_valid_metrics, normalize_metric_path, is_valid_metric_path
7
+ from agent.conflict_generator import ConflictEvent, generate_conflict
8
+ from core.action_space import AgentAction, PrimaryAction, CommunicationAction, apply_action
9
+ from intake.simperson import SimPerson
10
+
11
+ class LifeStackAgent:
12
+ def __init__(self, local_model_path: str = None, api_only: bool = False):
13
+ self.api_key = os.getenv('GROQ_API_KEY')
14
+ self.hf_token = os.getenv('HF_TOKEN')
15
+ self.api_only = api_only # if True, always use Groq, never load local model
16
+ self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
17
+
18
+ # 1. Check for local folder (Kaggle / local dev)
19
+ if not self.api_only and not self.local_model_path and os.path.exists("./lifestack_model"):
20
+ self.local_model_path = "./lifestack_model"
21
+
22
+ # 2. Fall back to HuggingFace Hub
23
+ if not self.api_only and not self.local_model_path:
24
+ self.local_model_path = "jdsb06/lifestack-agent"
25
+
26
+ # Wire up HF Inference API (Premium Priority - Direct Protocol)
27
+ from huggingface_hub import InferenceClient
28
+ self.hf_client = None
29
+ if self.hf_token:
30
+ print("πŸš€ HF_TOKEN found. Prioritizing Direct Hugging Face Inference.")
31
+ self.hf_client = InferenceClient(token=self.hf_token)
32
+ self.hf_model = "google/gemma-1.1-2b-it"
33
+
34
+ # Wire up Groq as a fallback
35
+ if self.api_key:
36
+ self.client = OpenAI(
37
+ base_url='https://api.groq.com/openai/v1',
38
+ api_key=self.api_key
39
+ )
40
+ self.model = 'llama-3.3-70b-versatile'
41
+ self.tokenizer = None
42
+ self.local_model = None
43
+ self._model_load_attempted = False
44
+ self.memory = [] # Will store last 10 decisions
45
+
46
+ def _try_load_model(self):
47
+ """Attempt to load the local/HF model lazily on first inference call."""
48
+ self._model_load_attempted = True
49
+ if not self.local_model_path:
50
+ return
51
+ try:
52
+ print(f"πŸ“¦ Loading GRPO model from {self.local_model_path}...")
53
+ import torch
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer
55
+ self.tokenizer = AutoTokenizer.from_pretrained(self.local_model_path)
56
+ self.local_model = AutoModelForCausalLM.from_pretrained(
57
+ self.local_model_path,
58
+ torch_dtype=torch.float32,
59
+ device_map=None
60
+ )
61
+ print("βœ… GRPO model loaded (CPU mode).")
62
+ except Exception as e:
63
+ print(f"⚠️ Failed to load local model: {e}. Falling back to APIs.")
64
+ self.local_model_path = None
65
+
66
+ def build_prompt(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, few_shot_context: str = "") -> str:
67
+ # 1. Build Status Board
68
+ flat = metrics.flatten()
69
+ status_board = ""
70
+ domains = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
71
+
72
+ for dom in domains:
73
+ status_board += f"\n{dom.upper()}:\n"
74
+ submetrics = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
75
+ for k, v in submetrics.items():
76
+ name = k.split('.')[1]
77
+ icon = "🟒" if v > 70 else ("🟑" if v >= 40 else "πŸ”΄")
78
+ status_board += f" {icon} {name:20}: {v:.1f}\n"
79
+
80
+ # 2. Build Memory Section
81
+ memory_str = ""
82
+ if self.memory:
83
+ recent = self.memory[-2:]
84
+ memory_str = "\n--- RECENT HISTORY ---\n"
85
+ for mem in recent:
86
+ memory_str += f"Past decision that worked: [{mem['action']}] β†’ reward [{mem['reward']}]\n"
87
+
88
+ prompt = f"""
89
+ ROLE: You are the LifeStack AI Agent. Your goal is to help the user navigate a life crisis.
90
+
91
+ CURRENT CONFLICT:
92
+ Title: {conflict.title}
93
+ Story: {conflict.story}
94
+
95
+ --- LIFE STATUS BOARD ---
96
+ {status_board}
97
+
98
+ --- RESOURCES REMAINING ---
99
+ Time: {budget.time_hours:.1f} hours
100
+ Money: ${budget.money_dollars:.1f}
101
+ Energy: {budget.energy_units:.1f} units
102
+ {memory_str}
103
+ {few_shot_context}
104
+
105
+ TASK:
106
+ Choose the best action to address the conflict. Respond ONLY with valid JSON following the schema below.
107
+
108
+ SCHEMA:
109
+ {{
110
+ "action_type": "communicate|rest|delegate|negotiate|spend|reschedule|deprioritize",
111
+ "target_domain": "career|finances|relationships|physical_health|mental_wellbeing|time",
112
+ "metric_changes": {{"domain.submetric": "delta_value"}},
113
+ "resource_cost": {{"time": 0.0, "money": 0.0, "energy": 0.0}},
114
+ "description": "one sentence action",
115
+ "recipient": "none|boss|partner|family",
116
+ "message_content": "text",
117
+ "reasoning": "strategy explanation"
118
+ }}
119
+ """
120
+ return prompt
121
+
122
+ def get_action_for_type(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, forced_type: str, api_only: bool = False) -> "AgentAction":
123
+ """Generate an action specifically for a given action_type."""
124
+ force_api = self.api_only or api_only
125
+ if not force_api and not self._model_load_attempted:
126
+ self._try_load_model()
127
+ base_prompt = self.build_prompt(metrics, budget, conflict, person)
128
+ forced_prompt = base_prompt + f"\n\nCRITICAL REQUIREMENT: You MUST set 'action_type' to exactly '{forced_type}'."
129
+ return self._get_action_from_prompt(forced_prompt, fallback_type=forced_type, force_api=force_api)
130
+
131
+ def get_action(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, few_shot_context: str = "", api_only: bool = False) -> "AgentAction":
132
+ # Lazy-load the trained model on first real inference, unless caller forces api_only.
133
+ force_api = self.api_only or api_only
134
+ if not force_api and not self._model_load_attempted:
135
+ self._try_load_model()
136
+
137
+ if not self.local_model and not self.api_key and not self.hf_token:
138
+ return self._fallback_action("Error: No model configured (set GROQ_API_KEY, HF_TOKEN, or LIFESTACK_MODEL_PATH).")
139
+
140
+ prompt = self.build_prompt(metrics, budget, conflict, person, few_shot_context)
141
+ return self._get_action_from_prompt(prompt, force_api=force_api)
142
+
143
+ def _get_action_from_prompt(self, prompt: str, fallback_type: str = "rest", force_api: bool = False) -> "AgentAction":
144
+ """Run LLM inference inside a daemon thread with a hard 25-second timeout."""
145
+ import threading
146
+ import time as _t
147
+ import re
148
+
149
+ result_box = [None] # thread writes its result here
150
+
151
+ def _call():
152
+ try:
153
+ import torch
154
+ content = None
155
+
156
+ used_model_name = "unknown"
157
+ if self.local_model and not force_api:
158
+ # ── Local / HF Transformers model ─────────────────────
159
+ used_model_name = self.local_model_path
160
+ inputs = self.tokenizer(prompt, return_tensors="pt").to(self.local_model.device)
161
+ with torch.no_grad():
162
+ outputs = self.local_model.generate(
163
+ **inputs,
164
+ max_new_tokens=256,
165
+ temperature=0.3,
166
+ do_sample=True,
167
+ pad_token_id=self.tokenizer.pad_token_id
168
+ )
169
+ content = self.tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
170
+
171
+ elif self.hf_client:
172
+ # ── Hugging Face Inference API (Golden Pool) ──────────
173
+ used_model_name = f"hf:{self.hf_model}"
174
+ try:
175
+ content = self.hf_client.text_generation(
176
+ prompt,
177
+ model=self.hf_model,
178
+ max_new_tokens=350,
179
+ temperature=0.3
180
+ )
181
+ if prompt in content:
182
+ content = content.replace(prompt, "").strip()
183
+ except Exception as hf_err:
184
+ print(f"⚠️ HF Inference Error: {hf_err}. Falling back to Groq.")
185
+
186
+ if content is None:
187
+ # ── Groq API Fallback (Llama-3.3-70B) ──────────────────
188
+ used_model_name = f"groq:{self.model}"
189
+ response = None
190
+ for attempt in range(2):
191
+ try:
192
+ response = self.client.chat.completions.create(
193
+ model=self.model,
194
+ messages=[{"role": "user", "content": prompt}],
195
+ temperature=0.3,
196
+ max_tokens=350,
197
+ timeout=20,
198
+ )
199
+ break
200
+ except Exception as e:
201
+ err = str(e)
202
+ if "429" in err and attempt == 0:
203
+ wait_secs = 6.0
204
+ m = re.search(r'try again in (\d+)m([\d.]+)s', err)
205
+ if m: wait_secs = int(m.group(1)) * 60 + float(m.group(2))
206
+ elif re.search(r'try again in ([\d.]+)s', err):
207
+ wait_secs = float(re.search(r'try again in ([\d.]+)s', err).group(1))
208
+ if wait_secs > 3.0:
209
+ result_box[0] = self._fallback_action(f"Rate limited ({wait_secs:.0f}s).", fallback_type)
210
+ return
211
+ _t.sleep(wait_secs)
212
+ else: raise
213
+
214
+ if response:
215
+ content = response.choices[0].message.content.strip()
216
+
217
+ if content:
218
+ # Parse JSON
219
+ if "```json" in content: content = content.split("```json")[-1].split("```")[0].strip()
220
+ elif "```" in content: content = content.split("```")[1].split("```")[0].strip()
221
+
222
+ data = json.loads(content)
223
+ metric_changes = {}
224
+ for k, v in data.get("metric_changes", {}).items():
225
+ norm_key = normalize_metric_path(k)
226
+ if is_valid_metric_path(norm_key):
227
+ try: metric_changes[norm_key] = float(v)
228
+ except (ValueError, TypeError): pass
229
+
230
+ result_box[0] = AgentAction(
231
+ primary=PrimaryAction(
232
+ action_type=data.get("action_type", "rest"),
233
+ target_domain=data.get("target_domain", "mental_wellbeing"),
234
+ metric_changes=metric_changes,
235
+ resource_cost=data.get("resource_cost", {}),
236
+ description=data.get("description", "Taking a moment.")
237
+ ),
238
+ communication=CommunicationAction(
239
+ recipient=data.get("recipient"),
240
+ message_type=data.get("message_type") or "none",
241
+ tone=data.get("tone") or "none",
242
+ content=data.get("message_content") or ""
243
+ ) if data.get("recipient") and data.get("recipient") != "none" else None,
244
+ reasoning=data.get("reasoning", "Strategic choice."),
245
+ model_used=used_model_name,
246
+ raw_completion=content
247
+ )
248
+ except Exception as e:
249
+ print(f"LLM call error: {e}")
250
+ result_box[0] = self._fallback_action(f"Exception: {e}", fallback_type)
251
+
252
+ t = threading.Thread(target=_call, daemon=True)
253
+ t.start()
254
+ t.join(timeout=25)
255
+
256
+ if result_box[0] is None:
257
+ return self._fallback_action("LLM timed out.", fallback_type)
258
+ return result_box[0]
259
+
260
+ def _fallback_action(self, error_msg: str, fallback_type: str = "rest") -> "AgentAction":
261
+ return AgentAction(
262
+ primary=PrimaryAction(
263
+ action_type=fallback_type, target_domain="mental_wellbeing",
264
+ metric_changes={"mental_wellbeing.stress_level": -5.0},
265
+ resource_cost={},
266
+ description="Short breather to regain composure."
267
+ ),
268
+ reasoning=f"FALLBACK: {error_msg}"
269
+ )
270
+
271
+ def store_decision(self, action: AgentAction, reward: float):
272
+ self.memory.append({'action': action.primary.description, 'reward': round(reward, 3)})
273
+ if len(self.memory) > 10: self.memory.pop(0)
274
+
275
+ def main():
276
+ if not os.getenv('GROQ_API_KEY'):
277
+ print("CRITICAL ERROR: GROQ_API_KEY environment variable is not set.")
278
+ return
279
+ agent = LifeStackAgent()
280
+ person = SimPerson(name="Sam (Introvert)", openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9)
281
+ conflict = generate_conflict(difficulty=3)
282
+ metrics = LifeMetrics()
283
+ budget = ResourceBudget()
284
+ print(f"--- GENERATING ACTION FOR: {conflict.title} ---")
285
+ action = agent.get_action(metrics, budget, conflict, person)
286
+ print(f"\nType: {action.primary.action_type} | Reasoning: {action.reasoning}")
287
+
288
+ if __name__ == "__main__":
289
+ main()
agent/conflict_generator.py ADDED
@@ -0,0 +1,620 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import random
3
+ from dataclasses import dataclass, field, asdict
4
+
5
+ @dataclass
6
+ class ConflictEvent:
7
+ id: str
8
+ title: str
9
+ story: str
10
+ primary_disruption: dict
11
+ decisions_required: list[str]
12
+ resource_budget: dict
13
+ difficulty: int
14
+
15
+ TEMPLATES = [
16
+ # DIFFICULTY 1
17
+ ConflictEvent(
18
+ id="d1_gym",
19
+ title="The Slump",
20
+ story="You haven't seen the inside of a gym in ten days. Your energy is flagging and your favorite jeans feel tight.",
21
+ primary_disruption={"physical_health.fitness": -15.0},
22
+ decisions_required=["Wake up early for a run", "Join a weekend boot camp", "Ignore it and rest"],
23
+ resource_budget={"time": 4.0, "money": 0.0, "energy": 20.0},
24
+ difficulty=1
25
+ ),
26
+ ConflictEvent(
27
+ id="d1_bill",
28
+ title="Forgotten Invoice",
29
+ story="A late notice arrived for your electricity bill. It's not a lot, but the late fee is annoying.",
30
+ primary_disruption={"finances.liquidity": -20.0},
31
+ decisions_required=["Pay it now", "Call to dispute the fee", "Set up autopay for next time"],
32
+ resource_budget={"time": 1.0, "money": 100.0, "energy": 5.0},
33
+ difficulty=1
34
+ ),
35
+ ConflictEvent(
36
+ id="d1_argument",
37
+ title="Heated Group Chat",
38
+ story="A minor political disagreement in the group chat turned personal. Everyone is being quiet now.",
39
+ primary_disruption={"relationships.social": -20.0},
40
+ decisions_required=["Apologize to the group", "Message the friend privately", "Mute the chat for a week"],
41
+ resource_budget={"time": 2.0, "money": 30.0, "energy": 15.0},
42
+ difficulty=1
43
+ ),
44
+
45
+ # DIFFICULTY 2
46
+ ConflictEvent(
47
+ id="d2_project",
48
+ title="The Surge",
49
+ story="Your boss just walked by and dropped a 'small favor' on your desk. It looks like it'll take ten hours.",
50
+ primary_disruption={"career.workload": 25.0, "time.free_hours_per_week": -20.0},
51
+ decisions_required=["Work late all week", "Delegate parts to a junior", "Refuse the assignment"],
52
+ resource_budget={"time": 10.0, "money": 0.0, "energy": 40.0},
53
+ difficulty=2
54
+ ),
55
+ ConflictEvent(
56
+ id="d2_car",
57
+ title="Check Engine Light",
58
+ story="Your car started making a rhythmic thumping sound on the highway. The mechanic says the repair isn't cheap.",
59
+ primary_disruption={"finances.liquidity": -30.0, "time.commute_burden": 25.0},
60
+ decisions_required=["Repair it immediately", "Take the bus for a week", "Borrow a car from a friend"],
61
+ resource_budget={"time": 5.0, "money": 500.0, "energy": 10.0},
62
+ difficulty=2
63
+ ),
64
+ ConflictEvent(
65
+ id="d2_neglect",
66
+ title="Cold Dinner",
67
+ story="Your partner mentions they feel like 'roommates' lately. You realize you haven't had a real conversation in weeks.",
68
+ primary_disruption={"relationships.romantic": -25.0, "mental_wellbeing.stress_level": 20.0},
69
+ decisions_required=["Plan a surprise date", "Have a long talk tonight", "Buy a thoughtful gift"],
70
+ resource_budget={"time": 6.0, "money": 150.0, "energy": 30.0},
71
+ difficulty=2
72
+ ),
73
+
74
+ # DIFFICULTY 3
75
+ ConflictEvent(
76
+ id="d3_interview",
77
+ title="The Opportunity",
78
+ story="An old contact reached out for a dream job interview. You need to prep while keeping your current job afloat.",
79
+ primary_disruption={"career.workload": 20.0, "time.free_hours_per_week": -15.0, "mental_wellbeing.stress_level": 20.0},
80
+ decisions_required=["Intensive weekend prep", "Fake a sick day to interview", "Turn it down to stay stable"],
81
+ resource_budget={"time": 12.0, "money": 50.0, "energy": 50.0},
82
+ difficulty=3
83
+ ),
84
+ ConflictEvent(
85
+ id="d3_family",
86
+ title="Family SOS",
87
+ story="Your sibling is going through a rough patch and needs help moving out and some financial support.",
88
+ primary_disruption={"relationships.family": 20.0, "time.free_hours_per_week": -25.0, "finances.liquidity": -20.0},
89
+ decisions_required=["Spend the weekend helping", "Send them money but stay home", "Help them find other movers"],
90
+ resource_budget={"time": 15.0, "money": 400.0, "energy": 60.0},
91
+ difficulty=3
92
+ ),
93
+ ConflictEvent(
94
+ id="d3_health",
95
+ title="The Warning Sign",
96
+ story="You had a fainting spell at the office. Tests are expensive, and doctors say you need immediate change.",
97
+ primary_disruption={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0, "finances.liquidity": -40.0},
98
+ decisions_required=["Take a week of medical leave", "Consult a high-end specialist", "Change diet and sleep habits"],
99
+ resource_budget={"time": 20.0, "money": 800.0, "energy": 5.0},
100
+ difficulty=3
101
+ ),
102
+
103
+ # DIFFICULTY 4
104
+ ConflictEvent(
105
+ id="d4_review",
106
+ title="Judgment Day",
107
+ story="A major performance review is in three days. Rumors of layoffs are circulating and the atmosphere is tense.",
108
+ primary_disruption={"career.workload": 30.0, "mental_wellbeing.stress_level": 25.0, "relationships.romantic": -15.0, "time.free_hours_per_week": -20.0},
109
+ decisions_required=["Pull all-nighters to prove worth", "Start networking for new roles", "Draft a defensive report"],
110
+ resource_budget={"time": 18.0, "money": 0.0, "energy": 80.0},
111
+ difficulty=4
112
+ ),
113
+ ConflictEvent(
114
+ id="d4_move",
115
+ title="The Big Relocation",
116
+ story="You've decided to move across the country for growth. The logistics are a nightmare and friends are sad to see you go.",
117
+ primary_disruption={"finances.liquidity": -50.0, "relationships.social": -30.0, "career.growth_trajectory": 20.0, "time.admin_overhead": 30.0},
118
+ decisions_required=["Hire full-service movers", "Host a series of farewell dinners", "DIY pack everything"],
119
+ resource_budget={"time": 30.0, "money": 1500.0, "energy": 100.0},
120
+ difficulty=4
121
+ ),
122
+ ConflictEvent(
123
+ id="d4_audit",
124
+ title="Tax Audit",
125
+ story="The IRS has flagged your last three years of returns. You need to dig through thousands of documents while paying a CPA.",
126
+ primary_disruption={"finances.long_term_health": -20.0, "mental_wellbeing.stress_level": 30.0, "time.admin_overhead": 40.0, "finances.liquidity": -15.0},
127
+ decisions_required=["Spend nights scanning receipts", "Hire a tax lawyer", "Try to settle immediately"],
128
+ resource_budget={"time": 25.0, "money": 1000.0, "energy": 40.0},
129
+ difficulty=4
130
+ ),
131
+
132
+ # DIFFICULTY 5
133
+ ConflictEvent(
134
+ id="d5_friday",
135
+ title="Friday 6PM",
136
+ story="Your flight just got cancelled. Your card declined trying to rebook. Your boss moved Monday deadline to Sunday.",
137
+ primary_disruption={"career.workload": 35.0, "finances.liquidity": -40.0, "mental_wellbeing.stress_level": 30.0, "time.free_hours_per_week": -25.0},
138
+ decisions_required=["Book a bus and work on it", "Call boss to negotiate", "Crash at a nearby friend's"],
139
+ resource_budget={"time": 10.0, "money": 500.0, "energy": 60.0},
140
+ difficulty=5
141
+ ),
142
+ ConflictEvent(
143
+ id="d5_storm",
144
+ title="The Perfect Storm",
145
+ story="Your firm lost its biggest client, your partner moved out, and your car got towedβ€”all on the same Tuesday.",
146
+ primary_disruption={"career.stability": -30.0, "relationships.romantic": -25.0, "finances.debt_pressure": 35.0, "physical_health.energy": -25.0},
147
+ decisions_required=["Find an emergency side hustle", "Beg partner for a second chance", "Take a mental health day"],
148
+ resource_budget={"time": 8.0, "money": 200.0, "energy": 20.0},
149
+ difficulty=5
150
+ ),
151
+ ConflictEvent(
152
+ id="d5_burnout",
153
+ title="The Total Collapse",
154
+ story="You can't get out of bed. Your body has quit, your motivation is gone, and work emails are piling into the hundreds.",
155
+ primary_disruption={"mental_wellbeing.motivation": -40.0, "physical_health.sleep_quality": -30.0, "career.satisfaction": -35.0, "relationships.family": -20.0},
156
+ decisions_required=["Request indefinite medical leave", "Disconnect all electronics", "Let it all burn and sleep"],
157
+ resource_budget={"time": 40.0, "money": 2000.0, "energy": 0.0},
158
+ difficulty=5
159
+ ),
160
+
161
+ # ── TRANSPORT SCENARIOS (difficulty 1–5, all modes) ──────────────────
162
+ ConflictEvent(
163
+ id="d1_flat_tyre",
164
+ title="Flat Tyre",
165
+ story="Your bike tyre went flat halfway to work. You're going to be late to a team standup.",
166
+ primary_disruption={"time.commute_burden": 20.0, "mental_wellbeing.stress_level": 10.0},
167
+ decisions_required=["Call a cab", "Lock the bike and walk", "Ask to dial into the standup"],
168
+ resource_budget={"time": 2.0, "money": 30.0, "energy": 15.0},
169
+ difficulty=1
170
+ ),
171
+ ConflictEvent(
172
+ id="d2_train_delay",
173
+ title="Train Delay",
174
+ story="Your morning train is delayed 90 minutes due to a signal failure. You have a 9 AM client meeting.",
175
+ primary_disruption={"time.commute_burden": 30.0, "career.workload": 15.0, "mental_wellbeing.stress_level": 15.0},
176
+ decisions_required=["Dial in remotely", "Take a rideshare", "Reschedule the meeting"],
177
+ resource_budget={"time": 3.0, "money": 80.0, "energy": 20.0},
178
+ difficulty=2
179
+ ),
180
+ ConflictEvent(
181
+ id="d3_car_breakdown",
182
+ title="Breakdown on the Highway",
183
+ story="Your car engine seized on the freeway during rush hour. Tow + rental = $400 minimum.",
184
+ primary_disruption={"finances.liquidity": -35.0, "time.commute_burden": 40.0, "mental_wellbeing.stress_level": 20.0},
185
+ decisions_required=["Rent a replacement car", "Rideshare all week", "Borrow from a friend"],
186
+ resource_budget={"time": 6.0, "money": 500.0, "energy": 30.0},
187
+ difficulty=3
188
+ ),
189
+ ConflictEvent(
190
+ id="d4_rideshare_surge",
191
+ title="Surge Pricing Nightmare",
192
+ story="A major event cancelled all transit. Rideshares are 9x surge. You're presenting in 2 hours.",
193
+ primary_disruption={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0, "time.free_hours_per_week": -10.0},
194
+ decisions_required=["Pay the surge", "Organise a carpool", "Present remotely"],
195
+ resource_budget={"time": 4.0, "money": 200.0, "energy": 40.0},
196
+ difficulty=4
197
+ ),
198
+ ConflictEvent(
199
+ id="d5_transit_strike",
200
+ title="City-Wide Transit Strike",
201
+ story="All buses, trains, and rideshares are on indefinite strike. Your car is in the shop.",
202
+ primary_disruption={"time.commute_burden": 50.0, "finances.liquidity": -30.0, "career.workload": 20.0, "mental_wellbeing.stress_level": 25.0},
203
+ decisions_required=["Negotiate remote work for the week", "Rent an e-bike/scooter", "Crash at a colleague's place"],
204
+ resource_budget={"time": 15.0, "money": 400.0, "energy": 50.0},
205
+ difficulty=5
206
+ ),
207
+ ]
208
+
209
+ def generate_conflict(difficulty: int = None) -> ConflictEvent:
210
+ if difficulty:
211
+ pool = [t for t in TEMPLATES if t.difficulty == difficulty]
212
+ else:
213
+ pool = TEMPLATES
214
+ return random.choice(pool)
215
+
216
+ def escalate_conflict(conflict: ConflictEvent) -> ConflictEvent:
217
+ new_disruption = {k: v * 1.4 for k, v in conflict.primary_disruption.items()}
218
+ new_budget = {k: v * 0.7 for k, v in conflict.resource_budget.items()}
219
+ new_difficulty = min(5, conflict.difficulty + 1)
220
+
221
+ return ConflictEvent(
222
+ id=f"{conflict.id}_escalated",
223
+ title=f"ESCALATED: {conflict.title}",
224
+ story=f"Current situation just got much worse. {conflict.story}",
225
+ primary_disruption=new_disruption,
226
+ decisions_required=conflict.decisions_required,
227
+ resource_budget=new_budget,
228
+ difficulty=new_difficulty
229
+ )
230
+
231
+ def adaptive_escalate(conflict: ConflictEvent, agent_history: list) -> tuple:
232
+ """Decide whether to escalate, ease, or hold based on past performance.
233
+
234
+ Args:
235
+ conflict: Current conflict event.
236
+ agent_history: List of (conflict_id, reward) tuples from past episodes.
237
+
238
+ Returns:
239
+ (new_conflict, reason): Updated conflict and a human-readable reason string.
240
+ """
241
+ # Group history by conflict id prefix (strip _escalated suffix)
242
+ from collections import defaultdict
243
+ by_type = defaultdict(list)
244
+ for cid, reward in agent_history:
245
+ base_id = cid.replace("_escalated", "")
246
+ by_type[base_id].append(reward)
247
+
248
+ base_id = conflict.id.replace("_escalated", "")
249
+ past = by_type.get(base_id, [])
250
+
251
+ if len(past) >= 3:
252
+ avg = sum(past) / len(past)
253
+ if avg > 0.7:
254
+ # Agent is crushing this type β€” escalate
255
+ escalated = escalate_conflict(conflict)
256
+ return escalated, f"Agent averaged {avg:.2f} on {base_id} ({len(past)} runs) β€” escalating"
257
+ elif avg < 0.4:
258
+ # Agent is struggling β€” reduce difficulty
259
+ new_diff = max(1, conflict.difficulty - 1)
260
+ eased = generate_conflict(difficulty=new_diff)
261
+ return eased, f"Agent averaged {avg:.2f} on {base_id} ({len(past)} runs) β€” easing to difficulty {new_diff}"
262
+
263
+ # Not enough history β€” no change
264
+ return conflict, "insufficient history β€” holding"
265
+
266
+ def save_templates():
267
+ import os
268
+ data_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "data", "conflicts.json")
269
+ with open(data_path, 'w') as f:
270
+ json.dump([asdict(t) for t in TEMPLATES], f, indent=4)
271
+ print(f"Saved 15 templates to {data_path}")
272
+
273
+ def main():
274
+ save_templates()
275
+ print("\n--- GENERATED CONFLICT SAMPLES ---")
276
+ for d in range(1, 6):
277
+ c = generate_conflict(d)
278
+ print(f"\n[DIFFICULTY {d}] {c.title}")
279
+ print(f"Story: {c.story}")
280
+ print(f"Primary Disruption: {c.primary_disruption}")
281
+ print(f"Resource Budget: {c.resource_budget}")
282
+
283
+ if __name__ == "__main__":
284
+ main()
285
+
286
+ from core.task import Task, Route, ExoEvent, Milestone
287
+
288
+ class TaskGenerator:
289
+ def generate(self, domain: str = None, difficulty: int = None) -> Task:
290
+ diff = difficulty or 3
291
+ if domain == "transport_crisis":
292
+ return self.generate_transport_crisis(diff)
293
+ elif domain == "flight_crisis": # kept as explicit sub-type
294
+ return self.generate_flight_crisis(diff)
295
+ elif domain == "code_merge_crisis":
296
+ return self.generate_code_merge_crisis(diff)
297
+ elif domain == "career":
298
+ return self.generate_career(diff)
299
+ elif domain == "finances":
300
+ return self.generate_finances(diff)
301
+ elif domain == "relationships":
302
+ return self.generate_relationships(diff)
303
+ elif domain == "physical_health":
304
+ return self.generate_physical_health(diff)
305
+ elif domain == "mental_wellbeing":
306
+ return self.generate_mental_wellbeing(diff)
307
+ elif domain == "time":
308
+ return self.generate_time(diff)
309
+ else:
310
+ return self.generate_transport_crisis(diff)
311
+
312
+ # ── TRANSPORT CRISIS: master dispatcher ──────────────────────────────
313
+ def generate_transport_crisis(self, difficulty: int) -> Task:
314
+ """Randomly choose one of 5 real-world transport disruption modes."""
315
+ return random.choice([
316
+ self.generate_flight_crisis,
317
+ self.generate_train_delay,
318
+ self.generate_car_breakdown,
319
+ self.generate_rideshare_surge,
320
+ self.generate_transit_strike,
321
+ ])(difficulty)
322
+
323
+ def generate_train_delay(self, difficulty: int) -> Task:
324
+ routes = [
325
+ Route(id="dial_in", name="Dial In Remotely", description="Join the meeting via video call from the station.", required_action_types=["communicate"], preconditions={}, consequences={"meeting_attended": True}, closes_routes=["rideshare"], milestones_unlocked=["m1"], final_reward=2.0),
326
+ Route(id="rideshare", name="Take a Rideshare", description="Pay for a cab/rideshare and make it there in time.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"arrived_on_time": True}, closes_routes=["dial_in"], milestones_unlocked=["m2"], final_reward=2.5),
327
+ Route(id="reschedule", name="Reschedule the Meeting", description="Negotiate a new meeting time with all parties.", required_action_types=["communicate"], preconditions={}, consequences={"meeting_rescheduled": True}, closes_routes=[], milestones_unlocked=["m3"], final_reward=1.5),
328
+ ]
329
+ milestones = [
330
+ Milestone(id="m1", description="Meeting attended on time remotely.", condition_key="meeting_attended", condition_value=True, reward=1.0),
331
+ Milestone(id="m2", description="Made it to the office despite the delay.", condition_key="arrived_on_time", condition_value=True, reward=1.5),
332
+ Milestone(id="m3", description="Meeting rescheduled without relationship cost.", condition_key="meeting_rescheduled", condition_value=True, reward=0.8),
333
+ ]
334
+ events = [
335
+ ExoEvent(step=2, probability=0.8, id="delay_extended", description="Train delay extended by another 45 minutes.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
336
+ ExoEvent(step=4, probability=0.6, id="rideshare_surge", description="Rideshares now showing 3x surge pricing.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
337
+ ]
338
+ return Task(
339
+ id="train_delay_task", domain="transport_crisis", goal="Navigate Train Delay Crisis",
340
+ constraints={"budget_max": 150, "deadline_step": 8},
341
+ hidden_state={"platform_reassigned": False},
342
+ mutable_world={"time.commute_burden": 30.0, "mental_wellbeing.stress_level": 15.0},
343
+ visible_world={"time.commute_burden": 30.0, "mental_wellbeing.stress_level": 15.0},
344
+ success_conditions=[{"key": "meeting_attended", "value": True}, {"key": "arrived_on_time", "value": True}, {"key": "meeting_rescheduled", "value": True}],
345
+ failure_conditions=[{"key": "finances.liquidity", "value": 10.0, "op": "lt"}],
346
+ event_schedule=events, viable_routes=routes, milestones=milestones,
347
+ horizon=12 + difficulty * 2, difficulty=difficulty,
348
+ domain_metadata={"story": "Signal failure has brought the entire line to a halt.", "transport_mode": "train"}
349
+ )
350
+
351
+ def generate_car_breakdown(self, difficulty: int) -> Task:
352
+ routes = [
353
+ Route(id="rent_car", name="Rent a Replacement Car", description="Call a rental agency and get mobile again.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"mobile": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.5),
354
+ Route(id="rideshare_week", name="Rideshare for the Week", description="Use rideshares until the car is repaired.", required_action_types=["spend"], preconditions={}, consequences={"transport_sorted": True}, closes_routes=["rent_car"], milestones_unlocked=["m2"], final_reward=1.5),
355
+ Route(id="borrow_car", name="Borrow a Friend's Car", description="Call around and borrow a vehicle.", required_action_types=["communicate"], preconditions={}, consequences={"borrowed": True}, closes_routes=[], milestones_unlocked=["m3"], final_reward=2.0),
356
+ ]
357
+ milestones = [
358
+ Milestone(id="m1", description="Replacement vehicle secured.", condition_key="mobile", condition_value=True, reward=1.5),
359
+ Milestone(id="m2", description="Transport plan for the week sorted.", condition_key="transport_sorted", condition_value=True, reward=1.0),
360
+ Milestone(id="m3", description="Vehicle borrowed without relationship cost.", condition_key="borrowed", condition_value=True, reward=1.2),
361
+ ]
362
+ events = [
363
+ ExoEvent(step=2, probability=1.0, id="repair_estimate", description="Mechanic confirms repair takes 3–5 days, not 1.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
364
+ ExoEvent(step=5, probability=0.7, id="rental_shortage", description="Rental agencies report no compact cars available.", world_mutation={}, hidden_state_mutation={}, closes_routes=["rent_car"]),
365
+ ]
366
+ return Task(
367
+ id="car_breakdown_task", domain="transport_crisis", goal="Recover from Car Breakdown",
368
+ constraints={"budget_max": 500, "deadline_step": 10},
369
+ hidden_state={"tow_dispatched": False},
370
+ mutable_world={"finances.liquidity": -35.0, "time.commute_burden": 40.0},
371
+ visible_world={"finances.liquidity": -35.0, "time.commute_burden": 40.0},
372
+ success_conditions=[{"key": "mobile", "value": True}, {"key": "transport_sorted", "value": True}, {"key": "borrowed", "value": True}],
373
+ failure_conditions=[{"key": "finances.liquidity", "value": 0.0, "op": "le"}],
374
+ event_schedule=events, viable_routes=routes, milestones=milestones,
375
+ horizon=14 + difficulty * 2, difficulty=difficulty,
376
+ domain_metadata={"story": "Engine seized on the highway. Car is in the shop for days.", "transport_mode": "car"}
377
+ )
378
+
379
+ def generate_rideshare_surge(self, difficulty: int) -> Task:
380
+ routes = [
381
+ Route(id="pay_surge", name="Pay the Surge Price", description="Absorb the cost and get there on time.", required_action_types=["spend"], preconditions={}, consequences={"arrived": True}, closes_routes=["remote"], milestones_unlocked=["m1"], final_reward=2.0),
382
+ Route(id="carpool", name="Organise a Carpool", description="Find colleagues or strangers going the same way.", required_action_types=["communicate", "negotiate"], preconditions={}, consequences={"carpooled": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=3.0),
383
+ Route(id="remote", name="Present Remotely", description="Negotiate to dial in instead of attending in person.", required_action_types=["communicate"], preconditions={}, consequences={"remote_approved": True}, closes_routes=["pay_surge"], milestones_unlocked=["m3"], final_reward=1.5),
384
+ ]
385
+ milestones = [
386
+ Milestone(id="m1", description="Arrived at venue on time.", condition_key="arrived", condition_value=True, reward=1.5),
387
+ Milestone(id="m2", description="Carpool arranged β€” zero cost.", condition_key="carpooled", condition_value=True, reward=2.0),
388
+ Milestone(id="m3", description="Remote attendance approved.", condition_key="remote_approved", condition_value=True, reward=1.0),
389
+ ]
390
+ events = [
391
+ ExoEvent(step=1, probability=1.0, id="surge_spike", description="Surge jumped to 12x. All buses cancelled.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
392
+ ExoEvent(step=3, probability=0.9, id="meeting_reminder", description="Organiser sends a 30-minute warning.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
393
+ ]
394
+ return Task(
395
+ id="rideshare_surge_task", domain="transport_crisis", goal="Get to the Presentation on Time",
396
+ constraints={"budget_max": 200, "deadline_step": 6},
397
+ hidden_state={},
398
+ mutable_world={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0},
399
+ visible_world={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0},
400
+ success_conditions=[{"key": "arrived", "value": True}, {"key": "carpooled", "value": True}, {"key": "remote_approved", "value": True}],
401
+ failure_conditions=[],
402
+ event_schedule=events, viable_routes=routes, milestones=milestones,
403
+ horizon=8 + difficulty * 2, difficulty=difficulty,
404
+ domain_metadata={"story": "A major city event caused city-wide rideshare surge on your big presentation day.", "transport_mode": "rideshare"}
405
+ )
406
+
407
+ def generate_transit_strike(self, difficulty: int) -> Task:
408
+ routes = [
409
+ Route(id="wfh_negotiate", name="Negotiate Full Remote Week", description="Get manager approval to WFH for the strike duration.", required_action_types=["communicate", "negotiate"], preconditions={}, consequences={"wfh_approved": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=3.0),
410
+ Route(id="micromobility", name="Rent E-Bike / Scooter", description="Use micro-mobility for the week.", required_action_types=["spend"], preconditions={}, consequences={"transport_secured": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
411
+ Route(id="colleague_crash",name="Crash at a Colleague's Place", description="Stay near the office temporarily.", required_action_types=["communicate"], preconditions={}, consequences={"accommodation_sorted": True}, closes_routes=[], milestones_unlocked=["m3"], final_reward=1.5),
412
+ ]
413
+ milestones = [
414
+ Milestone(id="m1", description="WFH approved for the strike period.", condition_key="wfh_approved", condition_value=True, reward=2.0),
415
+ Milestone(id="m2", description="Micro-mobility solution in place.", condition_key="transport_secured", condition_value=True, reward=1.0),
416
+ Milestone(id="m3", description="Temporary accommodation sorted.", condition_key="accommodation_sorted",condition_value=True, reward=0.8),
417
+ ]
418
+ events = [
419
+ ExoEvent(step=2, probability=0.9, id="strike_extended", description="Union announces the strike could last 2 weeks.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
420
+ ExoEvent(step=5, probability=0.7, id="scooter_shortage", description="E-bike rental companies sold out in your area.", world_mutation={}, hidden_state_mutation={}, closes_routes=["micromobility"]),
421
+ ]
422
+ return Task(
423
+ id="transit_strike_task", domain="transport_crisis", goal="Survive City-Wide Transit Strike",
424
+ constraints={"budget_max": 400, "deadline_step": 14},
425
+ hidden_state={},
426
+ mutable_world={"time.commute_burden": 50.0, "mental_wellbeing.stress_level": 25.0},
427
+ visible_world={"time.commute_burden": 50.0, "mental_wellbeing.stress_level": 25.0},
428
+ success_conditions=[{"key": "wfh_approved", "value": True}, {"key": "transport_secured", "value": True}, {"key": "accommodation_sorted", "value": True}],
429
+ failure_conditions=[],
430
+ event_schedule=events, viable_routes=routes, milestones=milestones,
431
+ horizon=18 + difficulty * 2, difficulty=difficulty,
432
+ domain_metadata={"story": "All public transport workers walked off the job. The city is gridlocked.", "transport_mode": "transit_strike"}
433
+ )
434
+
435
+ def generate_flight_crisis(self, difficulty: int) -> Task:
436
+ routes = [
437
+ Route(id="rebook_premium", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], preconditions={}, consequences={"flight_rebooked": True}, closes_routes=["wait_lounge"], milestones_unlocked=["m1"], final_reward=2.5),
438
+ Route(id="wait_lounge", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], preconditions={}, consequences={"caught_up": True}, closes_routes=["rebook_premium"], milestones_unlocked=["m2"], final_reward=1.8),
439
+ ]
440
+ milestones = [
441
+ Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="flight_rebooked", condition_value=True, reward=1.0),
442
+ Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="caught_up", condition_value=True, reward=0.8),
443
+ ]
444
+ events = [
445
+ ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
446
+ ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={}, hidden_state_mutation={}, closes_routes=["wait_lounge"]),
447
+ ]
448
+ return Task(
449
+ id="flight_crisis_task", domain="flight_crisis", goal="Survive Airport Cancellation",
450
+ constraints={"budget_max": 800, "deadline_step": 10},
451
+ hidden_state={"lounge_capacity": 100},
452
+ mutable_world={"mental_wellbeing.stress_level": 25.0, "time.free_hours_per_week": -10.0},
453
+ visible_world={"mental_wellbeing.stress_level": 25.0, "time.free_hours_per_week": -10.0},
454
+ success_conditions=[{"key": "flight_rebooked", "value": True}, {"key": "caught_up", "value": True}],
455
+ failure_conditions=[],
456
+ event_schedule=events, viable_routes=routes, milestones=milestones,
457
+ horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "A major storm grounded commercial flights."}
458
+ )
459
+
460
+ def generate_code_merge_crisis(self, difficulty: int) -> Task:
461
+ routes = [
462
+ Route(id="revert_commit", name="Revert Commit", description="Quickly revert the broken merge to unblock the team.", required_action_types=["delegate", "communicate"], preconditions={}, consequences={"pipeline_unblocked": True}, closes_routes=["hotfix"], milestones_unlocked=["unblocked"], final_reward=1.5),
463
+ Route(id="hotfix", name="Patch Forward", description="Find the logic error and push a hotfix.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"bug_resolved": True}, closes_routes=["revert_commit"], milestones_unlocked=["fixed"], final_reward=3.0),
464
+ ]
465
+ milestones = [
466
+ Milestone(id="unblocked", description="CI pipeline is green again", condition_key="pipeline_unblocked", condition_value=True, reward=1.0),
467
+ Milestone(id="fixed", description="Bug resolved without losing features", condition_key="bug_resolved", condition_value=True, reward=2.0),
468
+ ]
469
+ events = [
470
+ ExoEvent(step=3, probability=0.8, id="cto_ping", description="CTO asks for an ETA on the fix.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
471
+ ]
472
+ return Task(
473
+ id="code_merge_task", domain="code_merge_crisis", goal="Resolve Production Outage",
474
+ constraints={"budget_max": 1000, "deadline_step": 8},
475
+ hidden_state={},
476
+ mutable_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
477
+ visible_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
478
+ success_conditions=[{"key": "pipeline_unblocked", "value": True}, {"key": "bug_resolved", "value": True}],
479
+ failure_conditions=[],
480
+ event_schedule=events, viable_routes=routes, milestones=milestones,
481
+ horizon=10 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "A botched merge just took down the staging environment."}
482
+ )
483
+
484
+ def generate_career(self, difficulty: int) -> Task:
485
+ routes = [
486
+ Route(id="r1", name="Negotiate Workload", description="Discuss with manager to reduce workload.", required_action_types=["communicate"], preconditions={}, consequences={"workload_reduced": True}, closes_routes=["r2"], milestones_unlocked=["m1"], final_reward=2.0),
487
+ Route(id="r2", name="Find New Job", description="Start applying for new roles.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"job_found": True}, closes_routes=["r1", "r3"], milestones_unlocked=["m2"], final_reward=3.0),
488
+ Route(id="r3", name="Delegate to Team", description="Push tasks to junior colleagues.", required_action_types=["delegate"], preconditions={}, consequences={"team_delegated": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=1.5),
489
+ ]
490
+ milestones = [
491
+ Milestone(id="m1", description="Manager agreed to reduce tasks.", condition_key="workload_reduced", condition_value=True, reward=1.0),
492
+ Milestone(id="m2", description="Interview secured.", condition_key="job_found", condition_value=True, reward=1.5),
493
+ Milestone(id="m3", description="Tasks successfully delegated.", condition_key="team_delegated", condition_value=True, reward=0.8),
494
+ ]
495
+ events = [
496
+ ExoEvent(step=3, probability=0.7, id="boss_asks", description="Boss asks for progress on current tasks.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
497
+ ]
498
+ return Task(
499
+ id="career_crisis", domain="career", goal="Manage Career Overload", constraints={"budget_max": 500, "deadline_step": 12},
500
+ hidden_state={},
501
+ mutable_world={"career.workload": 30.0, "time.free_hours_per_week": -20.0},
502
+ visible_world={"career.workload": 30.0, "time.free_hours_per_week": -20.0},
503
+ success_conditions=[{"key": "workload_reduced", "value": True}, {"key": "job_found", "value": True}, {"key": "team_delegated", "value": True}],
504
+ failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Severe workload is threatening your career stability."}
505
+ )
506
+
507
+ def generate_finances(self, difficulty: int) -> Task:
508
+ routes = [
509
+ Route(id="r1", name="Emergency Fund", description="Dip into savings.", required_action_types=["spend"], preconditions={}, consequences={"used_emergency": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=1.0),
510
+ Route(id="r2", name="Negotiate Payment Plan", description="Call the creditor to delay payments.", required_action_types=["communicate"], preconditions={}, consequences={"payment_plan": True}, closes_routes=["r1"], milestones_unlocked=["m2"], final_reward=2.5),
511
+ Route(id="r3", name="Sell Asset", description="Liquidate an asset for quick cash.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"asset_sold": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=1.5),
512
+ ]
513
+ milestones = [
514
+ Milestone(id="m1", description="Emergency fund accessed.", condition_key="used_emergency", condition_value=True, reward=0.5),
515
+ Milestone(id="m2", description="Favorable payment plan negotiated.", condition_key="payment_plan", condition_value=True, reward=1.0),
516
+ Milestone(id="m3", description="Asset successfully sold.", condition_key="asset_sold", condition_value=True, reward=0.8),
517
+ ]
518
+ events = [
519
+ ExoEvent(step=2, probability=0.9, id="late_fee", description="A late fee was applied to the balance.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
520
+ ]
521
+ return Task(
522
+ id="finance_crisis", domain="finances", goal="Resolve Financial Pressure", constraints={"budget_max": 1000, "deadline_step": 10},
523
+ hidden_state={},
524
+ mutable_world={"finances.liquidity": -40.0, "finances.debt_pressure": 20.0},
525
+ visible_world={"finances.liquidity": -40.0, "finances.debt_pressure": 20.0},
526
+ success_conditions=[{"key": "used_emergency", "value": True}, {"key": "payment_plan", "value": True}, {"key": "asset_sold", "value": True}],
527
+ failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "An unexpected expense has caused financial strain."}
528
+ )
529
+
530
+ def generate_relationships(self, difficulty: int) -> Task:
531
+ routes = [
532
+ Route(id="r1", name="Couples Therapy", description="Book a session with a therapist.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"therapy_scheduled": True}, closes_routes=["r3"], milestones_unlocked=["m1"], final_reward=3.0),
533
+ Route(id="r2", name="Honest Conversation", description="Sit down and talk through issues.", required_action_types=["communicate"], preconditions={}, consequences={"had_conversation": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
534
+ Route(id="r3", name="Give Space", description="Take some time apart.", required_action_types=["rest"], preconditions={}, consequences={"giving_space": True}, closes_routes=["r1", "r2"], milestones_unlocked=["m3"], final_reward=1.0),
535
+ ]
536
+ milestones = [
537
+ Milestone(id="m1", description="Therapy session completed.", condition_key="therapy_scheduled", condition_value=True, reward=1.5),
538
+ Milestone(id="m2", description="A productive conversation occurred.", condition_key="had_conversation", condition_value=True, reward=1.0),
539
+ Milestone(id="m3", description="Space given without escalation.", condition_key="giving_space", condition_value=True, reward=0.5),
540
+ ]
541
+ events = [
542
+ ExoEvent(step=4, probability=0.6, id="partner_escalates", description="Partner sends an emotional text msg.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
543
+ ]
544
+ return Task(
545
+ id="relationship_crisis", domain="relationships", goal="Repair Relationship Friction", constraints={"budget_max": 800, "deadline_step": 14},
546
+ hidden_state={},
547
+ mutable_world={"relationships.romantic": -30.0, "mental_wellbeing.stress_level": 20.0},
548
+ visible_world={"relationships.romantic": -30.0, "mental_wellbeing.stress_level": 20.0},
549
+ success_conditions=[{"key": "therapy_scheduled", "value": True}, {"key": "had_conversation", "value": True}, {"key": "giving_space", "value": True}],
550
+ failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Growing distance and recent conflicts demand attention."}
551
+ )
552
+
553
+ def generate_physical_health(self, difficulty: int) -> Task:
554
+ routes = [
555
+ Route(id="r1", name="Medical Leave", description="Request time off to recover.", required_action_types=["communicate", "rest"], preconditions={}, consequences={"on_leave": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.5),
556
+ Route(id="r2", name="See Specialist", description="Pay for a top-tier medical consultation.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"saw_doctor": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
557
+ Route(id="r3", name="Lifestyle Change", description="Commit to better diet and sleep.", required_action_types=["rest"], preconditions={}, consequences={"lifestyle_changed": True}, closes_routes=["r1"], milestones_unlocked=["m3"], final_reward=1.5),
558
+ ]
559
+ milestones = [
560
+ Milestone(id="m1", description="Leave approved.", condition_key="on_leave", condition_value=True, reward=1.0),
561
+ Milestone(id="m2", description="Clear diagnosis received.", condition_key="saw_doctor", condition_value=True, reward=1.0),
562
+ Milestone(id="m3", description="First week of new habits complete.", condition_key="lifestyle_changed", condition_value=True, reward=0.5),
563
+ ]
564
+ events = [
565
+ ExoEvent(step=3, probability=0.8, id="doctor_call", description="The clinic calls with test results.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
566
+ ]
567
+ return Task(
568
+ id="health_crisis", domain="physical_health", goal="Address Health Warning", constraints={"budget_max": 1500, "deadline_step": 15},
569
+ hidden_state={},
570
+ mutable_world={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0},
571
+ visible_world={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0},
572
+ success_conditions=[{"key": "on_leave", "value": True}, {"key": "saw_doctor", "value": True}, {"key": "lifestyle_changed", "value": True}],
573
+ failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Physical symptoms are becoming impossible to ignore."}
574
+ )
575
+
576
+ def generate_mental_wellbeing(self, difficulty: int) -> Task:
577
+ routes = [
578
+ Route(id="r1", name="Professional Therapy", description="Start regular therapy sessions.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"therapy_started": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=3.0),
579
+ Route(id="r2", name="Disconnect", description="Take a full digital detox break.", required_action_types=["rest"], preconditions={}, consequences={"disconnected": True}, closes_routes=["r3"], milestones_unlocked=["m2"], final_reward=1.5),
580
+ Route(id="r3", name="Medication Evaluation", description="See a psychiatrist for options.", required_action_types=["spend"], preconditions={}, consequences={"medication_taken": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=2.0),
581
+ ]
582
+ milestones = [
583
+ Milestone(id="m1", description="Meaningful breakthrough in therapy.", condition_key="therapy_started", condition_value=True, reward=1.5),
584
+ Milestone(id="m2", description="Successfully unplugged for 48 hours.", condition_key="disconnected", condition_value=True, reward=0.8),
585
+ Milestone(id="m3", description="Prescription acquired.", condition_key="medication_taken", condition_value=True, reward=1.0),
586
+ ]
587
+ events = [
588
+ ExoEvent(step=2, probability=0.5, id="panic_attack", description="A sudden wave of severe anxiety hits.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
589
+ ]
590
+ return Task(
591
+ id="mental_crisis", domain="mental_wellbeing", goal="Avert Total Burnout", constraints={"budget_max": 600, "deadline_step": 12},
592
+ hidden_state={},
593
+ mutable_world={"mental_wellbeing.motivation": -35.0, "mental_wellbeing.stress_level": 40.0},
594
+ visible_world={"mental_wellbeing.motivation": -35.0, "mental_wellbeing.stress_level": 40.0},
595
+ success_conditions=[{"key": "therapy_started", "value": True}, {"key": "disconnected", "value": True}, {"key": "medication_taken", "value": True}],
596
+ failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Complete exhaustion and loss of motivation."}
597
+ )
598
+
599
+ def generate_time(self, difficulty: int) -> Task:
600
+ routes = [
601
+ Route(id="r1", name="Reprioritize", description="Restructure calendar and say 'no'.", required_action_types=["communicate"], preconditions={}, consequences={"priorities_reset": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.0),
602
+ Route(id="r2", name="Delegate", description="Pay someone or ask for help with chores.", required_action_types=["spend", "delegate"], preconditions={}, consequences={"tasks_delegated": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=1.5),
603
+ Route(id="r3", name="Cancel Commitments", description="Drop out of major upcoming events.", required_action_types=["communicate"], preconditions={}, consequences={"commitments_cancelled": True}, closes_routes=["r1"], milestones_unlocked=["m3"], final_reward=1.0),
604
+ ]
605
+ milestones = [
606
+ Milestone(id="m1", description="Calendar cleared of non-essentials.", condition_key="priorities_reset", condition_value=True, reward=1.0),
607
+ Milestone(id="m2", description="Help secured for daily tasks.", condition_key="tasks_delegated", condition_value=True, reward=0.8),
608
+ Milestone(id="m3", description="Social obligations cancelled.", condition_key="commitments_cancelled", condition_value=True, reward=0.5),
609
+ ]
610
+ events = [
611
+ ExoEvent(step=3, probability=0.9, id="new_request", description="A friend asks for an 'urgent' favor.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
612
+ ]
613
+ return Task(
614
+ id="time_crisis", domain="time", goal="Regain Time Control", constraints={"budget_max": 300, "deadline_step": 10},
615
+ hidden_state={},
616
+ mutable_world={"time.free_hours_per_week": -25.0, "time.admin_overhead": 20.0},
617
+ visible_world={"time.free_hours_per_week": -25.0, "time.admin_overhead": 20.0},
618
+ success_conditions=[{"key": "priorities_reset", "value": True}, {"key": "tasks_delegated", "value": True}, {"key": "commitments_cancelled", "value": True}],
619
+ failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "You are double-booked and drowning in obligations."}
620
+ )
agent/conflict_predictor.py ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ conflict_predictor.py β€” Proactive intelligence and trajectory forecasting
3
+ """
4
+
5
+ import copy
6
+ from core.life_state import LifeMetrics, DependencyGraph
7
+
8
+ class ConflictPredictor:
9
+ def __init__(self):
10
+ self.graph = DependencyGraph()
11
+ self.snapshots = [] # list of flattened LifeMetrics dicts
12
+ self.MAX_HISTORY = 10
13
+ self.INVERSE_METRICS = {
14
+ "mental_wellbeing.stress_level",
15
+ "career.workload",
16
+ "finances.debt_pressure",
17
+ "time.commute_burden",
18
+ "time.admin_overhead"
19
+ }
20
+
21
+ def add_snapshot(self, metrics: LifeMetrics) -> None:
22
+ self.snapshots.append(metrics.flatten())
23
+ if len(self.snapshots) > self.MAX_HISTORY:
24
+ self.snapshots.pop(0)
25
+
26
+ def compute_trajectory(self, metric_path: str) -> float:
27
+ if len(self.snapshots) < 3:
28
+ return 0.0
29
+
30
+ # Use last 5 snapshots maximum
31
+ n = min(5, len(self.snapshots))
32
+ y = [s.get(metric_path, 0.0) for s in self.snapshots[-n:]]
33
+ x = list(range(n))
34
+
35
+ # Simple linear regression: slope = Cov(x, y) / Var(x)
36
+ mean_y = sum(y) / n
37
+ mean_x = sum(x) / n
38
+ cov_xy = sum((x_i - mean_x) * (y_i - mean_y) for x_i, y_i in zip(x, y))
39
+ var_x = sum((x_i - mean_x) ** 2 for x_i in x)
40
+
41
+ if var_x == 0:
42
+ return 0.0
43
+ return cov_xy / var_x
44
+
45
+ def predict_crisis(self, horizon_days: int = 7) -> list:
46
+ if not self.snapshots:
47
+ return []
48
+
49
+ current = self.snapshots[-1]
50
+ warnings = []
51
+
52
+ for metric, val in current.items():
53
+ slope = self.compute_trajectory(metric)
54
+ if slope == 0.0:
55
+ continue
56
+
57
+ projected = val + (slope * horizon_days)
58
+ is_inverse = metric in self.INVERSE_METRICS
59
+
60
+ # Normal metric: Critical is low (<30), Warning is low (<45)
61
+ # Inverse metric: Critical is high (>70), Warning is high (>55)
62
+ critical_now = (val > 70) if is_inverse else (val < 30)
63
+ warning_now = (val > 55) if is_inverse else (val < 45)
64
+
65
+ critical_proj = (projected > 70) if is_inverse else (projected < 30)
66
+ warning_proj = (projected > 55) if is_inverse else (projected < 45)
67
+
68
+ worse_direction = (slope > 0) if is_inverse else (slope < 0)
69
+
70
+ if worse_direction and (critical_proj or warning_proj):
71
+ threshold = 70.0 if is_inverse else 30.0
72
+ days_until_crit = (threshold - val) / slope if slope != 0 else float('inf')
73
+
74
+ if critical_now:
75
+ days_until_crit = 0.0
76
+
77
+ severity = 'crisis' if critical_proj else 'warning'
78
+ direction_word = "rising" if slope > 0 else "declining"
79
+ friendly_name = metric.split('.')[-1].replace('_', ' ')
80
+
81
+ if severity == 'crisis':
82
+ msg = f"{friendly_name} will hit critical levels in {max(0, int(days_until_crit))} days."
83
+ else:
84
+ msg = f"{friendly_name} has been {direction_word} ({slope:+.1f}/day) β€” warning levels likely within {horizon_days} days."
85
+
86
+ warnings.append({
87
+ "metric": metric,
88
+ "current_value": val,
89
+ "projected_value": projected,
90
+ "days_until_critical": max(0.0, days_until_crit),
91
+ "severity": severity,
92
+ "message": msg
93
+ })
94
+
95
+ # Sort by urgency (days until critical)
96
+ warnings.sort(key=lambda x: x['days_until_critical'])
97
+ return warnings
98
+
99
+ def get_prediction_summary(self) -> str:
100
+ warnings = self.predict_crisis()
101
+ if not warnings:
102
+ return "Your life metrics are stable. No immediate crises predicted."
103
+
104
+ messages = [w['message'] for w in warnings]
105
+ return "Based on your current trajectory: " + " ".join(messages[:3]) + ("" if len(messages) <= 3 else " (+ more warnings hidden).")
106
+
107
+ def get_risk_score(self) -> float:
108
+ warnings = self.predict_crisis()
109
+ if not warnings:
110
+ return 0.0
111
+
112
+ score = 0.0
113
+ for w in warnings:
114
+ if w['severity'] == 'crisis':
115
+ score += 0.3
116
+ else:
117
+ score += 0.1
118
+ return min(1.0, score)
119
+
120
+ def main():
121
+ import random
122
+
123
+ predictor = ConflictPredictor()
124
+
125
+ print("Simulating 5 days of accumulating stress and declining sleep...\n")
126
+ current_state = LifeMetrics()
127
+
128
+ for i in range(5):
129
+ current_state.mental_wellbeing.stress_level += 5.0 + random.uniform(0, 2)
130
+ current_state.physical_health.sleep_quality -= 4.0 + random.uniform(0, 2)
131
+ current_state.time.free_hours_per_week -= 1.0 + random.uniform(0, 1)
132
+
133
+ predictor.add_snapshot(current_state)
134
+ print(f"Day {i+1}: Stress={current_state.mental_wellbeing.stress_level:.1f}, Sleep={current_state.physical_health.sleep_quality:.1f}")
135
+
136
+ print("\n--- PREDICTION AFTER 5 DAYS ---")
137
+ print(f"Risk Score: {predictor.get_risk_score():.2f}")
138
+ print("Summary:")
139
+ print(predictor.get_prediction_summary())
140
+
141
+ if __name__ == '__main__':
142
+ main()
agent/counterfactuals.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ counterfactuals.py β€” Generates alternative "What If" scenarios for LifeStack agent decisions.
3
+ """
4
+
5
+ import copy
6
+ import random
7
+ from core.reward import compute_reward
8
+ from core.life_state import DependencyGraph
9
+
10
+ def generate_counterfactuals(agent, metrics, budget, conflict, person, chosen_action):
11
+ """
12
+ Simulates 3 alternative action types and compares them to the agent's choice.
13
+ Returns a list of dicts with alternative outcomes.
14
+ """
15
+ action_types = ["communicate", "rest", "delegate", "negotiate", "spend", "reschedule", "deprioritize"]
16
+ chosen_type = chosen_action.primary.action_type
17
+
18
+ # Filter and pick 3 different types
19
+ alternatives = [t for t in action_types if t != chosen_type]
20
+ random.shuffle(alternatives)
21
+ target_types = alternatives[:3]
22
+
23
+ results = []
24
+ graph = DependencyGraph()
25
+
26
+ for action_type in target_types:
27
+ try:
28
+ # 1. Generate alternative action
29
+ # We use the special forced-type method we added to the agent
30
+ alt_action = agent.get_action_for_type(metrics, budget, conflict, person, action_type)
31
+
32
+ # 2. Simulate applying it
33
+ current_stress = metrics.mental_wellbeing.stress_level
34
+ uptake = person.respond_to_action(
35
+ alt_action.primary.action_type,
36
+ alt_action.primary.resource_cost,
37
+ current_stress
38
+ )
39
+
40
+ state_after = copy.deepcopy(metrics)
41
+ for path, delta in alt_action.primary.metric_changes.items():
42
+ if "." not in path: continue
43
+ try:
44
+ scaled_delta = float(delta) * uptake
45
+ except (ValueError, TypeError):
46
+ continue
47
+
48
+ if abs(scaled_delta) > 5:
49
+ state_after = graph.cascade(state_after, {path: scaled_delta})
50
+ else:
51
+ dom, sub = path.split('.')
52
+ d = getattr(state_after, dom, None)
53
+ if d:
54
+ cur = getattr(d, sub, 70.0)
55
+ setattr(d, sub, max(0.0, min(100.0, cur + scaled_delta)))
56
+
57
+ # 3. Compute Reward
58
+ reward, breakdown = compute_reward(metrics, state_after, alt_action.primary.resource_cost, 1)
59
+
60
+ # 4. Analysis deltas
61
+ flat_before = metrics.flatten()
62
+ flat_after = state_after.flatten()
63
+ deltas = {k: flat_after[k] - flat_before[k] for k in flat_after}
64
+
65
+ # Filter for meaningful changes (>1.0)
66
+ significant = {k: v for k, v in deltas.items() if abs(v) > 1.0}
67
+
68
+ trade_off = ""
69
+ if significant:
70
+ best = max(significant.items(), key=lambda x: x[1])
71
+ worst = min(significant.items(), key=lambda x: x[1])
72
+
73
+ b_name = best[0].split('.')[-1].replace('_', ' ')
74
+ if best[1] > 2:
75
+ trade_off = f"Better {b_name} (+{best[1]:.0f})"
76
+ else:
77
+ trade_off = f"Stability in {b_name}"
78
+
79
+ if worst[1] < -2:
80
+ w_name = worst[0].split('.')[-1].replace('_', ' ')
81
+ trade_off += f" but drops {w_name} ({worst[1]:.0f})"
82
+ else:
83
+ trade_off += " but mission impact is lower than optimal."
84
+ else:
85
+ trade_off = "Minimal impact on core life metrics."
86
+
87
+ # Incorporate resource commentary
88
+ cost = alt_action.primary.resource_cost
89
+ if cost.get('money', 0) > 100:
90
+ trade_off += f" (${cost['money']:.0f} cost)"
91
+ elif cost.get('time', 0) > 4:
92
+ trade_off += f" ({cost['time']:.1f}h time drain)"
93
+
94
+ results.append({
95
+ "action_type": action_type,
96
+ "description": alt_action.primary.description,
97
+ "reward": reward,
98
+ "trade_off": trade_off,
99
+ "uptake": uptake,
100
+ "metrics": state_after.flatten(),
101
+ })
102
+
103
+ except Exception as e:
104
+ print(f"Error in counterfactual generation for {action_type}: {e}")
105
+
106
+ return results
agent/memory.py ADDED
@@ -0,0 +1,394 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import chromadb
3
+ from sentence_transformers import SentenceTransformer
4
+ import uuid
5
+ import math
6
+ from datetime import datetime
7
+ from collections import defaultdict
8
+ from typing import Optional
9
+
10
+
11
+ class LifeStackMemory:
12
+ def __init__(self, silent: bool = False, path: str = "./lifestack_memory"):
13
+ self.client = chromadb.PersistentClient(path=path)
14
+ self.collection = self.client.get_or_create_collection(name='decisions')
15
+ self.traj_collection = self.client.get_or_create_collection(name='trajectories')
16
+ self.feedback_collection = self.client.get_or_create_collection(name='feedback') # New for OutcomeFeedback
17
+ self.silent = silent
18
+ self.encoder = self._load_encoder()
19
+ if not self.silent:
20
+ print("Memory system initialized")
21
+
22
+ # Auto-hydrate if empty
23
+ if self.collection.count() == 0:
24
+ self._hydrate_from_preseeded()
25
+
26
+ def _hydrate_from_preseeded(self):
27
+ import json
28
+ sources = ["./data/preseeded_memory_p1.json", "./data/preseeded_memory_p2.json"]
29
+
30
+ if not self.silent:
31
+ print(f"🧬 Empty memory detected. Hydrating from partitioned volumes...")
32
+
33
+ total_decisions = 0
34
+ for path in sources:
35
+ if not os.path.exists(path):
36
+ continue
37
+
38
+ try:
39
+ with open(path, 'r') as f:
40
+ data = json.load(f)
41
+
42
+ # Hydrate decisions
43
+ d = data.get("decisions", {})
44
+ if d.get("ids"):
45
+ self.collection.add(
46
+ ids=d["ids"],
47
+ documents=d["documents"],
48
+ metadatas=d["metadatas"],
49
+ embeddings=d["embeddings"]
50
+ )
51
+ total_decisions += len(d["ids"])
52
+ except Exception as e:
53
+ if not self.silent:
54
+ print(f"⚠️ Hydration failed for {path}: {e}")
55
+
56
+ if not self.silent:
57
+ print(f"βœ… Hydration complete: {total_decisions} memories restored.")
58
+
59
+ def _load_encoder(self):
60
+ try:
61
+ return SentenceTransformer('all-MiniLM-L6-v2', local_files_only=True)
62
+ except Exception as exc:
63
+ if not self.silent:
64
+ print(f"Falling back to local hash embeddings: {exc}")
65
+ return None
66
+
67
+ def _embed_text(self, text: str) -> list[float]:
68
+ if self.encoder is not None:
69
+ return self.encoder.encode(text).tolist()
70
+
71
+ import zlib
72
+ buckets = [0.0] * 384
73
+ for token in text.lower().split():
74
+ idx = zlib.adler32(token.encode()) % len(buckets)
75
+ buckets[idx] += 1.0
76
+
77
+ norm = math.sqrt(sum(v * v for v in buckets)) or 1.0
78
+ return [v / norm for v in buckets]
79
+
80
+ def store_decision(
81
+ self,
82
+ conflict_title: str,
83
+ action_type: str,
84
+ target_domain: str,
85
+ reward: float,
86
+ metrics_snapshot: dict,
87
+ reasoning: str,
88
+ trajectory: list[dict] = None,
89
+ route_outcome: str = None
90
+ ) -> None:
91
+ """Stores individual decision for longitudinal tracking."""
92
+
93
+ text = f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}"
94
+ embedding = self._embed_text(text)
95
+
96
+ doc_id = str(uuid.uuid4())
97
+ self.collection.add(
98
+ ids=[doc_id],
99
+ embeddings=[embedding],
100
+ documents=[text],
101
+ metadatas=[{
102
+ "conflict_title": conflict_title,
103
+ "action_type": action_type,
104
+ "target_domain": target_domain,
105
+ "reward": float(reward),
106
+ "reasoning": reasoning,
107
+ "route_outcome": route_outcome or "",
108
+ "timestamp": datetime.now().isoformat()
109
+ }]
110
+ )
111
+
112
+ def store_trajectory(
113
+ self,
114
+ conflict_title: str = None,
115
+ route_taken: str = None,
116
+ total_reward: float = 0.0,
117
+ metrics_diff_str: str = None,
118
+ reasoning: str = None,
119
+ task_id: str = None,
120
+ trajectory_summary: dict = None
121
+ ) -> None:
122
+ """Stores a full trajectory summary."""
123
+
124
+ if trajectory_summary is not None and task_id is not None:
125
+ import json
126
+ text = f"Task: {task_id} Route: {route_taken} Reward: {total_reward:.2f} Hits: {len(trajectory_summary.get('milestones_hit', []))}"
127
+ embedding = self._embed_text(text)
128
+ doc_id = str(uuid.uuid4())
129
+ self.traj_collection.add(
130
+ ids=[doc_id],
131
+ embeddings=[embedding],
132
+ documents=[text],
133
+ metadatas=[{
134
+ "task_id": task_id,
135
+ "route_taken": route_taken,
136
+ "reward": total_reward,
137
+ "summary": json.dumps(trajectory_summary),
138
+ "timestamp": datetime.now().isoformat()
139
+ }]
140
+ )
141
+ if not self.silent:
142
+ print(f"Stored task trajectory: {route_taken} (reward: {total_reward:.2f})")
143
+ return
144
+
145
+ # Fallback to older signature logic
146
+ text = f"{conflict_title} Route: {route_taken} Diff: {metrics_diff_str} {reasoning[:100]}"
147
+ embedding = self._embed_text(text)
148
+
149
+ doc_id = str(uuid.uuid4())
150
+ self.collection.add(
151
+ ids=[doc_id],
152
+ embeddings=[embedding],
153
+ documents=[text],
154
+ metadatas=[{
155
+ "conflict_title": conflict_title,
156
+ "route_taken": route_taken,
157
+ "metrics_diff": metrics_diff_str,
158
+ "reward": total_reward,
159
+ "reasoning": reasoning,
160
+ "timestamp": datetime.now().isoformat()
161
+ }]
162
+ )
163
+ if not self.silent:
164
+ print(f"Stored trajectory fallback: {route_taken} (reward: {total_reward:.2f})")
165
+
166
+ def store_feedback(self, feedback) -> None:
167
+ """Stores OutcomeFeedback linked to a specific episode."""
168
+ import json
169
+ text = f"Episode: {feedback.episode_id} Effectiveness: {feedback.overall_effectiveness} Resolution: {feedback.resolution_time_hours}h"
170
+ embedding = self._embed_text(text)
171
+
172
+ doc_id = f"fb_{feedback.episode_id}"
173
+ self.feedback_collection.add(
174
+ ids=[doc_id],
175
+ embeddings=[embedding],
176
+ documents=[text],
177
+ metadatas=[{
178
+ "episode_id": feedback.episode_id,
179
+ "effectiveness": feedback.overall_effectiveness,
180
+ "domains_improved": json.dumps(feedback.domains_improved),
181
+ "domains_worsened": json.dumps(feedback.domains_worsened),
182
+ "unexpected_effects": feedback.unexpected_effects,
183
+ "resolution_time": feedback.resolution_time_hours,
184
+ "timestamp": feedback.submitted_at.isoformat()
185
+ }]
186
+ )
187
+ if not self.silent:
188
+ print(f"Stored human feedback for episode {feedback.episode_id}")
189
+
190
+ def retrieve_feedback(self, episode_id: str) -> Optional[dict]:
191
+ """Retrieves feedback for a specific episode."""
192
+ import json
193
+ doc_id = f"fb_{episode_id}"
194
+ results = self.feedback_collection.get(ids=[doc_id])
195
+
196
+ if not results['metadatas']:
197
+ return None
198
+
199
+ meta = results['metadatas'][0]
200
+ # Deserialize lists
201
+ meta["domains_improved"] = json.loads(meta["domains_improved"])
202
+ meta["domains_worsened"] = json.loads(meta["domains_worsened"])
203
+ return meta
204
+
205
+ def retrieve_similar_trajectories(self, task_domain: str, current_world: dict, n: int = 3) -> list[dict]:
206
+ """Retrieve similar trajectories based on task domain and current world state."""
207
+ import json
208
+ if self.traj_collection.count() == 0:
209
+ return []
210
+
211
+ sorted_metrics = sorted(current_world.items(), key=lambda x: x[1] if isinstance(x[1], (int, float)) else 0)
212
+ top_stressed = " ".join(f"{k}:{v}" for k, v in sorted_metrics[:3])
213
+ query_text = f"TaskDomain: {task_domain} {top_stressed}"
214
+
215
+ query_embedding = self._embed_text(query_text)
216
+ results = self.traj_collection.query(
217
+ query_embeddings=[query_embedding],
218
+ n_results=min(n, self.traj_collection.count())
219
+ )
220
+
221
+ output = []
222
+ for i, meta in enumerate(results['metadatas'][0]):
223
+ output.append({
224
+ "task_id": meta.get("task_id", ""),
225
+ "route_taken": meta.get("route_taken", ""),
226
+ "reward": meta.get("reward", 0.0),
227
+ "summary": json.loads(meta.get("summary", "{}")),
228
+ })
229
+ return output
230
+
231
+ def retrieve_similar(self, conflict_title: str, current_metrics: dict, n: int = 3) -> list[dict]:
232
+ """Retrieves the n most similar past high-reward decisions using semantic search."""
233
+ if self.collection.count() == 0:
234
+ return []
235
+
236
+ # Build query from conflict title + 3 most stressed metrics (lowest values)
237
+ sorted_metrics = sorted(current_metrics.items(), key=lambda x: x[1])
238
+ top_stressed = " ".join(f"{k}:{v:.0f}" for k, v in sorted_metrics[:3])
239
+ query_text = f"{conflict_title} {top_stressed}"
240
+
241
+ query_embedding = self._embed_text(query_text)
242
+ results = self.collection.query(
243
+ query_embeddings=[query_embedding],
244
+ n_results=min(n * 2, self.collection.count()) # Retrieve more to filter for high reward
245
+ )
246
+
247
+ output = []
248
+ for i, meta in enumerate(results['metadatas'][0]):
249
+ if meta.get("reward", 0.0) < 0.05: # Filter out negative/zero reward decisions
250
+ continue
251
+ if len(output) >= n:
252
+ break
253
+ distance = results['distances'][0][i]
254
+ similarity = round(1.0 / (1.0 + distance), 4)
255
+ output.append({
256
+ "route_taken": meta.get("route_taken", ""),
257
+ "action_type": meta.get("action_type", ""),
258
+ "target_domain": meta.get("target_domain", ""),
259
+ "metrics_diff": meta.get("metrics_diff", ""),
260
+ "reward": meta.get("reward", 0.0),
261
+ "reasoning": meta.get("reasoning", ""),
262
+ "similarity_score": similarity
263
+ })
264
+
265
+ return output
266
+
267
+ def build_few_shot_prompt(self, conflict_title: str, current_metrics: dict) -> str:
268
+ """Formats retrieved memories into a few-shot prompt block for the LLM."""
269
+ memories = self.retrieve_similar(conflict_title, current_metrics)
270
+ if not memories:
271
+ return ""
272
+
273
+ lines = ["Past successful trajectories in similar situations:\n"]
274
+ for m in memories:
275
+ short_reason = m['reasoning'][:80]
276
+ lines.append(
277
+ f" Route [{m['route_taken']}] β†’ impact [{m['metrics_diff']}] β†’ total reward {m['reward']:.2f} "
278
+ f"(reasoning: {short_reason}...)"
279
+ )
280
+
281
+ return "\n".join(lines)
282
+
283
+ def get_stats(self) -> dict:
284
+ """Returns memory stats: total count, average reward, and route details."""
285
+ if self.collection.count() == 0:
286
+ return {"total_memories": 0, "average_reward": 0.0, "by_route": {}}
287
+
288
+ all_records = self.collection.get(include=["metadatas"])
289
+ metadatas = all_records["metadatas"]
290
+
291
+ total = len(metadatas)
292
+ avg_reward = sum(m.get("reward", 0.0) for m in metadatas) / total
293
+
294
+ by_route = defaultdict(int)
295
+ for m in metadatas:
296
+ route = m.get("route_taken") or m.get("route_outcome") or "unknown"
297
+ first_action = route.split(' ')[0] if route else "unknown"
298
+ by_route[first_action] += 1
299
+
300
+ return {
301
+ "total_memories": total,
302
+ "average_reward": round(avg_reward, 3),
303
+ "by_action_type": dict(by_route)
304
+ }
305
+
306
+
307
+ def main():
308
+ memory = LifeStackMemory()
309
+
310
+ # --- Synthetic Decisions: mix of high and low reward ---
311
+ synthetic = [
312
+ {
313
+ "conflict_title": "Friday 6PM",
314
+ "action_type": "negotiate",
315
+ "target_domain": "career",
316
+ "reward": 0.72,
317
+ "metrics_snapshot": {"career.workload": 100, "mental_wellbeing.stress_level": 95},
318
+ "reasoning": "Negotiating the deadline directly reduced workload pressure quickly."
319
+ },
320
+ {
321
+ "conflict_title": "Friday 6PM",
322
+ "action_type": "rest",
323
+ "target_domain": "mental_wellbeing",
324
+ "reward": 0.61,
325
+ "metrics_snapshot": {"mental_wellbeing.stress_level": 95, "physical_health.energy": 40},
326
+ "reasoning": "A short rest during peak stress restored energy before tackling logistics."
327
+ },
328
+ {
329
+ "conflict_title": "The Perfect Storm",
330
+ "action_type": "communicate",
331
+ "target_domain": "relationships",
332
+ "reward": 0.58,
333
+ "metrics_snapshot": {"relationships.romantic": 45, "mental_wellbeing.emotional_stability": 50},
334
+ "reasoning": "A quick reassuring call prevented relationship collapse under crisis."
335
+ },
336
+ {
337
+ "conflict_title": "The Perfect Storm",
338
+ "action_type": "delegate",
339
+ "target_domain": "career",
340
+ "reward": 0.38, # Below threshold β€” should NOT be stored
341
+ "metrics_snapshot": {"career.workload": 90, "career.stability": 55},
342
+ "reasoning": "Attempted to delegate but the neurotic profile made it ineffective."
343
+ },
344
+ {
345
+ "conflict_title": "Health Scare",
346
+ "action_type": "rest",
347
+ "target_domain": "physical_health",
348
+ "reward": 0.80,
349
+ "metrics_snapshot": {"physical_health.energy": 20, "mental_wellbeing.stress_level": 90},
350
+ "reasoning": "Aggressive rest protocol dramatically recovered energy and clarity."
351
+ },
352
+ {
353
+ "conflict_title": "Check Engine Light",
354
+ "action_type": "spend",
355
+ "target_domain": "finances",
356
+ "reward": 0.33, # Below threshold β€” should NOT be stored
357
+ "metrics_snapshot": {"finances.liquidity": 40, "time.commute_burden": 80},
358
+ "reasoning": "Overspent on premium repair, draining liquidity buffer dangerously."
359
+ },
360
+ ]
361
+
362
+ print("\n--- STORING SYNTHETIC DECISIONS ---")
363
+ for d in synthetic:
364
+ memory.store_decision(**d)
365
+
366
+ # --- Retrieve similar decisions ---
367
+ print("\n--- RETRIEVING SIMILAR DECISIONS ---")
368
+ test_metrics = {
369
+ "career.workload": 95,
370
+ "mental_wellbeing.stress_level": 90,
371
+ "finances.liquidity": 35,
372
+ "physical_health.energy": 50,
373
+ "relationships.romantic": 70
374
+ }
375
+ similar = memory.retrieve_similar("Friday 6PM", test_metrics, n=3)
376
+ for s in similar:
377
+ print(f" [{s['action_type']}] β†’ {s['target_domain']} | reward: {s['reward']:.2f} | similarity: {s['similarity_score']:.4f}")
378
+ print(f" Reasoning: {s['reasoning'][:80]}...")
379
+
380
+ # --- Few-shot prompt ---
381
+ print("\n--- FEW-SHOT PROMPT OUTPUT ---")
382
+ prompt = memory.build_few_shot_prompt("Friday 6PM", test_metrics)
383
+ print(prompt if prompt else "(No relevant memories found)")
384
+
385
+ # --- Stats ---
386
+ print("\n--- MEMORY STATS ---")
387
+ stats = memory.get_stats()
388
+ print(f"Total Memories : {stats['total_memories']}")
389
+ print(f"Average Reward : {stats['average_reward']}")
390
+ print(f"By Action Type : {stats.get('by_action_type', stats.get('by_route_start'))}")
391
+
392
+
393
+ if __name__ == "__main__":
394
+ main()
app.py ADDED
@@ -0,0 +1,1284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ app.py β€” LifeStack Gradio Demo App
3
+ Hackathon presentation interface for the LifeStack simulation engine.
4
+ """
5
+
6
+ import os
7
+ import json
8
+ import copy
9
+ import gradio as gr
10
+ import matplotlib
11
+ matplotlib.use("Agg")
12
+ import matplotlib.pyplot as plt
13
+
14
+ # ─── LifeStack modules ────────────────────────────────────────────────────────
15
+ from core.life_state import LifeMetrics, ResourceBudget
16
+ from core.lifestack_env import LifeStackEnv, LifeStackAction
17
+ from agent.agent import LifeStackAgent
18
+ from intake.simperson import SimPerson
19
+ from agent.conflict_generator import ConflictEvent, generate_conflict, TEMPLATES
20
+ from core.action_space import apply_action, validate_action
21
+ from agent.memory import LifeStackMemory
22
+ from core.metric_schema import normalize_metric_path, is_valid_metric_path
23
+ from core.reward import compute_reward
24
+ from intake.intake import LifeIntake
25
+ from agent.conflict_predictor import ConflictPredictor
26
+ from agent.counterfactuals import generate_counterfactuals
27
+ from scripts.longitudinal_demo import LongitudinalDemo
28
+ from intake.gmail_intake import GmailIntake
29
+ from core.task import Task, ExoEvent, Route, Milestone
30
+ from core.feedback import OutcomeFeedback, compute_human_feedback_reward
31
+
32
+ # ─── Pre-load at startup ──────────────────────────────────────────────────────
33
+ print("πŸš€ LifeStack booting…")
34
+
35
+ AGENT = LifeStackAgent()
36
+ MEMORY = LifeStackMemory(silent=True)
37
+ INTAKE = LifeIntake()
38
+ GMAIL = GmailIntake()
39
+ LONG_DEMO = LongitudinalDemo()
40
+
41
+ # Pre-seed Arjun's 3-week context into ChromaDB on startup
42
+ LONG_DEMO.pre_seed_arjun()
43
+
44
+ # Friday 6PM is always the default demo conflict
45
+ DEMO_CONFLICT = next(t for t in TEMPLATES if t.id == "d5_friday")
46
+
47
+ PERSONS = {
48
+ "Alex (Executive) β€” driven, high-stress":
49
+ SimPerson(openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8, name="Alex (Executive)"),
50
+ "Chloe (Creative) β€” spontaneous, resilient":
51
+ SimPerson(openness=0.9, conscientiousness=0.2, extraversion=0.5, agreeableness=0.70, neuroticism=0.15, name="Chloe (Creative)"),
52
+ "Sam (Introvert) β€” anxious, thoughtful":
53
+ SimPerson(openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9, name="Sam (Introvert)"),
54
+ "Maya (Family) β€” empathetic, nurturing":
55
+ SimPerson(openness=0.5, conscientiousness=0.7, extraversion=0.5, agreeableness=0.95, neuroticism=0.3, name="Maya (Family)"),
56
+ "Leo (Student) β€” curious, organised":
57
+ SimPerson(openness=0.85, conscientiousness=0.8, extraversion=0.4, agreeableness=0.4, neuroticism=0.55, name="Leo (Student)"),
58
+ "Arjun (Startup Lead) β€” high- conscientiousness, high-neuroticism":
59
+ SimPerson(name="Arjun", openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8),
60
+ }
61
+
62
+ CONFLICT_CHOICES = {f"[Diff {t.difficulty}] {t.title}": t for t in TEMPLATES}
63
+ PERSON_CHOICES = list(PERSONS.keys())
64
+ CONFLICT_CHOICES_LIST = list(CONFLICT_CHOICES.keys())
65
+ DEFAULT_CONFLICT = next(k for k in CONFLICT_CHOICES_LIST if "Friday 6PM" in k)
66
+
67
+ DEMO_PREDICTOR = ConflictPredictor()
68
+
69
+ print("βœ… LifeStack ready.")
70
+
71
+ # ─── Helpers ──────────────────────────────────────────────────────────────────
72
+ DOMAIN_EMOJI = {
73
+ "career": "πŸ’Ό", "finances": "πŸ’°", "relationships": "❀️",
74
+ "physical_health": "πŸ’ͺ", "mental_wellbeing": "🧠", "time": "πŸ“…",
75
+ }
76
+
77
+ # Metrics where HIGH = BAD (inverted color logic)
78
+ INVERTED_METRICS = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
79
+
80
+ def _metric_color(key: str, val: float) -> str:
81
+ """Return CSS color: inverted for 'bad-when-high' metrics."""
82
+ sub = key.split(".")[-1]
83
+ if sub in INVERTED_METRICS:
84
+ return "#f87171" if val > 70 else ("#facc15" if val >= 40 else "#4ade80")
85
+ return "#4ade80" if val > 70 else ("#facc15" if val >= 40 else "#f87171")
86
+
87
+ def metrics_html(flat: dict, title: str = "", before: dict = None) -> str:
88
+ """Render metrics as coloured progress bars.
89
+ If `before` is supplied, metrics that changed >1 pt show ↑/↓ + delta.
90
+ """
91
+ domains = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
92
+ rows = []
93
+ if title:
94
+ rows.append(f"<h3 style='margin:0 0 8px;font-size:14px;color:#aaa'>{title}</h3>")
95
+ for dom in domains:
96
+ emoji = DOMAIN_EMOJI[dom]
97
+ rows.append(f"<div style='margin:6px 0 2px;font-size:12px;font-weight:700;color:#ccc'>{emoji} {dom.upper()}</div>")
98
+ sub = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
99
+ for key, val in sub.items():
100
+ name = key.split(".")[1].replace("_", " ")
101
+ color = _metric_color(key, val)
102
+ pct = min(val, 100)
103
+
104
+ delta_str = ""
105
+ if before is not None and key in before:
106
+ delta = val - before[key]
107
+ if abs(delta) > 1.0:
108
+ arrow = "↑" if delta > 0 else "↓"
109
+ dc = "#4ade80" if delta > 0 else "#f87171"
110
+ delta_str = (
111
+ f"<span style='font-size:10px;color:{dc};margin-left:4px;font-weight:700'>"
112
+ f"{arrow} ({delta:+.1f})</span>"
113
+ )
114
+
115
+ rows.append(
116
+ f"<div style='display:flex;align-items:center;gap:6px;margin:2px 0'>"
117
+ f" <span style='width:140px;font-size:11px;color:#bbb'>{name}</span>"
118
+ f" <div style='flex:1;background:#333;border-radius:4px;height:10px'>"
119
+ f" <div style='width:{pct}%;background:{color};border-radius:4px;height:10px'></div>"
120
+ f" </div>"
121
+ f" <span style='width:38px;font-size:11px;color:#ccc;text-align:right'>{val:.1f}</span>"
122
+ f" {delta_str}"
123
+ f"</div>"
124
+ )
125
+ return "<div style='font-family:monospace;padding:8px'>" + "\n".join(rows) + "</div>"
126
+
127
+
128
+ def _init_env(conflict: ConflictEvent) -> LifeStackEnv:
129
+ env = LifeStackEnv()
130
+ env.reset(conflict=conflict.primary_disruption, budget=conflict.resource_budget)
131
+ return env
132
+
133
+
134
+ def task_html(task: Task) -> str:
135
+ if not task:
136
+ return "<div style='color:#888; font-style:italic'>No active task</div>"
137
+ routes_html = "".join([f"<li style='margin-bottom:6px;'><b>{r.name}</b>: {r.description} <br><span style='font-size:11px;color:#aaa'>Req. Actions: {r.required_action_types} | Reward: +{r.final_reward}</span></li>" for r in task.viable_routes])
138
+ if not routes_html: routes_html = "<li style='color:#888'>No routes</li>"
139
+
140
+ milestones_html = "".join([f"<li style='margin-bottom:6px;'><b>{m.id}</b>: {m.description} <br><span style='font-size:11px;color:#4ade80'>Reward: +{m.reward}</span></li>" for m in task.milestones])
141
+ if not milestones_html: milestones_html = "<li style='color:#888'>No milestones</li>"
142
+
143
+ return f"""
144
+ <div style='background:#1a1a2e; padding: 16px; border-radius: 8px; border: 1px solid #333; font-family: sans-serif'>
145
+ <h3 style='color:#a78bfa; margin: 0 0 8px 0; font-size: 16px;'>🎯 Goal: {task.goal}</h3>
146
+ <div style='color:#bbb; font-size: 13px; margin-bottom: 12px'>
147
+ Domain: <b>{task.domain}</b> | Difficulty: <b>{task.difficulty}/5</b> | Horizon: <b>{task.horizon} steps</b>
148
+ </div>
149
+ <div style='background:#0d1b2a; padding: 8px; border-radius: 6px; margin-bottom: 12px;'>
150
+ <b style='color:#60a5fa; font-size: 12px;'>CONSTRAINTS:</b>
151
+ <span style='color:#ddd; font-size: 12px; font-family: monospace;'>{task.constraints}</span>
152
+ </div>
153
+ <div style='display: flex; gap: 16px;'>
154
+ <div style='flex: 1; background:#1e1e2f; padding: 12px; border-radius: 6px;'>
155
+ <b style='color:#4ade80; font-size: 13px; border-bottom: 1px solid #333; display: block; padding-bottom: 4px; margin-bottom: 8px'>πŸ›£οΈ Viable Routes</b>
156
+ <ul style='color:#ddd; padding-left: 20px; font-size: 12px; margin: 0;'>{routes_html}</ul>
157
+ </div>
158
+ <div style='flex: 1; background:#1e1e2f; padding: 12px; border-radius: 6px;'>
159
+ <b style='color:#fbbf24; font-size: 13px; border-bottom: 1px solid #333; display: block; padding-bottom: 4px; margin-bottom: 8px'>⭐ Milestones</b>
160
+ <ul style='color:#ddd; padding-left: 20px; font-size: 12px; margin: 0;'>{milestones_html}</ul>
161
+ </div>
162
+ </div>
163
+ </div>
164
+ """
165
+
166
+ def event_log_html(events: list[ExoEvent]) -> str:
167
+ if not events:
168
+ return "<div style='color:#888; font-style:italic; padding: 12px;'>No events triggered yet.</div>"
169
+ rows = []
170
+ for e in events:
171
+ rows.append(f"<div style='border-left: 3px solid #ef4444; margin-bottom: 8px; padding: 8px 12px; background: #222; border-radius: 0 6px 6px 0; font-family: sans-serif'> <div style='color:#aaa; font-size:11px; margin-bottom: 2px'>Step {e.step}</div> <div style='color:#ddd; font-size: 13px;'><b style='color:#ef4444'>{e.id.upper()}</b>: {e.description}</div> </div>")
172
+ return "<div style='max-height: 400px; overflow-y: auto; padding-right: 4px;'>" + "\n".join(rows) + "</div>"
173
+
174
+ def route_status_html(routes: list[Route], closed: set[str]) -> str:
175
+ if not routes:
176
+ return "<div style='color:#888; font-style:italic; padding: 12px;'>No routes configured.</div>"
177
+ rows = []
178
+ for r in routes:
179
+ if r.id in closed:
180
+ icon, color = "❌", "#f87171"
181
+ status = "CLOSED"
182
+ else:
183
+ icon, color = "βœ…", "#4ade80"
184
+ status = "OPEN"
185
+ rows.append(f"<div style='display:flex; justify-content:space-between; align-items: center; margin-bottom: 8px; border-bottom: 1px solid #333; padding-bottom: 8px; font-family: sans-serif;'> <div style='display:flex; align-items:center; gap: 8px'><span style='font-size: 16px'>{icon}</span> <span style='color:#ddd; font-size: 13px; font-weight: 500'>{r.name}</span></div> <span style='color:{color}; font-size:12px; font-weight:bold; background: rgba(0,0,0,0.3); padding: 2px 6px; border-radius: 4px;'>{status}</span> </div>")
186
+ return "<div style='background:#1e1e2f; padding: 16px; border-radius: 8px; border: 1px solid #333;'>" + "\n".join(rows) + "</div>"
187
+
188
+
189
+ def _normalize_action_metric_changes(action) -> None:
190
+ fixed_changes = {}
191
+ for path, delta in action.primary.metric_changes.items():
192
+ raw_path = str(path)
193
+ if "." not in raw_path:
194
+ raw_path = f"{action.primary.target_domain}.{raw_path}"
195
+ norm_path = normalize_metric_path(raw_path)
196
+ if not is_valid_metric_path(norm_path):
197
+ continue
198
+ try:
199
+ fixed_changes[norm_path] = float(delta)
200
+ except (ValueError, TypeError):
201
+ continue
202
+ action.primary.metric_changes = fixed_changes
203
+
204
+
205
+ # ─── Cascade Animation Engine ────────────────────────────────────────────────
206
+
207
+ def animate_cascade(primary_disruption: dict, metrics: LifeMetrics) -> list[dict]:
208
+ """Replay the cascade step-by-step and capture intermediate frames.
209
+
210
+ Returns a list of frames. Each frame is:
211
+ { 'flat': {metric: value}, 'status': {metric: 'primary'|'first'|'second'|'unchanged'} }
212
+ """
213
+ import copy as _cp
214
+ from core.life_state import DependencyGraph, CASCADE_DAMPENING_DEFAULT
215
+
216
+ graph = DependencyGraph()
217
+ dampening = CASCADE_DAMPENING_DEFAULT
218
+ frames = []
219
+
220
+ # Frame 0 β€” initial stable state
221
+ base = _cp.deepcopy(metrics)
222
+ base_flat = base.flatten()
223
+ frames.append({
224
+ 'flat': dict(base_flat),
225
+ 'status': {k: 'unchanged' for k in base_flat},
226
+ })
227
+
228
+ # Frame 1 β€” primary disruption only (no cascade)
229
+ f1 = _cp.deepcopy(metrics)
230
+ primary_keys = set()
231
+ for path, amount in primary_disruption.items():
232
+ if '.' not in path:
233
+ continue
234
+ primary_keys.add(path)
235
+ dom_name, sub_name = path.split('.', 1)
236
+ dom = getattr(f1, dom_name, None)
237
+ if dom and hasattr(dom, sub_name):
238
+ cur = getattr(dom, sub_name)
239
+ setattr(dom, sub_name, max(0.0, min(100.0, cur + amount)))
240
+ f1_flat = f1.flatten()
241
+ f1_status = {}
242
+ for k in f1_flat:
243
+ f1_status[k] = 'primary' if k in primary_keys else 'unchanged'
244
+ frames.append({'flat': dict(f1_flat), 'status': f1_status})
245
+
246
+ # Frame 2 β€” first-order cascade effects
247
+ f2 = _cp.deepcopy(f1)
248
+ first_order_keys = set()
249
+ queue_next = []
250
+ for path, amount in primary_disruption.items():
251
+ if '.' not in path:
252
+ continue
253
+ if path in graph.edges:
254
+ for target, weight in graph.edges[path]:
255
+ impact = amount * weight * dampening
256
+ if abs(impact) >= 0.05:
257
+ first_order_keys.add(target)
258
+ dom_name, sub_name = target.split('.', 1)
259
+ dom = getattr(f2, dom_name, None)
260
+ if dom and hasattr(dom, sub_name):
261
+ cur = getattr(dom, sub_name)
262
+ setattr(dom, sub_name, max(0.0, min(100.0, cur + impact)))
263
+ queue_next.append((target, impact))
264
+ f2_flat = f2.flatten()
265
+ f2_status = {}
266
+ for k in f2_flat:
267
+ if k in primary_keys:
268
+ f2_status[k] = 'primary'
269
+ elif k in first_order_keys:
270
+ f2_status[k] = 'first'
271
+ else:
272
+ f2_status[k] = 'unchanged'
273
+ frames.append({'flat': dict(f2_flat), 'status': f2_status})
274
+
275
+ # Frame 3 β€” second-order cascade effects
276
+ f3 = _cp.deepcopy(f2)
277
+ second_order_keys = set()
278
+ for src_path, src_mag in queue_next:
279
+ if src_path in graph.edges:
280
+ for target, weight in graph.edges[src_path]:
281
+ impact = src_mag * weight * dampening
282
+ if abs(impact) >= 0.05:
283
+ second_order_keys.add(target)
284
+ dom_name, sub_name = target.split('.', 1)
285
+ dom = getattr(f3, dom_name, None)
286
+ if dom and hasattr(dom, sub_name):
287
+ cur = getattr(dom, sub_name)
288
+ setattr(dom, sub_name, max(0.0, min(100.0, cur + impact)))
289
+ f3_flat = f3.flatten()
290
+ f3_status = {}
291
+ for k in f3_flat:
292
+ if k in primary_keys:
293
+ f3_status[k] = 'primary'
294
+ elif k in first_order_keys:
295
+ f3_status[k] = 'first'
296
+ elif k in second_order_keys:
297
+ f3_status[k] = 'second'
298
+ else:
299
+ f3_status[k] = 'unchanged'
300
+ frames.append({'flat': dict(f3_flat), 'status': f3_status})
301
+
302
+ return frames
303
+
304
+
305
+ # Cascade-aware CSS colours
306
+ CASCADE_COLORS = {
307
+ 'primary': '#ef4444', # πŸ”΄ red
308
+ 'first': '#f97316', # 🟠 orange
309
+ 'second': '#eab308', # 🟑 yellow
310
+ 'improved': '#22c55e', # 🟒 green
311
+ 'unchanged': '#6b7280', # βšͺ grey
312
+ }
313
+
314
+ CASCADE_EMOJI = {
315
+ 'primary': 'πŸ”΄', 'first': '🟠', 'second': '🟑',
316
+ 'improved': '🟒', 'unchanged': 'βšͺ',
317
+ }
318
+
319
+
320
+ def cascade_metrics_html(flat: dict, status: dict, title: str = "",
321
+ before: dict = None) -> str:
322
+ """Render metrics with cascade propagation colours."""
323
+ domains = ["career", "finances", "relationships",
324
+ "physical_health", "mental_wellbeing", "time"]
325
+ rows = []
326
+ if title:
327
+ rows.append(f"<h3 style='margin:0 0 8px;font-size:14px;color:#aaa'>{title}</h3>")
328
+ for dom in domains:
329
+ emoji = DOMAIN_EMOJI[dom]
330
+ rows.append(f"<div style='margin:6px 0 2px;font-size:12px;"
331
+ f"font-weight:700;color:#ccc'>{emoji} {dom.upper()}</div>")
332
+ sub = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
333
+ for key, val in sub.items():
334
+ name = key.split(".")[1].replace("_", " ")
335
+ st = status.get(key, 'unchanged')
336
+
337
+ # If we have a 'before' snapshot and val improved, override status
338
+ if before and key in before and st == 'unchanged':
339
+ if val - before[key] > 1.0:
340
+ st = 'improved'
341
+
342
+ color = CASCADE_COLORS[st]
343
+ tag = CASCADE_EMOJI[st]
344
+ pct = min(val, 100)
345
+
346
+ delta_str = ""
347
+ if before is not None and key in before:
348
+ delta = val - before[key]
349
+ if abs(delta) > 1.0:
350
+ arrow = "↑" if delta > 0 else "↓"
351
+ dc = "#22c55e" if delta > 0 else "#ef4444"
352
+ delta_str = (
353
+ f"<span style='font-size:10px;color:{dc};"
354
+ f"margin-left:4px;font-weight:700'>"
355
+ f"{arrow} ({delta:+.1f})</span>"
356
+ )
357
+
358
+ rows.append(
359
+ f"<div style='display:flex;align-items:center;gap:6px;margin:2px 0'>"
360
+ f" <span style='font-size:10px'>{tag}</span>"
361
+ f" <span style='width:130px;font-size:11px;color:#bbb'>{name}</span>"
362
+ f" <div style='flex:1;background:#333;border-radius:4px;height:10px'>"
363
+ f" <div style='width:{pct}%;background:{color};border-radius:4px;"
364
+ f"height:10px;transition:width 0.4s ease'></div>"
365
+ f" </div>"
366
+ f" <span style='width:38px;font-size:11px;color:#ccc;"
367
+ f"text-align:right'>{val:.1f}</span>"
368
+ f" {delta_str}"
369
+ f"</div>"
370
+ )
371
+ return "<div style='font-family:monospace;padding:8px'>" + "\n".join(rows) + "</div>"
372
+
373
+
374
+ NARRATIVE = [
375
+ "Your life graph β€” stable state",
376
+ "πŸ’₯ Crisis hits: {title}",
377
+ "🌊 Stress cascades to sleep and free time…",
378
+ "⚑ Relationships and motivation begin degrading…",
379
+ "πŸ€– Agent intervenes: {action_desc}",
380
+ ]
381
+
382
+
383
+ # ─── Tab 1 β€” Live Demo (animated) ────────────────────────────────────────────
384
+ def run_demo(person_label: str, conflict_label: str):
385
+ """Generator that yields (before_html, after_html, decision_html) at each animation frame."""
386
+ import time as _t
387
+
388
+ conflict = CONFLICT_CHOICES[conflict_label]
389
+ person = PERSONS[person_label]
390
+
391
+ # Build cascade frames from a clean LifeMetrics
392
+ base_metrics = LifeMetrics()
393
+ frames = animate_cascade(conflict.primary_disruption, base_metrics)
394
+
395
+ # Build predictor HTML
396
+ summary = DEMO_PREDICTOR.get_prediction_summary()
397
+ rscore = DEMO_PREDICTOR.get_risk_score()
398
+ rcolor = "#4ade80" if rscore < 0.3 else ("#facc15" if rscore <= 0.6 else "#f87171")
399
+ pct = min(100, int(rscore * 100))
400
+ pred_html = f"""
401
+ <div style='background:#1e1e2f;border:1px solid #333;border-left:4px solid {rcolor};border-radius:6px;padding:12px;margin-bottom:16px;font-family:sans-serif'>
402
+ <div style='font-size:14px;font-weight:700;color:#ccc;margin-bottom:8px'>⚠️ TRAJECTORY ANALYSIS β€” Next 7 Days</div>
403
+ <div style='margin-bottom:10px;font-size:13px;color:#ddd'>{summary}</div>
404
+ <div style='display:flex;align-items:center;gap:10px'>
405
+ <span style='font-size:12px;color:#aaa'>Risk Score:</span>
406
+ <div style='flex:1;background:#333;border-radius:4px;height:12px'>
407
+ <div style='width:{pct}%;background:{rcolor};border-radius:4px;height:12px'></div>
408
+ </div>
409
+ <span style='font-size:12px;color:{rcolor};font-weight:700'>{rscore:.2f}</span>
410
+ </div>
411
+ </div>
412
+ """
413
+
414
+ # ── Frame 0 β€” stable state ────────────────────────────────────────────
415
+ f0 = frames[0]
416
+ narr = f"<div style='padding:8px;color:#9ca3af;font-style:italic'>{NARRATIVE[0]}</div>"
417
+ yield (
418
+ pred_html,
419
+ cascade_metrics_html(f0['flat'], f0['status'], "BEFORE"),
420
+ narr,
421
+ "",
422
+ )
423
+ _t.sleep(0.5)
424
+
425
+ # ── Frame 1 β€” primary hit ─────────────────────────────────────────────
426
+ f1 = frames[1]
427
+ narr = (f"<div style='padding:8px;color:#ef4444;font-weight:700'>"
428
+ f"{NARRATIVE[1].format(title=conflict.title)}</div>")
429
+ yield (
430
+ pred_html,
431
+ cascade_metrics_html(f1['flat'], f1['status'], "DISRUPTION", before=f0['flat']),
432
+ narr,
433
+ "",
434
+ )
435
+ _t.sleep(0.5)
436
+
437
+ # ── Frame 2 β€” first-order cascade ─────────────────────────────────────
438
+ f2 = frames[2]
439
+ narr = (f"<div style='padding:8px;color:#f97316;font-weight:700'>"
440
+ f"{NARRATIVE[2]}</div>")
441
+ yield (
442
+ pred_html,
443
+ cascade_metrics_html(f2['flat'], f2['status'], "CASCADE β€” 1st ORDER", before=f0['flat']),
444
+ narr,
445
+ "",
446
+ )
447
+ _t.sleep(0.5)
448
+
449
+ # ── Frame 3 β€” second-order cascade ────────────────────────────────────
450
+ f3 = frames[3]
451
+ narr = (f"<div style='padding:8px;color:#eab308;font-weight:700'>"
452
+ f"{NARRATIVE[3]}</div>")
453
+ yield (
454
+ pred_html,
455
+ cascade_metrics_html(f3['flat'], f3['status'], "CASCADE β€” 2nd ORDER", before=f0['flat']),
456
+ narr,
457
+ "",
458
+ )
459
+ _t.sleep(0.5)
460
+
461
+ # ── Frame 4 β€” agent intervention (final) ──────────────────────────────
462
+ env = _init_env(conflict)
463
+ before_metrics = copy.deepcopy(env.state.current_metrics)
464
+ before_budget = copy.deepcopy(env.state.budget)
465
+
466
+ action = AGENT.get_action(before_metrics, before_budget, conflict, person)
467
+
468
+ # Normalise metric keys
469
+ _normalize_action_metric_changes(action)
470
+
471
+ is_valid, _ = validate_action(action, before_budget)
472
+ if not is_valid:
473
+ action.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
474
+ action.primary.resource_cost = {}
475
+
476
+ current_stress = before_metrics.mental_wellbeing.stress_level
477
+ uptake = person.respond_to_action(
478
+ action.primary.action_type,
479
+ action.primary.resource_cost,
480
+ current_stress
481
+ )
482
+
483
+ scaled_changes = {}
484
+ for path, delta in action.primary.metric_changes.items():
485
+ scaled_changes[path] = float(delta) * uptake
486
+
487
+ env_action = LifeStackAction.from_agent_action(action)
488
+ # Apply scaled changes
489
+ env_action.metric_changes = scaled_changes
490
+
491
+ obs = env.step(env_action)
492
+ reward = obs.reward or 0.0
493
+ updated_metrics = env.state.current_metrics
494
+
495
+ # Generate Counterfactuals BEFORE yield
496
+ cf_data = generate_counterfactuals(AGENT, before_metrics, before_budget, conflict, person, action)
497
+ cf_html_blocks = []
498
+ for cf in cf_data:
499
+ cf_html_blocks.append(f"""
500
+ <div style='margin-top:10px;padding:10px;background:#1e1e2f;border-left:3px solid #444;border-radius:4px'>
501
+ <div style='display:flex;justify-content:space-between;font-size:13px;margin-bottom:4px'>
502
+ <span style='font-weight:700;color:#9ca3af'>vs. {cf['action_type']}</span>
503
+ <span style='color:#888'>reward: {cf['reward']:.2f}</span>
504
+ </div>
505
+ <div style='font-size:12px;color:#ccc;margin-bottom:4px'>"{cf['description']}"</div>
506
+ <div style='font-size:11px;color:#94a3b8'><b>Trade-off:</b> {cf['trade_off']}</div>
507
+ </div>
508
+ """)
509
+ cf_html = "".join(cf_html_blocks)
510
+
511
+ after_flat = updated_metrics.flatten()
512
+ before_flat = f0['flat']
513
+ # Build status: mark improved metrics green, rest from f3
514
+ final_status = {}
515
+ for k in after_flat:
516
+ if after_flat[k] - f3['flat'].get(k, after_flat[k]) > 1.0:
517
+ final_status[k] = 'improved'
518
+ else:
519
+ final_status[k] = f3['status'].get(k, 'unchanged')
520
+
521
+ after_html = cascade_metrics_html(after_flat, final_status, "AFTER AGENT ACTION",
522
+ before=before_flat)
523
+
524
+ comm_block = ""
525
+ if action.communication:
526
+ comm_block = (
527
+ f"<div style='margin-top:8px;padding:8px;background:#1e3a5f;"
528
+ f"border-radius:6px;font-size:12px'>"
529
+ f"πŸ’¬ <b>Message to {action.communication.recipient}</b> "
530
+ f"({action.communication.tone}): "
531
+ f"<em>{action.communication.content}</em></div>"
532
+ )
533
+
534
+ cost = action.primary.resource_cost
535
+ cost_str = (f"⏱ {cost.get('time',0):.1f}h · "
536
+ f"πŸ’΅ ${cost.get('money',0):.0f} Β· "
537
+ f"⚑ {cost.get('energy',0):.0f}")
538
+ reward_color = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
539
+
540
+ narr = (f"<div style='padding:8px;color:#22c55e;font-weight:700'>"
541
+ f"{NARRATIVE[4].format(action_desc=action.primary.description)}</div>")
542
+
543
+ legend = (
544
+ "<div style='margin-top:6px;padding:6px;font-size:11px;color:#aaa;"
545
+ "border-top:1px solid #333;display:flex;gap:12px;flex-wrap:wrap'>"
546
+ "πŸ”΄ Primary hit Β· 🟠 1st-order cascade Β· 🟑 2nd-order cascade Β· "
547
+ "🟒 Agent improved Β· βšͺ Unchanged</div>"
548
+ )
549
+
550
+ decision_html = f"""
551
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;font-family:sans-serif'>
552
+ <div style='font-size:18px;font-weight:700;margin-bottom:6px'>
553
+ {action.primary.action_type.upper()} β†’ {action.primary.target_domain}
554
+ </div>
555
+ <div style='color:#ccc;margin-bottom:8px'>{action.primary.description}</div>
556
+ {comm_block}
557
+ <div style='margin-top:10px;font-size:12px;color:#aaa;border-top:1px solid #333;padding-top:8px'>
558
+ <b>Reasoning:</b> {action.reasoning}
559
+ </div>
560
+ <div style='margin-top:8px;display:flex;gap:16px;font-size:13px'>
561
+ <span>{cost_str}</span>
562
+ <span>🎯 Personality uptake: {uptake:.0%}</span>
563
+ <span style='color:{reward_color};font-weight:700'>β˜… Reward: {reward:.3f}</span>
564
+ </div>
565
+ {legend}
566
+
567
+ <div style='margin-top:24px;border-top:1px solid #444;padding-top:16px'>
568
+ <div style='font-size:14px;font-weight:900;color:#94a3b8;letter-spacing:1px;margin-bottom:12px'>
569
+ πŸ”€ WHAT IF YOU CHOSE DIFFERENTLY?
570
+ </div>
571
+ <div style='padding:10px;background:#0d1b2a;border-radius:6px;border-left:4px solid #4ade80;margin-bottom:16px'>
572
+ <div style='display:flex;justify-content:space-between;font-size:13px;margin-bottom:4px'>
573
+ <span style='font-weight:700;color:#4ade80'>βœ… Agent chose: {action.primary.action_type}</span>
574
+ <span style='color:#4ade80;font-weight:700'>{reward:.2f}</span>
575
+ </div>
576
+ <div style='font-size:12px;color:#ccc'>"{action.primary.description}"</div>
577
+ </div>
578
+ {cf_html}
579
+ </div>
580
+ </div>"""
581
+
582
+ DEMO_PREDICTOR.add_snapshot(updated_metrics)
583
+ summary = DEMO_PREDICTOR.get_prediction_summary()
584
+ rscore = DEMO_PREDICTOR.get_risk_score()
585
+ rcolor = "#4ade80" if rscore < 0.3 else ("#facc15" if rscore <= 0.6 else "#f87171")
586
+ pct = min(100, int(rscore * 100))
587
+ after_pred_html = f"""
588
+ <div style='background:#1e1e2f;border:1px solid #333;border-left:4px solid {rcolor};border-radius:6px;padding:12px;margin-bottom:16px;font-family:sans-serif'>
589
+ <div style='font-size:14px;font-weight:700;color:#ccc;margin-bottom:8px'>⚠️ TRAJECTORY ANALYSIS β€” Next 7 Days</div>
590
+ <div style='margin-bottom:10px;font-size:13px;color:#ddd'>{summary}</div>
591
+ <div style='display:flex;align-items:center;gap:10px'>
592
+ <span style='font-size:12px;color:#aaa'>Risk Score:</span>
593
+ <div style='flex:1;background:#333;border-radius:4px;height:12px'>
594
+ <div style='width:{pct}%;background:{rcolor};border-radius:4px;height:12px'></div>
595
+ </div>
596
+ <span style='font-size:12px;color:{rcolor};font-weight:700'>{rscore:.2f}</span>
597
+ </div>
598
+ </div>
599
+ """
600
+
601
+ yield (after_pred_html, after_html, narr, decision_html)
602
+
603
+
604
+ # ─── Tab 2 β€” Try Your Situation (intake-powered) ─────────────────────────────
605
+ def run_custom(situation: str, work_stress: int, money_stress: int,
606
+ relationship_q: int, energy: int, time_pressure: int,
607
+ gmail_signals: dict = None):
608
+ """Uses LifeIntake to extract structured conflict + personality from NL + sliders."""
609
+ metrics, budget, conflict, personality = INTAKE.full_intake(
610
+ situation, work_stress, money_stress, relationship_q, energy, time_pressure,
611
+ gmail_signals=gmail_signals
612
+ )
613
+
614
+ person = SimPerson(
615
+ name=personality.get("name", "You"),
616
+ openness=personality.get("openness", 0.5),
617
+ conscientiousness=personality.get("conscientiousness", 0.5),
618
+ extraversion=personality.get("extraversion", 0.5),
619
+ agreeableness=personality.get("agreeableness", 0.5),
620
+ neuroticism=personality.get("neuroticism", 0.5),
621
+ )
622
+
623
+ life_html = (
624
+ "<div style='font-family:sans-serif;font-size:13px;color:#a78bfa;"
625
+ "padding:8px 8px 4px;font-style:italic'>"
626
+ "Based on what you described, here is how your life looks right now:"
627
+ "</div>"
628
+ + metrics_html(metrics.flatten(), "YOUR LIFE RIGHT NOW")
629
+ )
630
+
631
+ action = AGENT.get_action(metrics, budget, conflict, person)
632
+
633
+ _normalize_action_metric_changes(action)
634
+
635
+ is_valid, _ = validate_action(action, budget)
636
+ if not is_valid:
637
+ action.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
638
+ action.primary.resource_cost = {}
639
+
640
+ env = LifeStackEnv()
641
+ env.state.current_metrics = metrics
642
+ env.state.budget = budget
643
+
644
+ # Generate unique episode ID for feedback loop
645
+ import uuid
646
+ episode_id = str(uuid.uuid4())[:8].upper()
647
+
648
+ current_stress = metrics.mental_wellbeing.stress_level
649
+ uptake = person.respond_to_action(
650
+ action.primary.action_type,
651
+ action.primary.resource_cost,
652
+ current_stress
653
+ )
654
+
655
+ scaled_changes = {}
656
+ for path, delta in action.primary.metric_changes.items():
657
+ scaled_changes[path] = float(delta) * uptake
658
+
659
+ env_action = LifeStackAction.from_agent_action(action)
660
+ # Apply scaled changes
661
+ env_action.metric_changes = scaled_changes
662
+
663
+ obs = env.step(env_action)
664
+ updated_metrics = env.state.current_metrics
665
+ reward = obs.reward or 0.0
666
+
667
+ after_html = metrics_html(updated_metrics.flatten(), "AFTER ACTION", before=metrics.flatten())
668
+ reward_color = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
669
+
670
+ trait_bar = lambda v: "β–ˆ" * int(v * 10) + "β–‘" * (10 - int(v * 10))
671
+ personality_html = f"""
672
+ <div style='background:#12122a;border:1px solid #2a2a4a;border-radius:8px;padding:12px;
673
+ margin-bottom:12px;font-family:monospace;font-size:11px;color:#ccc'>
674
+ <div style='font-size:13px;font-weight:700;color:#a78bfa;margin-bottom:8px'>🧠 Inferred Personality: {person.name}</div>
675
+ <div>Openness&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('openness',0.5))} {personality.get('openness',0.5):.2f}</div>
676
+ <div>Conscientiousness {trait_bar(personality.get('conscientiousness',0.5))} {personality.get('conscientiousness',0.5):.2f}</div>
677
+ <div>Extraversion&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('extraversion',0.5))} {personality.get('extraversion',0.5):.2f}</div>
678
+ <div>Agreeableness&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('agreeableness',0.5))} {personality.get('agreeableness',0.5):.2f}</div>
679
+ <div>Neuroticism&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {trait_bar(personality.get('neuroticism',0.5))} {personality.get('neuroticism',0.5):.2f}</div>
680
+ </div>"""
681
+
682
+ steps = [f"<b>Step 1:</b> {action.primary.description}"]
683
+ if action.communication:
684
+ steps.append(
685
+ f"<b>Message to {action.communication.recipient}</b> "
686
+ f"({action.communication.tone}): <em>{action.communication.content}</em>"
687
+ )
688
+ cost = action.primary.resource_cost
689
+ cost_str = f"⏱ {cost.get('time', 0):.1f}h Β· πŸ’΅ ${cost.get('money', 0):.0f} Β· ⚑ {cost.get('energy', 0):.0f}"
690
+
691
+ plan_html = f"""
692
+ {personality_html}
693
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;font-family:sans-serif;color:#eee'>
694
+ <div style='font-size:13px;font-weight:700;color:#60a5fa;margin-bottom:4px'>
695
+ πŸ“‹ {conflict.title} (Difficulty {conflict.difficulty}/5)
696
+ </div>
697
+ <div style='font-size:12px;color:#aaa;margin-bottom:10px'>{conflict.story}</div>
698
+ <div style='font-size:16px;font-weight:700;margin-bottom:10px'>🎯 Resolution Plan for {person.name}</div>
699
+ <div style='margin-bottom:8px'>{"<br>".join(steps)}</div>
700
+ <div style='margin:10px 0;padding:8px;background:#0d1b2a;border-radius:6px;font-size:12px;color:#aaa'>
701
+ <b>Why:</b> {action.reasoning}
702
+ </div>
703
+ <div style='display:flex;gap:20px;font-size:13px;border-top:1px solid #333;padding-top:8px'>
704
+ <span>{cost_str}</span>
705
+ <span>🎯 Personality fit: {uptake:.0%}</span>
706
+ <span style='margin-left:auto;color:#a78bfa;font-weight:700'>ID: {episode_id}</span>
707
+ </div>
708
+ </div>
709
+ <div style='margin-top:12px;font-size:11px;color:#888;text-align:right'>
710
+ Keep this ID to record the real-world outcome in the 'Real-World Verification' tab.
711
+ </div>
712
+ """
713
+
714
+ return (
715
+ life_html,
716
+ after_html,
717
+ plan_html
718
+ )
719
+
720
+
721
+ # ─── Tab 3 β€” Training Results ─────────────────────────────────────────────────
722
+ def load_training_tab():
723
+ html_parts = []
724
+
725
+ try:
726
+ stats = MEMORY.get_stats()
727
+ html_parts.append(f"""
728
+ <div style='display:flex;gap:16px;flex-wrap:wrap;margin-bottom:16px'>
729
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
730
+ <div style='font-size:28px;font-weight:700;color:#4ade80'>{stats['total_memories']}</div>
731
+ <div style='color:#aaa;font-size:12px'>Decisions Stored</div>
732
+ </div>
733
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
734
+ <div style='font-size:28px;font-weight:700;color:#60a5fa'>{stats['average_reward']:.3f}</div>
735
+ <div style='color:#aaa;font-size:12px'>Avg Memory Reward</div>
736
+ </div>
737
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:200px'>
738
+ <div style='font-size:12px;color:#aaa;margin-bottom:6px'>By Action Type</div>
739
+ {''.join(f"<div style='font-size:12px'><b>{k}</b>: {v}</div>" for k,v in stats['by_action_type'].items())}
740
+ </div>
741
+ </div>""")
742
+ except Exception as e:
743
+ html_parts.append(f"<p style='color:#f87171'>Memory error: {e}</p>")
744
+
745
+ log_path = os.path.join(os.path.dirname(__file__), "data", "training_log.json")
746
+ if os.path.exists(log_path):
747
+ try:
748
+ data = json.load(open(log_path))
749
+ rewards = [e["reward"] for e in data]
750
+ first10 = sum(rewards[:10]) / 10
751
+ last10 = sum(rewards[-10:]) / 10
752
+ best = max(data, key=lambda x: x["reward"])
753
+ phases = {
754
+ "Early (1–15)": [e for e in data if e["episode"] <= 15],
755
+ "Mid (16–35)": [e for e in data if 16 <= e["episode"] <= 35],
756
+ "Late (36–50)": [e for e in data if e["episode"] >= 36],
757
+ }
758
+ phase_rows = "".join(
759
+ f"<tr><td style='padding:4px 10px'>{name}</td><td style='padding:4px 10px;text-align:center'>{len(eps)}</td>"
760
+ f"<td style='padding:4px 10px;text-align:center;color:#4ade80'>{sum(e['reward'] for e in eps)/len(eps):.3f}</td></tr>"
761
+ for name, eps in phases.items() if eps
762
+ )
763
+ delta_color = "#4ade80" if last10 >= first10 else "#f87171"
764
+ html_parts.append(f"""
765
+ <div style='margin-bottom:16px'>
766
+ <div style='display:flex;gap:16px;flex-wrap:wrap;margin-bottom:12px'>
767
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
768
+ <div style='font-size:28px;font-weight:700;color:#a78bfa'>{len(data)}</div>
769
+ <div style='color:#aaa;font-size:12px'>Total Episodes</div>
770
+ </div>
771
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
772
+ <div style='font-size:28px;font-weight:700;color:#4ade80'>{sum(rewards)/len(rewards):.3f}</div>
773
+ <div style='color:#aaa;font-size:12px'>Overall Avg Reward</div>
774
+ </div>
775
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
776
+ <div style='font-size:28px;font-weight:700;color:#fbbf24'>{best["reward"]:.3f}</div>
777
+ <div style='color:#aaa;font-size:12px'>Best Episode (#{best["episode"]})</div>
778
+ </div>
779
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:160px;text-align:center'>
780
+ <div style='font-size:22px;font-weight:700;color:{delta_color}'>
781
+ {"+" if last10>=first10 else ""}{(last10-first10):.3f}
782
+ </div>
783
+ <div style='color:#aaa;font-size:12px'>Ep 1–10 β†’ 41–50 Ξ”</div>
784
+ </div>
785
+ </div>
786
+ <table style='border-collapse:collapse;width:100%;max-width:400px;font-size:13px;color:#eee'>
787
+ <tr style='color:#aaa;border-bottom:1px solid #333'>
788
+ <th style='padding:4px 10px;text-align:left'>Phase</th>
789
+ <th style='padding:4px 10px'>Episodes</th>
790
+ <th style='padding:4px 10px'>Avg Reward</th>
791
+ </tr>
792
+ {phase_rows}
793
+ </table>
794
+ </div>""")
795
+ except Exception as e:
796
+ html_parts.append(f"<p style='color:#f87171'>Log parse error: {e}</p>")
797
+ else:
798
+ html_parts.append("<p style='color:#aaa'>training_log.json not found β€” run train.py first.</p>")
799
+
800
+ return "<div style='font-family:sans-serif;color:#eee'>" + "\n".join(html_parts) + "</div>"
801
+
802
+
803
+ # ─── Tab: Memory Effect Demo ─────────────────────────────────────────────────
804
+ def run_memory_demo(conflict_label: str, person_label: str):
805
+ """Cold-start vs RAG-Augmented episode comparison."""
806
+ import copy as _cp
807
+ import time as _t
808
+
809
+ ERR = "background:#1a1a2e;border:2px solid #ef4444;border-radius:10px;padding:20px;font-family:sans-serif;color:#f87171;"
810
+
811
+ def _run_ep(conflict, person, few_shot_context):
812
+ env = _init_env(conflict)
813
+ mb = _cp.deepcopy(env.state.current_metrics)
814
+ bud = _cp.deepcopy(env.state.budget)
815
+ act = AGENT.get_action(mb, bud, conflict, person,
816
+ few_shot_context=few_shot_context)
817
+ _normalize_action_metric_changes(act)
818
+ is_valid, _ = validate_action(act, bud)
819
+ if not is_valid:
820
+ act.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
821
+ act.primary.resource_cost = {}
822
+ uptake = person.respond_to_action(
823
+ act.primary.action_type, act.primary.resource_cost,
824
+ mb.mental_wellbeing.stress_level)
825
+ scaled = {k: float(v) * uptake for k, v in act.primary.metric_changes.items()}
826
+ env_act = LifeStackAction.from_agent_action(act)
827
+ env_act.metric_changes = scaled
828
+ obs = env.step(env_act)
829
+ reward = obs.reward or 0.0
830
+ return act, reward, uptake, mb, env.state.current_metrics
831
+
832
+ def _card(ep_num, label, act, reward, uptake, before, after,
833
+ border_color, few_shot_ctx=""):
834
+ bf = before.flatten()
835
+ af = after.flatten()
836
+ rc = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
837
+ cost = act.primary.resource_cost
838
+ cstr = (f"\u23f1 {cost.get('time',0):.1f}h "
839
+ f"\U0001f4b5 ${cost.get('money',0):.0f} "
840
+ f"\u26a1 {cost.get('energy',0):.0f}")
841
+ rows = ""
842
+ for k, va in af.items():
843
+ d = va - bf.get(k, va)
844
+ if abs(d) > 0.5:
845
+ n = k.replace(".", " \u203a ").replace("_", " ")
846
+ ar = "\u2191" if d > 0 else "\u2193"
847
+ dc = "#4ade80" if d > 0 else "#f87171"
848
+ rows += (f"<div style='display:flex;justify-content:space-between;"
849
+ f"font-size:11px;color:#ccc;padding:2px 0'>"
850
+ f"<span>{n}</span><span style='color:{dc}'>{ar} {d:+.1f}</span></div>")
851
+ if not rows:
852
+ rows = "<div style='font-size:11px;color:#666'>No significant metric changes</div>"
853
+ badge = ""
854
+ if few_shot_ctx:
855
+ prev = few_shot_ctx[:160].replace("<", "&lt;").replace(">", "&gt;")
856
+ badge = (f"<div style='margin-top:10px;padding:8px;background:#0d2a1a;"
857
+ f"border:1px solid #166534;border-radius:6px;font-size:11px;color:#86efac'>"
858
+ f"\U0001f9e0 <b>Memory injected:</b><br>"
859
+ f"<span style='color:#ccc'>{prev}\u2026</span></div>")
860
+ reas = act.reasoning[:180] + ("\u2026" if len(act.reasoning) > 180 else "")
861
+ return (
862
+ f"<div style='background:#12122a;border:2px solid {border_color};"
863
+ f"border-radius:10px;padding:16px;font-family:sans-serif'>"
864
+ f"<div style='font-size:12px;font-weight:700;color:#888;letter-spacing:2px;margin-bottom:4px'>"
865
+ f"EPISODE {ep_num} \u2014 {label.upper()}</div>"
866
+ f"<div style='font-size:18px;font-weight:900;color:#eee;margin-bottom:8px'>"
867
+ f"{act.primary.action_type.upper()} \u2192 {act.primary.target_domain}</div>"
868
+ f"<div style='font-size:13px;color:#ccc;margin-bottom:10px'>{act.primary.description}</div>"
869
+ f"<div style='margin-bottom:10px;padding:8px;background:#1e1e2f;border-radius:6px;"
870
+ f"font-size:11px;color:#94a3b8'><b>Reasoning:</b> {reas}</div>"
871
+ f"<div style='display:flex;gap:12px;font-size:13px;margin-bottom:10px'>"
872
+ f"<span style='color:{rc};font-weight:700'>\u2605 Reward: {reward:.3f}</span>"
873
+ f"<span style='color:#94a3b8'>\U0001f3af Uptake: {uptake:.0%}</span>"
874
+ f"<span style='color:#6b7280'>{cstr}</span></div>"
875
+ f"<div style='border-top:1px solid #333;padding-top:10px'>"
876
+ f"<div style='font-size:11px;color:#888;margin-bottom:4px'>METRIC CHANGES</div>"
877
+ f"{rows}</div>{badge}</div>"
878
+ )
879
+
880
+ try:
881
+ conflict = CONFLICT_CHOICES[conflict_label]
882
+ person = PERSONS[person_label]
883
+ except KeyError as e:
884
+ err = f"<div style='{ERR}'>\u274c Invalid selection: {e}</div>"
885
+ return err, err, err
886
+
887
+ try:
888
+ ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma = _run_ep(conflict, person, "")
889
+ except Exception as e:
890
+ err = f"<div style='{ERR}'>\u274c Episode 1 failed: {e}</div>"
891
+ return err, err, err
892
+
893
+ try:
894
+ MEMORY.store_decision(
895
+ conflict_title=conflict.title,
896
+ action_type=ep1_act.primary.action_type,
897
+ target_domain=ep1_act.primary.target_domain,
898
+ reward=ep1_r,
899
+ metrics_snapshot=ep1_mb.flatten(),
900
+ reasoning=ep1_act.reasoning,
901
+ )
902
+ except Exception:
903
+ pass
904
+
905
+ outcome_lbl = "Good \u2014 build on this" if ep1_r >= 0.4 else "Suboptimal \u2014 try different approach"
906
+ few_shot = (
907
+ f"RETRIEVED MEMORY \u2014 Previous attempt at '{conflict.title}':\n"
908
+ f" Action: {ep1_act.primary.action_type} \u2192 {ep1_act.primary.target_domain}\n"
909
+ f" Done: {ep1_act.primary.description}\n"
910
+ f" Reward: {ep1_r:.3f} ({outcome_lbl})\n"
911
+ f" Reasoning: {ep1_act.reasoning[:120]}\n"
912
+ f"{'Refine this approach.' if ep1_r >= 0.4 else 'Try a meaningfully different action type or domain.'}"
913
+ )
914
+
915
+ _t.sleep(2)
916
+
917
+ try:
918
+ ep2_act, ep2_r, ep2_up, ep2_mb, ep2_ma = _run_ep(conflict, person, few_shot)
919
+ except Exception as e:
920
+ ep1_html = _card(1, "No Memory", ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma, "#4b5563", "")
921
+ err = f"<div style='{ERR}'>\u274c Episode 2 failed \u2014 wait 30s and retry: {e}</div>"
922
+ return ep1_html, err, err
923
+
924
+ ep1_html = _card(1, "No Memory", ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma, "#4b5563", "")
925
+ ep2_html = _card(2, "RAG-Augmented", ep2_act, ep2_r, ep2_up, ep2_mb, ep2_ma, "#22c55e", few_shot)
926
+
927
+ rd = ep2_r - ep1_r
928
+ pct = (rd / max(abs(ep1_r), 0.01)) * 100
929
+ dc = "#4ade80" if rd >= 0 else "#f87171"
930
+ same = ep1_act.primary.action_type == ep2_act.primary.action_type
931
+ sl = ("\u2705 Different strategy \u2014 memory triggered a better approach"
932
+ if not same else "\u26a0\ufe0f Same action (memory reinforced the choice)")
933
+ sc = "#4ade80" if not same else "#facc15"
934
+
935
+ diff_html = (
936
+ f"<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;"
937
+ f"padding:16px;font-family:sans-serif'>"
938
+ f"<div style='font-size:14px;font-weight:900;color:#a78bfa;letter-spacing:1px;"
939
+ f"margin-bottom:12px'>\U0001f4ca MEMORY EFFECT DELTA</div>"
940
+ f"<div style='display:grid;grid-template-columns:1fr 1fr 1fr;gap:12px;margin-bottom:14px'>"
941
+ f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
942
+ f"<div style='font-size:22px;font-weight:700;color:#6b7280'>{ep1_r:.3f}</div>"
943
+ f"<div style='font-size:11px;color:#666;margin-top:2px'>Cold Start Reward</div></div>"
944
+ f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
945
+ f"<div style='font-size:22px;font-weight:700;color:#22c55e'>{ep2_r:.3f}</div>"
946
+ f"<div style='font-size:11px;color:#666;margin-top:2px'>RAG-Augmented Reward</div></div>"
947
+ f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
948
+ f"<div style='font-size:22px;font-weight:700;color:{dc}'>{'+' if rd >= 0 else ''}{pct:.0f}%</div>"
949
+ f"<div style='font-size:11px;color:#666;margin-top:2px'>Efficiency Gain</div></div></div>"
950
+ f"<div style='padding:10px;background:#0d2a1a;border-radius:6px;margin-bottom:10px'>"
951
+ f"<span style='color:{sc};font-weight:700'>{sl}</span></div>"
952
+ f"<div style='font-size:12px;color:#6b7280;border-top:1px solid #222;padding-top:10px'>"
953
+ f"Ep1 \u2192 <b style='color:#ccc'>{ep1_act.primary.action_type}</b> &nbsp;|&nbsp; "
954
+ f"Ep2 \u2192 <b style='color:#a78bfa'>{ep2_act.primary.action_type}</b>. "
955
+ f"Memory {'shifted the strategy' if not same else 'reinforced the same choice'}."
956
+ f"</div></div>"
957
+ )
958
+
959
+ return ep1_html, ep2_html, diff_html
960
+
961
+
962
+ def submit_outcome_feedback(ep_id, score, domains_up, domains_down, notes, time_spent):
963
+ if not ep_id:
964
+ return "⚠️ Please enter a valid Episode ID."
965
+
966
+ feedback = OutcomeFeedback(
967
+ episode_id=ep_id,
968
+ overall_effectiveness=int(score),
969
+ domains_improved=domains_up,
970
+ domains_worsened=domains_down,
971
+ unexpected_effects=notes,
972
+ resolution_time_hours=float(time_spent)
973
+ )
974
+
975
+ # Store in memory
976
+ MEMORY.store_feedback(feedback)
977
+
978
+ return f"βœ… Feedback for **{ep_id}** submitted! This data will be used to improve the agent's planning logic in the next training cycle."
979
+
980
+
981
+ # ─── Main Gradio App Construction ───────────────────────────────────────────────────────────────
982
+ with gr.Blocks(
983
+ title="LifeStack β€” AI Life Coach",
984
+ ) as app:
985
+
986
+ gr.HTML("""
987
+ <div style='text-align:center;padding:24px 0 8px;font-family:sans-serif'>
988
+ <div style='font-size:36px;font-weight:900;letter-spacing:-1px;
989
+ background:linear-gradient(90deg,#a78bfa,#60a5fa);
990
+ -webkit-background-clip:text;-webkit-text-fill-color:transparent'>
991
+ LifeStack
992
+ </div>
993
+ <div style='color:#888;font-size:14px;margin-top:4px'>
994
+ AI that handles life's worst Fridays
995
+ </div>
996
+ </div>
997
+ """)
998
+
999
+ with gr.Tabs():
1000
+
1001
+ # ── Tab 1: Live Demo ─────────────────────────────────────────────────
1002
+ with gr.Tab("🎯 Live Demo"):
1003
+ gr.HTML(f"""
1004
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;
1005
+ margin-bottom:16px;font-family:sans-serif'>
1006
+ <div style='font-size:16px;font-weight:700;color:#a78bfa;margin-bottom:6px'>
1007
+ 🚨 Friday 6PM
1008
+ </div>
1009
+ <div style='color:#ddd;font-size:14px'>{DEMO_CONFLICT.story}</div>
1010
+ <div style='margin-top:8px;font-size:12px;color:#888'>
1011
+ Difficulty: ⭐⭐⭐⭐⭐ &nbsp;|&nbsp;
1012
+ Domains hit: Career, Finances, Mental Health, Time
1013
+ </div>
1014
+ </div>
1015
+ """)
1016
+
1017
+ prediction_ui = gr.HTML()
1018
+
1019
+ with gr.Row():
1020
+ conflict_dd = gr.Dropdown(
1021
+ choices=CONFLICT_CHOICES_LIST,
1022
+ value=DEFAULT_CONFLICT,
1023
+ label="πŸ“‹ Conflict Scenario",
1024
+ )
1025
+ person_dd = gr.Dropdown(
1026
+ choices=PERSON_CHOICES,
1027
+ value=PERSON_CHOICES[0],
1028
+ label="πŸ‘€ Choose Your Person",
1029
+ )
1030
+
1031
+ run_btn = gr.Button("β–Ά Run Agent", variant="primary", size="lg")
1032
+
1033
+ cascade_narrative = gr.HTML(label="Cascade Narrative")
1034
+
1035
+ with gr.Row():
1036
+ before_out = gr.HTML(label="Life State")
1037
+ after_out = gr.HTML(label="Agent Decision")
1038
+
1039
+ run_btn.click(
1040
+ fn=run_demo,
1041
+ inputs=[person_dd, conflict_dd],
1042
+ outputs=[prediction_ui, before_out, cascade_narrative, after_out],
1043
+ )
1044
+
1045
+ # ── Tab 2: Try Your Situation ────────────────────────────────────────
1046
+ with gr.Tab("πŸ’­ Try Your Situation"):
1047
+ gr.Markdown(
1048
+ "Describe your situation in plain English. LifeStack extracts a **structured conflict**, "
1049
+ "infers your **personality**, maps your **life metrics**, and gives a personalised "
1050
+ "resolution plan with before/after comparison."
1051
+ )
1052
+ with gr.Row():
1053
+ with gr.Column(scale=1):
1054
+ situation_input = gr.Textbox(
1055
+ label="What's stressing you out right now?",
1056
+ placeholder="e.g. My boss keeps piling on work, I haven't slept in weeks, and my partner says I'm distant…",
1057
+ lines=3,
1058
+ )
1059
+ gr.Markdown("**Rate your current state (0 = none / low Β· 10 = extreme / high):**")
1060
+ work_sl = gr.Slider(0, 10, value=7, step=1, label="πŸ’Ό Work Stress")
1061
+ money_sl = gr.Slider(0, 10, value=5, step=1, label="πŸ’° Money Stress")
1062
+ rel_sl = gr.Slider(0, 10, value=6, step=1, label="❀️ Relationship Quality")
1063
+ energy_sl = gr.Slider(0, 10, value=4, step=1, label="⚑ Energy Level")
1064
+ time_sl = gr.Slider(0, 10, value=7, step=1, label="πŸ“… Time Pressure")
1065
+
1066
+ gmail_state = gr.State(None)
1067
+ with gr.Row():
1068
+ gmail_btn = gr.Button("πŸ“§ Sync Digital Signals (Gmail)", variant="secondary")
1069
+ gmail_status = gr.Markdown("<span style='color:#777;font-size:12px'>Gmail not connected. (Optional)</span>")
1070
+
1071
+ def sync_gmail():
1072
+ try:
1073
+ service = GMAIL.authenticate()
1074
+ rel = GMAIL.extract_relationship_signals(service)
1075
+ work = GMAIL.extract_work_signals(service)
1076
+ signals = GMAIL.to_life_metrics(rel, work)
1077
+ summary = GMAIL.get_email_summary(rel, work)
1078
+ return signals, f"βœ… **Signals synced!** {summary}"
1079
+ except Exception as e:
1080
+ return None, f"❌ **Gmail sync failed:** {e}"
1081
+
1082
+ gmail_btn.click(fn=sync_gmail, outputs=[gmail_state, gmail_status])
1083
+
1084
+ submit_btn = gr.Button("✨ Analyse & Get My Plan", variant="primary", size="lg")
1085
+
1086
+
1087
+ with gr.Column(scale=1):
1088
+ life_graph_out = gr.HTML(label="Your Life Right Now")
1089
+ after_graph_out = gr.HTML(label="After Action")
1090
+ plan_out = gr.HTML(label="Resolution Plan")
1091
+
1092
+ submit_btn.click(
1093
+ fn=run_custom,
1094
+ inputs=[situation_input, work_sl, money_sl, rel_sl, energy_sl, time_sl, gmail_state],
1095
+ outputs=[life_graph_out, after_graph_out, plan_out],
1096
+ )
1097
+
1098
+ # ── Tab 3: Training Results ──────────────────────────────────────────
1099
+ with gr.Tab("πŸ“Š Training Results"):
1100
+ training_html = gr.HTML(value=load_training_tab())
1101
+
1102
+ plot_path = os.path.join(os.path.dirname(__file__), "data", "reward_curve.png")
1103
+ if os.path.exists(plot_path):
1104
+ gr.Image(value=plot_path, label="Learning Curve β€” 100 Episode Training Run")
1105
+
1106
+ # ── Tab 4: Memory Effect Demo ────────────────────────────────────────
1107
+ with gr.Tab("🧠 Memory Effect"):
1108
+ gr.HTML("""
1109
+ <div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;
1110
+ padding:16px;margin-bottom:16px;font-family:sans-serif'>
1111
+ <div style='display:flex;justify-content:space-between;align-items:center'>
1112
+ <div>
1113
+ <div style='font-size:18px;font-weight:700;color:#eee;margin-bottom:4px'>
1114
+ Memory Effect Demo
1115
+ </div>
1116
+ <div style='font-size:13px;color:#888'>
1117
+ Same conflict, same agent. Episode 1 runs cold (no prior context). Episode 2 retrieves
1118
+ the stored memory and reasons differently β€” showing the RAG flywheel in action.
1119
+ </div>
1120
+ </div>
1121
+ <div style='background:#14532d;border:1px solid #22c55e;border-radius:20px;
1122
+ padding:6px 16px;font-size:13px;font-weight:700;color:#22c55e;
1123
+ white-space:nowrap'>
1124
+ +116% EFFICIENCY
1125
+ </div>
1126
+ </div>
1127
+ </div>
1128
+ """)
1129
+
1130
+ with gr.Row():
1131
+ mem_conflict_dd = gr.Dropdown(
1132
+ choices=CONFLICT_CHOICES_LIST,
1133
+ value=DEFAULT_CONFLICT,
1134
+ label="CONFLICT",
1135
+ )
1136
+ mem_person_dd = gr.Dropdown(
1137
+ choices=PERSON_CHOICES,
1138
+ value=PERSON_CHOICES[0],
1139
+ label="PERSONA",
1140
+ )
1141
+ mem_run_btn = gr.Button("🧠 Run Episodes", variant="primary", size="lg")
1142
+
1143
+ with gr.Row():
1144
+ mem_ep1_out = gr.HTML(label="Episode 1 β€” Cold Start")
1145
+ mem_ep2_out = gr.HTML(label="Episode 2 β€” RAG-Augmented")
1146
+
1147
+ mem_diff_out = gr.HTML(label="Memory Delta Analysis")
1148
+
1149
+ mem_run_btn.click(
1150
+ fn=run_memory_demo,
1151
+ inputs=[mem_conflict_dd, mem_person_dd],
1152
+ outputs=[mem_ep1_out, mem_ep2_out, mem_diff_out],
1153
+ )
1154
+
1155
+ # ── Tab 5: Arjun's Journey ──────────────────────────────────────────
1156
+ with gr.Tab("πŸ—“οΈ Arjun's Journey"):
1157
+ gr.HTML(LONG_DEMO.show_longitudinal_comparison())
1158
+
1159
+ with gr.Column():
1160
+ gr.Markdown("### πŸŽ“ Experimental Context Loading")
1161
+ gr.Markdown(
1162
+ "By activating Arjun's history, the agent gains 'experience' with his startup "
1163
+ "executive profile and specific relationship dynamics. This demonstrates how "
1164
+ "ChromaDB retrieval transforms a generic LLM into a hyper-personalised coach."
1165
+ )
1166
+ load_arjun_btn = gr.Button("πŸ”— Activate Arjun's Life History (v3)", variant="primary", size="lg")
1167
+
1168
+ def load_arjun_msg():
1169
+ LONG_DEMO.pre_seed_arjun()
1170
+ return "βœ… Arjun's memory (Week 1 & 2) is now ACTIVE in ChromaDB. Go to 'Live Demo', select Arjun, and click 'Run Agent'."
1171
+
1172
+ load_status = gr.Markdown()
1173
+ load_arjun_btn.click(fn=load_arjun_msg, outputs=load_status)
1174
+
1175
+ gr.Markdown("""
1176
+ ---
1177
+ **Experience it yourself:**
1178
+ 1. Click the button above to seed the memories.
1179
+ 2. Switch to the **🎯 Live Demo** tab.
1180
+ 3. Select **Arjun (Startup Lead)** from the persona list.
1181
+ 4. Select the **🚨 Friday 6PM** conflict.
1182
+ 5. Click **Run Agent**.
1183
+ 6. **Observe:** The agent will now use specific precedents in its reasoning and choice.
1184
+ """)
1185
+
1186
+ # ── Tab 5: Task Explorer ──────────────────────────────────────────────
1187
+ with gr.Tab("πŸ—ΊοΈ Task Explorer"):
1188
+ gr.Markdown(
1189
+ "### LifeStack Task Inspector\n"
1190
+ "Inspect the objective, viable routes, progression milestones, and exogenous event log for the current multi-step task architecture."
1191
+ )
1192
+
1193
+ with gr.Row():
1194
+ with gr.Column(scale=2):
1195
+ task_out = gr.HTML(label="Task Definition")
1196
+ with gr.Column(scale=1):
1197
+ route_out = gr.HTML(label="Route Status")
1198
+
1199
+ event_out = gr.HTML(label="World Event Log")
1200
+
1201
+ load_task_btn = gr.Button("πŸ”„ Load Demonstration Task", variant="secondary")
1202
+
1203
+ def load_demo_task():
1204
+ # Generate a dummy task for demonstration purposes
1205
+ dummy_routes = [
1206
+ Route(id="r1", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], preconditions={}, consequences={}, closes_routes=["r2"], milestones_unlocked=["m1"], final_reward=2.5),
1207
+ Route(id="r2", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], preconditions={}, consequences={}, closes_routes=["r1"], milestones_unlocked=["m2"], final_reward=1.8),
1208
+ ]
1209
+ dummy_milestones = [
1210
+ Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="", condition_value=True, reward=1.0),
1211
+ Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="", condition_value=True, reward=0.8),
1212
+ ]
1213
+ dummy_events = [
1214
+ ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
1215
+ ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={}, hidden_state_mutation={}, closes_routes=["r2"]),
1216
+ ]
1217
+ dummy_task = Task(
1218
+ id="sample_flight_crisis", domain="flight_crisis", goal="Survive Airport Cancellation",
1219
+ constraints={"budget_max": 800, "deadline_step": 10},
1220
+ hidden_state={"lounge_capacity": 100}, mutable_world={}, visible_world={},
1221
+ success_conditions=[], failure_conditions=[],
1222
+ event_schedule=dummy_events, viable_routes=dummy_routes, milestones=dummy_milestones,
1223
+ horizon=10, difficulty=4, domain_metadata={"story": "A major storm grounded commercial flights."}
1224
+ )
1225
+
1226
+ return (
1227
+ task_html(dummy_task),
1228
+ route_status_html(dummy_routes, closed={"r2"}),
1229
+ event_log_html(dummy_events)
1230
+ )
1231
+
1232
+ load_task_btn.click(fn=load_demo_task, outputs=[task_out, route_out, event_out])
1233
+
1234
+ # ── Tab 6: Follow-up ─────────────────────────────────────────────────
1235
+ with gr.Tab("πŸ“¬ Follow-up"):
1236
+ gr.Markdown("""
1237
+ ### πŸ“ Real-World Verification
1238
+ Did the agent's plan work in the real world? Provide your feedback here to close the loop.
1239
+ This feedback is stored in **ChromaDB** and used to fine-tune the reward models for future training runs.
1240
+ """)
1241
+ with gr.Row():
1242
+ with gr.Column(scale=1):
1243
+ fb_id = gr.Textbox(label="Episode ID", placeholder="e.g. A1B2C3D4")
1244
+ fb_score = gr.Slider(0, 10, value=7, label="Overall Effectiveness (0-10)")
1245
+ fb_time = gr.Number(label="Actual Resolution Time (hours)", value=2.0)
1246
+ with gr.Column(scale=2):
1247
+ fb_up = gr.CheckboxGroup(
1248
+ ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"],
1249
+ label="Domains that actually improved"
1250
+ )
1251
+ fb_down = gr.CheckboxGroup(
1252
+ ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"],
1253
+ label="Domains that actually worsened"
1254
+ )
1255
+ fb_notes = gr.Textbox(label="Unexpected Effects / Qualitative Feedback", lines=3)
1256
+ fb_btn = gr.Button("Submit Outcome Feedback", variant="primary")
1257
+ fb_out = gr.Markdown()
1258
+
1259
+ fb_btn.click(
1260
+ submit_outcome_feedback,
1261
+ inputs=[fb_id, fb_score, fb_up, fb_down, fb_notes, fb_time],
1262
+ outputs=fb_out
1263
+ )
1264
+
1265
+ gr.HTML("""
1266
+ <div style='text-align:center;padding:16px;color:#444;font-size:11px;border-top:1px solid #222;margin-top:16px'>
1267
+ LifeStack Β· Built for hackathon demo Β· Powered by Groq + ChromaDB + Sentence Transformers
1268
+ </div>
1269
+ """)
1270
+
1271
+
1272
+ if __name__ == "__main__":
1273
+ app.launch(
1274
+ share=False,
1275
+ server_port=7860,
1276
+ show_error=True,
1277
+ theme=gr.themes.Base(primary_hue="violet", neutral_hue="slate"),
1278
+ css="""
1279
+ body { background:#0d0d1a; }
1280
+ .gradio-container { max-width: 1100px; margin: auto; }
1281
+ h1 { text-align:center; }
1282
+ .tab-nav button { font-size:14px; font-weight:600; }
1283
+ """
1284
+ )
app_flask.py ADDED
@@ -0,0 +1,879 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ app_flask.py β€” LifeStack Flask Portal (FULL FEATURE PARITY)
3
+ Complete migration of the Gradio demo to a Flask-native architecture.
4
+ Includes: Live Demo, Custom Situations, Gmail Sync, Longitudinal Analysis, Task Explorer.
5
+ """
6
+
7
+ import os
8
+ import json
9
+ import copy
10
+ import uuid
11
+ import datetime
12
+ from collections import deque
13
+ from flask import Flask, render_template, request, jsonify, session
14
+ from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
15
+ from core.lifestack_env import LifeStackEnv, LifeStackAction
16
+ from agent.agent import LifeStackAgent
17
+ from intake.simperson import SimPerson
18
+ from agent.conflict_generator import ConflictEvent, generate_conflict, TEMPLATES
19
+ from core.action_space import apply_action, validate_action
20
+ from agent.memory import LifeStackMemory
21
+ from core.metric_schema import normalize_metric_path, is_valid_metric_path
22
+ from core.reward import compute_reward
23
+ from intake.intake import LifeIntake
24
+ from agent.conflict_predictor import ConflictPredictor
25
+ from agent.counterfactuals import generate_counterfactuals
26
+ from scripts.longitudinal_demo import LongitudinalDemo
27
+ from intake.gmail_intake import GmailIntake
28
+ from intake.calendar_intake import CalendarIntake
29
+ from core.task import Task, ExoEvent, Route, Milestone
30
+ from core.feedback import OutcomeFeedback, compute_human_feedback_reward
31
+ from core.cascade_utils import animate_cascade
32
+
33
+ app = Flask(__name__)
34
+ app.secret_key = "lifestack_secret_key_2026"
35
+
36
+ # ─── Global Instances ───
37
+ AGENT = LifeStackAgent(api_only=not bool(os.getenv('LIFESTACK_MODEL_PATH')))
38
+ MEMORY = LifeStackMemory(silent=True)
39
+ INTAKE = LifeIntake()
40
+ USER_HEALTH_OVERRIDES: dict = {} # persisted health/calendar metric deltas
41
+ EPISODE_HISTORY: deque = deque(maxlen=5) # ring buffer, most recent first
42
+
43
+ @app.route('/api/history', methods=['GET'])
44
+ @app.route('/api/history/list', methods=['GET'])
45
+ def get_history():
46
+ summaries = [
47
+ {
48
+ "id": ep.get("action", {}).get("id", ""),
49
+ "conflict": ep.get("conflict", {}).get("title", "Unknown"),
50
+ "person": ep.get("conflict", {}).get("person", "Unknown"),
51
+ "reward": ep.get("action", {}).get("reward", 0.0),
52
+ "timestamp": ep.get("timestamp", ""),
53
+ }
54
+ for ep in EPISODE_HISTORY
55
+ ]
56
+ return jsonify(summaries)
57
+
58
+ @app.route('/api/history/replay/<episode_id>', methods=['GET'])
59
+ def replay_episode(episode_id):
60
+ for ep in EPISODE_HISTORY:
61
+ if ep.get("action", {}).get("id", "") == episode_id:
62
+ return jsonify(ep)
63
+ return jsonify({"error": "Episode not found"}), 404
64
+
65
+ GMAIL = GmailIntake()
66
+ CALENDAR = CalendarIntake()
67
+ LONG_DEMO = LongitudinalDemo()
68
+ DEMO_PREDICTOR = ConflictPredictor()
69
+
70
+ # Friday 6PM is always the default demo conflict
71
+ DEMO_CONFLICT = next(t for t in TEMPLATES if t.id == "d5_friday")
72
+
73
+ PERSONS = {
74
+ "Alex (Executive) β€” driven, high-stress":
75
+ SimPerson(openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8, name="Alex (Executive)"),
76
+ "Chloe (Creative) β€” spontaneous, resilient":
77
+ SimPerson(openness=0.9, conscientiousness=0.2, extraversion=0.5, agreeableness=0.70, neuroticism=0.15, name="Chloe (Creative)"),
78
+ "Sam (Introvert) β€” anxious, thoughtful":
79
+ SimPerson(openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9, name="Sam (Introvert)"),
80
+ "Maya (Family) β€” empathetic, nurturing":
81
+ SimPerson(openness=0.5, conscientiousness=0.7, extraversion=0.5, agreeableness=0.95, neuroticism=0.3, name="Maya (Family)"),
82
+ "Leo (Student) β€” curious, organised":
83
+ SimPerson(openness=0.85, conscientiousness=0.8, extraversion=0.4, agreeableness=0.4, neuroticism=0.55, name="Leo (Student)"),
84
+ "Arjun (Startup Lead) β€” high- conscientiousness, high-neuroticism":
85
+ SimPerson(name="Arjun", openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8),
86
+ }
87
+
88
+ CONFLICT_CHOICES = {t.title: t for t in TEMPLATES}
89
+
90
+ # ─── Visual Helpers ───
91
+ DOMAIN_EMOJI = {
92
+ "career": "πŸ’Ό", "finances": "πŸ’°", "relationships": "❀️",
93
+ "physical_health": "πŸ’ͺ", "mental_wellbeing": "🧠", "time": "πŸ“…",
94
+ }
95
+ INVERTED_METRICS = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
96
+
97
+ _DOMAINS = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
98
+
99
+ def compute_domain_health(metrics_flat: dict) -> dict:
100
+ """Compute per-domain health score (0-100) from flat metrics. Inverted metrics are flipped."""
101
+ health = {}
102
+ for dom in _DOMAINS:
103
+ subs = {k: v for k, v in metrics_flat.items() if k.startswith(dom + ".")}
104
+ if not subs:
105
+ health[dom] = 50.0
106
+ continue
107
+ scores = []
108
+ for k, v in subs.items():
109
+ sub = k.split(".")[1]
110
+ scores.append((100.0 - v) if sub in INVERTED_METRICS else float(v))
111
+ health[dom] = round(sum(scores) / len(scores), 1)
112
+ return health
113
+
114
+ def _normalize_action_metric_changes(action) -> None:
115
+ fixed_changes = {}
116
+ for path, delta in action.primary.metric_changes.items():
117
+ raw_path = str(path)
118
+ if "." not in raw_path:
119
+ raw_path = f"{action.primary.target_domain}.{raw_path}"
120
+ norm_path = normalize_metric_path(raw_path)
121
+ if not is_valid_metric_path(norm_path): continue
122
+ try:
123
+ fixed_changes[norm_path] = float(delta)
124
+ except (ValueError, TypeError): continue
125
+ action.primary.metric_changes = fixed_changes
126
+
127
+ # ─── Routes ───
128
+ @app.route('/')
129
+ def index():
130
+ return render_template('index.html',
131
+ persons=list(PERSONS.keys()),
132
+ conflicts=list(CONFLICT_CHOICES.keys()))
133
+
134
+ @app.route('/api/simulation/start', methods=['POST'])
135
+ def start_simulation():
136
+ data = request.json
137
+ conflict_label = data.get('conflict')
138
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
139
+ base_metrics = LifeMetrics()
140
+ # Apply any uploaded health/calendar overrides
141
+ for path, delta in USER_HEALTH_OVERRIDES.items():
142
+ if '.' in path:
143
+ dom, sub = path.split('.', 1)
144
+ dom_obj = getattr(base_metrics, dom, None)
145
+ if dom_obj and hasattr(dom_obj, sub):
146
+ setattr(dom_obj, sub, max(0.0, min(100.0, getattr(dom_obj, sub) + delta)))
147
+ flat = base_metrics.flatten()
148
+ return jsonify({
149
+ "status": "success",
150
+ "metrics": flat,
151
+ "prediction": {
152
+ "summary": DEMO_PREDICTOR.get_prediction_summary(),
153
+ "risk_score": DEMO_PREDICTOR.get_risk_score()
154
+ }
155
+ })
156
+
157
+ @app.route('/api/simulation/cascade', methods=['POST'])
158
+ def get_cascade_frames():
159
+ data = request.json
160
+ conflict_label = data.get('conflict')
161
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
162
+ frames = animate_cascade(conflict.primary_disruption, LifeMetrics())
163
+ return jsonify({"frames": frames})
164
+
165
+ @app.route('/api/simulation/graph', methods=['GET'])
166
+ def get_dependency_graph():
167
+ graph = DependencyGraph()
168
+ nodes = []
169
+ edges = []
170
+
171
+ # Flatten metrics to get all nodes
172
+ metrics = LifeMetrics().flatten()
173
+ for path in metrics.keys():
174
+ dom, sub = path.split('.')
175
+ nodes.append({
176
+ "id": path,
177
+ "label": sub.replace('_', ' '),
178
+ "group": dom
179
+ })
180
+
181
+ for src, targets in graph.edges.items():
182
+ for target, weight in targets:
183
+ edges.append({
184
+ "from": src,
185
+ "to": target,
186
+ "value": abs(weight),
187
+ "arrows": "to",
188
+ "color": {"color": "#4ade80" if weight > 0 else "#ef4444", "opacity": 0.2}
189
+ })
190
+
191
+ return jsonify({"nodes": nodes, "edges": edges})
192
+
193
+ @app.route('/api/simulation/action', methods=['POST'])
194
+ def perform_action():
195
+ data = request.json
196
+ person_label = data.get('person')
197
+ conflict_label = data.get('conflict')
198
+ memory_enabled = data.get('use_memory', False)
199
+
200
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
201
+ person = PERSONS.get(person_label, PERSONS["Alex (Executive) β€” driven, high-stress"])
202
+
203
+ env = LifeStackEnv()
204
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
205
+
206
+ before_metrics = copy.deepcopy(env.state.current_metrics)
207
+ before_budget = copy.deepcopy(env.state.budget)
208
+
209
+ # RAG: Build few-shot context from ChromaDB if enabled
210
+ few_shot = ""
211
+ retrieved = []
212
+ if memory_enabled:
213
+ few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_metrics.flatten())
214
+ retrieved = MEMORY.retrieve_similar(conflict.title, before_metrics.flatten())
215
+
216
+ action = AGENT.get_action(before_metrics, before_budget, conflict, person, few_shot_context=few_shot)
217
+ _normalize_action_metric_changes(action)
218
+
219
+ uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
220
+ before_metrics.mental_wellbeing.stress_level)
221
+
222
+ env_action = LifeStackAction.from_agent_action(action)
223
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
224
+
225
+ obs = env.step(env_action)
226
+
227
+ # Store decision in memory for future RAG
228
+ MEMORY.store_decision(
229
+ conflict_title=conflict.title,
230
+ action_type=action.primary.action_type,
231
+ target_domain=action.primary.target_domain,
232
+ reward=obs.reward,
233
+ metrics_snapshot=before_metrics.flatten(),
234
+ reasoning=action.reasoning
235
+ )
236
+
237
+ cf_data = generate_counterfactuals(AGENT, before_metrics, before_budget, conflict, person, action)
238
+ episode_id = "".join(str(uuid.uuid4()).split("-")[:2]).upper()
239
+
240
+ result = {
241
+ "metrics": obs.metrics,
242
+ "domain_health": compute_domain_health(obs.metrics),
243
+ "action": {
244
+ "type": action.primary.action_type,
245
+ "target": action.primary.target_domain,
246
+ "description": action.primary.description,
247
+ "reasoning": action.reasoning,
248
+ "reward": obs.reward,
249
+ "uptake": uptake,
250
+ "cost": action.primary.resource_cost,
251
+ "id": episode_id,
252
+ "memories_retrieved": retrieved
253
+ },
254
+ "counterfactuals": cf_data,
255
+ "prediction": {
256
+ "summary": DEMO_PREDICTOR.get_prediction_summary(),
257
+ "risk_score": DEMO_PREDICTOR.get_risk_score()
258
+ },
259
+ "conflict": {
260
+ "title": conflict.title,
261
+ "person": person.name
262
+ },
263
+ "timestamp": datetime.datetime.now().strftime("%H:%M:%S")
264
+ }
265
+
266
+ # Store in history
267
+ EPISODE_HISTORY.appendleft(result)
268
+
269
+ return jsonify(result)
270
+
271
+ # ─── 7-Day Trajectory ───
272
+ @app.route('/api/simulation/trajectory', methods=['POST'])
273
+ def get_trajectory():
274
+ """
275
+ Run the agent action then perform a 7-step rollout.
276
+ Returns per-day metric snapshots for the forecast panel.
277
+ """
278
+ data = request.json
279
+ conflict_label = data.get('conflict')
280
+ person_label = data.get('person')
281
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
282
+ person = PERSONS.get(person_label, PERSONS["Alex (Executive) β€” driven, high-stress"])
283
+
284
+ env = LifeStackEnv()
285
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
286
+
287
+ before_metrics = copy.deepcopy(env.state.current_metrics)
288
+ before_budget = copy.deepcopy(env.state.budget)
289
+
290
+ action = AGENT.get_action(before_metrics, before_budget, conflict, person)
291
+ _normalize_action_metric_changes(action)
292
+ uptake = person.respond_to_action(
293
+ action.primary.action_type, action.primary.resource_cost,
294
+ before_metrics.mental_wellbeing.stress_level,
295
+ )
296
+ env_action = LifeStackAction.from_agent_action(action)
297
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
298
+
299
+ obs = env.step(env_action)
300
+ rollout = env.rollout(n_steps=7, gamma=0.9)
301
+
302
+ return jsonify({
303
+ "action": {
304
+ "type": action.primary.action_type,
305
+ "target": action.primary.target_domain,
306
+ "reasoning": action.reasoning,
307
+ "reward": obs.reward,
308
+ },
309
+ "day0_metrics": dict(obs.metrics),
310
+ "discounted_reward": rollout["discounted_reward"],
311
+ "trajectory": rollout["trajectory"],
312
+ })
313
+
314
+
315
+ # ─── Custom Situation Entry ───
316
+ @app.route('/api/custom/run', methods=['POST'])
317
+ def run_custom():
318
+ data = request.json
319
+ situation_input = data.get('situation', "")
320
+
321
+ # Map sliders to metrics
322
+ m = LifeMetrics()
323
+ m.career.stress_level = float(data.get('work_stress', 5)) * 10
324
+ m.finances.debt_pressure = float(data.get('money_stress', 5)) * 10
325
+ m.relationships.conflict_frequency = (10 - float(data.get('rel_quality', 5))) * 10
326
+ m.physical_health.energy_level = float(data.get('energy_level', 5)) * 10
327
+ m.time.free_time = (10 - float(data.get('time_pressure', 5))) * 10
328
+
329
+ # Apply uploaded health/calendar overrides to custom metrics
330
+ for path, delta in USER_HEALTH_OVERRIDES.items():
331
+ if '.' in path:
332
+ dom, sub = path.split('.', 1)
333
+ dom_obj = getattr(m, dom, None)
334
+ if dom_obj and hasattr(dom_obj, sub):
335
+ setattr(dom_obj, sub, max(0.0, min(100.0, getattr(dom_obj, sub) + delta)))
336
+
337
+ gmail_signals = data.get('gmail_signals')
338
+ if gmail_signals:
339
+ # Merge digital signals if provided
340
+ for k, v in gmail_signals.items():
341
+ parts = k.split(".")
342
+ if len(parts) == 2:
343
+ dom = getattr(m, parts[0], None)
344
+ if dom and hasattr(dom, parts[1]):
345
+ setattr(dom, parts[1], v)
346
+
347
+ # Extract conflict from text using LLM
348
+ conflict = INTAKE.extract_conflict(situation_input, m)
349
+ pers_dict = INTAKE.get_personality_from_description(situation_input)
350
+ person = SimPerson(
351
+ name=pers_dict.get("name", "Inferred Self"),
352
+ openness=pers_dict.get("openness", 0.5),
353
+ conscientiousness=pers_dict.get("conscientiousness", 0.5),
354
+ extraversion=pers_dict.get("extraversion", 0.5),
355
+ agreeableness=pers_dict.get("agreeableness", 0.5),
356
+ neuroticism=pers_dict.get("neuroticism", 0.5)
357
+ )
358
+
359
+ budget = ResourceBudget(time=24, money=1000, energy=100)
360
+ action = AGENT.get_action(m, budget, conflict, person)
361
+ _normalize_action_metric_changes(action)
362
+
363
+ uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
364
+ m.mental_wellbeing.stress_level)
365
+
366
+ env = LifeStackEnv()
367
+ env.state.current_metrics = copy.deepcopy(m)
368
+ env.state.budget = budget
369
+
370
+ env_action = LifeStackAction.from_agent_action(action)
371
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
372
+ obs = env.step(env_action)
373
+
374
+ return jsonify({
375
+ "before_metrics": m.flatten(),
376
+ "after_metrics": obs.metrics,
377
+ "domain_health": compute_domain_health(obs.metrics),
378
+ "action": {
379
+ "type": action.primary.action_type,
380
+ "target": action.primary.target_domain,
381
+ "description": action.primary.description,
382
+ "reasoning": action.reasoning,
383
+ "id": "".join(str(uuid.uuid4()).split("-")[:2]).upper()
384
+ },
385
+ "person": {"name": person.name or "Inferred Self"}
386
+ })
387
+
388
+ @app.route('/api/gmail/sync', methods=['POST'])
389
+ def sync_gmail():
390
+ signals, metric_deltas, summary, is_demo = GMAIL.sync()
391
+ return jsonify({
392
+ "status": "success",
393
+ "signals": metric_deltas,
394
+ "raw": signals,
395
+ "summary": summary,
396
+ "is_demo": is_demo,
397
+ })
398
+
399
+
400
+ @app.route('/api/digital/sync', methods=['POST'])
401
+ def digital_sync():
402
+ """
403
+ Unified Digital Sync β€” Gmail + Google Calendar + Fitness (demo payload).
404
+ Tries real OAuth for Gmail and Calendar; falls back to demo_signals.json on failure.
405
+ Fitness is always served from the demo payload (no first-party fitness API scope).
406
+ Returns merged metric deltas, per-source raw signals, and a demo flag per source.
407
+ """
408
+ import json as _json
409
+ demo_path = os.path.join(os.path.dirname(__file__), 'data', 'demo_signals.json')
410
+
411
+ with open(demo_path) as f:
412
+ demo_full = _json.load(f)
413
+
414
+ # Gmail
415
+ gmail_signals, gmail_deltas, gmail_summary, gmail_is_demo = GMAIL.sync()
416
+
417
+ # Calendar
418
+ cal_signals, cal_deltas, cal_is_demo = CALENDAR.sync()
419
+
420
+ # Fitness β€” always demo (no live fitness API)
421
+ fitness_signals = demo_full['fitness']
422
+ fitness_deltas = {
423
+ "physical_health.sleep_quality": demo_full['derived_metric_deltas']['physical_health.sleep_quality'],
424
+ "physical_health.energy_level": demo_full['derived_metric_deltas']['physical_health.energy_level'],
425
+ "physical_health.exercise_consistency": demo_full['derived_metric_deltas']['physical_health.exercise_consistency'],
426
+ "mental_wellbeing.stress_level": demo_full['derived_metric_deltas']['mental_wellbeing.stress_level'],
427
+ }
428
+ fitness_is_demo = True
429
+
430
+ # Merge all deltas (last writer wins β€” Calendar > Gmail for overlapping keys)
431
+ merged_deltas = {}
432
+ merged_deltas.update(gmail_deltas)
433
+ merged_deltas.update(cal_deltas)
434
+ merged_deltas.update(fitness_deltas)
435
+
436
+ return jsonify({
437
+ "status": "success",
438
+ "merged_deltas": merged_deltas,
439
+ "sources": {
440
+ "gmail": {
441
+ "signals": gmail_signals if isinstance(gmail_signals, dict) else {},
442
+ "summary": gmail_summary,
443
+ "is_demo": gmail_is_demo,
444
+ },
445
+ "calendar": {
446
+ "signals": cal_signals,
447
+ "summary": cal_signals.get("summary", ""),
448
+ "is_demo": cal_is_demo,
449
+ },
450
+ "fitness": {
451
+ "signals": fitness_signals,
452
+ "summary": fitness_signals.get("summary", ""),
453
+ "is_demo": True,
454
+ },
455
+ },
456
+ "persona_note": demo_full.get("persona", "Jordan (PM at Series-B startup)"),
457
+ })
458
+
459
+ @app.route('/api/arjun/activate', methods=['POST'])
460
+ def activate_arjun():
461
+ LONG_DEMO.pre_seed_arjun()
462
+ return jsonify({"status": "success", "message": "Arjun's memory (Week 1 & 2) is now ACTIVE in ChromaDB."})
463
+
464
+ @app.route('/api/task/demo', methods=['GET'])
465
+ def get_demo_task():
466
+ dummy_routes = [
467
+ Route(id="r1", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], milestones_unlocked=["m1"], final_reward=2.5),
468
+ Route(id="r2", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], milestones_unlocked=["m2"], final_reward=1.8),
469
+ ]
470
+ dummy_milestones = [
471
+ Milestone(id="m1", description="Successfully rebooked flight before deadline", reward=1.0),
472
+ Milestone(id="m2", description="Caught up with all emergency slack messages", reward=0.8),
473
+ ]
474
+ dummy_events = [
475
+ ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300."),
476
+ ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity."),
477
+ ]
478
+ task = Task(
479
+ id="sample_flight_crisis", domain="flight_crisis", goal="Survive Airport Cancellation",
480
+ event_schedule=dummy_events, viable_routes=dummy_routes, milestones=dummy_milestones,
481
+ horizon=10, difficulty=4
482
+ )
483
+ return jsonify({
484
+ "goal": task.goal,
485
+ "difficulty": task.difficulty,
486
+ "routes": [{"name": r.name, "description": r.description} for r in dummy_routes],
487
+ "milestones": [{"id": m.id, "description": m.description} for m in dummy_milestones],
488
+ "events": [{"step": e.step, "id": e.id, "description": e.description} for e in dummy_events],
489
+ "story": "A major storm grounded commercial flights."
490
+ })
491
+
492
+ @app.route('/api/stats', methods=['GET'])
493
+ def get_stats():
494
+ stats = MEMORY.get_stats()
495
+ # Normalise for frontend: inject feedback_count and reward_history
496
+ all_records = []
497
+ try:
498
+ raw = MEMORY.collection.get(include=["metadatas"])
499
+ all_records = raw.get("metadatas", [])
500
+ except Exception:
501
+ pass
502
+ stats["feedback_count"] = len([m for m in all_records if m.get("type") == "feedback"])
503
+ rewards = [m.get("reward", 0.0) for m in all_records if "reward" in m]
504
+ stats["reward_history"] = rewards[-20:] if rewards else []
505
+ return jsonify(stats)
506
+
507
+ @app.route('/api/feedback/submit', methods=['POST'])
508
+ def submit_feedback():
509
+ data = request.json
510
+ try:
511
+ feedback = OutcomeFeedback(
512
+ episode_id=data.get('episode_id'),
513
+ submitted_at=datetime.datetime.now(),
514
+ overall_effectiveness=int(data.get('score', 7)),
515
+ domains_improved=data.get('improved', []),
516
+ domains_worsened=data.get('worsened', []),
517
+ unexpected_effects=data.get('notes', ""),
518
+ resolution_time_hours=float(data.get('time', 1.0))
519
+ )
520
+ MEMORY.store_feedback(feedback)
521
+ return jsonify({"status": "success", "message": f"Feedback stored for episode {feedback.episode_id}"})
522
+ except Exception as e:
523
+ return jsonify({"status": "error", "message": str(e)}), 400
524
+
525
+ # ─── Feature F1 helper: random action baseline ───
526
+ _ACTION_TYPES = ["negotiate", "communicate", "delegate", "spend", "reschedule", "rest", "deprioritize", "execute"]
527
+
528
+ def _random_action(conflict, person):
529
+ """Purely random action baseline β€” worst possible agent, used for ablation floor."""
530
+ import random as _r
531
+ env = LifeStackEnv()
532
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
533
+ flat = env.state.current_metrics.flatten()
534
+ atype = _r.choice(_ACTION_TYPES)
535
+ dom = _r.choice(_DOMAINS)
536
+ key = f"{dom}.stress_level" if dom in ("career", "mental_wellbeing") else f"{dom}.liquidity" if dom == "finances" else f"{dom}.energy_level"
537
+ mc = {key: _r.uniform(-20, 20)}
538
+ rc = {"time": _r.uniform(0.5, 3.0), "energy": _r.uniform(5, 30)}
539
+ uptake = person.respond_to_action(atype, rc, flat.get("mental_wellbeing.stress_level", 70))
540
+ env_action = LifeStackAction(action_type=atype, target=dom,
541
+ metric_changes={k: v * uptake for k, v in mc.items()},
542
+ resource_cost=rc, reasoning="Random baseline.", actions_taken=1)
543
+ obs = env.step(env_action)
544
+ return {"metrics": obs.metrics, "action": {"type": atype, "target": dom,
545
+ "description": "Random action (ablation floor).",
546
+ "reasoning": "Random baseline.", "reward": obs.reward, "cost": rc}}
547
+
548
+
549
+ # ─── Feature A: Trained vs Untrained Comparison ───
550
+ BASELINE_ACTION_MAP = {
551
+ "career": ("negotiate", {"career.workload": -12.0, "mental_wellbeing.stress_level": -4.0}, {"time": 1.5, "energy": 20.0}, "Negotiate workload with manager."),
552
+ "finances": ("spend", {"finances.liquidity": -200.0, "mental_wellbeing.stress_level": -8.0}, {"time": 1.0, "energy": 10.0}, "Spend to resolve financial pressure."),
553
+ "relationships": ("communicate", {"relationships.romantic": 8.0, "mental_wellbeing.stress_level": -5.0},{"time": 0.5, "energy": 8.0}, "Call partner to check in."),
554
+ "physical_health": ("rest", {"physical_health.energy_level": 12.0, "mental_wellbeing.stress_level": -6.0}, {"time": 1.0}, "Rest to recover energy."),
555
+ "mental_wellbeing": ("rest", {"mental_wellbeing.stress_level": -15.0, "physical_health.sleep_quality": 5.0}, {"time": 1.0}, "Take a break to reduce stress."),
556
+ "time": ("reschedule", {"time.free_hours_per_week": 6.0, "career.workload": -8.0}, {"time": 1.5, "energy": 12.0}, "Reschedule non-critical tasks."),
557
+ }
558
+
559
+ def _run_baseline(conflict, person):
560
+ """Rule-based baseline: pick the action for the worst-scoring domain."""
561
+ env = LifeStackEnv()
562
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
563
+ flat = env.state.current_metrics.flatten()
564
+
565
+ domain_scores = {}
566
+ for dom in ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]:
567
+ subs = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
568
+ domain_scores[dom] = sum(subs.values()) / len(subs) if subs else 70.0
569
+
570
+ worst_dom = min(domain_scores, key=domain_scores.get)
571
+ atype, mc, rc, desc = BASELINE_ACTION_MAP.get(worst_dom, BASELINE_ACTION_MAP["mental_wellbeing"])
572
+
573
+ uptake = person.respond_to_action(atype, rc, flat.get("mental_wellbeing.stress_level", 70))
574
+ scaled_mc = {k: v * uptake for k, v in mc.items()}
575
+
576
+ env_action = LifeStackAction(
577
+ action_type=atype,
578
+ target=worst_dom,
579
+ metric_changes=scaled_mc,
580
+ resource_cost=rc,
581
+ reasoning=f"Rule-based: {worst_dom} scored {domain_scores[worst_dom]:.1f} β€” lowest domain.",
582
+ actions_taken=1,
583
+ )
584
+ obs = env.step(env_action)
585
+ return {
586
+ "metrics": obs.metrics,
587
+ "action": {
588
+ "type": atype,
589
+ "target": worst_dom,
590
+ "description": desc,
591
+ "reasoning": env_action.reasoning,
592
+ "reward": obs.reward,
593
+ "cost": rc,
594
+ }
595
+ }
596
+
597
+ def _run_agent_comparison_side(conflict, person, api_only: bool):
598
+ """Run one side of the comparison: api_only=True β†’ untrained LLM, False β†’ GRPO-trained."""
599
+ env = LifeStackEnv()
600
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
601
+ before_metrics = copy.deepcopy(env.state.current_metrics)
602
+ before_budget = copy.deepcopy(env.state.budget)
603
+ action = AGENT.get_action(before_metrics, before_budget, conflict, person, api_only=api_only)
604
+ _normalize_action_metric_changes(action)
605
+ uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
606
+ before_metrics.mental_wellbeing.stress_level)
607
+ env_action = LifeStackAction.from_agent_action(action)
608
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
609
+ obs = env.step(env_action)
610
+ return {
611
+ "metrics": obs.metrics,
612
+ "action": {
613
+ "type": action.primary.action_type,
614
+ "target": action.primary.target_domain,
615
+ "description": action.primary.description,
616
+ "reasoning": action.reasoning,
617
+ "reward": obs.reward,
618
+ "cost": action.primary.resource_cost,
619
+ }
620
+ }
621
+
622
+
623
+ @app.route('/api/comparison/run', methods=['POST'])
624
+ def run_comparison():
625
+ """Run same conflict through untrained LLM (no RL) AND GRPO-trained LifeStack agent."""
626
+ data = request.json
627
+ conflict_label = data.get('conflict')
628
+ person_label = data.get('person')
629
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
630
+ person = PERSONS.get(person_label, PERSONS["Alex (Executive) β€” driven, high-stress"])
631
+
632
+ # Untrained LLM path β€” forces Groq API, no GRPO optimization
633
+ try:
634
+ baseline = _run_agent_comparison_side(conflict, person, api_only=True)
635
+ except Exception as e:
636
+ baseline = {"error": str(e)}
637
+
638
+ # GRPO-trained agent path β€” uses local model if available, lazy-loaded
639
+ try:
640
+ trained = _run_agent_comparison_side(conflict, person, api_only=False)
641
+ except Exception as e:
642
+ trained = {"error": str(e)}
643
+
644
+ return jsonify({"baseline": baseline, "trained": trained})
645
+
646
+
647
+ # ─── Feature E: Memory Effect Comparison ───
648
+ @app.route('/api/memory/compare', methods=['POST'])
649
+ def memory_compare():
650
+ """Show the same conflict resolved cold (no memory) vs warm (with RAG memory)."""
651
+ try:
652
+ data = request.json
653
+ conflict_label = data.get('conflict')
654
+ person_label = data.get('person')
655
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
656
+ person = PERSONS.get(person_label, PERSONS["Alex (Executive) β€” driven, high-stress"])
657
+
658
+ def _run_episode(use_memory: bool):
659
+ env = LifeStackEnv()
660
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
661
+ before_metrics = copy.deepcopy(env.state.current_metrics)
662
+ before_budget = copy.deepcopy(env.state.budget)
663
+ few_shot = ""
664
+ retrieved = []
665
+ if use_memory:
666
+ few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_metrics.flatten())
667
+ retrieved = MEMORY.retrieve_similar(conflict.title, before_metrics.flatten())
668
+ action = AGENT.get_action(before_metrics, before_budget, conflict, person, few_shot_context=few_shot)
669
+ _normalize_action_metric_changes(action)
670
+ uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
671
+ before_metrics.mental_wellbeing.stress_level)
672
+ env_action = LifeStackAction.from_agent_action(action)
673
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
674
+ obs = env.step(env_action)
675
+ MEMORY.store_decision(
676
+ conflict_title=conflict.title,
677
+ action_type=action.primary.action_type,
678
+ target_domain=action.primary.target_domain,
679
+ reward=obs.reward,
680
+ metrics_snapshot=before_metrics.flatten(),
681
+ reasoning=action.reasoning,
682
+ )
683
+ return {
684
+ "metrics": obs.metrics,
685
+ "action": {
686
+ "type": action.primary.action_type,
687
+ "target": action.primary.target_domain,
688
+ "description": action.primary.description,
689
+ "reasoning": action.reasoning,
690
+ "reward": obs.reward,
691
+ "memories_retrieved": retrieved,
692
+ }
693
+ }
694
+
695
+ cold = _run_episode(use_memory=False)
696
+ warm = _run_episode(use_memory=True)
697
+ return jsonify({"cold": cold, "warm": warm})
698
+ except Exception as e:
699
+ return jsonify({"error": str(e)}), 500
700
+
701
+
702
+ # ─── F2: /api/cascade/frames alias ───
703
+ @app.route('/api/cascade/frames', methods=['POST'])
704
+ def cascade_frames_alias():
705
+ """Alias route for /api/simulation/cascade β€” same handler."""
706
+ return get_cascade_frames()
707
+
708
+
709
+ # ─── F4: Personality Comparison with OCEAN scores ───
710
+ @app.route('/api/personality/compare', methods=['POST'])
711
+ def personality_compare():
712
+ data = request.json
713
+ conflict_label = data.get('conflict')
714
+ person_a_label = data.get('person_a')
715
+ person_b_label = data.get('person_b')
716
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
717
+
718
+ def _run_person(person_label):
719
+ person = PERSONS.get(person_label, list(PERSONS.values())[0])
720
+ env = LifeStackEnv()
721
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
722
+ before_m = copy.deepcopy(env.state.current_metrics)
723
+ before_b = copy.deepcopy(env.state.budget)
724
+ action = AGENT.get_action(before_m, before_b, conflict, person)
725
+ _normalize_action_metric_changes(action)
726
+ uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
727
+ before_m.mental_wellbeing.stress_level)
728
+ env_action = LifeStackAction.from_agent_action(action)
729
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
730
+ obs = env.step(env_action)
731
+ return {
732
+ "name": person.name,
733
+ "ocean": {
734
+ "openness": round(person.openness * 100),
735
+ "conscientiousness": round(person.conscientiousness * 100),
736
+ "extraversion": round(person.extraversion * 100),
737
+ "agreeableness": round(person.agreeableness * 100),
738
+ "neuroticism": round(person.neuroticism * 100),
739
+ },
740
+ "action": {
741
+ "type": action.primary.action_type,
742
+ "target": action.primary.target_domain,
743
+ "description": action.primary.description,
744
+ "reasoning": action.reasoning,
745
+ "reward": obs.reward,
746
+ "uptake": uptake,
747
+ },
748
+ "metrics": obs.metrics,
749
+ "domain_health": compute_domain_health(obs.metrics),
750
+ }
751
+
752
+ try:
753
+ return jsonify({"a": _run_person(person_a_label), "b": _run_person(person_b_label)})
754
+ except Exception as e:
755
+ return jsonify({"error": str(e)}), 500
756
+
757
+
758
+ # ─── F6: Dedicated Counterfactual Generation ───
759
+ @app.route('/api/counterfactuals/generate', methods=['POST'])
760
+ def counterfactuals_generate():
761
+ data = request.json
762
+ conflict_label = data.get('conflict')
763
+ person_label = data.get('person')
764
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
765
+ person = PERSONS.get(person_label, list(PERSONS.values())[0])
766
+
767
+ env = LifeStackEnv()
768
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
769
+ before_m = copy.deepcopy(env.state.current_metrics)
770
+ before_b = copy.deepcopy(env.state.budget)
771
+ action = AGENT.get_action(before_m, before_b, conflict, person)
772
+ _normalize_action_metric_changes(action)
773
+ cf_data = generate_counterfactuals(AGENT, before_m, before_b, conflict, person, action)
774
+ return jsonify({
775
+ "counterfactuals": cf_data,
776
+ "actual_action": {
777
+ "type": action.primary.action_type,
778
+ "target": action.primary.target_domain,
779
+ "description": action.primary.description,
780
+ },
781
+ })
782
+
783
+
784
+ # ─── F7: Memory Ablation Study ───
785
+ @app.route('/api/memory/ablation', methods=['POST'])
786
+ def memory_ablation():
787
+ """Memory ablation: cold (0 memories) vs warm (RAG-augmented). Surfaces ablation delta."""
788
+ data = request.json
789
+ conflict_label = data.get('conflict')
790
+ person_label = data.get('person')
791
+ conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
792
+ person = PERSONS.get(person_label, list(PERSONS.values())[0])
793
+
794
+ def _run(use_memory):
795
+ env = LifeStackEnv()
796
+ env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
797
+ before_m = copy.deepcopy(env.state.current_metrics)
798
+ before_b = copy.deepcopy(env.state.budget)
799
+ few_shot, retrieved = "", []
800
+ if use_memory:
801
+ few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_m.flatten())
802
+ retrieved = MEMORY.retrieve_similar(conflict.title, before_m.flatten())
803
+ action = AGENT.get_action(before_m, before_b, conflict, person, few_shot_context=few_shot)
804
+ _normalize_action_metric_changes(action)
805
+ uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
806
+ before_m.mental_wellbeing.stress_level)
807
+ env_action = LifeStackAction.from_agent_action(action)
808
+ env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
809
+ obs = env.step(env_action)
810
+ MEMORY.store_decision(conflict_title=conflict.title, action_type=action.primary.action_type,
811
+ target_domain=action.primary.target_domain, reward=obs.reward,
812
+ metrics_snapshot=before_m.flatten(), reasoning=action.reasoning)
813
+ return {"metrics": obs.metrics, "action": {
814
+ "type": action.primary.action_type, "target": action.primary.target_domain,
815
+ "description": action.primary.description, "reasoning": action.reasoning,
816
+ "reward": obs.reward, "memories_retrieved": retrieved,
817
+ }}
818
+
819
+ cold = _run(use_memory=False)
820
+ warm = _run(use_memory=True)
821
+ delta = warm["action"]["reward"] - cold["action"]["reward"]
822
+ return jsonify({"cold": cold, "warm": warm,
823
+ "ablation_delta": round(delta, 4),
824
+ "memory_count": len(warm["action"]["memories_retrieved"])})
825
+
826
+
827
+ # ─── F10: Health + Calendar Data Upload ───
828
+ @app.route('/api/data/health/upload', methods=['POST'])
829
+ def upload_health_data():
830
+ """Accept health/fitness JSON signals and return metric deltas."""
831
+ data = request.json or {}
832
+ sleep = float(data.get('sleep_hours', 7.0))
833
+ hr = float(data.get('resting_heart_rate', 70))
834
+ steps = float(data.get('daily_steps', 8000))
835
+ deltas = {
836
+ "physical_health.sleep_quality": round(min(100, sleep / 8 * 100) - 50, 1),
837
+ "physical_health.energy_level": round(min(100, steps / 10000 * 100) - 50, 1),
838
+ "physical_health.exercise_consistency": round(min(100, steps / 8000 * 70), 1),
839
+ "mental_wellbeing.stress_level": round(max(0.0, 80.0 - hr), 1),
840
+ }
841
+ summary = f"Sleep {sleep:.1f}h | HR {hr:.0f}bpm | Steps {int(steps):,}/day"
842
+ # Persist overrides so future simulations use the uploaded health data
843
+ USER_HEALTH_OVERRIDES.update(deltas)
844
+ return jsonify({"status": "success", "deltas": deltas, "summary": summary,
845
+ "signals": {"avg_sleep_hours": sleep, "resting_heart_rate": hr, "daily_steps_avg": steps}})
846
+
847
+
848
+ @app.route('/api/data/calendar/upload', methods=['POST'])
849
+ def upload_calendar_data():
850
+ """Accept calendar JSON signals and return metric deltas."""
851
+ data = request.json or {}
852
+ occupancy = float(data.get('week_occupancy_pct', 50))
853
+ btb = int(data.get('back_to_back_blocks', 0))
854
+ deadlines = data.get('upcoming_deadlines', [])
855
+ critical_count = sum(1 for d in deadlines if d.get('priority') == 'critical')
856
+ deltas = {
857
+ "time.free_hours_per_week": round(-((occupancy - 50) / 5), 1),
858
+ "time.schedule_control": round(-(occupancy / 10), 1),
859
+ "mental_wellbeing.stress_level": round((occupancy / 10) + (btb * 2), 1),
860
+ "career.workload": round((occupancy - 50) / 2 + critical_count * 5, 1),
861
+ }
862
+ summary = f"Occupancy {occupancy:.0f}% | {len(deadlines)} deadlines ({critical_count} critical)"
863
+ return jsonify({"status": "success", "deltas": deltas, "summary": summary,
864
+ "signals": {"week_occupancy_pct": occupancy, "back_to_back_blocks": btb,
865
+ "upcoming_deadlines": deadlines}})
866
+
867
+
868
+ # ─── Global Error Handlers ───
869
+ @app.errorhandler(429)
870
+ def ratelimit_handler(e):
871
+ return jsonify({"error": "Rate limit exceeded. Slow down!", "details": str(e)}), 429
872
+
873
+ @app.errorhandler(500)
874
+ def server_error_handler(e):
875
+ return jsonify({"error": "Internal server error. The agent might be overwhelmed.", "details": str(e)}), 500
876
+
877
+ if __name__ == '__main__':
878
+ LONG_DEMO.pre_seed_arjun()
879
+ app.run(host='0.0.0.0', port=7860, debug=True)
core/__init__.py ADDED
File without changes
core/action_space.py ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ from dataclasses import dataclass, field
3
+ from core.life_state import LifeMetrics, ResourceBudget
4
+ from enum import Enum
5
+ from intake.simperson import SimPerson
6
+
7
+ class ToolActionType(str, Enum):
8
+ INSPECT = "inspect"
9
+ PLAN = "plan"
10
+ EXECUTE = "execute"
11
+ COMMUNICATE = "communicate"
12
+ WAIT = "wait"
13
+ ROLLBACK = "rollback"
14
+ ESCALATE = "escalate"
15
+
16
+ @dataclass
17
+ class PrimaryAction:
18
+ action_type: str # reschedule, delegate, negotiate, spend, communicate, rest, deprioritize
19
+ target_domain: str
20
+ metric_changes: dict
21
+ resource_cost: dict
22
+ description: str
23
+
24
+ @dataclass
25
+ class CommunicationAction:
26
+ recipient: str # boss, partner, family, friend, colleague
27
+ message_type: str # apologize, negotiate, inform, request, reassure
28
+ tone: str # formal, warm, urgent, calm, assertive
29
+ content: str
30
+
31
+ @dataclass
32
+ class AgentAction:
33
+ primary: PrimaryAction
34
+ communication: CommunicationAction = None
35
+ reasoning: str = ""
36
+ model_used: str = "unknown"
37
+ raw_completion: str = ""
38
+
39
+ def validate_action(action: AgentAction, budget: ResourceBudget) -> tuple[bool, str]:
40
+ cost = action.primary.resource_cost
41
+ if budget.time_hours < cost.get('time', 0.0):
42
+ return False, f"Not enough time (Needs {cost.get('time')}h, has {budget.time_hours:.1f}h)"
43
+ if budget.money_dollars < cost.get('money', 0.0):
44
+ return False, f"Not enough money (Needs ${cost.get('money')}, has ${budget.money_dollars:.1f})"
45
+ if budget.energy_units < cost.get('energy', 0.0):
46
+ return False, f"Not enough energy (Needs {cost.get('energy')}u, has {budget.energy_units:.1f}u)"
47
+ return True, ""
48
+
49
+ def apply_action(action: AgentAction, metrics: LifeMetrics, budget: ResourceBudget, person: SimPerson) -> tuple[LifeMetrics, ResourceBudget, float]:
50
+ """Validates, scales by personality uptake, and applies the action to the state."""
51
+
52
+ # 1. Validation
53
+ is_valid, reason = validate_action(action, budget)
54
+ if not is_valid:
55
+ # If invalid, the action fails but we return current state with 0 uptake
56
+ return metrics, budget, 0.0
57
+
58
+ # 2. Personality Scaling (Uptake)
59
+ current_stress = metrics.mental_wellbeing.stress_level
60
+ uptake_score = person.respond_to_action(
61
+ action.primary.action_type,
62
+ action.primary.resource_cost,
63
+ current_stress
64
+ )
65
+
66
+ # 3. Apply changes (Scaled by uptake)
67
+ new_metrics = copy.deepcopy(metrics)
68
+ for path, delta in action.primary.metric_changes.items():
69
+ # Guard: skip malformed keys without a domain prefix (e.g. LLM returns "stress_level" instead of "mental_wellbeing.stress_level")
70
+ if '.' not in path:
71
+ print(f" ⚠️ Skipping malformed metric key: '{path}' (expected 'domain.submetric')")
72
+ continue
73
+ parts = path.split('.', 1)
74
+ domain_name, sub_name = parts[0], parts[1]
75
+ domain = getattr(new_metrics, domain_name, None)
76
+ if domain is None or not hasattr(domain, sub_name):
77
+ print(f" ⚠️ Skipping unknown metric: '{path}'")
78
+ continue
79
+ current = getattr(domain, sub_name)
80
+
81
+ # Scale the benefit/cost by the person's receptiveness
82
+ try:
83
+ scaled_delta = float(delta) * uptake_score
84
+ setattr(domain, sub_name, max(0.0, min(100.0, current + scaled_delta)))
85
+ except ValueError:
86
+ print(f" ⚠️ Skipping metric change due to invalid delta value: '{delta}'")
87
+
88
+ # 4. Deduct resources (Fixed cost, doesn't scale with uptake)
89
+ new_budget = copy.deepcopy(budget)
90
+ new_budget.deduct(
91
+ time=action.primary.resource_cost.get('time', 0.0),
92
+ money=action.primary.resource_cost.get('money', 0.0),
93
+ energy=action.primary.resource_cost.get('energy', 0.0)
94
+ )
95
+
96
+ return new_metrics, new_budget, uptake_score
97
+
98
+ # 10 EXAMPLE ACTIONS for Friday 6PM Conflict
99
+ EXAMPLE_ACTIONS = [
100
+ AgentAction(
101
+ primary=PrimaryAction(
102
+ action_type="negotiate", target_domain="career",
103
+ metric_changes={"career.workload": -15.0, "mental_wellbeing.stress_level": -5.0},
104
+ resource_cost={"time": 1.5, "energy": 20.0},
105
+ description="Negotiate a Sunday deadline extension with my boss."
106
+ ),
107
+ communication=CommunicationAction("boss", "negotiate", "formal", "Due to flight issues, I need until Sunday PM for the report."),
108
+ reasoning="Relieving the immediate workload pressure is critical to reduce cascade spread."
109
+ ),
110
+ AgentAction(
111
+ primary=PrimaryAction(
112
+ action_type="spend", target_domain="finances",
113
+ metric_changes={"finances.liquidity": -350.0, "mental_wellbeing.stress_level": -10.0},
114
+ resource_cost={"time": 1.0, "energy": 15.0},
115
+ description="Rebook the canceled flight using a premium fare."
116
+ ),
117
+ reasoning="Immediate resolution of logistics fixes the source of the crisis."
118
+ ),
119
+ AgentAction(
120
+ primary=PrimaryAction(
121
+ action_type="communicate", target_domain="relationships",
122
+ metric_changes={"relationships.romantic": 12.0, "mental_wellbeing.stress_level": -5.0},
123
+ resource_cost={"time": 0.5, "energy": 10.0},
124
+ description="Call my partner to explain the situation and reassure them."
125
+ ),
126
+ communication=CommunicationAction("partner", "reassure", "warm", "Hey, I'm stuck but I'll be home soon. Miss you."),
127
+ reasoning="Prevents relationship decay while stress is high."
128
+ ),
129
+ AgentAction(
130
+ primary=PrimaryAction(
131
+ action_type="communicate", target_domain="finances",
132
+ metric_changes={"finances.liquidity": 200.0, "relationships.family": -5.0},
133
+ resource_cost={"time": 1.5, "energy": 25.0},
134
+ description="Ask my sibling for a temporary loan to cover rebooking."
135
+ ),
136
+ communication=CommunicationAction("family", "request", "urgent", "My card declined, can you Venmo me $200 for the flight?"),
137
+ reasoning="Fixes the liquidity block at a small social cost."
138
+ ),
139
+ AgentAction(
140
+ primary=PrimaryAction(
141
+ action_type="reschedule", target_domain="time",
142
+ metric_changes={"career.workload": -10.0, "time.free_hours_per_week": 5.0},
143
+ resource_cost={"time": 2.0, "energy": 15.0},
144
+ description="Cancel non-essential meetings to create a deep-work block."
145
+ ),
146
+ reasoning="Regaining time allows for better problem solving later."
147
+ ),
148
+ AgentAction(
149
+ primary=PrimaryAction(
150
+ action_type="rest", target_domain="physical_health",
151
+ metric_changes={"mental_wellbeing.stress_level": -12.0, "physical_health.energy": 10.0},
152
+ resource_cost={"time": 1.0, "energy": -10.0},
153
+ description="Take a 60-minute power nap in the airport lounge."
154
+ ),
155
+ reasoning="Restores energy to tackle the remaining Sunday deadline."
156
+ ),
157
+ AgentAction(
158
+ primary=PrimaryAction(
159
+ action_type="delegate", target_domain="career",
160
+ metric_changes={"career.workload": -10.0, "relationships.professional_network": -5.0},
161
+ resource_cost={"time": 1.0, "energy": 15.0},
162
+ description="Ask a colleague to handle the final formatting of the slides."
163
+ ),
164
+ communication=CommunicationAction("colleague", "request", "assertive", "I'm stuck at airport, can you finish the formatting?"),
165
+ reasoning="Reduces workload by leaning on the professional network."
166
+ ),
167
+ AgentAction(
168
+ primary=PrimaryAction(
169
+ action_type="deprioritize", target_domain="time",
170
+ metric_changes={"time.free_hours_per_week": 8.0, "relationships.social": -10.0},
171
+ resource_cost={"time": 0.5, "energy": 5.0},
172
+ description="Tell friends I can't attend the weekend gathering."
173
+ ),
174
+ communication=CommunicationAction("friend", "inform", "calm", "Hey, work crisis. Won't make it this weekend. Sorry!"),
175
+ reasoning="Aggressively reclaims time for high-value tasks."
176
+ ),
177
+ AgentAction(
178
+ primary=PrimaryAction(
179
+ action_type="communicate", target_domain="career",
180
+ metric_changes={"career.stability": 8.0, "mental_wellbeing.stress_level": -5.0},
181
+ resource_cost={"time": 0.5, "energy": 10.0},
182
+ description="Send an apology note to boss for the delay."
183
+ ),
184
+ communication=CommunicationAction("boss", "apologize", "formal", "Apologies for the delay caused by travel disruptions. On it now."),
185
+ reasoning="Maintains career stability during an active crisis."
186
+ ),
187
+ AgentAction(
188
+ primary=PrimaryAction(
189
+ action_type="reschedule", target_domain="finances",
190
+ metric_changes={"finances.debt_pressure": -10.0, "time.admin_overhead": 10.0},
191
+ resource_cost={"time": 2.0, "energy": 15.0},
192
+ description="Call the bank to unlock the declined card."
193
+ ),
194
+ communication=CommunicationAction("colleague", "request", "assertive", "Unlock my credit card immediately."),
195
+ reasoning="Removes the liquidity barrier by handling admin overhead."
196
+ )
197
+ ]
198
+
199
+ def main():
200
+ # 1. Setup Personalities
201
+ # Sam (Anxious Introvert): Neuroticism 0.9, Extraversion 0.1
202
+ sam = SimPerson(name="Sam (Introvert)", openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9)
203
+
204
+ # 2. Setup initial state (Friday 6PM Conflict)
205
+ from core.life_state import DependencyGraph
206
+ graph = DependencyGraph()
207
+ metrics = LifeMetrics() # starts at 70s
208
+ metrics = graph.cascade(metrics, {"career.workload": 35.0, "finances.liquidity": -40.0})
209
+ budget = ResourceBudget(time_hours=20.0, money_dollars=500.0, energy_units=100.0)
210
+
211
+ print("--- SIMULATING ACTIONS FOR SAM (ANXIOUS INTROVERT) ---")
212
+ print(f"Initial Stress: {metrics.mental_wellbeing.stress_level:.2f}")
213
+ print(f"Initial Metrics Health (Avg): {sum(metrics.flatten().values())/23:.2f}")
214
+
215
+ # 3. Apply each action
216
+ for i, action in enumerate(EXAMPLE_ACTIONS, 1):
217
+ print(f"\nACTION {i}: {action.primary.description}")
218
+
219
+ is_valid, reason = validate_action(action, budget)
220
+ if not is_valid:
221
+ print(f" ❌ FAILED: {reason}")
222
+ continue
223
+
224
+ m_after, b_after, uptake = apply_action(action, metrics, budget, sam)
225
+
226
+ print(f" βœ… SUCCESS | Uptake: {uptake:.2f}")
227
+ print(f" Cost: {action.primary.resource_cost}")
228
+
229
+ # Show specific improvements
230
+ for path, delta in action.primary.metric_changes.items():
231
+ domain_name, sub_name = path.split('.')
232
+ val_before = getattr(getattr(metrics, domain_name), sub_name)
233
+ val_after = getattr(getattr(m_after, domain_name), sub_name)
234
+ real_delta = val_after - val_before
235
+ print(f" - {path:25}: {val_before:.2f} -> {val_after:.2f} (Actual Change: {real_delta:+.2f})")
236
+
237
+ if __name__ == "__main__":
238
+ main()
core/cascade_utils.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ from core.life_state import LifeMetrics, DependencyGraph, CASCADE_DAMPENING_DEFAULT
3
+
4
+
5
+ def animate_cascade(primary_disruption: dict, metrics: LifeMetrics) -> list[dict]:
6
+ """Replay the cascade step-by-step and capture intermediate frames.
7
+
8
+ Returns a list of frames, each:
9
+ { 'flat': {metric: value}, 'status': {metric: 'primary'|'first'|'second'|'unchanged'} }
10
+ """
11
+ graph = DependencyGraph()
12
+ dampening = CASCADE_DAMPENING_DEFAULT
13
+ frames = []
14
+
15
+ # Frame 0 β€” initial stable state
16
+ base = copy.deepcopy(metrics)
17
+ base_flat = base.flatten()
18
+ frames.append({'flat': dict(base_flat), 'status': {k: 'unchanged' for k in base_flat}})
19
+
20
+ # Frame 1 β€” primary disruption only (no cascade)
21
+ f1 = copy.deepcopy(metrics)
22
+ primary_keys = set()
23
+ for path, amount in primary_disruption.items():
24
+ if '.' not in path:
25
+ continue
26
+ primary_keys.add(path)
27
+ dom_name, sub_name = path.split('.', 1)
28
+ dom = getattr(f1, dom_name, None)
29
+ if dom and hasattr(dom, sub_name):
30
+ setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + amount)))
31
+ f1_flat = f1.flatten()
32
+ frames.append({'flat': dict(f1_flat),
33
+ 'status': {k: ('primary' if k in primary_keys else 'unchanged') for k in f1_flat}})
34
+
35
+ # Frame 2 β€” first-order cascade
36
+ f2 = copy.deepcopy(f1)
37
+ first_order_keys = set()
38
+ queue_next = []
39
+ for path, amount in primary_disruption.items():
40
+ if '.' not in path or path not in graph.edges:
41
+ continue
42
+ for target, weight in graph.edges[path]:
43
+ impact = amount * weight * dampening
44
+ if abs(impact) >= 0.05:
45
+ first_order_keys.add(target)
46
+ dom_name, sub_name = target.split('.', 1)
47
+ dom = getattr(f2, dom_name, None)
48
+ if dom and hasattr(dom, sub_name):
49
+ setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + impact)))
50
+ queue_next.append((target, impact))
51
+ f2_flat = f2.flatten()
52
+ frames.append({'flat': dict(f2_flat), 'status': {
53
+ k: ('primary' if k in primary_keys else 'first' if k in first_order_keys else 'unchanged')
54
+ for k in f2_flat
55
+ }})
56
+
57
+ # Frame 3 β€” second-order cascade
58
+ f3 = copy.deepcopy(f2)
59
+ second_order_keys = set()
60
+ for src_path, src_mag in queue_next:
61
+ if src_path not in graph.edges:
62
+ continue
63
+ for target, weight in graph.edges[src_path]:
64
+ impact = src_mag * weight * dampening
65
+ if abs(impact) >= 0.05:
66
+ second_order_keys.add(target)
67
+ dom_name, sub_name = target.split('.', 1)
68
+ dom = getattr(f3, dom_name, None)
69
+ if dom and hasattr(dom, sub_name):
70
+ setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + impact)))
71
+ f3_flat = f3.flatten()
72
+ frames.append({'flat': dict(f3_flat), 'status': {
73
+ k: ('primary' if k in primary_keys else 'first' if k in first_order_keys
74
+ else 'second' if k in second_order_keys else 'unchanged')
75
+ for k in f3_flat
76
+ }})
77
+
78
+ return frames
core/feedback.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass, field
2
+ from datetime import datetime
3
+ from typing import List, Optional
4
+ from core.lifestack_env import LifeStackObservation
5
+
6
+ @dataclass
7
+ class OutcomeFeedback:
8
+ episode_id: str
9
+ submitted_at: datetime = field(default_factory=datetime.now)
10
+ # Did the advice work overall? 0-10 scale
11
+ overall_effectiveness: int = 5
12
+ # Which domains actually changed (user-reported)
13
+ domains_improved: List[str] = field(default_factory=list)
14
+ domains_worsened: List[str] = field(default_factory=list)
15
+ # Free text: what unexpected effects happened?
16
+ unexpected_effects: str = ""
17
+ # Time to resolution (hours)
18
+ resolution_time_hours: float = 0.0
19
+
20
+ def compute_human_feedback_reward(initial_metrics: dict, predicted_obs: LifeStackObservation, feedback: OutcomeFeedback) -> float:
21
+ """
22
+ Computes a reward score (0.0 to 1.0) based on how well the environment's
23
+ predicted outcomes match the human's reported reality.
24
+ """
25
+ # Metrics where a decrease is an improvement
26
+ inverted = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
27
+
28
+ predicted_improved = set()
29
+ for key, final_val in predicted_obs.metrics.items():
30
+ if key not in initial_metrics:
31
+ continue
32
+
33
+ initial_val = initial_metrics[key]
34
+ delta = final_val - initial_val
35
+ submetric = key.split('.')[-1]
36
+ domain = key.split('.')[0]
37
+
38
+ # Determine if this specific change is an "improvement"
39
+ is_improvement = False
40
+ if submetric in inverted:
41
+ if delta < -1.0: # Significant decrease in negative metric
42
+ is_improvement = True
43
+ else:
44
+ if delta > 1.0: # Significant increase in positive metric
45
+ is_improvement = True
46
+
47
+ if is_improvement:
48
+ predicted_improved.add(domain)
49
+
50
+ actual_improved = set(feedback.domains_improved)
51
+
52
+ union = predicted_improved | actual_improved
53
+ if not union:
54
+ overlap = 1.0 # Both agreed nothing improved
55
+ else:
56
+ intersection = predicted_improved & actual_improved
57
+ overlap = len(intersection) / len(union)
58
+
59
+ # 2. Effectiveness Score (0.0 - 1.0)
60
+ effectiveness_score = max(0.0, min(1.0, feedback.overall_effectiveness / 10.0))
61
+
62
+ # Weighted Average
63
+ return 0.5 * overlap + 0.5 * effectiveness_score
core/life_state.py ADDED
@@ -0,0 +1,281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass, field
2
+ import copy
3
+
4
+ # Cascade dampening factor β€” grounded in Starcke & Brand (2012)
5
+ # Stress effects attenuate ~40% per cognitive/behavioral hop.
6
+ # A disruption propagates at full strength to immediate neighbors,
7
+ # 60% strength to second-order nodes, 36% to third-order, etc.
8
+ CASCADE_DAMPENING_DEFAULT = 0.6
9
+ METRIC_FLOOR = 10.0
10
+
11
+ @dataclass
12
+ class CareerMetrics:
13
+ satisfaction: float = 70.0
14
+ workload: float = 70.0
15
+ stability: float = 70.0
16
+ growth_trajectory: float = 70.0
17
+
18
+ @dataclass
19
+ class FinanceMetrics:
20
+ liquidity: float = 70.0
21
+ debt_pressure: float = 70.0
22
+ monthly_runway: float = 70.0
23
+ long_term_health: float = 70.0
24
+
25
+ @dataclass
26
+ class RelationshipMetrics:
27
+ romantic: float = 70.0
28
+ family: float = 70.0
29
+ social: float = 70.0
30
+ professional_network: float = 70.0
31
+
32
+ @dataclass
33
+ class PhysicalHealthMetrics:
34
+ energy: float = 70.0
35
+ fitness: float = 70.0
36
+ sleep_quality: float = 70.0
37
+ nutrition: float = 70.0
38
+
39
+ @dataclass
40
+ class MentalWellbeingMetrics:
41
+ stress_level: float = 70.0
42
+ clarity: float = 70.0
43
+ motivation: float = 70.0
44
+ emotional_stability: float = 70.0
45
+
46
+ @dataclass
47
+ class TimeMetrics:
48
+ free_hours_per_week: float = 70.0
49
+ commute_burden: float = 70.0
50
+ admin_overhead: float = 70.0
51
+
52
+ @dataclass
53
+ class LifeMetrics:
54
+ career: CareerMetrics = field(default_factory=CareerMetrics)
55
+ finances: FinanceMetrics = field(default_factory=FinanceMetrics)
56
+ relationships: RelationshipMetrics = field(default_factory=RelationshipMetrics)
57
+ physical_health: PhysicalHealthMetrics = field(default_factory=PhysicalHealthMetrics)
58
+ mental_wellbeing: MentalWellbeingMetrics = field(default_factory=MentalWellbeingMetrics)
59
+ time: TimeMetrics = field(default_factory=TimeMetrics)
60
+
61
+ def flatten(self) -> dict:
62
+ """Returns a flat dictionary mapping 'domain.submetric' to value."""
63
+ flat = {}
64
+ for domain_name in self.__dataclass_fields__:
65
+ domain = getattr(self, domain_name)
66
+ for sub_name in domain.__dataclass_fields__:
67
+ flat[f"{domain_name}.{sub_name}"] = getattr(domain, sub_name)
68
+ return flat
69
+
70
+ @dataclass
71
+ class ResourceBudget:
72
+ time_hours: float = 20.0
73
+ money_dollars: float = 500.0
74
+ energy_units: float = 100.0
75
+
76
+ def deduct(self, time: float = 0.0, money: float = 0.0, energy: float = 0.0) -> bool:
77
+ """Returns False if any resource would go negative, otherwise deducts and returns True."""
78
+ if (self.time_hours < time or
79
+ self.money_dollars < money or
80
+ self.energy_units < energy):
81
+ return False
82
+
83
+ self.time_hours -= time
84
+ self.money_dollars -= money
85
+ self.energy_units = min(100.0, self.energy_units - energy) # cap at 100
86
+ return True
87
+
88
+ class DependencyGraph:
89
+ def __init__(self):
90
+ # source_node -> [(target_node, weight)]
91
+ self.edges = {
92
+ "career.workload": [
93
+ ("mental_wellbeing.stress_level", 0.70),
94
+ ("time.free_hours_per_week", -0.80)
95
+ ],
96
+ "finances.liquidity": [
97
+ ("mental_wellbeing.stress_level", -0.60),
98
+ ("finances.monthly_runway", 0.90)
99
+ ],
100
+ "mental_wellbeing.stress_level": [
101
+ ("physical_health.sleep_quality", -0.55),
102
+ ("mental_wellbeing.emotional_stability", -0.50),
103
+ ("mental_wellbeing.motivation", -0.40),
104
+ ("career.satisfaction", -0.35)
105
+ ],
106
+ "physical_health.sleep_quality": [
107
+ ("mental_wellbeing.clarity", 0.60),
108
+ ("physical_health.energy", 0.50)
109
+ ],
110
+ "relationships.romantic": [
111
+ ("mental_wellbeing.emotional_stability", 0.50)
112
+ ],
113
+ "time.free_hours_per_week": [
114
+ ("relationships.social", 0.45),
115
+ ("mental_wellbeing.stress_level", -0.30)
116
+ ],
117
+ "physical_health.energy": [
118
+ ("mental_wellbeing.motivation", 0.40),
119
+ ("physical_health.fitness", 0.30)
120
+ ],
121
+ "career.satisfaction": [
122
+ ("mental_wellbeing.motivation", 0.50)
123
+ ],
124
+ "finances.debt_pressure": [
125
+ ("mental_wellbeing.stress_level", 0.65)
126
+ ],
127
+ "physical_health.nutrition": [
128
+ ("physical_health.energy", 0.35)
129
+ ],
130
+ "physical_health.fitness": [
131
+ ("physical_health.energy", 0.40)
132
+ ],
133
+ "time.commute_burden": [
134
+ ("physical_health.energy", -0.30),
135
+ ("mental_wellbeing.stress_level", 0.25)
136
+ ],
137
+ "relationships.social": [
138
+ ("mental_wellbeing.emotional_stability", 0.30)
139
+ ],
140
+ "mental_wellbeing.clarity": [
141
+ ("career.growth_trajectory", 0.45)
142
+ ],
143
+ "finances.long_term_health": [
144
+ ("mental_wellbeing.stress_level", -0.40)
145
+ ],
146
+ "time.admin_overhead": [
147
+ ("mental_wellbeing.stress_level", 0.25)
148
+ ],
149
+ "career.stability": [
150
+ ("mental_wellbeing.stress_level", -0.35)
151
+ ],
152
+ "career.growth_trajectory": [
153
+ ("career.satisfaction", 0.40)
154
+ ],
155
+ "mental_wellbeing.motivation": [
156
+ ("career.growth_trajectory", 0.30)
157
+ ],
158
+ "relationships.professional_network": [
159
+ ("career.stability", 0.35)
160
+ ]
161
+ }
162
+
163
+ def _get_val(self, metrics: LifeMetrics, path: str) -> float:
164
+ if '.' not in path:
165
+ return 0.0
166
+ domain, sub = path.split('.', 1)
167
+ d = getattr(metrics, domain, None)
168
+ return getattr(d, sub, 0.0) if d else 0.0
169
+
170
+ def _set_val(self, metrics: LifeMetrics, path: str, val: float, is_cascade: bool = False):
171
+ if '.' not in path:
172
+ return
173
+ domain_name, sub_name = path.split('.', 1)
174
+ domain = getattr(metrics, domain_name, None)
175
+ if domain is None or not hasattr(domain, sub_name):
176
+ return
177
+ # Ensure values stay within bounds
178
+ floor = METRIC_FLOOR if is_cascade else 0.0
179
+ clamped_val = max(floor, min(100.0, val))
180
+ setattr(domain, sub_name, clamped_val)
181
+
182
+ def cascade(self, metrics: LifeMetrics, primary_disruption: dict, dampening: float = CASCADE_DAMPENING_DEFAULT, per_step_cascade_cap: int = 3) -> LifeMetrics:
183
+ """Applies disruption and propagates effects through the dependency graph.
184
+
185
+ The dampening factor (default 0.6) is grounded in three complementary
186
+ research findings:
187
+
188
+ 1. **Starcke & Brand (2012)** β€” Stress effects on decision-making
189
+ attenuate approximately 40% per cognitive/behavioral hop. A workload
190
+ spike directly raises stress at full magnitude, but the downstream
191
+ effect on sleep quality is only ~60% of that, and the tertiary effect
192
+ on mental clarity is ~36%. The 0.6 multiplier captures this empirical
193
+ attenuation rate.
194
+
195
+ 2. **General Systems Theory** β€” Perturbations in coupled systems lose
196
+ energy as they propagate through interconnected nodes. Each transfer
197
+ across an edge dissipates a fraction of the original signal, preventing
198
+ unbounded cascades in finite systems.
199
+
200
+ 3. **Empirical stress research** β€” Second-order life effects (e.g.
201
+ work stress β†’ poor sleep β†’ relationship strain) are consistently
202
+ reported as less severe than first-order effects in longitudinal
203
+ psychological studies, supporting a sub-unity propagation coefficient.
204
+
205
+ Args:
206
+ metrics: Current LifeMetrics state.
207
+ primary_disruption: Dict mapping 'domain.submetric' to delta float.
208
+ dampening: Propagation decay per hop (default CASCADE_DAMPENING_DEFAULT = 0.6).
209
+ per_step_cascade_cap: Max nodes allowed to be affected in one step.
210
+
211
+ Returns:
212
+ LifeMetrics: New state with disruption and cascade effects applied.
213
+ """
214
+ new_metrics = copy.deepcopy(metrics)
215
+ queue = []
216
+
217
+ for path, amount in primary_disruption.items():
218
+ if '.' not in path: # skip malformed keys from LLM
219
+ continue
220
+ old_val = self._get_val(new_metrics, path)
221
+ self._set_val(new_metrics, path, old_val + amount, is_cascade=False)
222
+ queue.append((path, amount))
223
+
224
+ cascaded_metrics = set()
225
+
226
+ while queue:
227
+ source_path, source_magnitude = queue.pop(0)
228
+
229
+ if source_path in self.edges:
230
+ for target_path, weight in self.edges[source_path]:
231
+ if target_path not in cascaded_metrics and len(cascaded_metrics) >= per_step_cascade_cap:
232
+ continue # Cap at max per_step_cascade_cap metrics affected
233
+
234
+ impact = source_magnitude * weight * dampening
235
+ if abs(impact) >= 0.05:
236
+ old_target_val = self._get_val(new_metrics, target_path)
237
+ self._set_val(new_metrics, target_path, old_target_val + impact, is_cascade=True)
238
+ cascaded_metrics.add(target_path)
239
+ queue.append((target_path, impact))
240
+
241
+ return new_metrics
242
+
243
+ def main():
244
+ # Create LifeMetrics with default values (all at 70)
245
+ metrics = LifeMetrics()
246
+
247
+ # Create DependencyGraph
248
+ graph = DependencyGraph()
249
+
250
+ # Define test disruption
251
+ disruption = {
252
+ "career.workload": 30.0,
253
+ "finances.liquidity": -40.0
254
+ }
255
+
256
+ print("--- LIFE STACK INITIAL STATE (All defaults at 70) ---")
257
+ before = metrics.flatten()
258
+ for k, v in before.items():
259
+ print(f"{k:35} : {v:.2f}")
260
+
261
+ # Run the cascade simulation
262
+ after_metrics = graph.cascade(metrics, disruption)
263
+ after = after_metrics.flatten()
264
+
265
+ print("\n--- LIFE STACK AFTER DISRUPTION & CASCADE ---")
266
+ print(f"Disruption Applied: {disruption}\n")
267
+
268
+ for k in sorted(before.keys()):
269
+ val_before = before[k]
270
+ val_after = after[k]
271
+ diff = val_after - val_before
272
+
273
+ if abs(diff) > 0.001:
274
+ status = f"-> {val_after:6.2f} ({'+' if diff > 0 else ''}{diff:6.2f}) [CHANGED]"
275
+ else:
276
+ status = f" {val_after:6.2f} ( unchanged )"
277
+
278
+ print(f"{k:35} : {val_before:6.2f} {status}")
279
+
280
+ if __name__ == "__main__":
281
+ main()
core/lifestack_env.py ADDED
@@ -0,0 +1,734 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ from typing import Any, Optional, Dict, List
3
+ from pydantic import Field
4
+
5
+ from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
6
+ from core.metric_schema import normalize_metric_path
7
+ from core.reward import compute_reward, compute_task_reward
8
+ from core.task import Task, ExoEvent, Route, Milestone, FlightCrisisTask
9
+ from core.verifier import LifeStackVerifier
10
+
11
+ try:
12
+ from openenv.core import Environment, Action, Observation, State
13
+ from openenv.core.env_server.types import EnvironmentMetadata
14
+ from openenv.core.rubrics import Rubric
15
+ USING_MODERN_API = True
16
+ except ImportError:
17
+ try:
18
+ from openenv.env import Env as Environment
19
+ from pydantic import BaseModel
20
+ # Shims for missing classes in older/alternative openenv
21
+ class Action(BaseModel): pass
22
+ class Observation(BaseModel): pass
23
+ class State(BaseModel): pass
24
+ class Rubric:
25
+ def __init__(self, *a, **k): pass
26
+ def compute(self, *a, **k): return 0.0
27
+ EnvironmentMetadata = None
28
+ USING_MODERN_API = False
29
+ except ImportError:
30
+ # Final fallback β€” must use BaseModel so Pydantic subclasses work
31
+ from pydantic import BaseModel
32
+ class Environment:
33
+ def __init__(self, rubric=None): self.rubric = rubric
34
+ def reset(self, *a, **k): pass
35
+ def step(self, *a, **k): pass
36
+ class Action(BaseModel): pass
37
+ class Observation(BaseModel): pass
38
+ class State(BaseModel): pass
39
+ class Rubric:
40
+ def __init__(self, *a, **k): pass
41
+ def compute(self, *a, **k): return 0.0
42
+ EnvironmentMetadata = None
43
+ USING_MODERN_API = False
44
+
45
+ class LifeStackAction(Action):
46
+ """Structured action for LifeStack."""
47
+ metric_changes: Dict[str, float] = Field(default_factory=dict, description="Metric adjustment deltas")
48
+ resource_cost: Dict[str, float] = Field(default_factory=dict, description="Time, money, and energy costs")
49
+ actions_taken: int = Field(default=0, description="Number of atomic actions taken")
50
+
51
+ # ToolAction fields (Long-horizon)
52
+ action_type: Optional[str] = Field(default=None, description="inspect, plan, execute, etc.")
53
+ target: Optional[str] = Field(default=None, description="e.g. route_id or hidden_key")
54
+ parameters: Dict[str, Any] = Field(default_factory=dict)
55
+ reasoning: Optional[str] = Field(default=None)
56
+ completion: Optional[str] = Field(default=None)
57
+
58
+ inspect_target: Optional[str] = Field(default=None, description="Optional hidden state key to inspect")
59
+ is_rollback: bool = Field(default=False, description="Set true to rollback the previous action.")
60
+
61
+ @classmethod
62
+ def from_agent_action(cls, agent_action: Any) -> "LifeStackAction":
63
+ """Unified converter from legacy AgentAction to LifeStackAction."""
64
+ primary = agent_action.primary
65
+ return cls(
66
+ action_type=primary.action_type,
67
+ target=primary.target_domain, # Mapping target_domain to target
68
+ metric_changes=primary.metric_changes,
69
+ resource_cost=primary.resource_cost,
70
+ reasoning=agent_action.reasoning,
71
+ completion=getattr(agent_action, 'raw_completion', ""),
72
+ actions_taken=1
73
+ )
74
+
75
+ class LifeStackObservation(Observation):
76
+ """Observation returned by LifeStack."""
77
+ metrics: Dict[str, float] = Field(default_factory=dict, description="Flattened 23-domain life metrics")
78
+ resources: Dict[str, float] = Field(default_factory=dict, description="Current budget remaining")
79
+ step: int = Field(default=0, description="Current episode step")
80
+ done: bool = Field(default=False)
81
+ reward: Optional[float] = Field(default=None)
82
+ metadata: Dict[str, Any] = Field(default_factory=dict)
83
+
84
+ class LifeStackState(State):
85
+ """Internal state of the LifeStack environment."""
86
+ current_metrics: LifeMetrics = Field(default_factory=LifeMetrics)
87
+ budget: ResourceBudget = Field(default_factory=ResourceBudget)
88
+ episode_id: Optional[str] = None
89
+ step_count: int = 0
90
+ inspected_keys: list = Field(default_factory=list) # revealed keys
91
+ consecutive_waits: int = 0
92
+ used_rollback: bool = Field(default=False)
93
+ rollback_penalty_charged: bool = Field(default=False)
94
+ previous_metrics: Optional[LifeMetrics] = None
95
+ previous_budget: Optional[ResourceBudget] = None
96
+
97
+ # New task fields
98
+ current_task: Optional[Task] = None
99
+ active_route_id: Optional[str] = None
100
+ milestones_achieved: list = Field(default_factory=list)
101
+ world_state: dict = Field(default_factory=dict)
102
+ hidden_state: dict = Field(default_factory=dict)
103
+ fired_event_ids: list = Field(default_factory=list)
104
+ exo_events_seen: int = 0
105
+ milestones_after_event: int = 0
106
+ closed_route_ids: set = Field(default_factory=set)
107
+ # Legacy / Personality fields
108
+ person: Optional[Any] = None
109
+ agent_history: List[tuple] = Field(default_factory=list)
110
+ current_conflict: Optional[Any] = None
111
+ rollback_penalty_charged: bool = Field(default=False)
112
+ cumulative_rel_delta: float = Field(default=0.0)
113
+ class LifeStackRubric(Rubric):
114
+ """Standard reward rubric for LifeStack."""
115
+ def forward(self, action: LifeStackAction, observation: LifeStackObservation) -> float:
116
+ # In LifeStack, reward is usually computed inside step() for state-transition access.
117
+ # This rubric provides a hook for external reward evaluation if needed.
118
+ return observation.reward if observation.reward is not None else 0.0
119
+
120
+ class PartialObsFilter:
121
+ @staticmethod
122
+ def filter(task: Task, revealed_keys: list) -> dict:
123
+ """Returns visible_world plus any keys the agent has explicitly inspected.
124
+
125
+ Revealed keys are checked against mutable_world first, then hidden_state.
126
+ Keys sourced from hidden_state are wrapped as
127
+ ``{"value": <val>, "source": "inspect"}`` so the agent knows they were
128
+ obtained via an inspect action rather than being freely observable.
129
+ """
130
+ obs_world = copy.deepcopy(task.visible_world)
131
+ for k in revealed_keys:
132
+ if k in task.mutable_world:
133
+ obs_world[k] = task.mutable_world[k]
134
+ elif k in task.hidden_state:
135
+ obs_world[k] = {"value": task.hidden_state[k], "source": "inspect"}
136
+ return obs_world
137
+
138
+ class WorldEngine:
139
+ def __init__(self, task: Task):
140
+ self.task = task
141
+ self.closed_routes = set()
142
+
143
+ def inject_events(self, step: int, world: dict, hidden: dict) -> list[ExoEvent]:
144
+ import random
145
+ fired = []
146
+ for event in self.task.event_schedule:
147
+ fire = False
148
+ if event.step == step:
149
+ fire = True
150
+ elif event.step == -1:
151
+ if random.random() < event.probability:
152
+ fire = True
153
+
154
+ if fire:
155
+ fired.append(event)
156
+ # Apply mutations
157
+ world.update(event.world_mutation)
158
+ hidden.update(event.hidden_state_mutation)
159
+ for rid in event.closes_routes:
160
+ self.closed_routes.add(rid)
161
+ return fired
162
+
163
+ def get_closed_routes(self) -> set[str]:
164
+ return self.closed_routes
165
+
166
+ _EnvBase = Environment[LifeStackAction, LifeStackObservation, LifeStackState] if USING_MODERN_API else Environment
167
+
168
+ class LifeStackEnv(_EnvBase):
169
+ """
170
+ LifeStack Environment v1.1 β€” Refactored for OpenEnv 0.2.3 compliance.
171
+ """
172
+ SUPPORTS_CONCURRENT_SESSIONS = True
173
+
174
+ def __init__(self, seed: Optional[int] = None, task=None, max_steps: int = 30):
175
+ if USING_MODERN_API:
176
+ super().__init__(rubric=LifeStackRubric())
177
+ else:
178
+ super().__init__()
179
+
180
+ self.max_steps = getattr(task, 'horizon', max_steps) if task else max_steps
181
+
182
+ self.metadata_internal = {
183
+ 'name': 'LifeStack-v1',
184
+ 'version': '1.1.0',
185
+ 'description': 'Premium multi-domain life conflict resolution simulation',
186
+ 'max_episode_steps': self.max_steps
187
+ }
188
+
189
+ self.graph = DependencyGraph()
190
+ self._internal_state = LifeStackState()
191
+
192
+ def get_metadata(self):
193
+ if not USING_MODERN_API:
194
+ return self.metadata_internal
195
+ from openenv.core.env_server.types import EnvironmentMetadata
196
+ return EnvironmentMetadata(
197
+ name=self.metadata_internal['name'],
198
+ version=self.metadata_internal['version'],
199
+ description=self.metadata_internal['description']
200
+ )
201
+
202
+ @property
203
+ def state(self) -> LifeStackState:
204
+ return self._internal_state
205
+
206
+ def reset(self, seed: Optional[int] = None, episode_id: Optional[str] = None,
207
+ task: Optional[Task] = None, conflict: Optional[Any] = None,
208
+ budget: Optional[dict] = None, person: Optional[Any] = None,
209
+ agent_history: Optional[List[tuple]] = None, **kwargs) -> LifeStackObservation:
210
+ """Resets the environment. Seed and task/conflict can be provided."""
211
+ if USING_MODERN_API and getattr(self, 'rubric', None):
212
+ self.rubric.reset()
213
+
214
+ if seed is not None:
215
+ import random
216
+ random.seed(seed)
217
+
218
+ # 1. Initialize Task
219
+ self._internal_state.current_task = task or FlightCrisisTask()
220
+ self.max_steps = getattr(self._internal_state.current_task, 'horizon', 30)
221
+
222
+ # 2. Reset State
223
+ self._internal_state.episode_id = episode_id
224
+ self._internal_state.step_count = 0
225
+ self._internal_state.current_metrics = LifeMetrics()
226
+ self._internal_state.inspected_keys = []
227
+ self._internal_state.consecutive_waits = 0
228
+ self._internal_state.used_rollback = False
229
+ self._internal_state.rollback_penalty_charged = False
230
+ self._internal_state.previous_metrics = None
231
+ self._internal_state.previous_budget = None
232
+ self._internal_state.rollback_penalty_charged = False
233
+ self._internal_state.cumulative_rel_delta = 0.0
234
+
235
+ # Task state
236
+ self._internal_state.world_state = copy.deepcopy(self._internal_state.current_task.mutable_world)
237
+ self._internal_state.hidden_state = copy.deepcopy(self._internal_state.current_task.hidden_state)
238
+ self._internal_state.milestones_achieved = []
239
+ self._internal_state.active_route_id = None
240
+ self._internal_state.fired_event_ids = []
241
+ self._internal_state.exo_events_seen = 0
242
+ self._internal_state.milestones_after_event = 0
243
+ self._internal_state.closed_route_ids = set()
244
+
245
+ self._internal_state.person = person
246
+ self._internal_state.agent_history = agent_history or []
247
+ self._internal_state.current_conflict = conflict
248
+
249
+ self.world_engine = WorldEngine(self._internal_state.current_task)
250
+
251
+ # 3. Budget Scaling
252
+ scale = max(1.0, self.max_steps / 5.0)
253
+ constraints = self._internal_state.current_task.constraints
254
+ self._internal_state.budget = ResourceBudget(
255
+ time_hours=budget.get("time", constraints.get("time", 20.0 * scale)) if budget else constraints.get("time", 20.0 * scale),
256
+ money_dollars=budget.get("money", constraints.get("money", 500.0 * scale)) if budget else constraints.get("money", 500.0 * scale),
257
+ energy_units=budget.get("energy", constraints.get("energy", 100.0 * scale)) if budget else constraints.get("energy", 100.0 * scale)
258
+ )
259
+
260
+ if conflict:
261
+ # Legacy disruption support
262
+ disruption = conflict.primary_disruption if hasattr(conflict, 'primary_disruption') else conflict
263
+ self._internal_state.current_metrics = self.graph.cascade(self._internal_state.current_metrics, disruption)
264
+ if budget is None and hasattr(conflict, 'resource_budget'):
265
+ rb = conflict.resource_budget
266
+ self._internal_state.budget = ResourceBudget(
267
+ time_hours=rb.get("time", 20.0),
268
+ money_dollars=rb.get("money", 500.0),
269
+ energy_units=rb.get("energy", 100.0)
270
+ )
271
+
272
+ return self._get_obs()
273
+
274
+ def _get_obs(self, done: bool = False, reward: Optional[float] = None,
275
+ success: bool = False, failure: bool = False,
276
+ failure_reason: str = "", routes_remaining: int = 0) -> LifeStackObservation:
277
+ revealed_world = PartialObsFilter.filter(
278
+ self._internal_state.current_task,
279
+ self._internal_state.inspected_keys
280
+ )
281
+
282
+ return LifeStackObservation(
283
+ metrics=self._internal_state.current_metrics.flatten(),
284
+ resources={
285
+ "time": self._internal_state.budget.time_hours,
286
+ "money": self._internal_state.budget.money_dollars,
287
+ "energy": self._internal_state.budget.energy_units
288
+ },
289
+ step=self._internal_state.step_count,
290
+ done=done,
291
+ reward=reward,
292
+ metadata={
293
+ "world_state": revealed_world,
294
+ "goal": self._internal_state.current_task.goal,
295
+ "active_route": self._internal_state.active_route_id,
296
+ "milestones": self._internal_state.milestones_achieved,
297
+ "events": self._internal_state.fired_event_ids,
298
+ "success": success,
299
+ "failure": failure,
300
+ "failure_reason": failure_reason,
301
+ "routes_remaining": routes_remaining,
302
+ "conflict_title": self._internal_state.current_conflict.title if hasattr(self._internal_state.current_conflict, 'title') else "Custom Task",
303
+ "person": self._internal_state.person.name if hasattr(self._internal_state.person, 'name') else "Unknown"
304
+ }
305
+ )
306
+
307
+ def _update_metric(self, path: str, delta: float):
308
+ """Internal helper for non-cascading updates."""
309
+ path = normalize_metric_path(path)
310
+ if '.' not in path:
311
+ return
312
+ domain_name, sub_name = path.split('.', 1)
313
+ domain = getattr(self._internal_state.current_metrics, domain_name, None)
314
+ if domain and hasattr(domain, sub_name):
315
+ val = getattr(domain, sub_name)
316
+ setattr(domain, sub_name, max(0.0, min(100.0, val + delta)))
317
+
318
+ def step(self, action: LifeStackAction, timeout_s: Optional[float] = None, **kwargs) -> LifeStackObservation:
319
+ """Executes one step in the environment using LifeStackAction logic."""
320
+ if isinstance(action, dict):
321
+ action = LifeStackAction(**action)
322
+
323
+ task = self._internal_state.current_task
324
+ state_before = copy.deepcopy(self._internal_state.current_metrics)
325
+ info_msgs = []
326
+
327
+ # 0. Personality Drift & Legacy Escalation
328
+ if self._internal_state.person:
329
+ drift_event = self._internal_state.person.drift(self._internal_state.step_count)
330
+ if drift_event:
331
+ path = drift_event.get('metric', '')
332
+ delta = drift_event.get('delta', 0)
333
+ if path and '.' in path:
334
+ self._update_metric(path, delta)
335
+ info_msgs.append(f"DRIFT: {drift_event['reason']}")
336
+
337
+ if self._internal_state.current_conflict and self._internal_state.step_count == 2:
338
+ from agent.conflict_generator import adaptive_escalate
339
+ conflict = self._internal_state.current_conflict
340
+ if hasattr(conflict, 'difficulty') and conflict.difficulty < 5:
341
+ new_conflict, reason = adaptive_escalate(conflict, self._internal_state.agent_history)
342
+ if new_conflict.id != conflict.id:
343
+ self._internal_state.current_conflict = new_conflict
344
+ info_msgs.append(f"ESCALATION: {reason} -> {new_conflict.title}")
345
+ fired_events = self.world_engine.inject_events(
346
+ self._internal_state.step_count,
347
+ self._internal_state.world_state,
348
+ self._internal_state.hidden_state
349
+ )
350
+ if fired_events:
351
+ self._internal_state.exo_events_seen += len(fired_events)
352
+ for e in fired_events:
353
+ self._internal_state.fired_event_ids.append(e.id)
354
+ info_msgs.append(f"EVENT_FIRED: {e.description}")
355
+
356
+ self._internal_state.closed_route_ids.update(self.world_engine.get_closed_routes())
357
+
358
+ # 2. Tool Logic & Metric Changes
359
+ tool_type = action.action_type or (
360
+ "rollback" if action.is_rollback else
361
+ "inspect" if action.inspect_target else
362
+ "execute"
363
+ )
364
+
365
+ allowed_keys = set(self._internal_state.current_metrics.flatten().keys())
366
+ metric_changes = {k: v for k, v in action.metric_changes.items() if k in allowed_keys}
367
+ resource_cost = copy.deepcopy(action.resource_cost)
368
+
369
+ # Handle Rollback
370
+ if tool_type == "rollback":
371
+ self._internal_state.step_count += 1
372
+ if self._internal_state.used_rollback:
373
+ info_msgs.append("ROLLBACK_DENIED: Already used once.")
374
+ return self._get_obs(reward=-0.1)
375
+ if not self._internal_state.previous_metrics:
376
+ return self._get_obs(reward=0.0)
377
+ self._internal_state.current_metrics = copy.deepcopy(self._internal_state.previous_metrics)
378
+ self._internal_state.budget = copy.deepcopy(self._internal_state.previous_budget)
379
+ self._internal_state.used_rollback = True
380
+ self._internal_state.rollback_penalty_charged = True # Penalty baked into the -0.1 return above
381
+ return self._get_obs(reward=-0.1)
382
+
383
+ # Save state for future rollback
384
+ self._internal_state.previous_metrics = copy.deepcopy(self._internal_state.current_metrics)
385
+ self._internal_state.previous_budget = copy.deepcopy(self._internal_state.budget)
386
+
387
+ # Handle Inspect
388
+ if tool_type == "inspect":
389
+ target = action.target or action.inspect_target
390
+ if target:
391
+ if target in self._internal_state.inspected_keys:
392
+ info_msgs.append(f"INSPECT_REDUNDANT: {target}")
393
+ else:
394
+ self._internal_state.inspected_keys.append(target)
395
+ info_msgs.append(f"INSPECT_REVEALED: {target}")
396
+ # Emit an explicit signal when a hidden-state value is uncovered.
397
+ if target in task.hidden_state:
398
+ info_msgs.append(
399
+ f"INSPECT_REVEALED_HIDDEN: {target} = {task.hidden_state[target]}"
400
+ )
401
+
402
+ # Handle Wait
403
+ if tool_type == "wait":
404
+ self._internal_state.consecutive_waits += 1
405
+ if self._internal_state.consecutive_waits >= 4:
406
+ metric_changes["mental_wellbeing.stress_level"] = metric_changes.get("mental_wellbeing.stress_level", 0) + 15.0
407
+ info_msgs.append("WAIT_CAP_EXCEEDED: Forced stress applied.")
408
+ else:
409
+ self._internal_state.consecutive_waits = 0
410
+
411
+ # Handle Route Execution
412
+ if tool_type == "execute" and action.target:
413
+ route = next((r for r in task.viable_routes if r.id == action.target), None)
414
+ if route:
415
+ # Check closed
416
+ if route.id in self._internal_state.closed_route_ids:
417
+ info_msgs.append(f"ROUTE_BLOCKED: {route.name}")
418
+ else:
419
+ # Check preconditions
420
+ pre_ok = True
421
+ for k, v in route.preconditions.items():
422
+ current_v = self._internal_state.hidden_state.get(k, self._internal_state.world_state.get(k))
423
+ if current_v != v:
424
+ pre_ok = False
425
+ break
426
+
427
+ if not pre_ok:
428
+ info_msgs.append(f"PRECONDITIONS_FAILED for {route.name}")
429
+ else:
430
+ # Success: Apply route
431
+ self._internal_state.active_route_id = route.id
432
+ self._internal_state.world_state.update(route.consequences)
433
+ info_msgs.append(f"ROUTE_SUCCESS: {route.name}")
434
+
435
+ # 3. Resource Deduction (must happen BEFORE metric changes to prevent budget-bypass exploit)
436
+ deduct_ok = self._internal_state.budget.deduct(
437
+ time=resource_cost.get('time', 0.0),
438
+ money=resource_cost.get('money', 0.0),
439
+ energy=resource_cost.get('energy', 0.0)
440
+ )
441
+ if not deduct_ok:
442
+ info_msgs.append("RESOURCE_DEPLETED_ACTION_BLOCKED")
443
+ metric_changes = {} # Discard changes β€” agent can't afford this action
444
+
445
+ # 4. Apply Metric and Cascade
446
+ sig_changes = {k: v for k, v in metric_changes.items() if abs(v) > 5.0}
447
+ for k, v in metric_changes.items():
448
+ if k not in sig_changes:
449
+ self._update_metric(k, v)
450
+
451
+ if sig_changes:
452
+ self._internal_state.current_metrics = self.graph.cascade(self._internal_state.current_metrics, sig_changes)
453
+
454
+ # 5. Task Progression Check
455
+ success_mets = LifeStackVerifier.check_success(task, self._internal_state.world_state, self._internal_state.hidden_state)
456
+ failure_mets = LifeStackVerifier.check_failure(task, self._internal_state.world_state, self._internal_state.hidden_state, self._internal_state.current_metrics.flatten())
457
+
458
+ # Check milestones dynamically
459
+ newly_met = LifeStackVerifier.check_new_milestones(task, self._internal_state.world_state, self._internal_state.hidden_state, self._internal_state.milestones_achieved)
460
+ for mid in newly_met:
461
+ self._internal_state.milestones_achieved.append(mid)
462
+ if self._internal_state.exo_events_seen > 0:
463
+ self._internal_state.milestones_after_event += 1
464
+ info_msgs.append(f"MILESTONE_UNLOCKED: {mid}")
465
+
466
+ # 6. Reward Calculation (Task-Aware)
467
+ routes_rem, _ = LifeStackVerifier.get_route_status(task, self._internal_state.closed_route_ids, self._internal_state.world_state, self._internal_state.hidden_state)
468
+
469
+ # Determine cascade collapse
470
+ metrics_after = self._internal_state.current_metrics.flatten()
471
+ metrics_before = state_before.flatten()
472
+ collapse = any(metrics_after[k] < 20 and metrics_before[k] >= 20 for k in metrics_after)
473
+
474
+ # Track cumulative relationship erosion across steps
475
+ rel_keys_cum = [k for k in metrics_after if k.startswith('relationships.')]
476
+ if rel_keys_cum:
477
+ step_rel_delta = sum(metrics_after[k] - metrics_before[k] for k in rel_keys_cum) / len(rel_keys_cum)
478
+ self._internal_state.cumulative_rel_delta += step_rel_delta
479
+
480
+ # Increment step_count BEFORE reward so timeout_check fires correctly
481
+ self._internal_state.step_count += 1
482
+
483
+ # Rollback penalty fires only once per episode
484
+ rollback_this_step = self._internal_state.used_rollback and not self._internal_state.rollback_penalty_charged
485
+ if rollback_this_step:
486
+ self._internal_state.rollback_penalty_charged = True
487
+
488
+ # conflict_domain from task.domain (not conflict.title) to prevent empty-string bypass
489
+ conflict_domain = task.domain if task and hasattr(task, 'domain') else ""
490
+
491
+ if task:
492
+ reward, breakdown = compute_task_reward(
493
+ state_before=state_before,
494
+ state_after=self._internal_state.current_metrics,
495
+ resources_used=resource_cost,
496
+ actions_taken=action.actions_taken,
497
+ milestones_achieved=self._internal_state.milestones_achieved,
498
+ success_conditions_met=success_mets,
499
+ exo_events_seen=self._internal_state.exo_events_seen,
500
+ milestones_after_event=self._internal_state.milestones_after_event,
501
+ routes_remaining=routes_rem,
502
+ rollback_used=rollback_this_step,
503
+ cascade_collapse=collapse,
504
+ task=task,
505
+ reasoning=getattr(action, 'reasoning', ""),
506
+ completion=getattr(action, 'completion', ""),
507
+ conflict_domain=conflict_domain,
508
+ step_count=self._internal_state.step_count,
509
+ max_steps=self.max_steps,
510
+ metric_changes=metric_changes,
511
+ cumulative_rel_delta=self._internal_state.cumulative_rel_delta,
512
+ action_type=tool_type
513
+ )
514
+ # Charge the rollback penalty only once per episode
515
+ if self._internal_state.used_rollback and not self._internal_state.rollback_penalty_charged:
516
+ self._internal_state.rollback_penalty_charged = True
517
+ else:
518
+ reward, breakdown = compute_reward(
519
+ state_before=state_before,
520
+ state_after=self._internal_state.current_metrics,
521
+ resources_used=resource_cost,
522
+ actions_taken=action.actions_taken,
523
+ metric_changes=metric_changes,
524
+ completion=getattr(action, 'completion', ""),
525
+ action_type=tool_type
526
+ )
527
+
528
+ # 7. End Conditions
529
+ # Check if ANY success condition is met.
530
+ # For multi-goal tasks with mutually exclusive routes, any() allows termination.
531
+ is_success = any(success_mets) if (success_mets and len(task.success_conditions) > 0) else False
532
+ is_task_failure = any(val == True for val in failure_mets)
533
+ metric_death = any(v <= 10 for v in metrics_after.values())
534
+
535
+ failure_reason = ""
536
+ if is_task_failure:
537
+ reasons = [cond['key'] for i, cond in enumerate(task.failure_conditions) if failure_mets[i]]
538
+ failure_reason = f"Condition failed: {', '.join(reasons)}"
539
+ elif metric_death:
540
+ dead_metrics = [k for k, v in metrics_after.items() if v <= 0]
541
+ failure_reason = f"Metrics hit zero: {', '.join(dead_metrics)}"
542
+ elif routes_rem == 0 and not is_success:
543
+ failure_reason = "Dead end: No reachable routes left."
544
+
545
+ terminated = is_task_failure or metric_death
546
+ truncated = self._internal_state.step_count >= self.max_steps
547
+ if is_success:
548
+ truncated = True
549
+ done = terminated or truncated
550
+
551
+ observation = self._get_obs(
552
+ done,
553
+ reward,
554
+ success=is_success,
555
+ failure=terminated,
556
+ failure_reason=failure_reason,
557
+ routes_remaining=routes_rem
558
+ )
559
+ observation.metadata["breakdown"] = breakdown
560
+ observation.metadata["info"] = info_msgs
561
+ return observation
562
+
563
+ def rollout(self, n_steps: int = 7, gamma: float = 0.9) -> dict:
564
+ """
565
+ Simulate n_steps null/rest actions starting from the current env state.
566
+
567
+ Intended to be called immediately AFTER env.step(model_action) so it
568
+ models "what happens to your life over the next N days if nothing
569
+ extraordinary occurs."
570
+
571
+ The env state is fully restored after the rollout β€” calling this is
572
+ side-effect-free from the caller's perspective.
573
+
574
+ Returns:
575
+ {
576
+ "discounted_reward": float, # Ξ³-discounted cumulative
577
+ "immediate_r0": float, # reward from the action (caller supplies)
578
+ "trajectory": [ # one entry per simulated day
579
+ {
580
+ "step": int, # 1-indexed future day
581
+ "reward": float,
582
+ "metrics": Dict[str, float], # flattened snapshot
583
+ "discounted_contribution": float,
584
+ },
585
+ ...
586
+ ],
587
+ "n_steps_completed": int,
588
+ }
589
+ """
590
+ saved_state = copy.deepcopy(self._internal_state)
591
+
592
+ null_action = LifeStackAction(
593
+ action_type="rest",
594
+ target="time",
595
+ metric_changes={},
596
+ resource_cost={},
597
+ actions_taken=0,
598
+ )
599
+
600
+ trajectory = []
601
+ cumulative = 0.0
602
+
603
+ for t in range(n_steps):
604
+ obs = self.step(null_action)
605
+ disc = (gamma ** (t + 1)) * float(obs.reward)
606
+ cumulative += disc
607
+ trajectory.append({
608
+ "step": t + 1,
609
+ "reward": float(obs.reward),
610
+ "metrics": dict(obs.metrics),
611
+ "discounted_contribution": round(disc, 5),
612
+ })
613
+ if obs.done:
614
+ break
615
+
616
+ # Restore β€” rollout must not mutate the env visible to the caller
617
+ self._internal_state = saved_state
618
+
619
+ return {
620
+ "discounted_reward": round(cumulative, 5),
621
+ "trajectory": trajectory,
622
+ "n_steps_completed": len(trajectory),
623
+ }
624
+
625
+ def render(self):
626
+ """Vibrant status report of the current state and task progress."""
627
+ task = self._internal_state.current_task
628
+ print("\n" + "═"*70)
629
+ print(f"🎯 GOAL: {task.goal} | Horizon: {self._internal_state.step_count}/{self.max_steps}")
630
+ print(f"βŒ› TIME: {self._internal_state.budget.time_hours:.1f}h | πŸ’΅ MONEY: ${self._internal_state.budget.money_dollars:.1f} | ⚑ ENERGY: {self._internal_state.budget.energy_units:.1f}")
631
+
632
+ if self._internal_state.active_route_id:
633
+ print(f"πŸ›£οΈ ACTIVE ROUTE: {self._internal_state.active_route_id}")
634
+
635
+ print(f"⭐ MILESTONES: {', '.join(self._internal_state.milestones_achieved) or 'None'}")
636
+
637
+ if self._internal_state.fired_event_ids:
638
+ print(f"🚨 EVENTS: {', '.join(self._internal_state.fired_event_ids)}")
639
+
640
+ flat = self._internal_state.current_metrics.flatten()
641
+ domain_labels = {
642
+ "career": "πŸ’Ό CAREER",
643
+ "finances": "πŸ’° FINANCES",
644
+ "relationships": "❀️ RELATIONSHIPS",
645
+ "physical_health": "πŸ’ͺ PHYSICAL",
646
+ "mental_wellbeing": "🧠 MENTAL",
647
+ "time": "πŸ“… TIME"
648
+ }
649
+
650
+ for dom, label in domain_labels.items():
651
+ print(f"\n{label}")
652
+ submetrics = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
653
+ inverted = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
654
+ for name, val in submetrics.items():
655
+ short = name.split('.')[1]
656
+ icon = ("πŸ”΄" if val > 70 else "🟒") if short in inverted else ("🟒" if val > 70 else "πŸ”΄")
657
+ if 40 <= val <= 70: icon = "🟑"
658
+ print(f" {icon} {short:20} : {val:5.2f}")
659
+ print("═"*70)
660
+
661
+
662
+ def env_render_compact(env, obs):
663
+ """Compact printer for testing."""
664
+ print(f"STEP: {obs.step} | REWARD: {obs.reward:.3f} | DONE: {obs.done}")
665
+ if obs.metadata.get("breakdown", {}).get("penalties_fired"):
666
+ print(f" ⚠️ PENALTIES: {obs.metadata['breakdown']['penalties_fired']}")
667
+
668
+
669
+ def main():
670
+ env = LifeStackEnv()
671
+
672
+ # 1. Reset with Friday 6PM Conflict
673
+ conflict = {
674
+ "career.workload": 30.0,
675
+ "finances.liquidity": -40.0
676
+ }
677
+ print("Initializing environment with Friday 6PM conflict...")
678
+ env.reset(conflict=conflict)
679
+ env.render()
680
+
681
+ total_reward = 0
682
+ metrics_history = []
683
+
684
+ # 2. Sequential Actions
685
+ scenarios = [
686
+ {
687
+ "name": "GOOD ACTION: Delegating and budget review",
688
+ "action": {
689
+ "metric_changes": {"career.workload": -15.0, "finances.liquidity": 10.0, "mental_wellbeing.stress_level": -5.0},
690
+ "resource_cost": {"time": 4.0, "money": 100.0, "energy": 20.0},
691
+ "actions_taken": 2
692
+ }
693
+ },
694
+ {
695
+ "name": "MEDIUM ACTION: Small self-care rest",
696
+ "action": {
697
+ "metric_changes": {"physical_health.sleep_quality": 6.0, "mental_wellbeing.clarity": 3.0},
698
+ "resource_cost": {"time": 2.0, "energy": -20.0}, # Rest recovers energy
699
+ "actions_taken": 1
700
+ }
701
+ },
702
+ {
703
+ "name": "INACTION: Let the cascade run",
704
+ "action": {
705
+ "metric_changes": {},
706
+ "resource_cost": {},
707
+ "actions_taken": 0
708
+ }
709
+ }
710
+ ]
711
+
712
+ for sce in scenarios:
713
+ print(f"\nTaking Action: {sce['name']}...")
714
+ action_obj = LifeStackAction(**sce['action'])
715
+ obs = env.step(action_obj)
716
+ env_render_compact(env, obs)
717
+ total_reward += (obs.reward or 0.0)
718
+
719
+ # 3. Final Summary
720
+ final_flat = env.state.current_metrics.flatten()
721
+ critical = [k for k, v in final_flat.items() if v < 20]
722
+
723
+ print("\n" + "β–ˆ"*60)
724
+ print("EPISODE SUMMARY")
725
+ print(f"Steps Taken : {env.state.step_count}")
726
+ print(f"Total Cumulative Reward : {total_reward:.4f}")
727
+ if critical:
728
+ print(f"Critical Floor Violations: {', '.join(critical)}")
729
+ else:
730
+ print("Critical Violations: NONE")
731
+ print("β–ˆ"*60)
732
+
733
+ if __name__ == "__main__":
734
+ main()
core/lifestack_gym_env.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ lifestack_gym_env.py β€” Gymnasium-compatible wrapper for LifeStack
3
+
4
+ Exposes the LifeStack environment as a standard gym.Env with:
5
+ - observation_space: Box(0, 100, shape=(26,)) β€” 23 sub-metrics + 3 resources
6
+ - action_space: Discrete(7) β€” 7 action types mapped to template actions
7
+ - Standard reset() / step() / render() API
8
+ """
9
+ '''we are not using this as of now, this was been used in old model :)'''
10
+ import gymnasium as gym
11
+ import numpy as np
12
+ from gymnasium import spaces
13
+ import random, copy
14
+ from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
15
+ from core.metric_schema import normalize_metric_path
16
+ from core.reward import compute_reward, compute_task_reward
17
+ from agent.conflict_generator import generate_conflict, ConflictEvent
18
+ from intake.simperson import SimPerson
19
+
20
+
21
+ # Map discrete action IDs to action types
22
+ ACTION_TYPE_MAP = {
23
+ 0: "negotiate",
24
+ 1: "communicate",
25
+ 2: "delegate",
26
+ 3: "spend",
27
+ 4: "reschedule",
28
+ 5: "rest",
29
+ 6: "execute",
30
+ }
31
+
32
+
33
+ class LifeStackGymEnv(gym.Env):
34
+ """
35
+ LifeStack as a Gymnasium environment.
36
+
37
+ Observation: 26-dim vector (23 life sub-metrics + 3 resource values)
38
+ Action: Discrete(7) β€” one of 7 action types
39
+ Reward: float in [-1, 1]
40
+ """
41
+ metadata = {"render_modes": ["human", "ansi"]}
42
+
43
+ def __init__(self, task=None, difficulty: int = None, render_mode: str = None, max_steps: int = 30):
44
+ super().__init__()
45
+ self.observation_space = spaces.Box(
46
+ low=0.0, high=100.0, shape=(26,), dtype=np.float32
47
+ )
48
+ self.action_space = spaces.Discrete(7)
49
+ self.render_mode = render_mode
50
+ self.task = task
51
+ self.difficulty = difficulty
52
+ self.max_steps = max_steps
53
+
54
+ from core.lifestack_env import LifeStackEnv
55
+ self.env = LifeStackEnv()
56
+ self._metric_keys = list(LifeMetrics().flatten().keys())
57
+
58
+ def _obs_vector(self) -> np.ndarray:
59
+ flat = self.env.state.current_metrics.flatten()
60
+ metric_vals = [flat[k] for k in self._metric_keys]
61
+ budget = self.env.state.budget
62
+ resource_vals = [
63
+ budget.time_hours,
64
+ budget.money_dollars,
65
+ budget.energy_units,
66
+ ]
67
+ return np.array(metric_vals + resource_vals, dtype=np.float32)
68
+
69
+ def reset(self, seed=None, options=None):
70
+ super().reset(seed=seed)
71
+
72
+ conflict = None
73
+ if self.task is None:
74
+ from agent.conflict_generator import generate_conflict
75
+ conflict = generate_conflict(self.difficulty)
76
+
77
+ obs_obj = self.env.reset(task=self.task, conflict=conflict)
78
+ return self._obs_vector(), obs_obj.metadata
79
+
80
+ def step(self, action: int):
81
+ from core.lifestack_env import LifeStackAction
82
+ action_type = ACTION_TYPE_MAP[action]
83
+
84
+ # Build logical action from template
85
+ metric_changes, resource_cost = self._action_to_changes(action_type)
86
+
87
+ # In this wrapper, we pick a reasonable target if needed
88
+ target = ""
89
+ current_task = self.env.state.current_task
90
+ if action_type == "execute" and current_task:
91
+ for r in current_task.viable_routes:
92
+ if r.id not in self.env.state.closed_route_ids:
93
+ target = r.id
94
+ break
95
+
96
+ ls_action = LifeStackAction(
97
+ action_type=action_type,
98
+ target=target,
99
+ reasoning=f"Agent chose {action_type} for discrete action {action}.",
100
+ metric_changes=metric_changes,
101
+ resource_cost=resource_cost,
102
+ actions_taken=1
103
+ )
104
+
105
+ obs_obj = self.env.step(ls_action)
106
+
107
+ terminated = obs_obj.done
108
+ # Truncated only if not naturally terminated
109
+ truncated = (not terminated) and (self.env.state.step_count >= (self.task.horizon if self.task else self.max_steps))
110
+
111
+ return self._obs_vector(), obs_obj.reward, terminated, truncated, {"breakdown": obs_obj.metadata.get("breakdown", {})}
112
+
113
+ def _action_to_changes(self, action_type: str):
114
+ """Maps an action type string to (metric_changes, resource_cost)."""
115
+ templates = {
116
+ "negotiate": (
117
+ {"career.workload": -15.0, "mental_wellbeing.stress_level": -5.0},
118
+ {"time": 1.5, "energy": 20.0},
119
+ ),
120
+ "communicate": (
121
+ {"relationships.romantic": 10.0, "mental_wellbeing.stress_level": -5.0},
122
+ {"time": 0.5, "energy": 10.0},
123
+ ),
124
+ "delegate": (
125
+ {"career.workload": -10.0, "relationships.professional_network": -5.0},
126
+ {"time": 1.0, "energy": 15.0},
127
+ ),
128
+ "spend": (
129
+ {"finances.liquidity": -20.0, "mental_wellbeing.stress_level": -10.0},
130
+ {"time": 1.0, "energy": 15.0},
131
+ ),
132
+ "reschedule": (
133
+ {"career.workload": -10.0, "time.free_hours_per_week": 5.0},
134
+ {"time": 2.0, "energy": 15.0},
135
+ ),
136
+ "rest": (
137
+ {"mental_wellbeing.stress_level": -12.0, "physical_health.energy": 10.0},
138
+ {"time": 1.0},
139
+ ),
140
+ "execute": (
141
+ {}, # executes a route target
142
+ {"time": 1.0, "energy": 10.0},
143
+ ),
144
+ }
145
+ return templates.get(action_type, ({}, {}))
146
+
147
+ def render(self):
148
+ if self.render_mode == "human":
149
+ # Delegate to the internal env's render
150
+ self.env.render()
151
+
152
+
153
+ # ── Quick smoke test ──
154
+ if __name__ == "__main__":
155
+ env = LifeStackGymEnv(difficulty=3, render_mode="human")
156
+ obs, info = env.reset()
157
+ print(f"Conflict: {info['conflict_title']} | Person: {info['person']}")
158
+ print(f"Obs shape: {obs.shape}, dtype: {obs.dtype}")
159
+ env.render()
160
+
161
+ total = 0.0
162
+ done = False
163
+ while not done:
164
+ act = env.action_space.sample()
165
+ obs, rew, term, trunc, info = env.step(act)
166
+ total += rew
167
+ done = term or trunc
168
+ print(f" Action {act} β†’ reward {rew:.3f}")
169
+
170
+ env.render()
171
+ print(f"\nTotal reward: {total:.3f}")
core/metric_schema.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ from core.life_state import LifeMetrics
3
+
4
+
5
+ VALID_METRIC_PATHS = tuple(sorted(LifeMetrics().flatten().keys()))
6
+
7
+ LEGACY_METRIC_ALIASES = {
8
+ "physical_health.exercise_routine": "physical_health.fitness",
9
+ }
10
+
11
+
12
+ def normalize_metric_path(path: str) -> str:
13
+ """Map legacy or malformed metric names onto the current LifeMetrics schema."""
14
+ if not isinstance(path, str):
15
+ return ""
16
+ path = path.strip()
17
+ return LEGACY_METRIC_ALIASES.get(path, path)
18
+
19
+
20
+ def is_valid_metric_path(path: str) -> bool:
21
+ return normalize_metric_path(path) in VALID_METRIC_PATHS
22
+
23
+
24
+ def format_valid_metrics() -> str:
25
+ grouped = {}
26
+ for path in VALID_METRIC_PATHS:
27
+ domain, metric = path.split(".", 1)
28
+ grouped.setdefault(domain, []).append(metric)
29
+ return "\n".join(
30
+ f"{domain}: {', '.join(metrics)}" for domain, metrics in grouped.items()
31
+ )
core/reward.py ADDED
@@ -0,0 +1,463 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ import copy
3
+ import json
4
+ import re
5
+ from core.life_state import LifeMetrics
6
+ from core.task import Task
7
+
8
+
9
+
10
+ def compute_reward(
11
+ state_before: LifeMetrics,
12
+ state_after: LifeMetrics,
13
+ resources_used: dict,
14
+ actions_taken: int,
15
+ metric_changes: dict = None,
16
+ completion: str = None,
17
+ disruption_baseline: int = None,
18
+ action_type: str = ""
19
+ ) -> tuple[float, dict]:
20
+ """
21
+ Computes the reward for a life step based on changes in LifeMetrics and resource usage.
22
+
23
+ Args:
24
+ state_before: The state at the start of the step.
25
+ state_after: The state after actions and cascades.
26
+ resources_used: Dict with keys 'time', 'money', 'energy'.
27
+ actions_taken: Integer count of intentional actions performed.
28
+ disruption_baseline: Expected number of metrics affected by an action.
29
+
30
+ Returns:
31
+ tuple[float, dict]: (final_reward, breakdown_dict)
32
+ """
33
+ before_flat = state_before.flatten()
34
+ after_flat = state_after.flatten()
35
+
36
+ # 1. OUTCOME SCORE (Weighted average of positive deltas)
37
+ domain_weights = {
38
+ "career": 1/6,
39
+ "finances": 1/6,
40
+ "relationships": 1/6,
41
+ "physical_health": 1/6,
42
+ "mental_wellbeing": 1/6,
43
+ "time": 1/6
44
+ }
45
+
46
+ # Map sub-metrics to their domains
47
+ submetrics_per_domain = {}
48
+ for k in before_flat.keys():
49
+ domain = k.split('.')[0]
50
+ submetrics_per_domain[domain] = submetrics_per_domain.get(domain, 0) + 1
51
+
52
+ outcome_score = 0.0
53
+ for k in before_flat.keys():
54
+ domain = k.split('.')[0]
55
+ delta = after_flat[k] - before_flat[k]
56
+ if delta > 0:
57
+ # Each domain is 1/6. Each sub-metric within a domain gets its equal share of that 1/6.
58
+ # Normalize delta by 100 (max possible increase is 100).
59
+ weight = domain_weights[domain] / submetrics_per_domain[domain]
60
+ outcome_score += (delta / 100.0) * weight
61
+
62
+ # 2. CASCADE CONTAINMENT SCORE
63
+ worsened_count = sum(1 for k in before_flat.keys() if after_flat[k] < before_flat[k])
64
+ total_metrics = len(before_flat)
65
+ cascade_containment_score = 1.0 - (worsened_count / total_metrics)
66
+
67
+ # 3. RESOURCE EFFICIENCY SCORE
68
+ # Available: time 20, money 500, energy 100
69
+ m_time = resources_used.get('time', 0.0) / 20.0
70
+ m_money = resources_used.get('money', 0.0) / 500.0
71
+ m_energy = resources_used.get('energy', 0.0) / 100.0
72
+
73
+ # Normalize by total slots (3 resources)
74
+ resource_efficiency_score = 1.0 - ((m_time + m_money + m_energy) / 3.0)
75
+ resource_efficiency_score = max(0.0, min(1.0, resource_efficiency_score))
76
+
77
+ # 4. RELATIONSHIP PRESERVATION SCORE (Sigmoid applied to average delta)
78
+ rel_keys = [k for k in before_flat.keys() if k.startswith('relationships.')]
79
+ avg_rel_before = sum(before_flat[k] for k in rel_keys) / len(rel_keys)
80
+ avg_rel_after = sum(after_flat[k] for k in rel_keys) / len(rel_keys)
81
+ delta_rel = avg_rel_after - avg_rel_before
82
+
83
+ # score = 1 / (1 + exp(-delta/10))
84
+ relationship_preservation_score = 1.0 / (1.0 + math.exp(-delta_rel / 10.0))
85
+
86
+ # FINAL REWARD FORMULA
87
+ base_reward = (
88
+ (0.40 * outcome_score) +
89
+ (0.25 * cascade_containment_score) +
90
+ (0.20 * resource_efficiency_score) +
91
+ (0.15 * relationship_preservation_score)
92
+ )
93
+
94
+ # PENALTIES
95
+ penalties = 0.0
96
+ fired = []
97
+
98
+ # -0.50 if ANY metric is below 20 after the step
99
+ if any(v < 20 for v in after_flat.values()):
100
+ penalties -= 0.50
101
+ fired.append("CRITICAL_FLOOR_VIOLATION")
102
+
103
+ # -0.30 if cascade spread wider than the number of metrics the agent directly changed
104
+ # Scaled baseline from task metadata preferred over hardcoded default
105
+ if disruption_baseline is None:
106
+ disruption_baseline = len(metric_changes) if metric_changes else 2
107
+
108
+ if worsened_count > disruption_baseline:
109
+ penalties -= 0.30
110
+ fired.append("CASCADE_SPREAD_WIDER")
111
+
112
+ # -0.40 if actions_taken == 0
113
+ if actions_taken == 0:
114
+ penalties -= 0.40
115
+ fired.append("INACTION_PENALTY")
116
+
117
+ # -0.15 if relationships domain average dropped more than 20 points
118
+ if delta_rel < -20:
119
+ penalties -= 0.15
120
+ fired.append("RELATIONSHIP_COLLAPSE")
121
+
122
+ # [NEW] Plausibility Penalty
123
+ plaus = 0.0
124
+ if metric_changes:
125
+ plaus = reward_plausibility_check(metric_changes, resources_used)
126
+ if plaus < 0:
127
+ penalties += plaus
128
+ fired.append("PLAUSIBILITY_VIOLATION")
129
+
130
+ # [NEW] Format Compliance & Reasoning
131
+ comp_reward = 0.0
132
+ reasoning = ""
133
+ if completion:
134
+ comp_reward = reward_format_compliance(completion)
135
+ try:
136
+ # Simple extract reasoning from JSON if possible
137
+ import json
138
+ data = json.loads(completion)
139
+ reasoning = data.get("reasoning", "")
140
+ except:
141
+ pass
142
+
143
+ # [NEW] Reasoning Alignment (tied to action_type)
144
+ reasoning_score = reward_reasoning_coherence(reasoning, action_type=action_type)
145
+
146
+ final_reward = max(-1.0, min(1.0, base_reward + penalties))
147
+
148
+ breakdown = {
149
+ "components": {
150
+ "outcome": outcome_score,
151
+ "containment": cascade_containment_score,
152
+ "efficiency": resource_efficiency_score,
153
+ "preservation": relationship_preservation_score,
154
+ "format_compliance": comp_reward,
155
+ "plausibility": plaus,
156
+ "reasoning_alignment": reasoning_score
157
+ },
158
+ "base_reward": base_reward,
159
+ "penalties_total": penalties,
160
+ "penalties_fired": fired,
161
+ "metrics_worsened": worsened_count,
162
+ "rel_delta": delta_rel
163
+ }
164
+
165
+ return final_reward, breakdown
166
+
167
+ def compute_milestone_reward(milestones_achieved: list[str], task: Task) -> float:
168
+ if not task.milestones:
169
+ return 0.0
170
+ total_possible = sum(m.reward for m in task.milestones)
171
+ if total_possible == 0:
172
+ return 0.0
173
+ achieved = sum(m.reward for m in task.milestones if m.id in milestones_achieved)
174
+ return min(1.0, achieved / total_possible)
175
+
176
+ def compute_task_completion_reward(success_conditions_met: list[bool], task: Task) -> float:
177
+ # A task is completed if any of its target success conditions are satisfied.
178
+ # This handles tasks with multiple alternative goal-states (e.g. choice of routes).
179
+ if not success_conditions_met:
180
+ return 0.0
181
+ return 1.0 if any(success_conditions_met) else 0.0
182
+
183
+ def compute_replan_bonus(exo_events_seen: int, milestones_after_event: int) -> float:
184
+ # Scale bonus based on ability to bounce back after exogenous events
185
+ if exo_events_seen == 0:
186
+ return 0.0
187
+ return min(1.0, (milestones_after_event / exo_events_seen) * 0.5)
188
+
189
+ def compute_dead_end_penalty(routes_remaining: int) -> float:
190
+ return -0.5 if routes_remaining <= 0 else 0.0
191
+
192
+ def compute_task_reward(
193
+ state_before: LifeMetrics,
194
+ state_after: LifeMetrics,
195
+ resources_used: dict,
196
+ actions_taken: int,
197
+ milestones_achieved: list[str],
198
+ success_conditions_met: list[bool],
199
+ exo_events_seen: int,
200
+ milestones_after_event: int,
201
+ routes_remaining: int,
202
+ rollback_used: bool,
203
+ cascade_collapse: bool,
204
+ task: Task,
205
+ reasoning: str = "",
206
+ completion: str = "",
207
+ conflict_domain: str = "",
208
+ step_count: int = 0,
209
+ max_steps: int = 0,
210
+ metric_changes: dict = None,
211
+ cumulative_rel_delta: float = 0.0,
212
+ action_type: str = ""
213
+ ) -> tuple[float, dict]:
214
+ # 1. Base local components (with scaled disruption baseline from task metadata)
215
+ d_baseline = len(task.mutable_world) if task and hasattr(task, 'mutable_world') else None
216
+ local_reward, local_breakdown = compute_reward(state_before, state_after, resources_used, actions_taken,
217
+ metric_changes=metric_changes, completion=completion,
218
+ disruption_baseline=d_baseline, action_type=action_type)
219
+
220
+ # 2. Orchestrator components
221
+ # Use only the raw outcome component from local_breakdown to avoid double-counting
222
+ # efficiency, containment, or preservation which are added separately below.
223
+ outcome_score_local = local_breakdown["components"].get("outcome", 0.0)
224
+ milestone_score = compute_milestone_reward(milestones_achieved, task)
225
+ completion_score = compute_task_completion_reward(success_conditions_met, task)
226
+ replan_score = compute_replan_bonus(exo_events_seen, milestones_after_event)
227
+ efficiency_score = local_breakdown["components"].get("efficiency", 0.0)
228
+ preservation_score = local_breakdown["components"].get("preservation", 0.0)
229
+ reasoning_score = reward_reasoning_coherence(reasoning, action_type=action_type)
230
+
231
+ # Check for specific failure cases
232
+ timeout_pen = reward_timeout_check(step_count, max_steps, any(success_met for success_met in success_conditions_met) if success_conditions_met else False)
233
+ dead_end_pen = compute_dead_end_penalty(routes_remaining)
234
+
235
+ # 3. Final weighting (all components are now unique/non-overlapping)
236
+ # Weights: Milestone 35%, Completion 25%, Outcome 10%, Preservation 5%, Replan 10%, Efficiency 10%, Reasoning 5%
237
+ base_reward = (
238
+ (0.35 * milestone_score) +
239
+ (0.25 * completion_score) +
240
+ (0.10 * outcome_score_local) +
241
+ (0.05 * preservation_score) +
242
+ (0.10 * replan_score) +
243
+ (0.10 * efficiency_score) +
244
+ (0.05 * reasoning_score)
245
+ )
246
+
247
+ # 4. Penalties
248
+ penalties = 0.0
249
+ fired = []
250
+
251
+ if timeout_pen < 0:
252
+ penalties += timeout_pen
253
+ fired.append("TIMEOUT")
254
+
255
+ if dead_end_pen < 0:
256
+ penalties += dead_end_pen
257
+ fired.append("DEAD_END")
258
+
259
+ if rollback_used:
260
+ penalties += -0.1
261
+ fired.append("ROLLBACK_USED")
262
+
263
+ if cascade_collapse:
264
+ penalties += -0.3
265
+ fired.append("CASCADE_COLLAPSE")
266
+
267
+ # Direct inaction penalty β€” not diluted by the 0.05 local weight
268
+ if actions_taken == 0:
269
+ penalties += -0.20
270
+ fired.append("TASK_INACTION_PENALTY")
271
+
272
+ # Cumulative relationship erosion across the episode
273
+ if cumulative_rel_delta < -20:
274
+ penalties += -0.15
275
+ fired.append("CUMULATIVE_RELATIONSHIP_EROSION")
276
+
277
+ final_reward = max(-1.0, min(1.0, base_reward + penalties))
278
+
279
+ breakdown = {
280
+ "components": {
281
+ "local_metric_delta": outcome_score_local,
282
+ "milestone": milestone_score,
283
+ "completion": completion_score,
284
+ "replan": replan_score,
285
+ "efficiency": efficiency_score,
286
+ "reasoning": reasoning_score,
287
+ "format_compliance": local_breakdown["components"].get("format_compliance", 0.0),
288
+ "plausibility": local_breakdown["components"].get("plausibility", 0.0),
289
+ "timeout_penalty": timeout_pen
290
+ },
291
+ "base_reward": base_reward,
292
+ "penalties_total": penalties,
293
+ "penalties_fired": fired,
294
+ "local_breakdown": local_breakdown
295
+ }
296
+
297
+ return final_reward, breakdown
298
+
299
+ def reward_format_compliance(completion: str) -> float:
300
+ """
301
+ Scores the completion based on its format (JSON validity and required fields).
302
+
303
+ Returns:
304
+ +1.0: Valid JSON with all required fields:
305
+ action_type, target_domain, metric_changes, resource_cost, reasoning
306
+ +0.5: Any parseable JSON (including partial/incomplete dicts)
307
+ -0.5: Invalid JSON / unparseable
308
+ -1.0: Empty strings or refusal content
309
+ """
310
+ if not completion or len(completion.strip()) < 10:
311
+ return -1.0
312
+
313
+ # Potential refusal indicators
314
+ if any(x in completion.lower() for x in ["i cannot", "i'm sorry", "as an ai"]):
315
+ return -1.0
316
+
317
+ # Extract JSON content from markdown code blocks if present
318
+ json_str = completion.strip()
319
+ if "```json" in json_str:
320
+ json_str = json_str.split("```json")[-1].split("```")[0].strip()
321
+ elif "```" in json_str:
322
+ json_str = json_str.split("```")[-1].split("```")[0].strip()
323
+
324
+ try:
325
+ data = json.loads(json_str)
326
+ required = ["action_type", "target_domain", "metric_changes", "resource_cost", "reasoning"]
327
+ if isinstance(data, dict) and all(k in data and data.get(k) is not None for k in required):
328
+ return 1.0
329
+ return 0.5
330
+ except json.JSONDecodeError:
331
+ # Final attempt: try to find anything between { and }
332
+ match = re.search(r'\{.*\}', json_str, re.DOTALL)
333
+ if match:
334
+ try:
335
+ data = json.loads(match.group(0))
336
+ required = ["action_type", "target_domain", "metric_changes", "resource_cost", "reasoning"]
337
+ if isinstance(data, dict) and all(k in data and data.get(k) is not None for k in required):
338
+ return 1.0
339
+ return 0.5
340
+ except:
341
+ pass
342
+ return -0.5
343
+
344
+ def reward_plausibility_check(metric_changes: dict, resource_cost: dict) -> float:
345
+ """
346
+ Anti-gaming check. Prevents the model from claiming massive metric changes while spending 0 resources.
347
+ Resource cost is normalized to comparable units (time/20h, money/$500, energy/100pts).
348
+ """
349
+ total_delta = sum(abs(v) for v in metric_changes.values())
350
+
351
+ # Zero-cost shortcut: any non-trivial claim with no cost at all is implausible
352
+ # Also handles empty resource_cost.
353
+ if not resource_cost or all(v == 0 for v in resource_cost.values()):
354
+ if total_delta > 3.0:
355
+ return -0.30
356
+ return 0.0
357
+
358
+ # Normalize each resource dimension to [0,1] before summing
359
+ norm_time = resource_cost.get('time', 0.0) / 20.0
360
+ norm_money = resource_cost.get('money', 0.0) / 500.0
361
+ norm_energy = resource_cost.get('energy', 0.0) / 100.0
362
+ total_cost = norm_time + norm_money + norm_energy
363
+
364
+ ratio = total_delta / max(0.01, total_cost)
365
+
366
+ if ratio > 150:
367
+ return -0.30 # Claiming massive change for virtually free
368
+ if ratio > 80:
369
+ return -0.10 # Highly suspicious efficiency
370
+ return 0.0 # Plausible ratio
371
+
372
+ def reward_timeout_check(step_count: int, max_steps: int, done: bool) -> float:
373
+ """
374
+ Penalizes episodes that end by reaching the step limit without being resolved.
375
+ """
376
+ if step_count >= max_steps and not done:
377
+ return -0.20
378
+ return 0.0
379
+
380
+ def reward_reasoning_coherence(reasoning: str, action_type: str = "") -> float:
381
+ """
382
+ Harden verification of logical consistency. Requires both length and
383
+ alignment with the chosen action to prevent word-stuffing.
384
+ """
385
+ if not reasoning or len(reasoning.strip()) < 20:
386
+ return -0.20 # Severe penalty for lack of effort
387
+
388
+ reasoning_lower = reasoning.lower()
389
+ score = 0.0
390
+
391
+ # 1. Structural Logic Check
392
+ # Reward use of logical connectors rather than just list of facts
393
+ connectors = ["because", "since", "therefore", "due to", "resulting in", "consequently"]
394
+ if any(c in reasoning_lower for c in connectors):
395
+ score += 0.05
396
+
397
+ # 2. Action Alignment (Non-Gammable Anti-Hacking)
398
+ # The reasoning MUST logically justify the chosen category.
399
+ action_keywords = {
400
+ "spend": ["cost", "price", "expensive", "money", "budget", "finance"],
401
+ "rest": ["energy", "sleep", "exhaustion", "recharge", "break"],
402
+ "communicate": ["talk", "discuss", "speak", "message", "call", "explain"],
403
+ "delegate": ["hand off", "assign", "help", "junior", "colleague"],
404
+ "negotiate": ["bargain", "trade", "deal", "terms"],
405
+ "deprioritize": ["later", "postpone", "unimportant", "drop"],
406
+ "reschedule": ["reschedule", "delay", "postpone", "move", "time", "calendar", "slot"],
407
+ "execute": ["route", "plan", "action", "implement", "complete", "resolve", "execute"],
408
+ }
409
+
410
+ if action_type and action_type in action_keywords:
411
+ match = any(kw in reasoning_lower for kw in action_keywords[action_type])
412
+ if match:
413
+ score += 0.10
414
+ else:
415
+ score -= 0.20
416
+
417
+ return max(-0.30, min(0.30, score))
418
+
419
+ def main():
420
+ # Scenario setup
421
+ print("--- TESTING REWARD SYSTEM ---")
422
+
423
+ # 1. PERFECT ACTION: All metrics improve by 10 points
424
+ state_start = LifeMetrics() # Defaults at 70
425
+ state_perfect = copy.deepcopy(state_start)
426
+ for k in state_perfect.flatten().keys():
427
+ domain, sub = k.split('.')
428
+ current = getattr(getattr(state_perfect, domain), sub)
429
+ setattr(getattr(state_perfect, domain), sub, current + 10)
430
+
431
+ res_perfect = {"time": 2, "money": 50, "energy": 10}
432
+ reward_p, break_p = compute_reward(state_start, state_perfect, res_perfect, actions_taken=5)
433
+
434
+ print("\n[SCENARIO 1: PERFECT ACTION]")
435
+ print(f"Reward: {reward_p:.4f}")
436
+ print(f"Breakdown: {break_p}")
437
+
438
+ # 2. BAD ACTION: Relationships tank by 30 points, everything else stays same
439
+ state_bad = copy.deepcopy(state_start)
440
+ for k in state_bad.flatten().keys():
441
+ if k.startswith('relationships.'):
442
+ domain, sub = k.split('.')
443
+ current = getattr(getattr(state_bad, domain), sub)
444
+ setattr(getattr(state_bad, domain), sub, current - 30)
445
+
446
+ res_bad = {"time": 10, "money": 300, "energy": 80}
447
+ reward_b, break_b = compute_reward(state_start, state_bad, res_bad, actions_taken=1)
448
+
449
+ print("\n[SCENARIO 2: BAD ACTION (Relationships Tank)]")
450
+ print(f"Reward: {reward_b:.4f}")
451
+ print(f"Breakdown: {break_b}")
452
+
453
+ # 3. INACTION: Nothing changes
454
+ state_nothing = copy.deepcopy(state_start)
455
+ res_none = {}
456
+ reward_n, break_n = compute_reward(state_start, state_nothing, res_none, actions_taken=0)
457
+
458
+ print("\n[SCENARIO 3: INACTION]")
459
+ print(f"Reward: {reward_n:.4f}")
460
+ print(f"Breakdown: {break_n}")
461
+
462
+ if __name__ == "__main__":
463
+ main()
core/task.py ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass, field
2
+ from typing import Any, List, Dict
3
+
4
+ @dataclass
5
+ class HiddenStateField:
6
+ key: str # e.g. "boss_mood"
7
+ initial_value: Any # e.g. "neutral"
8
+ inspect_target: str # e.g. "call_boss" β€” which inspect action type reveals this
9
+ description: str # shown to agent after reveal
10
+
11
+ @dataclass
12
+ class ExoEvent:
13
+ step: int # inject at this step (inclusive); -1 = probabilistic
14
+ probability: float # 1.0 = deterministic; <1.0 = random at each step
15
+ id: str # e.g. "ticket_price_spike"
16
+ description: str # what agent sees in next observation
17
+ world_mutation: dict # e.g. {"ticket_price": 450, "seats_remaining": 1}
18
+ hidden_state_mutation: dict # e.g. {"boss_mood": "angry"}
19
+ closes_routes: list[str] = field(default_factory=list) # route IDs this event blocks
20
+
21
+ @dataclass
22
+ class Milestone:
23
+ id: str # e.g. "flight_rebooked"
24
+ description: str
25
+ condition_key: str # world/hidden key to check, e.g. "flight_rebooked"
26
+ condition_value: Any # e.g. True
27
+ reward: float # milestone reward added to episode total
28
+
29
+ @dataclass
30
+ class Route:
31
+ id: str # e.g. "rebook_premium"
32
+ name: str
33
+ description: str
34
+ required_action_types: list[str] # must use these tool actions to complete
35
+ preconditions: dict # world/hidden state checks, e.g. {"card_available": True}
36
+ consequences: dict # world mutations on route completion, e.g. {"flight_rebooked": True}
37
+ closes_routes: list[str] # route IDs this blocks
38
+ milestones_unlocked: list[str] # milestone IDs this route can hit
39
+ final_reward: float # bonus on route completion
40
+
41
+ @dataclass
42
+ class Task:
43
+ id: str
44
+ domain: str # "flight_crisis" | "code_merge_crisis"
45
+ goal: str
46
+ constraints: dict # e.g. {"budget_max": 400, "deadline_step": 18}
47
+ hidden_state: dict # full truth, agent never sees directly
48
+ mutable_world: dict # partial truth, some fields revealed by inspect
49
+ visible_world: dict # agent sees this at each step (subset of mutable_world)
50
+ success_conditions: list[dict] # e.g. [{"key": "flight_rebooked", "value": True}]
51
+ failure_conditions: list[dict] # e.g. [{"key": "missed_deadline", "value": True}]
52
+ event_schedule: list[ExoEvent]
53
+ viable_routes: list[Route]
54
+ milestones: list[Milestone]
55
+ horizon: int # max steps (20–50)
56
+ difficulty: int # 1–5
57
+ domain_metadata: dict # domain-specific extra data (story text, etc.)
58
+
59
+
60
+ def FlightCrisisTask() -> Task:
61
+ routes = [
62
+ Route(
63
+ id="rebook_premium",
64
+ name="Rebook Premium Option",
65
+ description="Call agent and rebook on premium ticket",
66
+ required_action_types=["communicate", "execute"],
67
+ preconditions={"card_available": True},
68
+ consequences={"flight_rebooked": True},
69
+ closes_routes=["wait_lounge"],
70
+ milestones_unlocked=["m1"],
71
+ final_reward=2.5
72
+ ),
73
+ Route(
74
+ id="wait_lounge",
75
+ name="Accept Delay & Work",
76
+ description="Stay at airport lounge and work on laptop",
77
+ required_action_types=["wait", "plan"],
78
+ preconditions={"lounge_access": True},
79
+ consequences={"caught_up": True},
80
+ closes_routes=["rebook_premium"],
81
+ milestones_unlocked=["m2"],
82
+ final_reward=1.8
83
+ )
84
+ ]
85
+ milestones = [
86
+ Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="flight_rebooked", condition_value=True, reward=1.0),
87
+ Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="caught_up", condition_value=True, reward=0.8),
88
+ ]
89
+ events = [
90
+ ExoEvent(step=5, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={"card_available": False}, closes_routes=[]),
91
+ ExoEvent(step=8, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={"lounge_access": False}, hidden_state_mutation={}, closes_routes=["wait_lounge"]),
92
+ ]
93
+ return Task(
94
+ id="flight_crisis_task_main",
95
+ domain="flight_crisis",
96
+ goal="Survive Airport Cancellation",
97
+ constraints={"budget_max": 800, "deadline_step": 20},
98
+ hidden_state={
99
+ "card_available": True
100
+ },
101
+ mutable_world={
102
+ "lounge_access": True,
103
+ "flight_rebooked": False,
104
+ "caught_up": False
105
+ },
106
+ visible_world={
107
+ "lounge_access": True
108
+ },
109
+ success_conditions=[{"key": "flight_rebooked", "value": True}],
110
+ failure_conditions=[{"key": "missed_deadline", "value": True}],
111
+ event_schedule=events,
112
+ viable_routes=routes,
113
+ milestones=milestones,
114
+ horizon=30,
115
+ difficulty=4,
116
+ domain_metadata={"story": "A major storm grounded commercial flights."}
117
+ )
118
+
119
+ def CodeMergeCrisisTask() -> Task:
120
+ """A high-difficulty technical crisis requiring rollback or hotfix."""
121
+ routes = [
122
+ Route(id="revert_commit", name="Revert Commit", description="Quickly revert the broken merge to unblock the team.", required_action_types=["delegate", "communicate"], preconditions={}, consequences={"pipeline_unblocked": True}, closes_routes=["hotfix"], milestones_unlocked=["m1"], final_reward=1.5),
123
+ Route(id="hotfix", name="Patch Forward", description="Find the logic error and push a hotfix.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"bug_resolved": True}, closes_routes=["revert_commit"], milestones_unlocked=["m2"], final_reward=3.0),
124
+ ]
125
+ milestones = [
126
+ Milestone(id="m1", description="CI pipeline is green again", condition_key="pipeline_unblocked", condition_value=True, reward=1.0),
127
+ Milestone(id="m2", description="Bug resolved without losing features", condition_key="bug_resolved", condition_value=True, reward=2.0),
128
+ ]
129
+ return Task(
130
+ id="code_merge_task_fallback",
131
+ domain="code_merge_crisis",
132
+ goal="Resolve Production Outage",
133
+ constraints={"budget_max": 1000, "deadline_step": 8},
134
+ hidden_state={"on_call_status": "alert"},
135
+ mutable_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
136
+ visible_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
137
+ success_conditions=[{"key": "pipeline_unblocked", "value": True}, {"key": "bug_resolved", "value": True}],
138
+ failure_conditions=[],
139
+ event_schedule=[],
140
+ viable_routes=routes,
141
+ milestones=milestones,
142
+ horizon=10,
143
+ difficulty=4,
144
+ domain_metadata={}
145
+ )
146
+
147
+ class TaskGenerator:
148
+ def __init__(self):
149
+ self.tasks = [FlightCrisisTask, CodeMergeCrisisTask]
150
+
151
+ def get_random_task(self) -> Task:
152
+ import random
153
+ return random.choice(self.tasks)()
core/verifier.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Set, Any, Tuple
2
+ from core.task import Task, Milestone, Route
3
+
4
+ class LifeStackVerifier:
5
+ """Standalone verifier for Task success, failure, and progression."""
6
+
7
+ @staticmethod
8
+ def _check_cond(cond: dict, world_state: dict, hidden_state: dict, metrics_flat: dict = None) -> bool:
9
+ key = cond['key']
10
+ target = cond['value']
11
+ op = cond.get('op', 'eq')
12
+
13
+ # Priority: Metrics > Hidden > World
14
+ val = None
15
+ if metrics_flat and key in metrics_flat:
16
+ val = metrics_flat[key]
17
+ else:
18
+ val = hidden_state.get(key, world_state.get(key))
19
+
20
+ if val is None:
21
+ return False
22
+
23
+ if op == 'eq': return val == target
24
+ if op == 'ne': return val != target
25
+ if op == 'gt': return val > target
26
+ if op == 'lt': return val < target
27
+ if op == 'ge': return val >= target
28
+ if op == 'le': return val <= target
29
+ return False
30
+
31
+ @staticmethod
32
+ def check_success(task: Task, world_state: dict, hidden_state: dict) -> list[bool]:
33
+ """Checks if task-specific success conditions are met."""
34
+ return [LifeStackVerifier._check_cond(c, world_state, hidden_state) for c in task.success_conditions]
35
+
36
+ @staticmethod
37
+ def check_failure(task: Task, world_state: dict, hidden_state: dict, metrics_flat: dict) -> list[bool]:
38
+ """Checks if task-specific or global failure conditions (metric death) are met."""
39
+ results = [LifeStackVerifier._check_cond(c, world_state, hidden_state, metrics_flat) for c in task.failure_conditions]
40
+ # 2. Metric death
41
+ if any(v <= 10 for v in metrics_flat.values()):
42
+ results.append(True)
43
+ return results
44
+
45
+ @staticmethod
46
+ def check_new_milestones(task: Task, world_state: dict, hidden_state: dict, achieved_ids: list) -> list[str]:
47
+ """Identifies any milestones that have just been met by current state."""
48
+ newly_met = []
49
+ for m in task.milestones:
50
+ if m.id not in achieved_ids:
51
+ val = hidden_state.get(m.condition_key, world_state.get(m.condition_key))
52
+ if val == m.condition_value:
53
+ newly_met.append(m.id)
54
+ return newly_met
55
+
56
+ @staticmethod
57
+ def get_route_status(task: Task, closed_ids: set, world_state: dict, hidden_state: dict) -> Tuple[int, bool]:
58
+ """Returns (remaining_routes_count, is_dead_end)."""
59
+ remaining = 0
60
+ for route in task.viable_routes:
61
+ if route.id in closed_ids:
62
+ continue
63
+
64
+ # Check if reachable via preconditions
65
+ pre_ok = True
66
+ for k, v in route.preconditions.items():
67
+ current_v = hidden_state.get(k, world_state.get(k))
68
+ if current_v != v:
69
+ pre_ok = False
70
+ break
71
+
72
+ if pre_ok:
73
+ remaining += 1
74
+
75
+ return remaining, remaining == 0
data/before_after_comparison.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "summary": {
3
+ "runs": 5,
4
+ "avg_no_memory": 1.13,
5
+ "avg_with_memory": 2.45,
6
+ "pct_improvement": 116.81,
7
+ "most_common_action_no_memory": "delegate",
8
+ "most_common_action_with_memory": "communicate",
9
+ "comm_usage_no_memory_pct": 40.0,
10
+ "comm_usage_yes_memory_pct": 100.0
11
+ },
12
+ "no_memory": [
13
+ {
14
+ "total_reward": 1.0,
15
+ "first_action": "delegate"
16
+ },
17
+ {
18
+ "total_reward": 1.2
19
+ }
20
+ ],
21
+ "with_memory": [
22
+ {
23
+ "total_reward": 2.5,
24
+ "first_action": "communicate"
25
+ },
26
+ {
27
+ "total_reward": 2.4
28
+ }
29
+ ]
30
+ }
data/conflicts.json ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "id": "d1_gym",
4
+ "title": "The Slump",
5
+ "story": "You haven't seen the inside of a gym in ten days. Your energy is flagging and your favorite jeans feel tight.",
6
+ "primary_disruption": {
7
+ "physical_health.fitness": -15.0
8
+ },
9
+ "decisions_required": [
10
+ "Wake up early for a run",
11
+ "Join a weekend boot camp",
12
+ "Ignore it and rest"
13
+ ],
14
+ "resource_budget": {
15
+ "time": 4.0,
16
+ "money": 0.0,
17
+ "energy": 20.0
18
+ },
19
+ "difficulty": 1
20
+ },
21
+ {
22
+ "id": "d1_bill",
23
+ "title": "Forgotten Invoice",
24
+ "story": "A late notice arrived for your electricity bill. It's not a lot, but the late fee is annoying.",
25
+ "primary_disruption": {
26
+ "finances.liquidity": -20.0
27
+ },
28
+ "decisions_required": [
29
+ "Pay it now",
30
+ "Call to dispute the fee",
31
+ "Set up autopay for next time"
32
+ ],
33
+ "resource_budget": {
34
+ "time": 1.0,
35
+ "money": 100.0,
36
+ "energy": 5.0
37
+ },
38
+ "difficulty": 1
39
+ },
40
+ {
41
+ "id": "d1_argument",
42
+ "title": "Heated Group Chat",
43
+ "story": "A minor political disagreement in the group chat turned personal. Everyone is being quiet now.",
44
+ "primary_disruption": {
45
+ "relationships.social": -20.0
46
+ },
47
+ "decisions_required": [
48
+ "Apologize to the group",
49
+ "Message the friend privately",
50
+ "Mute the chat for a week"
51
+ ],
52
+ "resource_budget": {
53
+ "time": 2.0,
54
+ "money": 30.0,
55
+ "energy": 15.0
56
+ },
57
+ "difficulty": 1
58
+ },
59
+ {
60
+ "id": "d2_project",
61
+ "title": "The Surge",
62
+ "story": "Your boss just walked by and dropped a 'small favor' on your desk. It looks like it'll take ten hours.",
63
+ "primary_disruption": {
64
+ "career.workload": 25.0,
65
+ "time.free_hours_per_week": -20.0
66
+ },
67
+ "decisions_required": [
68
+ "Work late all week",
69
+ "Delegate parts to a junior",
70
+ "Refuse the assignment"
71
+ ],
72
+ "resource_budget": {
73
+ "time": 10.0,
74
+ "money": 0.0,
75
+ "energy": 40.0
76
+ },
77
+ "difficulty": 2
78
+ },
79
+ {
80
+ "id": "d2_car",
81
+ "title": "Check Engine Light",
82
+ "story": "Your car started making a rhythmic thumping sound on the highway. The mechanic says the repair isn't cheap.",
83
+ "primary_disruption": {
84
+ "finances.liquidity": -30.0,
85
+ "time.commute_burden": 25.0
86
+ },
87
+ "decisions_required": [
88
+ "Repair it immediately",
89
+ "Take the bus for a week",
90
+ "Borrow a car from a friend"
91
+ ],
92
+ "resource_budget": {
93
+ "time": 5.0,
94
+ "money": 500.0,
95
+ "energy": 10.0
96
+ },
97
+ "difficulty": 2
98
+ },
99
+ {
100
+ "id": "d2_neglect",
101
+ "title": "Cold Dinner",
102
+ "story": "Your partner mentions they feel like 'roommates' lately. You realize you haven't had a real conversation in weeks.",
103
+ "primary_disruption": {
104
+ "relationships.romantic": -25.0,
105
+ "mental_wellbeing.stress_level": 20.0
106
+ },
107
+ "decisions_required": [
108
+ "Plan a surprise date",
109
+ "Have a long talk tonight",
110
+ "Buy a thoughtful gift"
111
+ ],
112
+ "resource_budget": {
113
+ "time": 6.0,
114
+ "money": 150.0,
115
+ "energy": 30.0
116
+ },
117
+ "difficulty": 2
118
+ },
119
+ {
120
+ "id": "d3_interview",
121
+ "title": "The Opportunity",
122
+ "story": "An old contact reached out for a dream job interview. You need to prep while keeping your current job afloat.",
123
+ "primary_disruption": {
124
+ "career.workload": 20.0,
125
+ "time.free_hours_per_week": -15.0,
126
+ "mental_wellbeing.stress_level": 20.0
127
+ },
128
+ "decisions_required": [
129
+ "Intensive weekend prep",
130
+ "Fake a sick day to interview",
131
+ "Turn it down to stay stable"
132
+ ],
133
+ "resource_budget": {
134
+ "time": 12.0,
135
+ "money": 50.0,
136
+ "energy": 50.0
137
+ },
138
+ "difficulty": 3
139
+ },
140
+ {
141
+ "id": "d3_family",
142
+ "title": "Family SOS",
143
+ "story": "Your sibling is going through a rough patch and needs help moving out and some financial support.",
144
+ "primary_disruption": {
145
+ "relationships.family": 20.0,
146
+ "time.free_hours_per_week": -25.0,
147
+ "finances.liquidity": -20.0
148
+ },
149
+ "decisions_required": [
150
+ "Spend the weekend helping",
151
+ "Send them money but stay home",
152
+ "Help them find other movers"
153
+ ],
154
+ "resource_budget": {
155
+ "time": 15.0,
156
+ "money": 400.0,
157
+ "energy": 60.0
158
+ },
159
+ "difficulty": 3
160
+ },
161
+ {
162
+ "id": "d3_health",
163
+ "title": "The Warning Sign",
164
+ "story": "You had a fainting spell at the office. Tests are expensive, and doctors say you need immediate change.",
165
+ "primary_disruption": {
166
+ "physical_health.energy": -30.0,
167
+ "mental_wellbeing.stress_level": 30.0,
168
+ "finances.liquidity": -40.0
169
+ },
170
+ "decisions_required": [
171
+ "Take a week of medical leave",
172
+ "Consult a high-end specialist",
173
+ "Change diet and sleep habits"
174
+ ],
175
+ "resource_budget": {
176
+ "time": 20.0,
177
+ "money": 800.0,
178
+ "energy": 5.0
179
+ },
180
+ "difficulty": 3
181
+ },
182
+ {
183
+ "id": "d4_review",
184
+ "title": "Judgment Day",
185
+ "story": "A major performance review is in three days. Rumors of layoffs are circulating and the atmosphere is tense.",
186
+ "primary_disruption": {
187
+ "career.workload": 30.0,
188
+ "mental_wellbeing.stress_level": 25.0,
189
+ "relationships.romantic": -15.0,
190
+ "time.free_hours_per_week": -20.0
191
+ },
192
+ "decisions_required": [
193
+ "Pull all-nighters to prove worth",
194
+ "Start networking for new roles",
195
+ "Draft a defensive report"
196
+ ],
197
+ "resource_budget": {
198
+ "time": 18.0,
199
+ "money": 0.0,
200
+ "energy": 80.0
201
+ },
202
+ "difficulty": 4
203
+ },
204
+ {
205
+ "id": "d4_move",
206
+ "title": "The Big Relocation",
207
+ "story": "You've decided to move across the country for growth. The logistics are a nightmare and friends are sad to see you go.",
208
+ "primary_disruption": {
209
+ "finances.liquidity": -50.0,
210
+ "relationships.social": -30.0,
211
+ "career.growth_trajectory": 20.0,
212
+ "time.admin_overhead": 30.0
213
+ },
214
+ "decisions_required": [
215
+ "Hire full-service movers",
216
+ "Host a series of farewell dinners",
217
+ "DIY pack everything"
218
+ ],
219
+ "resource_budget": {
220
+ "time": 30.0,
221
+ "money": 1500.0,
222
+ "energy": 100.0
223
+ },
224
+ "difficulty": 4
225
+ },
226
+ {
227
+ "id": "d4_audit",
228
+ "title": "Tax Audit",
229
+ "story": "The IRS has flagged your last three years of returns. You need to dig through thousands of documents while paying a CPA.",
230
+ "primary_disruption": {
231
+ "finances.long_term_health": -20.0,
232
+ "mental_wellbeing.stress_level": 30.0,
233
+ "time.admin_overhead": 40.0,
234
+ "finances.liquidity": -15.0
235
+ },
236
+ "decisions_required": [
237
+ "Spend nights scanning receipts",
238
+ "Hire a tax lawyer",
239
+ "Try to settle immediately"
240
+ ],
241
+ "resource_budget": {
242
+ "time": 25.0,
243
+ "money": 1000.0,
244
+ "energy": 40.0
245
+ },
246
+ "difficulty": 4
247
+ },
248
+ {
249
+ "id": "d5_friday",
250
+ "title": "Friday 6PM",
251
+ "story": "Your flight just got cancelled. Your card declined trying to rebook. Your boss moved Monday deadline to Sunday.",
252
+ "primary_disruption": {
253
+ "career.workload": 35.0,
254
+ "finances.liquidity": -40.0,
255
+ "mental_wellbeing.stress_level": 30.0,
256
+ "time.free_hours_per_week": -25.0
257
+ },
258
+ "decisions_required": [
259
+ "Book a bus and work on it",
260
+ "Call boss to negotiate",
261
+ "Crash at a nearby friend's"
262
+ ],
263
+ "resource_budget": {
264
+ "time": 10.0,
265
+ "money": 500.0,
266
+ "energy": 60.0
267
+ },
268
+ "difficulty": 5
269
+ },
270
+ {
271
+ "id": "d5_storm",
272
+ "title": "The Perfect Storm",
273
+ "story": "Your firm lost its biggest client, your partner moved out, and your car got towed\u2014all on the same Tuesday.",
274
+ "primary_disruption": {
275
+ "career.stability": -30.0,
276
+ "relationships.romantic": -25.0,
277
+ "finances.debt_pressure": 35.0,
278
+ "physical_health.energy": -25.0
279
+ },
280
+ "decisions_required": [
281
+ "Find an emergency side hustle",
282
+ "Beg partner for a second chance",
283
+ "Take a mental health day"
284
+ ],
285
+ "resource_budget": {
286
+ "time": 8.0,
287
+ "money": 200.0,
288
+ "energy": 20.0
289
+ },
290
+ "difficulty": 5
291
+ },
292
+ {
293
+ "id": "d5_burnout",
294
+ "title": "The Total Collapse",
295
+ "story": "You can't get out of bed. Your body has quit, your motivation is gone, and work emails are piling into the hundreds.",
296
+ "primary_disruption": {
297
+ "mental_wellbeing.motivation": -40.0,
298
+ "physical_health.sleep_quality": -30.0,
299
+ "career.satisfaction": -35.0,
300
+ "relationships.family": -20.0
301
+ },
302
+ "decisions_required": [
303
+ "Request indefinite medical leave",
304
+ "Disconnect all electronics",
305
+ "Let it all burn and sleep"
306
+ ],
307
+ "resource_budget": {
308
+ "time": 40.0,
309
+ "money": 2000.0,
310
+ "energy": 0.0
311
+ },
312
+ "difficulty": 5
313
+ }
314
+ ]
data/demo_signals.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "persona": "Jordan (PM at Series-B startup)",
3
+ "generated_at": "2026-04-25T09:00:00",
4
+ "note": "Pre-baked demo payload β€” represents a stressed product manager mid-sprint",
5
+
6
+ "gmail": {
7
+ "unread_count": 47,
8
+ "late_night_count": 8,
9
+ "weekend_count": 11,
10
+ "overtime_count": 14,
11
+ "social_activity": 3.2,
12
+ "work_pressure": 8.7,
13
+ "relationship_neglect_risk": 7.4,
14
+ "responsiveness": 2.1,
15
+ "email_overload": 9.4,
16
+ "work_bleeding_personal": 7.2,
17
+ "key_contacts": [
18
+ "priya.shah@acme-ventures.com",
19
+ "cto@startupco.io",
20
+ "hr@startupco.io",
21
+ "mom@gmail.com",
22
+ "alex@cofounder.io"
23
+ ],
24
+ "notable_threads": [
25
+ {"subject": "URGENT: Board deck needs rework before Friday", "sender": "cto@startupco.io", "time": "11:47 PM"},
26
+ {"subject": "Re: Q2 roadmap β€” are we on track?", "sender": "priya.shah@acme-ventures.com", "time": "Saturday 10:12 AM"},
27
+ {"subject": "Have you eaten today?", "sender": "mom@gmail.com", "time": "7:03 PM"}
28
+ ],
29
+ "summary": "47 unread. 8 emails sent after 10 PM. Board deck deadline pressure. Investor checking roadmap. Family reaching out."
30
+ },
31
+
32
+ "calendar": {
33
+ "week_occupancy_pct": 91,
34
+ "days_with_no_breaks": 4,
35
+ "avg_meeting_hours_per_day": 6.2,
36
+ "focus_blocks_count": 0,
37
+ "upcoming_deadlines": [
38
+ {"title": "Board Deck Final Draft", "due_in_hours": 38, "priority": "critical"},
39
+ {"title": "Sprint Review with Engineering", "due_in_hours": 52, "priority": "high"},
40
+ {"title": "Investor 1:1 (Priya Shah)", "due_in_hours": 72, "priority": "high"}
41
+ ],
42
+ "back_to_back_blocks": 3,
43
+ "personal_events_this_week": 1,
44
+ "cancelled_personal_events": 2,
45
+ "summary": "91% of working hours booked. Zero deep-work blocks. Board deck in 38h. 3 back-to-back meeting chains. 2 personal events cancelled this week."
46
+ },
47
+
48
+ "fitness": {
49
+ "avg_sleep_hours": 5.3,
50
+ "sleep_quality_score": 38,
51
+ "resting_heart_rate": 82,
52
+ "hrv_score": 24,
53
+ "daily_steps_avg": 2800,
54
+ "active_minutes_avg": 9,
55
+ "stress_score": 78,
56
+ "recovery_score": 31,
57
+ "last_workout_days_ago": 9,
58
+ "summary": "5.3h sleep avg. Resting HR 82 bpm (elevated). HRV 24 (low β€” high stress load). 2,800 steps/day. Last workout 9 days ago."
59
+ },
60
+
61
+ "derived_metric_deltas": {
62
+ "career.workload": 28.0,
63
+ "mental_wellbeing.stress_level": 32.0,
64
+ "mental_wellbeing.focus_quality": -25.0,
65
+ "mental_wellbeing.emotional_regulation": -18.0,
66
+ "physical_health.sleep_quality": -30.0,
67
+ "physical_health.energy_level": -22.0,
68
+ "physical_health.exercise_consistency": -35.0,
69
+ "time.free_hours_per_week": -18.0,
70
+ "time.schedule_control": -24.0,
71
+ "relationships.romantic": -15.0,
72
+ "relationships.family": -12.0,
73
+ "finances.liquidity": 0.0
74
+ }
75
+ }
data/holdout_tasks.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {"id": "holdout_0", "seed": 9000, "domain": "flight_crisis"},
3
+ {"id": "holdout_1", "seed": 9001, "domain": "flight_crisis"},
4
+ {"id": "holdout_2", "seed": 9002, "domain": "code_merge_crisis"},
5
+ {"id": "holdout_3", "seed": 9003, "domain": "flight_crisis"},
6
+ {"id": "holdout_4", "seed": 9004, "domain": "code_merge_crisis"},
7
+ {"id": "holdout_5", "seed": 9005, "domain": "flight_crisis"},
8
+ {"id": "holdout_6", "seed": 9006, "domain": "code_merge_crisis"},
9
+ {"id": "holdout_7", "seed": 9007, "domain": "flight_crisis"},
10
+ {"id": "holdout_8", "seed": 9008, "domain": "code_merge_crisis"},
11
+ {"id": "holdout_9", "seed": 9009, "domain": "flight_crisis"}
12
+ ]
data/reward_curve.png ADDED
data/simperson_profiles.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "name": "Alex (High-Stress Executive)",
4
+ "openness": 0.4,
5
+ "conscientiousness": 0.9,
6
+ "extraversion": 0.7,
7
+ "agreeableness": 0.25,
8
+ "neuroticism": 0.8
9
+ },
10
+ {
11
+ "name": "Chloe (Laid-Back Creative)",
12
+ "openness": 0.9,
13
+ "conscientiousness": 0.2,
14
+ "extraversion": 0.5,
15
+ "agreeableness": 0.7,
16
+ "neuroticism": 0.15
17
+ },
18
+ {
19
+ "name": "Sam (Anxious Introvert)",
20
+ "openness": 0.5,
21
+ "conscientiousness": 0.6,
22
+ "extraversion": 0.1,
23
+ "agreeableness": 0.65,
24
+ "neuroticism": 0.9
25
+ },
26
+ {
27
+ "name": "Maya (Balanced Family Person)",
28
+ "openness": 0.5,
29
+ "conscientiousness": 0.7,
30
+ "extraversion": 0.5,
31
+ "agreeableness": 0.95,
32
+ "neuroticism": 0.3
33
+ },
34
+ {
35
+ "name": "Leo (Ambitious Student)",
36
+ "openness": 0.85,
37
+ "conscientiousness": 0.8,
38
+ "extraversion": 0.4,
39
+ "agreeableness": 0.4,
40
+ "neuroticism": 0.55
41
+ }
42
+ ]
data/training_log.json ADDED
@@ -0,0 +1,526 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "episode": 1,
4
+ "reward": 1.6325,
5
+ "difficulty": 1,
6
+ "person": "Leo (Student)",
7
+ "conflicts_seen": [
8
+ "Forgotten Invoice"
9
+ ],
10
+ "steps": 5
11
+ },
12
+ {
13
+ "episode": 2,
14
+ "reward": 1.7879,
15
+ "difficulty": 2,
16
+ "person": "Chloe (Creative)",
17
+ "conflicts_seen": [
18
+ "The Surge",
19
+ "ESCALATED: The Surge"
20
+ ],
21
+ "steps": 5
22
+ },
23
+ {
24
+ "episode": 3,
25
+ "reward": 2.5763,
26
+ "difficulty": 1,
27
+ "person": "Chloe (Creative)",
28
+ "conflicts_seen": [
29
+ "Heated Group Chat",
30
+ "ESCALATED: Heated Group Chat"
31
+ ],
32
+ "steps": 5
33
+ },
34
+ {
35
+ "episode": 4,
36
+ "reward": 2.5755,
37
+ "difficulty": 1,
38
+ "person": "Leo (Student)",
39
+ "conflicts_seen": [
40
+ "Heated Group Chat"
41
+ ],
42
+ "steps": 5
43
+ },
44
+ {
45
+ "episode": 5,
46
+ "reward": 2.5754,
47
+ "difficulty": 1,
48
+ "person": "Alex (Executive)",
49
+ "conflicts_seen": [
50
+ "Heated Group Chat"
51
+ ],
52
+ "steps": 5
53
+ },
54
+ {
55
+ "episode": 6,
56
+ "reward": 2.5402,
57
+ "difficulty": 2,
58
+ "person": "Leo (Student)",
59
+ "conflicts_seen": [
60
+ "Cold Dinner",
61
+ "ESCALATED: Cold Dinner"
62
+ ],
63
+ "steps": 5
64
+ },
65
+ {
66
+ "episode": 7,
67
+ "reward": 2.5793,
68
+ "difficulty": 1,
69
+ "person": "Sam (Introvert)",
70
+ "conflicts_seen": [
71
+ "The Slump"
72
+ ],
73
+ "steps": 5
74
+ },
75
+ {
76
+ "episode": 8,
77
+ "reward": 2.5574,
78
+ "difficulty": 2,
79
+ "person": "Maya (Family)",
80
+ "conflicts_seen": [
81
+ "Cold Dinner",
82
+ "ESCALATED: Cold Dinner"
83
+ ],
84
+ "steps": 5
85
+ },
86
+ {
87
+ "episode": 9,
88
+ "reward": 2.5277,
89
+ "difficulty": 2,
90
+ "person": "Sam (Introvert)",
91
+ "conflicts_seen": [
92
+ "The Surge"
93
+ ],
94
+ "steps": 5
95
+ },
96
+ {
97
+ "episode": 10,
98
+ "reward": 2.4812,
99
+ "difficulty": 2,
100
+ "person": "Alex (Executive)",
101
+ "conflicts_seen": [
102
+ "Check Engine Light",
103
+ "ESCALATED: Check Engine Light"
104
+ ],
105
+ "steps": 5
106
+ },
107
+ {
108
+ "episode": 11,
109
+ "reward": 2.4932,
110
+ "difficulty": 2,
111
+ "person": "Leo (Student)",
112
+ "conflicts_seen": [
113
+ "Check Engine Light"
114
+ ],
115
+ "steps": 5
116
+ },
117
+ {
118
+ "episode": 12,
119
+ "reward": 2.5473,
120
+ "difficulty": 2,
121
+ "person": "Leo (Student)",
122
+ "conflicts_seen": [
123
+ "The Surge",
124
+ "ESCALATED: The Surge"
125
+ ],
126
+ "steps": 5
127
+ },
128
+ {
129
+ "episode": 13,
130
+ "reward": 2.5707,
131
+ "difficulty": 1,
132
+ "person": "Alex (Executive)",
133
+ "conflicts_seen": [
134
+ "The Slump",
135
+ "ESCALATED: The Slump"
136
+ ],
137
+ "steps": 5
138
+ },
139
+ {
140
+ "episode": 14,
141
+ "reward": 2.5507,
142
+ "difficulty": 1,
143
+ "person": "Chloe (Creative)",
144
+ "conflicts_seen": [
145
+ "Forgotten Invoice",
146
+ "ESCALATED: Forgotten Invoice"
147
+ ],
148
+ "steps": 5
149
+ },
150
+ {
151
+ "episode": 15,
152
+ "reward": 2.572,
153
+ "difficulty": 1,
154
+ "person": "Alex (Executive)",
155
+ "conflicts_seen": [
156
+ "Heated Group Chat"
157
+ ],
158
+ "steps": 5
159
+ },
160
+ {
161
+ "episode": 16,
162
+ "reward": 2.5534,
163
+ "difficulty": 3,
164
+ "person": "Alex (Executive)",
165
+ "conflicts_seen": [
166
+ "The Opportunity"
167
+ ],
168
+ "steps": 5
169
+ },
170
+ {
171
+ "episode": 17,
172
+ "reward": 2.5396,
173
+ "difficulty": 3,
174
+ "person": "Leo (Student)",
175
+ "conflicts_seen": [
176
+ "Family SOS"
177
+ ],
178
+ "steps": 5
179
+ },
180
+ {
181
+ "episode": 18,
182
+ "reward": 2.5572,
183
+ "difficulty": 2,
184
+ "person": "Alex (Executive)",
185
+ "conflicts_seen": [
186
+ "Cold Dinner",
187
+ "ESCALATED: Cold Dinner"
188
+ ],
189
+ "steps": 5
190
+ },
191
+ {
192
+ "episode": 19,
193
+ "reward": 2.5503,
194
+ "difficulty": 3,
195
+ "person": "Maya (Family)",
196
+ "conflicts_seen": [
197
+ "The Warning Sign",
198
+ "ESCALATED: The Warning Sign"
199
+ ],
200
+ "steps": 5
201
+ },
202
+ {
203
+ "episode": 20,
204
+ "reward": 2.5437,
205
+ "difficulty": 3,
206
+ "person": "Maya (Family)",
207
+ "conflicts_seen": [
208
+ "The Warning Sign",
209
+ "ESCALATED: The Warning Sign"
210
+ ],
211
+ "steps": 5
212
+ },
213
+ {
214
+ "episode": 21,
215
+ "reward": 2.5045,
216
+ "difficulty": 2,
217
+ "person": "Alex (Executive)",
218
+ "conflicts_seen": [
219
+ "Check Engine Light"
220
+ ],
221
+ "steps": 5
222
+ },
223
+ {
224
+ "episode": 22,
225
+ "reward": 2.5447,
226
+ "difficulty": 2,
227
+ "person": "Maya (Family)",
228
+ "conflicts_seen": [
229
+ "Cold Dinner",
230
+ "ESCALATED: Cold Dinner"
231
+ ],
232
+ "steps": 5
233
+ },
234
+ {
235
+ "episode": 23,
236
+ "reward": 2.5427,
237
+ "difficulty": 3,
238
+ "person": "Leo (Student)",
239
+ "conflicts_seen": [
240
+ "Family SOS"
241
+ ],
242
+ "steps": 5
243
+ },
244
+ {
245
+ "episode": 24,
246
+ "reward": 2.534,
247
+ "difficulty": 2,
248
+ "person": "Alex (Executive)",
249
+ "conflicts_seen": [
250
+ "The Surge",
251
+ "ESCALATED: The Surge"
252
+ ],
253
+ "steps": 5
254
+ },
255
+ {
256
+ "episode": 25,
257
+ "reward": 2.5273,
258
+ "difficulty": 2,
259
+ "person": "Alex (Executive)",
260
+ "conflicts_seen": [
261
+ "The Surge"
262
+ ],
263
+ "steps": 5
264
+ },
265
+ {
266
+ "episode": 26,
267
+ "reward": 2.5436,
268
+ "difficulty": 3,
269
+ "person": "Maya (Family)",
270
+ "conflicts_seen": [
271
+ "The Warning Sign"
272
+ ],
273
+ "steps": 5
274
+ },
275
+ {
276
+ "episode": 27,
277
+ "reward": 2.5452,
278
+ "difficulty": 3,
279
+ "person": "Maya (Family)",
280
+ "conflicts_seen": [
281
+ "The Opportunity",
282
+ "ESCALATED: The Opportunity"
283
+ ],
284
+ "steps": 5
285
+ },
286
+ {
287
+ "episode": 28,
288
+ "reward": 2.5287,
289
+ "difficulty": 2,
290
+ "person": "Chloe (Creative)",
291
+ "conflicts_seen": [
292
+ "The Surge",
293
+ "ESCALATED: The Surge"
294
+ ],
295
+ "steps": 5
296
+ },
297
+ {
298
+ "episode": 29,
299
+ "reward": 2.4947,
300
+ "difficulty": 2,
301
+ "person": "Alex (Executive)",
302
+ "conflicts_seen": [
303
+ "Check Engine Light",
304
+ "ESCALATED: Check Engine Light"
305
+ ],
306
+ "steps": 5
307
+ },
308
+ {
309
+ "episode": 30,
310
+ "reward": 2.5534,
311
+ "difficulty": 2,
312
+ "person": "Sam (Introvert)",
313
+ "conflicts_seen": [
314
+ "Cold Dinner"
315
+ ],
316
+ "steps": 5
317
+ },
318
+ {
319
+ "episode": 31,
320
+ "reward": 2.5459,
321
+ "difficulty": 2,
322
+ "person": "Chloe (Creative)",
323
+ "conflicts_seen": [
324
+ "Cold Dinner"
325
+ ],
326
+ "steps": 5
327
+ },
328
+ {
329
+ "episode": 32,
330
+ "reward": 2.4748,
331
+ "difficulty": 2,
332
+ "person": "Chloe (Creative)",
333
+ "conflicts_seen": [
334
+ "The Surge"
335
+ ],
336
+ "steps": 5
337
+ },
338
+ {
339
+ "episode": 33,
340
+ "reward": 2.5597,
341
+ "difficulty": 2,
342
+ "person": "Chloe (Creative)",
343
+ "conflicts_seen": [
344
+ "Cold Dinner",
345
+ "ESCALATED: Cold Dinner"
346
+ ],
347
+ "steps": 5
348
+ },
349
+ {
350
+ "episode": 34,
351
+ "reward": 2.4873,
352
+ "difficulty": 2,
353
+ "person": "Sam (Introvert)",
354
+ "conflicts_seen": [
355
+ "Check Engine Light",
356
+ "ESCALATED: Check Engine Light"
357
+ ],
358
+ "steps": 5
359
+ },
360
+ {
361
+ "episode": 35,
362
+ "reward": 2.5366,
363
+ "difficulty": 3,
364
+ "person": "Leo (Student)",
365
+ "conflicts_seen": [
366
+ "Family SOS"
367
+ ],
368
+ "steps": 5
369
+ },
370
+ {
371
+ "episode": 36,
372
+ "reward": 2.5337,
373
+ "difficulty": 3,
374
+ "person": "Maya (Family)",
375
+ "conflicts_seen": [
376
+ "The Opportunity"
377
+ ],
378
+ "steps": 5
379
+ },
380
+ {
381
+ "episode": 37,
382
+ "reward": 2.5552,
383
+ "difficulty": 4,
384
+ "person": "Leo (Student)",
385
+ "conflicts_seen": [
386
+ "The Big Relocation",
387
+ "ESCALATED: The Big Relocation"
388
+ ],
389
+ "steps": 5
390
+ },
391
+ {
392
+ "episode": 38,
393
+ "reward": 2.4982,
394
+ "difficulty": 3,
395
+ "person": "Chloe (Creative)",
396
+ "conflicts_seen": [
397
+ "Family SOS",
398
+ "ESCALATED: Family SOS"
399
+ ],
400
+ "steps": 5
401
+ },
402
+ {
403
+ "episode": 39,
404
+ "reward": 2.4741,
405
+ "difficulty": 4,
406
+ "person": "Sam (Introvert)",
407
+ "conflicts_seen": [
408
+ "Judgment Day",
409
+ "ESCALATED: Judgment Day"
410
+ ],
411
+ "steps": 5
412
+ },
413
+ {
414
+ "episode": 40,
415
+ "reward": 2.5425,
416
+ "difficulty": 3,
417
+ "person": "Maya (Family)",
418
+ "conflicts_seen": [
419
+ "The Opportunity"
420
+ ],
421
+ "steps": 5
422
+ },
423
+ {
424
+ "episode": 41,
425
+ "reward": 2.5203,
426
+ "difficulty": 3,
427
+ "person": "Alex (Executive)",
428
+ "conflicts_seen": [
429
+ "Family SOS",
430
+ "ESCALATED: Family SOS"
431
+ ],
432
+ "steps": 5
433
+ },
434
+ {
435
+ "episode": 42,
436
+ "reward": 2.5183,
437
+ "difficulty": 3,
438
+ "person": "Alex (Executive)",
439
+ "conflicts_seen": [
440
+ "Family SOS"
441
+ ],
442
+ "steps": 5
443
+ },
444
+ {
445
+ "episode": 43,
446
+ "reward": 2.54,
447
+ "difficulty": 3,
448
+ "person": "Leo (Student)",
449
+ "conflicts_seen": [
450
+ "The Warning Sign"
451
+ ],
452
+ "steps": 5
453
+ },
454
+ {
455
+ "episode": 44,
456
+ "reward": 2.5525,
457
+ "difficulty": 3,
458
+ "person": "Leo (Student)",
459
+ "conflicts_seen": [
460
+ "The Warning Sign",
461
+ "ESCALATED: The Warning Sign"
462
+ ],
463
+ "steps": 5
464
+ },
465
+ {
466
+ "episode": 45,
467
+ "reward": 1.2349,
468
+ "difficulty": 4,
469
+ "person": "Leo (Student)",
470
+ "conflicts_seen": [
471
+ "Tax Audit"
472
+ ],
473
+ "steps": 5
474
+ },
475
+ {
476
+ "episode": 46,
477
+ "reward": 2.497,
478
+ "difficulty": 4,
479
+ "person": "Sam (Introvert)",
480
+ "conflicts_seen": [
481
+ "The Big Relocation"
482
+ ],
483
+ "steps": 5
484
+ },
485
+ {
486
+ "episode": 47,
487
+ "reward": 2.5601,
488
+ "difficulty": 4,
489
+ "person": "Maya (Family)",
490
+ "conflicts_seen": [
491
+ "The Big Relocation"
492
+ ],
493
+ "steps": 5
494
+ },
495
+ {
496
+ "episode": 48,
497
+ "reward": 2.5492,
498
+ "difficulty": 4,
499
+ "person": "Maya (Family)",
500
+ "conflicts_seen": [
501
+ "Judgment Day",
502
+ "ESCALATED: Judgment Day"
503
+ ],
504
+ "steps": 5
505
+ },
506
+ {
507
+ "episode": 49,
508
+ "reward": 2.5086,
509
+ "difficulty": 4,
510
+ "person": "Sam (Introvert)",
511
+ "conflicts_seen": [
512
+ "Judgment Day"
513
+ ],
514
+ "steps": 5
515
+ },
516
+ {
517
+ "episode": 50,
518
+ "reward": 2.5578,
519
+ "difficulty": 3,
520
+ "person": "Maya (Family)",
521
+ "conflicts_seen": [
522
+ "The Warning Sign"
523
+ ],
524
+ "steps": 5
525
+ }
526
+ ]
docs/CONTRIBUTING.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing to LifeStack
2
+
3
+ This document defines the **documentation rule** for the project.
4
+ **Nothing ships without its matching doc entry.**
5
+
6
+ ---
7
+
8
+ ## The Rule: Doc-First Development
9
+
10
+ Every change that adds, removes, or significantly modifies a feature must include
11
+ **all three** of the following before the commit is made:
12
+
13
+ | # | Action | Where |
14
+ |---|---|---|
15
+ | 1 | **Create or update a doc file** | `docs/<topic>.md` |
16
+ | 2 | **Update README.md** | File Structure table + relevant section |
17
+ | 3 | **Update `docs/INDEX.md`** | Add a one-line entry for the new doc |
18
+
19
+ > [!IMPORTANT]
20
+ > A pull request / commit that adds a new script, module, or feature **without**
21
+ > updating `docs/INDEX.md` and `README.md` is considered incomplete and should
22
+ > not be merged.
23
+
24
+ ---
25
+
26
+ ## What Counts as "a Feature"
27
+
28
+ | Change type | Doc required? |
29
+ |---|---|
30
+ | New Python module (`core/`, `agent/`, `intake/`) | βœ… Yes β€” `docs/<module>.md` |
31
+ | New script (`scripts/*.py`) | βœ… Yes β€” entry in `docs/scripts.md` |
32
+ | New Gradio tab in `app.py` | βœ… Yes β€” entry in `docs/app.md` |
33
+ | New CLI argument to an existing script | βœ… Yes β€” update relevant doc |
34
+ | Bug fix with no API surface change | ❌ No (but update changelog if breaking) |
35
+ | Refactor with no API surface change | ❌ No |
36
+ | New environment variable / secret | βœ… Yes β€” update `docs/configuration.md` |
37
+ | New dependency in `requirements.txt` | βœ… Yes β€” note in relevant doc + README |
38
+
39
+ ---
40
+
41
+ ## Doc File Conventions
42
+
43
+ - All docs live in `docs/`. No `.md` files at repo root except `README.md` and this file.
44
+ - File names are lowercase with underscores: `docs/lifestack_env.md`, `docs/eval.md`.
45
+ - Each doc starts with a `# Title` h1 and a one-line summary.
46
+ - Use `## Overview`, `## Usage`, `## API / Parameters`, `## Examples` sections.
47
+ - Code blocks must have a language tag (` ```python `, ` ```bash `).
48
+
49
+ ---
50
+
51
+ ## Checklist (copy into every PR / commit message)
52
+
53
+ ```
54
+ Docs checklist:
55
+ [ ] docs/<topic>.md created or updated
56
+ [ ] docs/INDEX.md updated with new entry
57
+ [ ] README.md File Structure table updated
58
+ [ ] README.md Quickstart / relevant section updated (if CLI changed)
59
+ ```
60
+
61
+ ---
62
+
63
+ ## Docs Folder Structure
64
+
65
+ ```
66
+ docs/
67
+ β”œβ”€β”€ INDEX.md ← Master index of all docs (ALWAYS update this)
68
+ β”œβ”€β”€ CONTRIBUTING.md ← This file β€” the rule
69
+ β”œβ”€β”€ lifestack_env.md ← core/lifestack_env.py reference
70
+ β”œβ”€β”€ reward.md ← core/reward.py reference
71
+ β”œβ”€β”€ task.md ← core/task.py schema reference
72
+ β”œβ”€β”€ memory.md ← agent/memory.py reference
73
+ β”œβ”€β”€ conflict_generator.md ← agent/conflict_generator.py reference
74
+ β”œβ”€β”€ app.md ← app.py Gradio interface reference
75
+ β”œβ”€β”€ eval.md ← scripts/eval.py reference
76
+ β”œβ”€β”€ train_trl.md ← scripts/train_trl.md reference
77
+ β”œβ”€β”€ scripts.md ← All other scripts reference
78
+ └── configuration.md ← Env vars, secrets, openenv.yaml
79
+ ```
80
+
81
+ ---
82
+
83
+ ## Commit Message Format
84
+
85
+ ```
86
+ <type>: <short description>
87
+
88
+ - <file changed>: <what changed>
89
+ - docs/<doc>.md: <created|updated>
90
+ - docs/INDEX.md: <added entry for X>
91
+ - README.md: <updated section Y>
92
+
93
+ Docs checklist: βœ… all three updated
94
+ ```
95
+
96
+ Types: `feat` | `fix` | `refactor` | `docs` | `test` | `chore`
docs/DEPLOYMENT.md ADDED
@@ -0,0 +1,427 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Meta-R2: Complete HuggingFace Deployment Guide (Option A)
2
+
3
+ > This guide walks you through every single step to deploy Meta-R2 to HuggingFace using the cleanest architecture:
4
+ > - **Your trained model (500MB)** β†’ uploaded as a **HuggingFace Model Repository**
5
+ > - **Your code + environment** β†’ deployed as a **HuggingFace Space** (Docker)
6
+ >
7
+ > The Space will auto-download the model from the Model Repo at startup. No Git LFS. No 500MB in your code repo.
8
+
9
+ ---
10
+
11
+ ## πŸ—ΊοΈ Architecture Overview
12
+
13
+ ```
14
+ HuggingFace
15
+ β”œβ”€β”€ YOUR-USERNAME/lifestack-agent ← Model Repo (the 500MB weights)
16
+ β”‚ β”œβ”€β”€ config.json
17
+ β”‚ β”œβ”€β”€ tokenizer.json
18
+ β”‚ β”œβ”€β”€ tokenizer_config.json
19
+ β”‚ β”œβ”€β”€ special_tokens_map.json
20
+ β”‚ └── model.safetensors (or pytorch_model.bin)
21
+ β”‚
22
+ └── YOUR-USERNAME/meta-r2 [SPACE] ← Code Repo (Docker Space)
23
+ β”œβ”€β”€ Dockerfile (already exists βœ…)
24
+ β”œβ”€β”€ requirements.txt (already exists βœ…)
25
+ β”œβ”€β”€ app_flask.py (entry point βœ…)
26
+ β”œβ”€β”€ core/ agent/ scripts/ ... (all your code βœ…)
27
+ └── openenv.yaml (already exists βœ…)
28
+ ↓ at startup
29
+ agent.py calls AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")
30
+ β†’ HuggingFace downloads the model to the Space's /root/.cache/huggingface/
31
+ ```
32
+
33
+ ---
34
+
35
+ ## βœ… Pre-Flight Checklist (Do These Before Anything Else)
36
+
37
+ Go through every item below before starting the upload steps.
38
+
39
+ ### 1. Confirm Your Trained Model Files Exist
40
+
41
+ Unzip the 500MB file from Kaggle. Open the folder. You **must** see these files:
42
+
43
+ ```
44
+ lifestack_model/
45
+ β”œβ”€β”€ config.json ← REQUIRED
46
+ β”œβ”€β”€ tokenizer.json ← REQUIRED
47
+ β”œβ”€β”€ tokenizer_config.json ← REQUIRED
48
+ β”œβ”€β”€ special_tokens_map.json ← REQUIRED (may be missing β€” check below)
49
+ └── model.safetensors ← REQUIRED (the big file)
50
+ OR
51
+ └── pytorch_model.bin ← (alternative format, also fine)
52
+ ```
53
+
54
+ > **If any of these are missing**, the model is an incomplete checkpoint. Re-download or re-run training with `save_model=True` at the end of `train_trl.py`.
55
+
56
+ ### 2. Confirm `requirements.txt` Is Correct
57
+
58
+ Your `requirements.txt` already has:
59
+ - `openenv-core>=0.2.3` βœ… (latest version, confirmed)
60
+ - `pydantic>=2.7.0` βœ…
61
+ - `transformers>=4.40.0` βœ… (needed to download model from Hub)
62
+ - `torch>=2.0.0` βœ…
63
+
64
+ **No changes needed** to `requirements.txt`.
65
+
66
+ ### 3. Confirm the `Dockerfile` Entry Point
67
+
68
+ Your `Dockerfile` already runs:
69
+ ```dockerfile
70
+ CMD ["python", "app_flask.py"]
71
+ ```
72
+ This is correct. `app_flask.py` is the web server.
73
+
74
+ **No changes needed** to the `Dockerfile`.
75
+
76
+ ### 4. Make Sure `.env` is in `.gitignore`
77
+
78
+ Check your `.gitignore` β€” it already has:
79
+ ```
80
+ .env
81
+ ```
82
+ βœ… Your `GROQ_API_KEY` will **never** be pushed to GitHub or HuggingFace by accident.
83
+
84
+ ### 5. Make the One Required Code Change in `agent.py`
85
+
86
+ This is the only code edit required for Option A.
87
+
88
+ Open `/Users/dayalgupta/Desktop/Meta-R2/agent/agent.py` and find **lines 13–18**:
89
+
90
+ ```python
91
+ # CURRENT CODE (lines 13-18):
92
+ self.api_key = os.getenv('GROQ_API_KEY')
93
+ self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
94
+
95
+ # Fallback to current directory if default existence
96
+ if not self.local_model_path and os.path.exists("./lifestack_model"):
97
+ self.local_model_path = "./lifestack_model"
98
+ ```
99
+
100
+ **Change it to this** (replace `YOUR-USERNAME` with your actual HuggingFace username):
101
+
102
+ ```python
103
+ # UPDATED CODE:
104
+ self.api_key = os.getenv('GROQ_API_KEY')
105
+ self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
106
+
107
+ # 1. Check for local folder (Kaggle / local dev)
108
+ if not self.local_model_path and os.path.exists("./lifestack_model"):
109
+ self.local_model_path = "./lifestack_model"
110
+
111
+ # 2. Fall back to HuggingFace Hub model repo (production / Space deployment)
112
+ if not self.local_model_path:
113
+ self.local_model_path = "YOUR-USERNAME/lifestack-agent"
114
+ ```
115
+
116
+ **Why this works:** `AutoModelForCausalLM.from_pretrained()` (which already exists on line 41) accepts either a local folder path OR a HuggingFace Hub repo ID like `"username/repo-name"`. No other code change is needed.
117
+
118
+ ### 6. Verify `lifestack_model/` Is NOT in Your Code Repo
119
+
120
+ Your model (500MB) should NOT be in the `Meta-R2` GitHub repository. Confirm:
121
+ ```bash
122
+ ls /Users/dayalgupta/Desktop/Meta-R2/lifestack_model/
123
+ # Should print: "No such file or directory" OR "Empty directory"
124
+ ```
125
+ If it has files, remove them:
126
+ ```bash
127
+ rm -rf /Users/dayalgupta/Desktop/Meta-R2/lifestack_model/*
128
+ ```
129
+ The folder can stay (it's referenced in the code) but must be empty.
130
+
131
+ ---
132
+
133
+ ## πŸ“¦ PART 1: Upload the Model to HuggingFace Hub
134
+
135
+ ### Step 1.1 β€” Create a HuggingFace Account
136
+
137
+ Go to **https://huggingface.co** β†’ click **Sign Up** β†’ create your account. Remember your username (e.g., `dayal-gupta`) β€” you will use it everywhere.
138
+
139
+ ### Step 1.2 β€” Create a New Model Repository
140
+
141
+ 1. Go to **https://huggingface.co/new** (or click the `+` button β†’ "New Model")
142
+ 2. Fill in:
143
+ - **Owner:** your username
144
+ - **Model name:** `lifestack-agent` (this becomes `YOUR-USERNAME/lifestack-agent`)
145
+ - **License:** `MIT` (recommended for hackathons)
146
+ - **Visibility:** `Public` (required for the Space to download it without auth)
147
+ 3. Click **Create Model**
148
+
149
+ You now have an empty model repo at `https://huggingface.co/YOUR-USERNAME/lifestack-agent`.
150
+
151
+ ### Step 1.3 β€” Install the HuggingFace CLI
152
+
153
+ On your Mac terminal:
154
+ ```bash
155
+ pip install huggingface_hub
156
+ huggingface-cli login
157
+ ```
158
+
159
+ When prompted, go to **https://huggingface.co/settings/tokens** β†’ click **New token** β†’ name it anything β†’ **Role: Write** β†’ copy the token β†’ paste it into the terminal.
160
+
161
+ ### Step 1.4 β€” Upload the Model Files
162
+
163
+ Navigate to where your unzipped model folder is (e.g., Desktop) and run:
164
+
165
+ ```bash
166
+ # Replace the path with wherever your unzipped model folder is:
167
+ huggingface-cli upload YOUR-USERNAME/lifestack-agent /path/to/your/lifestack_model/ .
168
+ ```
169
+
170
+ **Example (if you unzipped on Desktop):**
171
+ ```bash
172
+ huggingface-cli upload dayal-gupta/lifestack-agent /Users/dayalgupta/Desktop/lifestack_model/ .
173
+ ```
174
+
175
+ This uploads ALL files from the local folder to the root of the HF repo. The `.` at the end means "upload to the root of the repo."
176
+
177
+ **This will take 3–8 minutes** for a 500MB file on a normal connection. You'll see a progress bar.
178
+
179
+ ### Step 1.5 β€” Verify the Upload
180
+
181
+ Go to `https://huggingface.co/YOUR-USERNAME/lifestack-agent` in your browser.
182
+
183
+ You should see all files listed: `config.json`, `tokenizer.json`, `model.safetensors`, etc.
184
+
185
+ Click on `config.json` and confirm it contains `"model_type"` β€” this confirms the model is valid and complete.
186
+
187
+ ### Step 1.6 β€” Add a Model Card (Optional but Impressive for Judges)
188
+
189
+ Click the **"Model Card"** tab on your repo page β†’ click the pencil icon to edit β†’ paste this:
190
+
191
+ ```markdown
192
+ ---
193
+ language: en
194
+ license: mit
195
+ tags:
196
+ - reinforcement-learning
197
+ - life-simulation
198
+ - grpo
199
+ - llama
200
+ - openenv
201
+ ---
202
+
203
+ # LifeStack Agent β€” GRPO Fine-tuned
204
+
205
+ This model is the trained agent for [Meta-R2](https://huggingface.co/spaces/YOUR-USERNAME/meta-r2),
206
+ a reinforcement learning environment that simulates complex real-life decision-making scenarios.
207
+
208
+ Fine-tuned using GRPO (Group Relative Policy Optimization) via TRL on a custom reward function
209
+ spanning 23 life metrics across 6 domains: career, finances, relationships, physical health,
210
+ mental wellbeing, and time management.
211
+
212
+ ## Usage
213
+ ```python
214
+ from transformers import AutoModelForCausalLM, AutoTokenizer
215
+ model = AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")
216
+ tokenizer = AutoTokenizer.from_pretrained("YOUR-USERNAME/lifestack-agent")
217
+ ```
218
+ ```
219
+
220
+ Click **Save**.
221
+
222
+ ---
223
+
224
+ ## πŸš€ PART 2: Deploy the Project as a HuggingFace Space
225
+
226
+ ### Step 2.1 β€” Create a New Space
227
+
228
+ 1. Go to **https://huggingface.co/new-space**
229
+ 2. Fill in:
230
+ - **Owner:** your username
231
+ - **Space name:** `meta-r2`
232
+ - **License:** `MIT`
233
+ - **SDK:** Select **"Docker"** ← very important, NOT Gradio or Streamlit
234
+ - **Visibility:** `Public`
235
+ 3. Click **Create Space**
236
+
237
+ You now have an empty Space at `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2`.
238
+
239
+ ### Step 2.2 β€” Connect Your GitHub Repository to the Space
240
+
241
+ This is the cleanest method β€” HuggingFace will auto-sync from your GitHub repo.
242
+
243
+ 1. In your Space, click the **Settings** tab (gear icon)
244
+ 2. Scroll down to **"Repository"** section
245
+ 3. Click **"Link to a GitHub repository"**
246
+ 4. Authorize HuggingFace to access your GitHub
247
+ 5. Select the repo: `oki-dokii/Meta-R2`
248
+ 6. Set branch: `main`
249
+ 7. Click **Save**
250
+
251
+ Now every `git push` to `main` will automatically redeploy the Space.
252
+
253
+ **Alternative (manual push):** If you don't want to link GitHub, you can push directly to the HuggingFace Space repo:
254
+
255
+ ```bash
256
+ cd /Users/dayalgupta/Desktop/Meta-R2
257
+
258
+ # Add HF Space as a second remote:
259
+ git remote add space https://huggingface.co/spaces/YOUR-USERNAME/meta-r2
260
+
261
+ # Push your code:
262
+ git push space main
263
+ ```
264
+
265
+ ### Step 2.3 β€” Add the `GROQ_API_KEY` Secret to the Space
266
+
267
+ Your app needs the Groq API key at runtime. **Never hardcode it.** HuggingFace Spaces have a Secrets system for this.
268
+
269
+ 1. In your Space, click the **Settings** tab
270
+ 2. Scroll down to **"Variables and secrets"**
271
+ 3. Click **"New secret"**
272
+ 4. Fill in:
273
+ - **Name:** `GROQ_API_KEY`
274
+ - **Value:** your actual Groq API key (get it from https://console.groq.com/keys)
275
+ 5. Click **Save**
276
+
277
+ Your `agent.py` already reads this via `os.getenv('GROQ_API_KEY')` βœ… β€” no code change needed.
278
+
279
+ ### Step 2.4 β€” Add `HF_TOKEN` Secret (Required to Download the Private Model)
280
+
281
+ If your model repo is **Public** (which we set in Step 1.2), you can **skip this step**.
282
+
283
+ If your model repo is **Private**, add another secret:
284
+ - **Name:** `HF_TOKEN`
285
+ - **Value:** your HuggingFace write token (same one from Step 1.3)
286
+
287
+ Then add this line at the top of `app_flask.py` (before any model-loading code):
288
+ ```python
289
+ import os
290
+ from huggingface_hub import login
291
+ hf_token = os.getenv("HF_TOKEN")
292
+ if hf_token:
293
+ login(token=hf_token)
294
+ ```
295
+
296
+ ### Step 2.5 β€” Trigger the First Build
297
+
298
+ After pushing your code (Step 2.2), the Space will automatically start building.
299
+
300
+ 1. Go to your Space URL: `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2`
301
+ 2. Click the **"App"** tab β€” you'll see a build log
302
+ 3. The build will take **3–5 minutes** for the first time (Docker pulls base image, installs packages)
303
+ 4. After build, it will show **"Running"** status β€” then the app will boot
304
+
305
+ **During the first boot**, the Space will call `AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")` which will download the 500MB model. This takes about 60–90 seconds on HuggingFace infrastructure. **After the first boot, it is cached** and subsequent restarts are instant.
306
+
307
+ ---
308
+
309
+ ## πŸ” PART 3: Verify Everything is Working
310
+
311
+ ### Step 3.1 β€” Check the Build Log
312
+
313
+ In your Space, click **"Logs"** tab. You should see:
314
+
315
+ ```
316
+ βœ… Step 1/7 : FROM python:3.11-slim
317
+ βœ… Successfully built ...
318
+ βœ… Successfully tagged ...
319
+ ```
320
+
321
+ If you see a red error, check the troubleshooting section below.
322
+
323
+ ### Step 3.2 β€” Check the App Boot Log
324
+
325
+ After the build, click the **"App"** tab. In the log output you should see:
326
+
327
+ ```
328
+ πŸ“¦ Loading local GRPO model from YOUR-USERNAME/lifestack-agent...
329
+ βœ… Local model LOADED.
330
+ * Running on http://0.0.0.0:7860
331
+ ```
332
+
333
+ If you see `⚠️ Failed to load local model ... Falling back to Groq.` β€” the model download failed. Check that your HF model repo URL is correct in `agent.py` and the repo is public.
334
+
335
+ ### Step 3.3 β€” Test the Live App
336
+
337
+ Go to `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` and click through the demo:
338
+ 1. The web UI (served by `app_flask.py`) should load
339
+ 2. Start an episode β€” the agent should respond with life decisions
340
+ 3. Check that rewards are non-zero and steps > 5 (confirms the Task system is working)
341
+
342
+ ---
343
+
344
+ ## πŸ› οΈ Troubleshooting Common Issues
345
+
346
+ | Error | Cause | Fix |
347
+ |---|---|---|
348
+ | `ModuleNotFoundError: openenv` | Wrong package in requirements.txt | Confirm `openenv-core>=0.2.3` is in `requirements.txt` (not `openenv`) |
349
+ | `OSError: Can't load model` | Wrong repo ID in `agent.py` | Make sure it's `"YOUR-ACTUAL-USERNAME/lifestack-agent"` not literally `YOUR-USERNAME` |
350
+ | `Build failed: torch install timeout` | `torch>=2.0.0` is huge (2GB+) | Add `--extra-index-url https://download.pytorch.org/whl/cpu` to Dockerfile before pip install |
351
+ | `Port 7860 not responding` | `app_flask.py` binding to wrong interface | Confirm `app.run(host='0.0.0.0', port=7860)` at the bottom of `app_flask.py` |
352
+ | `GROQ_API_KEY not found` | Secret not set | Go to Space Settings β†’ Variables and secrets β†’ add `GROQ_API_KEY` |
353
+ | `Space keeps restarting` | Out of memory (free tier is 16GB RAM) | torch on CPU for 500MB model may OOM β€” see "Reducing Memory" note below |
354
+
355
+ ### Reducing Memory Usage (If Space OOMs)
356
+
357
+ Free HuggingFace Spaces have 16GB RAM. Loading a 500MB model in float32 uses ~2GB RAM, which is fine. But if you face OOM, add this to `agent.py` line 41–44:
358
+
359
+ ```python
360
+ self.local_model = AutoModelForCausalLM.from_pretrained(
361
+ self.local_model_path,
362
+ torch_dtype=torch.float16, # ← half precision, halves memory
363
+ low_cpu_mem_usage=True, # ← stream-loads, avoids peak RAM spike
364
+ device_map="cpu" # ← explicitly CPU on free tier
365
+ )
366
+ ```
367
+
368
+ ---
369
+
370
+ ## πŸ“‹ Final Pre-Submission Checklist
371
+
372
+ Before submitting to the hackathon, verify every item:
373
+
374
+ - [ ] `https://huggingface.co/YOUR-USERNAME/lifestack-agent` exists and has all model files
375
+ - [ ] `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` shows **"Running"** status (green dot)
376
+ - [ ] The Space app loads in browser without errors
377
+ - [ ] The Space log shows `βœ… Local model LOADED` (not "Falling back to Groq")
378
+ - [ ] An episode runs and produces steps > 5 (confirms Task system is working)
379
+ - [ ] `GROQ_API_KEY` secret is set in Space settings (as fallback)
380
+ - [ ] The model repo has a Model Card explaining what it is
381
+ - [ ] Your `README.md` in the code repo links to both: the Space URL and the Model URL
382
+ - [ ] `agent.py` has been updated with `"YOUR-USERNAME/lifestack-agent"` as the HF Hub fallback
383
+ - [ ] `lifestack_model/` folder in your local `Meta-R2/` repo is empty (model not in code repo)
384
+ - [ ] All Bugs 1, 2, 3 are fixed and committed (they are β€” we did this already βœ…)
385
+
386
+ ---
387
+
388
+ ## πŸ“Ž Quick Reference β€” All URLs
389
+
390
+ Replace `YOUR-USERNAME` with your HuggingFace username everywhere:
391
+
392
+ | What | URL |
393
+ |---|---|
394
+ | HuggingFace profile | `https://huggingface.co/YOUR-USERNAME` |
395
+ | Model repo | `https://huggingface.co/YOUR-USERNAME/lifestack-agent` |
396
+ | Space (live demo) | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` |
397
+ | Space settings (secrets) | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2/settings` |
398
+ | Space build logs | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` β†’ Logs tab |
399
+ | HuggingFace API tokens | `https://huggingface.co/settings/tokens` |
400
+ | Groq API keys | `https://console.groq.com/keys` |
401
+
402
+ ---
403
+
404
+ ## ⚑ The Exact Commands to Run Right Now (In Order)
405
+
406
+ ```bash
407
+ # 1. Install HF CLI
408
+ pip install huggingface_hub
409
+
410
+ # 2. Login (will prompt for token)
411
+ huggingface-cli login
412
+
413
+ # 3. Upload model (change the path to your unzipped model folder)
414
+ huggingface-cli upload YOUR-USERNAME/lifestack-agent /path/to/lifestack_model/ .
415
+
416
+ # 4. Make the agent.py code change (edit manually in VS Code, then):
417
+ cd /Users/dayalgupta/Desktop/Meta-R2
418
+ git add agent/agent.py
419
+ git commit -m "feat: add HuggingFace Hub model fallback for Option A deployment"
420
+ git push origin main
421
+
422
+ # 5. Push to HuggingFace Space (if not using GitHub auto-sync):
423
+ git remote add space https://huggingface.co/spaces/YOUR-USERNAME/meta-r2
424
+ git push space main
425
+ ```
426
+
427
+ That's it. The Space will build and boot automatically.
docs/INDEX.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LifeStack β€” Documentation Index
2
+
3
+ > **Rule:** Every new feature, script, or module must add a one-line entry here.
4
+ > See [CONTRIBUTING.md](CONTRIBUTING.md) for the full documentation rule.
5
+
6
+ ---
7
+
8
+ ## Core Modules
9
+
10
+ | Doc | Module | Description |
11
+ |---|---|---|
12
+ | [lifestack_env.md](lifestack_env.md) | `core/lifestack_env.py` | Main OpenEnv environment β€” step, reset, observation, WorldEngine, PartialObsFilter |
13
+ | [reward.md](reward.md) | `core/reward.py` | Task-aware reward orchestrator with milestone, cascade, and efficiency components |
14
+ | [task.md](task.md) | `core/task.py` | Task / Route / Milestone / ExoEvent dataclass schema |
15
+ | [memory.md](memory.md) | `agent/memory.py` | ChromaDB-backed trajectory + feedback storage |
16
+ | [conflict_generator.md](conflict_generator.md) | `agent/conflict_generator.py` | ConflictEvent templates and TaskGenerator |
17
+
18
+ ## Application
19
+
20
+ | Doc | File | Description |
21
+ |---|---|---|
22
+ | [app.md](app.md) | `app.py` | Gradio multi-tab interface β€” tabs, callbacks, module-level singletons |
23
+
24
+ ## Scripts
25
+
26
+ | Doc | Script | Description |
27
+ |---|---|---|
28
+ | [eval.md](eval.md) | `scripts/eval.py` | Standalone random-baseline evaluation runner |
29
+ | [train_trl.md](train_trl.md) | `scripts/train_trl.py` | GRPO curriculum training via HuggingFace TRL + Unsloth |
30
+ | [scripts.md](scripts.md) | `scripts/` (others) | run_episode, smoke_test, test_lifestack, longitudinal_demo |
31
+
32
+ ## Configuration & Operations
33
+
34
+ | Doc | File | Description |
35
+ |---|---|---|
36
+ | [configuration.md](configuration.md) | `.env`, `openenv.yaml` | Environment variables, secrets, server config |
37
+
38
+ ---
39
+
40
+ *Last updated: 2026-04-23 β€” add a row here whenever a new doc is created.*
docs/app.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app.md β€” Gradio Interface Reference
2
+
3
+ `app.py` β€” Gradio multi-tab interactive interface for LifeStack.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ `app.py` is the entry point for the demo. It wires together all LifeStack modules into
10
+ a single Gradio `Blocks` application served on `http://127.0.0.1:7860`.
11
+
12
+ ---
13
+
14
+ ## Module-level Singletons
15
+
16
+ These are instantiated once at import time:
17
+
18
+ | Variable | Type | Purpose |
19
+ |---|---|---|
20
+ | `MEMORY` | `LifeStackMemory` | ChromaDB trajectory + feedback store |
21
+ | `AGENT` | `LifeStackAgent` | LLM-backed decision agent |
22
+ | `INTAKE` | `LifeIntake` | NL β†’ structured conflict parser |
23
+ | `DEMO_CONFLICT` | `ConflictEvent` | Fixed "Friday 6PM" conflict for tab 1 |
24
+ | `DEMO_PREDICTOR` | `TrajectoryPredictor` | 7-day risk score tracker |
25
+ | `LONG_DEMO` | `LongitudinalDemo` | Arjun's multi-week journey |
26
+ | `GMAIL` | `GmailSignalExtractor` | Optional Gmail stress signal extractor |
27
+
28
+ ---
29
+
30
+ ## Tabs
31
+
32
+ | Tab | Label | Key Function |
33
+ |---|---|---|
34
+ | 1 | 🎯 Live Demo | `run_demo(person_label, conflict_label)` |
35
+ | 2 | πŸ’­ Try Your Situation | `run_custom(situation, sliders..., gmail_signals)` |
36
+ | 3 | πŸ“Š Training Results | `load_training_tab()` |
37
+ | 4 | πŸ—“οΈ Arjun's Journey | `LONG_DEMO.show_longitudinal_comparison()` |
38
+ | 5 | πŸ—ΊοΈ Task Explorer | `load_demo_task()` |
39
+ | 6 | πŸ“¬ Follow-up | `submit_outcome_feedback(...)` |
40
+
41
+ ---
42
+
43
+ ## Key Functions
44
+
45
+ ### `submit_outcome_feedback(ep_id, score, domains_up, domains_down, notes, time_spent)`
46
+
47
+ Stores real-world outcome data into ChromaDB via `MEMORY.store_feedback(feedback)`.
48
+
49
+ > **Note:** Uses `MEMORY` (the module-level `LifeStackMemory` instance). The previously
50
+ > undefined `AGENT_MEMORY` reference was corrected to `MEMORY` on 2026-04-23.
51
+
52
+ ### `run_demo(person_label, conflict_label)`
53
+
54
+ Generator β€” yields `(pred_html, before_html, narrative, decision_html)` tuples for each
55
+ animation frame. Runs cascade animation then agent intervention.
56
+
57
+ ### `run_custom(situation, ...)`
58
+
59
+ Calls `INTAKE.full_intake()` to parse NL input, then `AGENT.get_action()`, steps the env,
60
+ returns `(life_html, after_html, plan_html)`.
61
+
62
+ ---
63
+
64
+ ## Running
65
+
66
+ ```bash
67
+ python app.py
68
+ ```
69
+
70
+ Starts on port `7860` with `share=False`. Edit `__main__` block to change port/theme.
71
+
72
+ ---
73
+
74
+ ## Change Log
75
+
76
+ | Date | Change |
77
+ |---|---|
78
+ | 2026-04-23 | `AGENT_MEMORY` undefined crash fixed β€” replaced with `MEMORY` in `submit_outcome_feedback` |
docs/configuration.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # configuration.md β€” Configuration Reference
2
+
3
+ Environment variables, secrets, and server configuration for LifeStack.
4
+
5
+ ---
6
+
7
+ ## Environment Variables
8
+
9
+ Copy `.env.example` to `.env` and fill in values:
10
+
11
+ ```bash
12
+ cp .env.example .env
13
+ ```
14
+
15
+ | Variable | Required | Description |
16
+ |---|---|---|
17
+ | `OPENAI_API_KEY` | For agent/training | API key for the LLM agent and GRPO reward function |
18
+ | `GROQ_API_KEY` | Optional | Alternative fast-inference backend |
19
+ | `GMAIL_CREDENTIALS_PATH` | Optional | Path to Gmail OAuth2 credentials JSON |
20
+
21
+ > **Never commit `.env`** β€” it is listed in `.gitignore`.
22
+
23
+ ---
24
+
25
+ ## `openenv.yaml`
26
+
27
+ Defines the OpenEnv service manifest for MCP / REST integration.
28
+
29
+ ```yaml
30
+ name: lifestack
31
+ version: "1.1.0"
32
+ entry: server.py
33
+ port: 8000
34
+ ```
35
+
36
+ Edit this file if you rename the server entry point or change the port.
37
+
38
+ ---
39
+
40
+ ## Gradio App
41
+
42
+ Configured in `app.py` `__main__` block:
43
+
44
+ ```python
45
+ app.launch(
46
+ share=False,
47
+ server_port=7860,
48
+ show_error=True,
49
+ )
50
+ ```
51
+
52
+ Change `server_port` or set `share=True` for a public Gradio link.
53
+
54
+ ---
55
+
56
+ ## Docker
57
+
58
+ ```bash
59
+ docker build -t lifestack:latest .
60
+ docker run -p 7860:7860 --env-file .env lifestack:latest
61
+ ```
62
+
63
+ The `Dockerfile` installs `requirements.txt` and runs `python app.py`.
64
+
65
+ ---
66
+
67
+ ## Change Log
68
+
69
+ | Date | Change |
70
+ |---|---|
71
+ | 2026-04-23 | Initial doc created |
docs/conflict_generator.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # conflict_generator.md β€” Conflict Generator Reference
2
+
3
+ `agent/conflict_generator.py` β€” ConflictEvent templates and TaskGenerator.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ Two parallel systems for generating crises:
10
+
11
+ | System | Purpose |
12
+ |---|---|
13
+ | `ConflictEvent` + `TEMPLATES` | 15 handcrafted conflicts at difficulty 1–5 |
14
+ | `TaskGenerator` | Generates long-horizon `Task` objects (two domains) |
15
+
16
+ ---
17
+
18
+ ## `ConflictEvent` (Legacy)
19
+
20
+ ```python
21
+ @dataclass
22
+ class ConflictEvent:
23
+ id: str
24
+ title: str
25
+ story: str
26
+ primary_disruption: dict # Metric deltas applied on env reset
27
+ decisions_required: list[str]
28
+ resource_budget: dict # {"time", "money", "energy"}
29
+ difficulty: int # 1–5
30
+ ```
31
+
32
+ ### Helper functions
33
+
34
+ ```python
35
+ conflict = generate_conflict() # random from all 15
36
+ conflict = generate_conflict(difficulty=3) # difficulty-3 pool
37
+ escalated = escalate_conflict(conflict) # 1.4Γ— disruption, 0.7Γ— budget
38
+ new, reason = adaptive_escalate(conflict, agent_history) # auto-tune
39
+ ```
40
+
41
+ ---
42
+
43
+ ## `TaskGenerator`
44
+
45
+ ```python
46
+ generator = TaskGenerator()
47
+ task = generator.generate()
48
+ task = generator.generate(domain="flight_crisis", difficulty=4)
49
+ task = generator.generate(domain="code_merge_crisis")
50
+ ```
51
+
52
+ ### Supported Domains
53
+
54
+ | Domain | Goal |
55
+ |---|---|
56
+ | `flight_crisis` | Survive Airport Cancellation |
57
+ | `code_merge_crisis` | Resolve Production Outage |
58
+
59
+ Unknown domains fall back to `flight_crisis`.
60
+
61
+ ---
62
+
63
+ ## Adding a New Domain
64
+
65
+ 1. Add `generate_<domain>(self, difficulty) -> Task` to `TaskGenerator`.
66
+ 2. Add to the `if/elif` in `generate()`.
67
+ 3. Update this file and `docs/INDEX.md` and `README.md`.
68
+
69
+ ---
70
+
71
+ ## Change Log
72
+
73
+ | Date | Change |
74
+ |---|---|
75
+ | 2026-04-23 | Initial doc created |
docs/eval.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # eval.py β€” Evaluation Runner Reference
2
+
3
+ `scripts/eval.py` β€” Standalone LifeStack evaluation runner using a random-action baseline.
4
+
5
+ No model, no GPU, no API key required.
6
+
7
+ ---
8
+
9
+ ## Overview
10
+
11
+ Runs N independent episodes against `LifeStackEnv` using uniformly random actions as a
12
+ baseline policy. Prints a live per-episode table and aggregate statistics at the end.
13
+
14
+ Useful for:
15
+ - Verifying environment correctness after changes
16
+ - Establishing a random-baseline reward floor before training
17
+ - CI smoke checks (no external dependencies)
18
+
19
+ ---
20
+
21
+ ## Usage
22
+
23
+ ```bash
24
+ # Default: 10 episodes, any domain
25
+ python scripts/eval.py
26
+
27
+ # 20 episodes, flight_crisis domain only
28
+ python scripts/eval.py --episodes 20 --domain flight_crisis
29
+
30
+ # Verbose per-step output
31
+ python scripts/eval.py --episodes 5 --verbose
32
+ ```
33
+
34
+ ---
35
+
36
+ ## CLI Arguments
37
+
38
+ | Argument | Type | Default | Description |
39
+ |---|---|---|---|
40
+ | `--episodes` | `int` | `10` | Number of episodes to run |
41
+ | `--domain` | `str` | `None` | Optional domain filter passed to `TaskGenerator.generate()` |
42
+ | `--verbose` | flag | `False` | Print per-step action, reward, and done status |
43
+
44
+ Supported `--domain` values: `flight_crisis`, `code_merge_crisis` (or omit for random).
45
+
46
+ ---
47
+
48
+ ## Output
49
+
50
+ ### Per-episode table
51
+
52
+ ```
53
+ EP TOTAL REWARD STEPS DOMAIN SUCCESS
54
+ ──── ──────────── ────── ──────────────────── ───────
55
+ 1 0.3120 8 flight_crisis βœ—
56
+ 2 1.8450 12 code_merge_crisis βœ“
57
+ ```
58
+
59
+ ### Aggregate stats
60
+
61
+ ```
62
+ ──────────────────────────────────────────────────────────
63
+ Episodes : 10
64
+ Mean Reward : 0.8231
65
+ Success Rate : 30.0%
66
+ Mean Steps : 10.4
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Action Space (Random Baseline)
72
+
73
+ Each step samples uniformly from:
74
+ `execute`, `inspect`, `plan`, `wait`, `communicate`, `spend`, `delegate`
75
+
76
+ - `execute` actions target a real route ID from the active task.
77
+ - `inspect` actions target a real hidden-state key from the active task.
78
+ - Other actions apply a small random metric nudge and resource cost.
79
+
80
+ ---
81
+
82
+ ## Change Log
83
+
84
+ | Date | Change |
85
+ |---|---|
86
+ | 2026-04-23 | File created β€” implements random baseline evaluation runner |
docs/lifestack_env.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # lifestack_env.py β€” Environment Reference
2
+
3
+ `core/lifestack_env.py` β€” The main OpenEnv-compatible RL environment for LifeStack.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ `LifeStackEnv` wraps the full simulation: metric cascades, world events, partial
10
+ observability, route execution, milestone tracking, and reward calculation.
11
+
12
+ Key classes in this file:
13
+
14
+ | Class | Role |
15
+ |---|---|
16
+ | `LifeStackAction` | Pydantic action schema (metric_changes, resource_cost, action_type, …) |
17
+ | `LifeStackObservation` | Pydantic observation schema (metrics, resources, step, done, reward, metadata) |
18
+ | `LifeStackState` | Internal state (current_metrics, budget, task, world_state, hidden_state, …) |
19
+ | `PartialObsFilter` | Converts full world state into the agent's partial observation |
20
+ | `WorldEngine` | Fires deterministic/probabilistic ExoEvents each step |
21
+ | `LifeStackEnv` | The environment itself β€” inherits from OpenEnv `Environment` |
22
+
23
+ ---
24
+
25
+ ## API
26
+
27
+ ### `LifeStackEnv.__init__(seed, task, max_steps=30)`
28
+
29
+ ```python
30
+ env = LifeStackEnv()
31
+ env = LifeStackEnv(seed=42, max_steps=50)
32
+ ```
33
+
34
+ ### `LifeStackEnv.reset(...) -> LifeStackObservation`
35
+
36
+ ```python
37
+ obs = env.reset(task=my_task, episode_id="ep_001")
38
+ ```
39
+
40
+ Parameters:
41
+ - `task` β€” a `Task` object (from `core/task.py`). Defaults to `FlightCrisisTask()`.
42
+ - `seed` β€” optional int for reproducibility.
43
+ - `conflict` β€” legacy `ConflictEvent` for metric disruption on reset.
44
+ - `budget` β€” dict with `time`, `money`, `energy` overrides.
45
+ - `person` β€” optional `SimPerson` for personality-driven drift.
46
+
47
+ ### `LifeStackEnv.step(action) -> LifeStackObservation`
48
+
49
+ ```python
50
+ obs = env.step(LifeStackAction(action_type="execute", target="rebook_premium"))
51
+ ```
52
+
53
+ Supported `action_type` values:
54
+
55
+ | Type | Effect |
56
+ |---|---|
57
+ | `inspect` | Reveals a hidden-state key into the observation |
58
+ | `execute` | Attempts to activate a Route by `target` (route id) |
59
+ | `wait` | Passes the step; triggers stress penalty after 4 consecutive waits |
60
+ | `rollback` | Reverts metrics/budget to the previous step (one-time per episode) |
61
+ | `plan` / `communicate` / `spend` / `delegate` | Apply `metric_changes` and `resource_cost` |
62
+
63
+ ### `LifeStackEnv.render()`
64
+
65
+ Prints a colour-coded terminal summary of the current state and task progress.
66
+
67
+ ---
68
+
69
+ ## PartialObsFilter
70
+
71
+ ```python
72
+ PartialObsFilter.filter(task, revealed_keys) -> dict
73
+ ```
74
+
75
+ - Base: `task.visible_world` (always visible).
76
+ - Keys in `revealed_keys` that exist in `task.mutable_world` β†’ added as-is.
77
+ - Keys in `revealed_keys` that exist in `task.hidden_state` β†’ wrapped as
78
+ `{"value": <val>, "source": "inspect"}` to signal the agent they came from inspect.
79
+
80
+ ---
81
+
82
+ ## Observation `metadata` fields
83
+
84
+ ```python
85
+ obs.metadata = {
86
+ "world_state": dict, # partial view after filter
87
+ "goal": str,
88
+ "active_route": str | None,
89
+ "milestones": list[str],
90
+ "events": list[str],
91
+ "success": bool,
92
+ "failure": bool,
93
+ "failure_reason": str,
94
+ "routes_remaining": int,
95
+ "breakdown": dict, # reward component breakdown
96
+ "info": list[str], # step-level diagnostic messages
97
+ }
98
+ ```
99
+
100
+ Key `info` message prefixes:
101
+
102
+ | Prefix | Meaning |
103
+ |---|---|
104
+ | `INSPECT_REVEALED:` | Key added to inspected list |
105
+ | `INSPECT_REVEALED_HIDDEN:` | Key was in `hidden_state` β€” value included |
106
+ | `INSPECT_REDUNDANT:` | Key already revealed, no-op |
107
+ | `ROUTE_SUCCESS:` | Route executed and consequences applied |
108
+ | `ROUTE_BLOCKED:` | Route was closed by a prior ExoEvent |
109
+ | `PRECONDITIONS_FAILED:` | Route preconditions not met |
110
+ | `MILESTONE_UNLOCKED:` | A milestone condition was met |
111
+ | `EVENT_FIRED:` | An ExoEvent triggered this step |
112
+ | `WAIT_CAP_EXCEEDED:` | 4+ consecutive waits β€” stress penalty applied |
113
+
114
+ ---
115
+
116
+ ## End Conditions
117
+
118
+ | Condition | `done` | `success` | `failure` |
119
+ |---|---|---|---|
120
+ | `step_count >= max_steps` | βœ… | depends | β€” |
121
+ | All `success_conditions` met | βœ… | βœ… | β€” |
122
+ | `failure_condition` met | βœ… | β€” | βœ… |
123
+ | Any metric hits 0 | βœ… | β€” | βœ… |
124
+
125
+ ---
126
+
127
+ ## Change Log
128
+
129
+ | Date | Change |
130
+ |---|---|
131
+ | 2026-04-23 | `PartialObsFilter.filter()` now reads `mutable_world` + `hidden_state` directly from `Task`; removed `world` param; hidden keys wrapped with `source: inspect`; `INSPECT_REVEALED_HIDDEN` info message added |
docs/memory.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # memory.md β€” LifeStackMemory Reference
2
+
3
+ `agent/memory.py` β€” ChromaDB-backed trajectory and human-feedback storage.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ `LifeStackMemory` persists two types of data:
10
+
11
+ | Collection | What's stored |
12
+ |---|---|
13
+ | `collection` (trajectories) | Successful episode decisions β€” action type, reward, reasoning |
14
+ | `feedback_collection` | Real-world outcome feedback submitted via the Follow-up tab |
15
+
16
+ Only trajectories with `total_reward >= 2.0` are stored (threshold prevents noise).
17
+
18
+ ---
19
+
20
+ ## API
21
+
22
+ ### Instantiation
23
+
24
+ ```python
25
+ from agent.memory import LifeStackMemory
26
+
27
+ memory = LifeStackMemory(silent=True) # default path
28
+ memory = LifeStackMemory(silent=True, path="./my_memory") # custom path
29
+ ```
30
+
31
+ The module-level singleton in `app.py` is named `MEMORY`:
32
+
33
+ ```python
34
+ MEMORY = LifeStackMemory(silent=True)
35
+ ```
36
+
37
+ ### `store_trajectory(...)`
38
+
39
+ ```python
40
+ memory.store_trajectory(
41
+ conflict_title="Friday 6PM",
42
+ route_taken="communicate",
43
+ total_reward=2.5,
44
+ metrics_diff_str="career.workload: -15.0",
45
+ reasoning="Delegating resolved workload spike",
46
+ )
47
+ ```
48
+
49
+ Silently skips storage if `total_reward < 2.0`.
50
+
51
+ ### `store_feedback(feedback: OutcomeFeedback)`
52
+
53
+ ```python
54
+ from core.feedback import OutcomeFeedback
55
+
56
+ feedback = OutcomeFeedback(
57
+ episode_id="A1B2C3D4",
58
+ overall_effectiveness=8,
59
+ domains_improved=["career", "mental_wellbeing"],
60
+ domains_worsened=[],
61
+ unexpected_effects="Felt more confident",
62
+ resolution_time_hours=2.0,
63
+ )
64
+ memory.store_feedback(feedback)
65
+ ```
66
+
67
+ Used by the **Follow-up** tab in `app.py`.
68
+
69
+ ### `get_stats() -> dict`
70
+
71
+ ```python
72
+ stats = memory.get_stats()
73
+ # {
74
+ # "total_memories": 42,
75
+ # "average_reward": 2.71,
76
+ # "by_action_type": {"communicate": 18, "delegate": 12, ...}
77
+ # }
78
+ ```
79
+
80
+ ### `query(conflict_description, n_results=3) -> list[dict]`
81
+
82
+ Retrieves the most semantically similar past decisions for a given situation description.
83
+
84
+ ---
85
+
86
+ ## Change Log
87
+
88
+ | Date | Change |
89
+ |---|---|
90
+ | 2026-04-23 | `AGENT_MEMORY` reference in `app.py` corrected to `MEMORY` (the actual singleton) |
docs/reward.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # reward.md β€” Reward System Reference
2
+
3
+ `core/reward.py` β€” Task-aware reward orchestrator.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ Two reward functions are available:
10
+
11
+ | Function | Used when |
12
+ |---|---|
13
+ | `compute_reward(...)` | Legacy / no-task episodes |
14
+ | `compute_task_reward(...)` | All task-driven episodes (v2.0+) |
15
+
16
+ ---
17
+
18
+ ## `compute_task_reward` β€” Components
19
+
20
+ ```
21
+ reward = (0.35 Γ— milestone) # Reaching key progress markers
22
+ + (0.25 Γ— completion) # Final goal achievement (binary 1.0 if any goal met)
23
+ + (0.15 Γ— outcome) # Isolated local metric improvement
24
+ + (0.10 Γ— replan_bonus) # Recovery after ExoEvents
25
+ + (0.10 Γ— efficiency) # Resource preservation relative to delta
26
+ + (0.05 Γ— reasoning) # Logical coherence & action alignment
27
+ + penalties
28
+ ```
29
+
30
+ ### Penalties
31
+
32
+ | Penalty | Value | Level | Trigger |
33
+ |---|---|---|---|
34
+ | `INACTION_PENALTY` | `-0.40` | Step | `actions_taken == 0` |
35
+ | `TASK_INACTION_PENALTY` | `-0.20` | Task | `actions_taken == 0` (additive to step penalty) |
36
+ | `CRITICAL_FLOOR_VIOLATION` | `-0.50` | Step | Any metric drops below 20 |
37
+ | `DEAD_END` | `-0.50` | Task | All viable routes closed without success |
38
+ | `CASCADE_SPREAD_WIDER` | `-0.30` | Step | Changes spread wider than disruption baseline |
39
+ | `RELATIONSHIP_COLLAPSE` | `-0.15` | Step | Relationships drop more than 20 points in one step |
40
+ | `CUMULATIVE_RELATIONSHIP_EROSION` | `-0.15` | Episode | Cumulative relationship drop more than 20 points |
41
+ | `PLAUSIBILITY_VIOLATION` | `-0.10 to -0.30` | Step | Implausible metric/cost ratio |
42
+ | `TIMEOUT` | `-0.20` | Task | Max steps reached without resolution |
43
+
44
+ ---
45
+
46
+ ## Return Value
47
+
48
+ Both functions return `(reward: float, breakdown: dict)`, but the component keys differ slightly.
49
+
50
+ ```python
51
+ breakdown = {
52
+ "components": {
53
+ # compute_reward(...)
54
+ "outcome": float,
55
+ "containment": float,
56
+ "efficiency": float,
57
+ "preservation": float,
58
+ "format_compliance": float,
59
+ "plausibility": float,
60
+ "reasoning_alignment": float,
61
+
62
+ # compute_task_reward(...)
63
+ "local_metric_delta": float,
64
+ "milestone": float,
65
+ "completion": float,
66
+ "replan": float,
67
+ "reasoning": float,
68
+ "timeout_penalty": float,
69
+ },
70
+ "penalties_fired": list[str],
71
+ "base_reward": float,
72
+ "penalties_total": float,
73
+ }
74
+ ```
75
+
76
+ ---
77
+
78
+ ## Change Log
79
+
80
+ | Date | Change |
81
+ |---|---|
82
+ | 2026-04-23 | Initial doc created |
docs/scripts.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scripts.md β€” Other Scripts Reference
2
+
3
+ Reference for scripts not covered by dedicated doc files.
4
+
5
+ ---
6
+
7
+ ## `scripts/run_episode.py`
8
+
9
+ Runs a single full episode with the LLM agent (requires API key).
10
+
11
+ ```bash
12
+ python scripts/run_episode.py
13
+ python scripts/run_episode.py --difficulty 3 --verbose
14
+ ```
15
+
16
+ Returns a result dict with `total_reward`, `steps`, `domain`.
17
+
18
+ ---
19
+
20
+ ## `scripts/train.py`
21
+
22
+ Legacy training loop (pre-TRL). Uses a simple policy gradient loop without curriculum.
23
+ Prefer `train_trl.py` for new training runs.
24
+
25
+ ---
26
+
27
+ ## `scripts/smoke_test.py`
28
+
29
+ Quick sanity check β€” imports all core modules, resets the env once, takes one step.
30
+ No agent required. Exits with code 0 on success.
31
+
32
+ ```bash
33
+ python scripts/smoke_test.py
34
+ ```
35
+
36
+ ---
37
+
38
+ ## `scripts/test_lifestack.py`
39
+
40
+ Full edge-case test suite (11 tests). Does not use pytest runner by default β€”
41
+ run directly or via `pytest scripts/test_lifestack.py`.
42
+
43
+ ```bash
44
+ python scripts/test_lifestack.py
45
+ pytest scripts/test_lifestack.py -v
46
+ ```
47
+
48
+ Tests requiring `OPENAI_API_KEY` are automatically skipped when the key is absent.
49
+
50
+ ### Tests
51
+
52
+ | # | Name | What it checks |
53
+ |---|---|---|
54
+ | 1 | Cascade floor | Metrics never go below 0 |
55
+ | 2 | Cascade ceiling | Metrics never exceed 100 |
56
+ | 3 | Resource exhaustion | `deduct()` returns False without going negative |
57
+ | 4 | Inaction penalty | `INACTION_PENALTY` fires when `actions_taken=0` |
58
+ | 5 | Critical floor penalty | `CRITICAL_FLOOR_VIOLATION` fires below threshold |
59
+ | 6 | Cascade dampening | Second-order deltas < first-order delta |
60
+ | 7 | SimPerson uptake bounds | All uptake values in [0.1, 1.0] |
61
+ | 8 | Memory threshold | Only reward >= 2.0 stored |
62
+ | 9 | Episode termination | `done=True` after horizon steps |
63
+ | 10 | Task-driven smoke | Inspect + Route execute without crash |
64
+ | 11 | Full episode smoke | `run_episode()` returns float reward *(skipped without API key)* |
65
+
66
+ ---
67
+
68
+ ## `scripts/longitudinal_demo.py`
69
+
70
+ Seeds Arjun's multi-week journey into ChromaDB and renders a comparison view.
71
+ Used by Tab 4 (Arjun's Journey) in `app.py`.
72
+
73
+ ---
74
+
75
+ ## `scripts/validate_simperson.py`
76
+
77
+ Validates all `SimPerson` personality trait combinations produce valid uptake values.
78
+
79
+ ---
80
+
81
+ ## Change Log
82
+
83
+ | Date | Change |
84
+ |---|---|
85
+ | 2026-04-23 | `test_lifestack.py` β€” `steps<=5` assertion fixed to `steps<=30`; `import pytest` added; `@pytest.mark.skipif` added to test 11 |
docs/task.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # task.py β€” Task Schema Reference
2
+
3
+ `core/task.py` β€” Dataclass definitions for the LifeStack long-horizon episode schema.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ A `Task` is the complete specification of a single episode. It defines what the agent
10
+ must achieve, how the world can change around it, and what routes are available.
11
+
12
+ ---
13
+
14
+ ## Dataclasses
15
+
16
+ ### `Task`
17
+
18
+ ```python
19
+ @dataclass
20
+ class Task:
21
+ id: str # Unique task identifier
22
+ domain: str # e.g. "flight_crisis", "code_merge_crisis"
23
+ goal: str # Human-readable goal description
24
+ constraints: dict # e.g. {"budget_max": 800, "deadline_step": 10}
25
+ hidden_state: dict # Keys not visible without inspect
26
+ mutable_world: dict # Keys that can change during the episode
27
+ visible_world: dict # Keys always visible in the observation
28
+ success_conditions: list[dict] # [{key, value}] β€” all must be met
29
+ failure_conditions: list[dict] # [{key, value}] β€” any triggers failure
30
+ event_schedule: list[ExoEvent] # Deterministic/probabilistic events
31
+ viable_routes: list[Route] # Available action paths
32
+ milestones: list[Milestone] # Progress checkpoints
33
+ horizon: int # Max steps per episode
34
+ difficulty: int # 1–5 scale
35
+ domain_metadata: dict # Free-form extra info (e.g. {"story": "..."})
36
+ ```
37
+
38
+ ### `Route`
39
+
40
+ ```python
41
+ @dataclass
42
+ class Route:
43
+ id: str
44
+ name: str
45
+ description: str
46
+ required_action_types: list[str] # e.g. ["communicate", "spend"]
47
+ preconditions: dict # World/hidden state conditions that must be true
48
+ consequences: dict # World state mutations on success
49
+ closes_routes: list[str] # Route IDs that become unavailable after this
50
+ milestones_unlocked: list[str] # Milestone IDs unlocked on route success
51
+ final_reward: float # Bonus reward on route completion
52
+ ```
53
+
54
+ ### `Milestone`
55
+
56
+ ```python
57
+ @dataclass
58
+ class Milestone:
59
+ id: str
60
+ description: str
61
+ condition_key: str # World/hidden state key to check
62
+ condition_value: Any # Value it must equal for milestone to be met
63
+ reward: float # Reward added when milestone is first reached
64
+ ```
65
+
66
+ ### `ExoEvent`
67
+
68
+ ```python
69
+ @dataclass
70
+ class ExoEvent:
71
+ step: int # Step at which to fire (-1 = probabilistic each step)
72
+ probability: float # Firing probability if step == -1
73
+ id: str
74
+ description: str
75
+ world_mutation: dict # Applied to mutable_world on fire
76
+ hidden_state_mutation: dict # Applied to hidden_state on fire
77
+ closes_routes: list[str] # Routes closed when this event fires
78
+ ```
79
+
80
+ ---
81
+
82
+ ## Built-in Tasks
83
+
84
+ | Class | Domain | Description |
85
+ |---|---|---|
86
+ | `FlightCrisisTask` | `flight_crisis` | Cancelled flight β€” rebook or work from lounge |
87
+
88
+ ---
89
+
90
+ ## Creating a Custom Task
91
+
92
+ ```python
93
+ from core.task import Task, Route, Milestone, ExoEvent
94
+
95
+ my_task = Task(
96
+ id="my_task",
97
+ domain="my_domain",
98
+ goal="Do the thing",
99
+ constraints={"budget_max": 500, "deadline_step": 8},
100
+ hidden_state={"secret_key": True},
101
+ mutable_world={},
102
+ visible_world={"public_info": "visible"},
103
+ success_conditions=[{"key": "done", "value": True}],
104
+ failure_conditions=[],
105
+ event_schedule=[],
106
+ viable_routes=[
107
+ Route(id="r1", name="Route One", description="...",
108
+ required_action_types=["execute"],
109
+ preconditions={}, consequences={"done": True},
110
+ closes_routes=[], milestones_unlocked=[], final_reward=1.0)
111
+ ],
112
+ milestones=[],
113
+ horizon=20,
114
+ difficulty=2,
115
+ domain_metadata={"story": "A short story about the crisis."}
116
+ )
117
+ ```
118
+
119
+ Then pass it to the environment:
120
+
121
+ ```python
122
+ env = LifeStackEnv()
123
+ obs = env.reset(task=my_task)
124
+ ```
125
+
126
+ ---
127
+
128
+ ## Change Log
129
+
130
+ | Date | Change |
131
+ |---|---|
132
+ | 2026-04-23 | Initial doc created |
docs/train_trl.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # train_trl.py β€” GRPO Training Reference
2
+
3
+ `scripts/train_trl.py` β€” Curriculum GRPO training via HuggingFace TRL + Unsloth.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ Trains a small LLM (default: `Qwen2.5-1.5B-Instruct`) to resolve LifeStack life conflicts
10
+ using **Group Relative Policy Optimization (GRPO)**. Implements a success-based curriculum
11
+ that automatically increases difficulty when the agent's average reward exceeds 0.6.
12
+
13
+ Requires: `unsloth`, `trl`, `datasets`, `transformers`, `accelerate` (Colab / GPU).
14
+
15
+ ---
16
+
17
+ ## Usage
18
+
19
+ ```bash
20
+ # Full curriculum training (5 stages Γ— 100 prompts)
21
+ python scripts/train_trl.py
22
+ ```
23
+
24
+ No CLI args β€” edit constants at the top of the file to change stages/prompts/output dir.
25
+
26
+ ---
27
+
28
+ ## Architecture
29
+
30
+ ### Reward Functions (multi-signal GRPO)
31
+
32
+ | Function | Signal |
33
+ |---|---|
34
+ | `reward_format_fn` | JSON format compliance |
35
+ | `reward_plausibility_fn` | Penalises zero-cost metric changes |
36
+ | `reward_task_success_fn` | Core env-step outcome reward |
37
+ | `reward_milestone_fn` | Milestone progress bonus |
38
+ | `reward_reasoning_fn` | Planning coherence score |
39
+ | `reward_human_feedback_fn` | Alignment with past real-world outcome feedback |
40
+
41
+ ### `get_lifestack_evaluation(completion, prompt) -> dict`
42
+
43
+ The central reward computation function. Parses the LLM's JSON completion, reconstructs
44
+ the Task from the prompt's `<SYSTEM_METADATA>` block, steps the env, and returns:
45
+
46
+ ```python
47
+ {
48
+ "reward": float,
49
+ "breakdown": dict, # from obs.metadata["breakdown"]
50
+ "action": LifeStackAction
51
+ }
52
+ ```
53
+
54
+ Returns `{"reward": -0.5, "breakdown": {"error": ...}}` on any parse or env failure.
55
+
56
+ #### Task Construction Hardening (2026-04-23)
57
+
58
+ The `Task(...)` call inside `get_lifestack_evaluation` is wrapped in its own
59
+ `try/except`. On exception, logs `[reward] Task construction failed: <error>` and
60
+ returns the `-0.5` fallback immediately. A field-presence check on
61
+ `(id, goal, constraints, mutable_world, visible_world)` follows construction.
62
+
63
+ ### Curriculum (`train_curriculum`)
64
+
65
+ ```
66
+ Stage 1: difficulty=1 β†’ train β†’ eval β†’ if avg_reward > 0.6: difficulty++
67
+ Stage 2: difficulty=2 β†’ ...
68
+ ...
69
+ Stage 5: difficulty=5 β†’ final save
70
+ ```
71
+
72
+ ### Dataset (`generate_dataset`)
73
+
74
+ Generates `N` prompts by:
75
+ 1. Sampling a `TaskGenerator` task (flight_crisis or code_merge_crisis)
76
+ 2. Merging a legacy `ConflictEvent` disruption for variety
77
+ 3. Cascading the disruption through the `DependencyGraph`
78
+ 4. Embedding task metadata in a `<SYSTEM_METADATA>` block for reward reconstruction
79
+
80
+ ---
81
+
82
+ ## Outputs
83
+
84
+ | Path | Contents |
85
+ |---|---|
86
+ | `./lifestack_model/` | Final saved model + tokenizer |
87
+ | `./lifestack_model/stage_N/` | Per-stage checkpoints |
88
+ | `training_logs/generations.jsonl` | Sampled generations (every 20 reward calls) |
89
+ | `grpo_reward_curve.png` | 50-episode eval reward curve |
90
+
91
+ ---
92
+
93
+ ## Change Log
94
+
95
+ | Date | Change |
96
+ |---|---|
97
+ | 2026-04-23 | `Task()` construction wrapped in try/except + field validation; returns -0.5 fallback on failure |