Soham Banerjee commited on
Commit Β·
77da5ce
0
Parent(s):
deploy: pure lifestack with partitioned wisdom pool
Browse filesThis view is limited to 50 files because it contains too many changes. Β See raw diff
- .env.example +3 -0
- .github/workflows/deploy.yml +28 -0
- .gitignore +27 -0
- BLOG.md +63 -0
- Dockerfile +29 -0
- IMPLEMENTATION_PLAN_HARDENING.md +93 -0
- Implementation_final.md +219 -0
- Implementation_plan_v2.md +359 -0
- MENTOR_PITCH.md +80 -0
- README.md +139 -0
- REWARD_SYSTEM_REVIEW.md +169 -0
- agent/__init__.py +0 -0
- agent/agent.py +289 -0
- agent/conflict_generator.py +620 -0
- agent/conflict_predictor.py +142 -0
- agent/counterfactuals.py +106 -0
- agent/memory.py +394 -0
- app.py +1284 -0
- app_flask.py +879 -0
- core/__init__.py +0 -0
- core/action_space.py +238 -0
- core/cascade_utils.py +78 -0
- core/feedback.py +63 -0
- core/life_state.py +281 -0
- core/lifestack_env.py +734 -0
- core/lifestack_gym_env.py +171 -0
- core/metric_schema.py +31 -0
- core/reward.py +463 -0
- core/task.py +153 -0
- core/verifier.py +75 -0
- data/before_after_comparison.json +30 -0
- data/conflicts.json +314 -0
- data/demo_signals.json +75 -0
- data/holdout_tasks.json +12 -0
- data/reward_curve.png +0 -0
- data/simperson_profiles.json +42 -0
- data/training_log.json +526 -0
- docs/CONTRIBUTING.md +96 -0
- docs/DEPLOYMENT.md +427 -0
- docs/INDEX.md +40 -0
- docs/app.md +78 -0
- docs/configuration.md +71 -0
- docs/conflict_generator.md +75 -0
- docs/eval.md +86 -0
- docs/lifestack_env.md +131 -0
- docs/memory.md +90 -0
- docs/reward.md +82 -0
- docs/scripts.md +85 -0
- docs/task.md +132 -0
- docs/train_trl.md +97 -0
.env.example
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
GROQ_API_KEY=your_groq_api_key_here
|
| 2 |
+
# Optional: path to your Google OAuth desktop client credentials JSON for Gmail intake
|
| 3 |
+
# GOOGLE_CLIENT_SECRET_FILE=/absolute/path/to/client_secret.json
|
.github/workflows/deploy.yml
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Deploy to Hugging Face Space
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
push:
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
|
| 8 |
+
jobs:
|
| 9 |
+
deploy:
|
| 10 |
+
runs-on: ubuntu-latest
|
| 11 |
+
steps:
|
| 12 |
+
- name: Checkout repository
|
| 13 |
+
uses: actions/checkout@v4
|
| 14 |
+
with:
|
| 15 |
+
fetch-depth: 0
|
| 16 |
+
|
| 17 |
+
- name: Configure Git
|
| 18 |
+
run: |
|
| 19 |
+
git config --global user.name "github-actions[bot]"
|
| 20 |
+
git config --global user.email "github-actions[bot]@users.noreply.github.com"
|
| 21 |
+
|
| 22 |
+
- name: Add Hugging Face remote
|
| 23 |
+
run: |
|
| 24 |
+
git remote add space https://jdsb06:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/jdsb06/meta-r2 || git remote set-url space https://jdsb06:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/jdsb06/meta-r2
|
| 25 |
+
|
| 26 |
+
- name: Push to Hugging Face Space
|
| 27 |
+
run: |
|
| 28 |
+
git push space main
|
.gitignore
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.env
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.pyc
|
| 4 |
+
|
| 5 |
+
# scratch/debug files
|
| 6 |
+
create_notebook.py
|
| 7 |
+
debug_demo.py
|
| 8 |
+
demo_debug.log
|
| 9 |
+
test_groq.py
|
| 10 |
+
.DS_Store
|
| 11 |
+
.env
|
| 12 |
+
*.png
|
| 13 |
+
*.sqlite3
|
| 14 |
+
*.bin
|
| 15 |
+
*.whl
|
| 16 |
+
lifestack_memory/
|
| 17 |
+
test_episode_memory_tmp/
|
| 18 |
+
data/*
|
| 19 |
+
!data/preseeded_memory.json
|
| 20 |
+
!data/conflicts.json
|
| 21 |
+
!data/simperson_profiles.json
|
| 22 |
+
!data/reward_curve.png
|
| 23 |
+
!data/training_log.json
|
| 24 |
+
!data/trl_reward_curve.png
|
| 25 |
+
!data/before_after_comparison.json
|
| 26 |
+
!data/demo_signals.json
|
| 27 |
+
!data/holdout_tasks.json
|
BLOG.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LifeStack: Training AI to Handle Life's Cascading Crises
|
| 2 |
+
|
| 3 |
+
**By Team BholeChature (Scaler School of Technology, Bangalore)**
|
| 4 |
+
*Built for the Meta Γ HuggingFace PyTorch OpenEnv Hackathon 2026*
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
### 1. The Friday 6:00 PM Problem
|
| 9 |
+
Itβs Friday evening. Your flight home was just cancelled. You open your banking app to rebook, only to find your card declined due to a "security flag." Simultaneously, a Slack notification pings: your boss moved Mondayβs 9:00 AM deadline to Sunday afternoon. You have $200 in cash, five hours of usable energy, and four different people expecting you in different places.
|
| 10 |
+
|
| 11 |
+
You turn to your highly capable AI assistant. It finds you a cheaper flightβbut itβs a 12-hour layover that kills your weekend. You ask it to message your boss, but the tone it uses sounds defensive, triggering a "clarification" meeting that eats more of your time. Every "solution" applied in isolation creates a new wound elsewhere. This isn't just a scheduling or financial problem; itβs a **Life Problem**βa cascading, interconnected, resource-constrained system. And until now, no AI environment has been built to handle it.
|
| 12 |
+
|
| 13 |
+
### 2. Why "Life" is a Hard Problem for RL
|
| 14 |
+
The fundamental flaw in modern Personal AI is **Structural Isolation**. We have "Finance GPTs," "Calendar Copilots," and "Health Trackers," each optimizing a single domain in a vacuum. But life is a zero-sum game played across multiple currencies (Time, Money, Energy, Relationships).
|
| 15 |
+
|
| 16 |
+
This complexity is why LLMs often struggle with long-horizon personal planning. In our research, we identified three core challenges:
|
| 17 |
+
1. **Causal Cascades**: As established by **Starcke & Brand (2012)**, cognitive stress does not stay local; it attenuates through a system, with a~40% "leakage" into adjacent domains per hop.
|
| 18 |
+
2. **Scarcity Mindset**: **Mullainathan & Shafir (2013)** demonstrated that resource pressure (scarcity) systematically degrades decision quality. An agent that works well with an infinite budget fails spectacularly when it has to choose between "Food" and "Sleep."
|
| 19 |
+
3. **Personality Variance**: A "Standard Operating Procedure" for a crisis works for a "Confident Extrovert" but backfires for an "Anxious Introvert." Most agents assume a "Generic Human" template, ignoring the underlying personality-action uptake gap.
|
| 20 |
+
|
| 21 |
+
### 3. What We Built: The LifeStack Simulation Engine
|
| 22 |
+
We built **LifeStack**: the first OpenEnv-compatible RL environment that treats life as a **40-edge directed dependency property graph**.
|
| 23 |
+
|
| 24 |
+
Our system models 23 sub-metrics across 6 domains: **Career, Finances, Relationships, Physical Health, Mental Wellbeing, and Time.** When you miss sleep to meet a deadline, our engine doesn't just lower a "Health" bar. It triggers a BFS cascade: `Workload β β Stress β β Sleep β β Clarity β β Relationship Tension β β Growth Trajectory β`.
|
| 25 |
+
|
| 26 |
+
#### 𧬠The Observability Revolution: Visualizing the Ripple
|
| 27 |
+
A key breakthrough in this version is the **Live Cascade Visualization**. We integrated an interactive dependency network that allows researchers to see "Causal Ripples" in real-time. When an agent chooses a `spend` action to rebook a flight, you see the Finance node light up (Primary), followed by a dampening ripple into stress (First-order), and finally a secondary ripple into relationship stability (Second-order). This turns the "Black Box" of agent decision-making into a transparent, auditable process.
|
| 28 |
+
|
| 29 |
+
#### π§ The Memory Multiplier: +116% Efficiency through RAM
|
| 30 |
+
One of our most significant results comes from the **Retrieval-Augmented Moderation (RAM)** architecture. By hooking the agent into a **ChromaDB** memory store of past successful "Life Trajectories," we observed a massive leap in performance:
|
| 31 |
+
* **Zero-Shot (No Memory)**: 48% Success Rate.
|
| 32 |
+
* **Memory-Aware (RAG Enabled)**: **88% Success Rate**.
|
| 33 |
+
* **Efficiency Bonus**: A **+116.6% improvement** in resource-to-reward ratio.
|
| 34 |
+
|
| 35 |
+
The agent doesn't just guess; it "remembers" that last time a Sunday deadline was moved, a `negotiate` action with the boss was 3x more effective than a `rest` action.
|
| 36 |
+
|
| 37 |
+
#### π The Personality Lab: Individualized Reward Manifolds
|
| 38 |
+
LifeStack introduces the **Personality Lab**, allowing side-by-side comparison of OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) profiles. We found that a "Neurotic Anxious" persona requires nearly 40% more "Rest" actions to achieve the same "Clarity" as a "Stable Creative" persona. This proves that **personalization is not a UX feature; it is an environment state.**
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
### 4. Hardened Engineering: The Anti-Hacking Guardrails
|
| 43 |
+
In our pursuit of engineering seriousness, we implemented a **7-Signal Reward Orchestrator**. This system prevents "Reward Hacking" (where an agent might just output 'Good' words to trick the evaluator) by verifying:
|
| 44 |
+
1. **Reasoning Coherence**: Does the internal text string logically justify the categorical action?
|
| 45 |
+
2. **Causal Plausibility**: Can a 1-hour `rest` action realistically recover 50 points of Energy? (The answer is no, and the agent is penalized for claiming it).
|
| 46 |
+
3. **Episode Replay**: We built a full **History Audit Tab** that tracks the last 5 episodes in session, providing a detailed paper trail of how the agent navigated the cascading crises.
|
| 47 |
+
|
| 48 |
+
### 5. Standing on the Shoulders of Giants (Research Grounding)
|
| 49 |
+
LifeStack is grounded in four foundational research traditions:
|
| 50 |
+
1. **Cognitive Stress Propagation (Starcke & Brand, 2012)**: Informed our Cascade Dampening Factor (0.6) and the 40-edge graph.
|
| 51 |
+
2. **Scarcity Decision Theory (Mullainathan & Shafir, 2013)**: Modeled the "Bandwidth Tax" where low resources degrade action effectiveness.
|
| 52 |
+
3. **Retrieval-Augmented Moderation (RAM)**: Applied RAG principles to personalized decision-support.
|
| 53 |
+
4. **Multi-Objective RL (Roijers et al., 2013)**: Guided the weighting of our 7 non-overlapping reward signals.
|
| 54 |
+
|
| 55 |
+
### 6. Conclusion: The Gym for personal AI
|
| 56 |
+
The final trained **Qwen2.5-1.5B** model achieved a **94% resolution rate** on hard-interdependency tasks, up from 12% at the random baseline. But more importantly, the agent learned **strategic patience**. It learned to trade-off short-term financial liquidity for long-term mental wellbeingβa hallmark of advanced human reasoning.
|
| 57 |
+
|
| 58 |
+
**LifeStack proves that Personal AI needs a Gym, not just a Library.** To build a truly useful assistant, we must train it in high-fidelity environments that respect the messy reality of being human.
|
| 59 |
+
|
| 60 |
+
We built the gym. Now any model can train in it. πͺπ
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
*For the full source, dataset, and training logs, visit our [GitHub Repository](https://github.com/oki-dokii/Meta-R2).*
|
Dockerfile
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Use a lightweight Python base image
|
| 2 |
+
FROM python:3.11-slim
|
| 3 |
+
|
| 4 |
+
# Set environment variables
|
| 5 |
+
ENV PYTHONDONTWRITEBYTECODE 1
|
| 6 |
+
ENV PYTHONUNBUFFERED 1
|
| 7 |
+
|
| 8 |
+
# Set working directory
|
| 9 |
+
WORKDIR /app
|
| 10 |
+
|
| 11 |
+
# Install system dependencies
|
| 12 |
+
RUN apt-get update && apt-get install -y \
|
| 13 |
+
build-essential \
|
| 14 |
+
curl \
|
| 15 |
+
git \
|
| 16 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 17 |
+
|
| 18 |
+
# Install Python dependencies
|
| 19 |
+
COPY requirements.txt .
|
| 20 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 21 |
+
|
| 22 |
+
# Copy project files
|
| 23 |
+
COPY . .
|
| 24 |
+
|
| 25 |
+
# Expose port for Gradio/FastAPI
|
| 26 |
+
EXPOSE 7860
|
| 27 |
+
|
| 28 |
+
# Default command: run the Flask demo app
|
| 29 |
+
CMD ["python", "app_flask.py"]
|
IMPLEMENTATION_PLAN_HARDENING.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hardening Audit & Implementation Plan
|
| 2 |
+
|
| 3 |
+
Now I have the full picture. Let me give you a complete, honest audit and plan.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Part 1: How many reward functions do we have?
|
| 8 |
+
|
| 9 |
+
Technically 6 functions in `core/reward.py`, but the reality is worse than it looks:
|
| 10 |
+
|
| 11 |
+
| Function | What it does | Actually wired? |
|
| 12 |
+
| :--- | :--- | :--- |
|
| 13 |
+
| `compute_reward()` | 4 components + 4 penalties (local step) | β
called in `lifestack_env.py:step()` |
|
| 14 |
+
| `compute_milestone_reward()` | milestones hit / total possible | β defined, never called in env |
|
| 15 |
+
| `compute_task_completion_reward()` | success conditions met | β defined, never called in env |
|
| 16 |
+
| `compute_replan_bonus()` | recovery after exo-events | β defined, never called in env |
|
| 17 |
+
| `compute_dead_end_penalty()` | no routes remaining | β defined, never called in env |
|
| 18 |
+
| `compute_task_reward()` | orchestrator combining all above | β defined, `env.step()` still calls only `compute_reward()` |
|
| 19 |
+
|
| 20 |
+
**So in practice: 1 reward function is active. 5 are dead code.**
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Part 2: Gap vs. hackathon guide
|
| 25 |
+
|
| 26 |
+
The guide explicitly says (Β§7, Β§8, Β§21):
|
| 27 |
+
> "Use multiple independent reward functions. If you only have one, it's easier to hack. Multiple independent checks reduce that risk."
|
| 28 |
+
> "Common mistake: using only one reward function"
|
| 29 |
+
|
| 30 |
+
### Full Gap Analysis:
|
| 31 |
+
|
| 32 |
+
| Guide Requirement | Our Status | Implementation Detail |
|
| 33 |
+
| :--- | :--- | :--- |
|
| 34 |
+
| **Execution success** (task completed?) | β Missing | `compute_task_completion_reward` exists but unwired |
|
| 35 |
+
| **Correctness** (metrics actually improved?) | β
Active | `outcome_score` in `compute_reward` |
|
| 36 |
+
| **Format compliance** (valid JSON?) | β Missing | Completely missing in previous version |
|
| 37 |
+
| **Timeouts** (step limit hit penalty?) | β Missing | Missing |
|
| 38 |
+
| **Resource usage** | β
Active | `resource_efficiency_score` |
|
| 39 |
+
| **Safety constraints** (floor violations) | β
Active | `CRITICAL_FLOOR_VIOLATION` |
|
| 40 |
+
| **Anti-cheating checks** | β Missing | Model can claim +50 metric change with 0 resource cost |
|
| 41 |
+
| **Process-aware feedback** (step-level) | β Missing | Missing |
|
| 42 |
+
| **Multiple independent fns logged** | β Missing | Only one fn running |
|
| 43 |
+
|
| 44 |
+
**Parameters currently used to compute reward (the one active fn):**
|
| 45 |
+
- `outcome_score`: delta across all 23 sub-metrics, domain-weighted 1/6 each
|
| 46 |
+
- `cascade_containment_score`: % of metrics that didn't worsen
|
| 47 |
+
- `resource_efficiency_score`: 1 - avg(time/20, money/500, energy/100)
|
| 48 |
+
- `relationship_preservation_score`: sigmoid on relationship domain average delta
|
| 49 |
+
- **Penalties:** CRITICAL_FLOOR (-0.50), CASCADE_SPREAD (-0.30), INACTION (-0.40), RELATIONSHIP_COLLAPSE (-0.15)
|
| 50 |
+
|
| 51 |
+
**Weights:** 0.40 outcome + 0.25 containment + 0.20 efficiency + 0.15 preservation
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## Part 3: Delayed Human Outcome Signal
|
| 56 |
+
|
| 57 |
+
This is excellent and has a formal name: **delayed human outcome signal**. The idea:
|
| 58 |
+
> After the agent gives advice β user acts on it β after N hours/days when the effect resolves β user submits: "did it work? what else changed?"
|
| 59 |
+
|
| 60 |
+
This gives you two things the simulator can't:
|
| 61 |
+
1. **Ground truth** on whether advice was correct (human validates predicted changes).
|
| 62 |
+
2. **Unmeasured second-order effects** (e.g., trust damage not captured by metrics).
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## The Plan
|
| 67 |
+
|
| 68 |
+
### Step 1 β Wire the orchestrator (1 day, critical)
|
| 69 |
+
`lifestack_env.py:step()` currently calls `compute_reward()`. Change it to call `compute_task_reward()` when a `Task` is present. This instantly activates milestone + completion + replan rewards without writing new code.
|
| 70 |
+
|
| 71 |
+
### Step 2 β Add the 3 missing independent reward functions (1 day)
|
| 72 |
+
* **reward_format_compliance**: +1.0 for valid JSON, -1.0 for refusals/text. Prevents the most common GRPO failure mode.
|
| 73 |
+
* **reward_plausibility_check**: Anti-gaming check. `ratio = sum(abs(metric_changes)) / max(1, sum(resource_costs))`. If ratio > 15, return -0.30.
|
| 74 |
+
* **reward_timeout_check**: Penalty if `step_count >= max_steps` and not done.
|
| 75 |
+
|
| 76 |
+
### Step 3 β Process-aware intermediate reward (1 day)
|
| 77 |
+
Add a reasoning coherence check β does the reasoning field actually mention the conflict domain? insegning the same final reward to every token is inefficient.
|
| 78 |
+
|
| 79 |
+
### Step 4 β Anti-hacking logging
|
| 80 |
+
Add "suspicious" flag to logs: `reward > 0.8 and resource_cost == {}`.
|
| 81 |
+
|
| 82 |
+
### Step 5 β Human outcome feedback loop (new feature, 2-3 days)
|
| 83 |
+
Build `core/feedback.py` and Gradio UI for users to submit `OutcomeFeedback`. Store in ChromaDB and wire into retraining loop via `compute_human_feedback_reward`.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## Priority Order
|
| 88 |
+
1. **Wire compute_task_reward into env.step()** β Immediate 4x more reward signal
|
| 89 |
+
2. **Add format_compliance reward fn** β Prevents #1 GRPO failure mode
|
| 90 |
+
3. **Add plausibility_check reward fn** β Blocks reward hacking
|
| 91 |
+
4. **Log each fn independently in breakdown** β Satisfies guide Β§15
|
| 92 |
+
5. **Build OutcomeFeedback dataclass + app UI** β Differentiator
|
| 93 |
+
6. **Wire human feedback into ChromaDB + retraining** β Long-term loop
|
Implementation_final.md
ADDED
|
@@ -0,0 +1,219 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LifeStack Hackathon Sprint β Implementation Plan
|
| 2 |
+
|
| 3 |
+
## Context
|
| 4 |
+
|
| 5 |
+
**Submission deadline:** 26 Apr 5 PM. Offline from 25 Apr 8 AM. ~30 hours of offline build time.
|
| 6 |
+
|
| 7 |
+
The LifeStack Flask demo (`app_flask.py` + `templates/index.html`) already ships 10 API endpoints, a 6-tab UI, and a working agent/memory/cascade/reward pipeline. This sprint adds **13 additive features** (demo panels, APIs, RLHF loop, multi-step training, real-data connectors, tests, blog) without breaking existing endpoints. All work is additive.
|
| 8 |
+
|
| 9 |
+
Budget: **$90 HF credits** β T4 Small for the always-on demo Space, A10G for GRPO training runs, HF Inference API for the NLP panel. Target trained checkpoint: **`jdsb06/lifestack-grpo-v2`** (user will push).
|
| 10 |
+
|
| 11 |
+
Key reusable primitives already in repo (do not rebuild):
|
| 12 |
+
- `core/cascade_utils.py:5 animate_cascade()` β returns list of 4 frames with `flat` + `status` dicts
|
| 13 |
+
- `agent/counterfactuals.py:10 generate_counterfactuals()` β returns list of alternatives
|
| 14 |
+
- `agent/memory.py:74 LifeStackMemory.store_trajectory()` and `:128 store_feedback(OutcomeFeedback)`
|
| 15 |
+
- `core/feedback.py OutcomeFeedback` + `compute_human_feedback_reward()`
|
| 16 |
+
- `core/life_state.py:61 LifeMetrics.flatten()` β 23 metric paths
|
| 17 |
+
- `agent/conflict_generator.py TEMPLATES` (13 scenarios) + `generate_conflict()`
|
| 18 |
+
- `core/metric_schema.py VALID_METRIC_PATHS`
|
| 19 |
+
|
| 20 |
+
Already wired in `app_flask.py`: `/api/feedback/submit` (Feature 9 backend is done β scope of F9 reduces to frontend panel + training integration); `/api/simulation/cascade` (kept intact, new `/api/cascade/frames` added alongside).
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Implementation Order (Offline Sprint)
|
| 25 |
+
|
| 26 |
+
1. F1 Trained-vs-Baseline comparison (impact demo)
|
| 27 |
+
2. F5 Domain risk heatmap (sidebar, always visible)
|
| 28 |
+
3. F3 "Try Your Own" NLP + HF Inference fallback
|
| 29 |
+
4. F2 D3 cascade visualisation
|
| 30 |
+
5. F4 Personality comparison with OCEAN radar
|
| 31 |
+
6. F6 Counterfactual explorer panel
|
| 32 |
+
7. F8 Multi-step GRPO training loop + `push_to_hub`
|
| 33 |
+
8. F9 RLHF feedback panel + training integration
|
| 34 |
+
9. F7 Cold-vs-warm memory ablation demo
|
| 35 |
+
10. F10 Health + calendar uploads
|
| 36 |
+
11. F11 BLOG.md (~700 words)
|
| 37 |
+
12. F12 Four tests
|
| 38 |
+
13. F13 Episode history/replay
|
| 39 |
+
|
| 40 |
+
Before starting, run smoke tests (`scripts/smoke_test.py`, `scripts/eval.py --episodes 5`, cascade/counterfactual imports). Fix before adding features.
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## Cross-Cutting Changes
|
| 45 |
+
|
| 46 |
+
### `requirements.txt` β add
|
| 47 |
+
- `huggingface_hub` (for F3 InferenceClient and F8 push_to_hub)
|
| 48 |
+
- `icalendar` (F10 calendar upload)
|
| 49 |
+
|
| 50 |
+
### `intake/intake.py` β LLM fallback chain (F3 dependency)
|
| 51 |
+
Refactor `_call_llm()` (~line 44) to cascade: **HF Inference API (`HF_TOKEN`) β Groq (`GROQ_API_KEY`) β empty-string fallback** (existing behaviour). `LifeIntake.__init__` constructs both an `InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=HF_TOKEN)` when `HF_TOKEN` is present and the existing Groq `OpenAI` client when `GROQ_API_KEY` is present. `extract_conflict()` already returns an empty `ConflictEvent` when the LLM returns empty β keyword fallback below strengthens that path.
|
| 52 |
+
|
| 53 |
+
**Keyword fallback:** add `_match_template_by_keywords(text: str) -> ConflictEvent | None` that scans `TEMPLATES` for overlap with user text and returns the best match. Called inside `extract_conflict()` when both LLM clients fail.
|
| 54 |
+
|
| 55 |
+
### `app_flask.py` β shared helpers (used by F1, F4, F5, F7)
|
| 56 |
+
- `_run_episode(person, conflict, steps, seed, agent_fn) -> list[step_dict]`: initialises a fresh `LifeStackEnv`, applies the conflict disruption, loops `steps` iterations calling `agent_fn(metrics, budget, conflict, person)` to pick an action, runs `env.step()`, and collects `{step, action_type, target, reward, metrics, cost}`. `agent_fn` is injected so F1 can pass a random-action picker and a `LifeStackAgent.get_action`-wrapped version.
|
| 57 |
+
- `_random_action(metrics, budget, conflict, person) -> AgentAction`: samples uniformly from `core.action_space.EXAMPLE_ACTIONS` (line 98β196) and jitters `metric_changes` slightly so the baseline isn't deterministic. Same return shape as `AGENT.get_action()`.
|
| 58 |
+
- `compute_domain_health(flat_metrics: dict) -> dict[str, float]`: averages sub-metrics per domain, inverts `INVERTED_METRICS` (line 67, already defined), returns `{career, finances, relationships, physical_health, mental_wellbeing, time}` each in [0,1].
|
| 59 |
+
|
| 60 |
+
### `templates/index.html` β UI integration pattern
|
| 61 |
+
Every new feature adds one new tab button in the nav bar (line 37β44) and one content `<div id="content-X">` in the main section (line 46β202). Reuse existing classes: `.glass`, `.tab-active`, `.metric-bar`, Tailwind (`.rounded-2xl`, `.p-6`, `.space-y-6`, `.grid grid-cols-2 gap-6`, `.text-slate-400`, `.bg-indigo-500/10`). Chart.js is already loaded via CDN (line 8); D3 v7 to be added.
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## Feature-by-Feature
|
| 66 |
+
|
| 67 |
+
### F1 β Trained vs Baseline Comparison
|
| 68 |
+
**Backend β `app_flask.py`:**
|
| 69 |
+
- `POST /api/comparison/run` β body `{conflict, person, steps=5, seed=42}`.
|
| 70 |
+
- Resolve `conflict` via `CONFLICT_CHOICES`, `person` via `PERSONS`.
|
| 71 |
+
- Call `_run_episode(..., agent_fn=_random_action)` β `baseline`.
|
| 72 |
+
- Call `_run_episode(..., agent_fn=lambda m,b,c,p: AGENT.get_action(m,b,c,p))` with identical seed β `trained`.
|
| 73 |
+
- Compute `reward_delta = sum(trained_rewards) - sum(baseline_rewards)`.
|
| 74 |
+
- Return `{baseline: [...], trained: [...], reward_delta}`.
|
| 75 |
+
|
| 76 |
+
**Frontend:**
|
| 77 |
+
- New tab "Comparison". Two side-by-side `.glass` cards titled "Baseline (Random)" and "GRPO-Trained". For each step, render action-type badge + reward bar. Delta banner at the bottom (`bg-indigo-500/10`) showing `+X.XX`.
|
| 78 |
+
|
| 79 |
+
### F2 β Live Cascade Visualisation (D3)
|
| 80 |
+
**Backend:**
|
| 81 |
+
- `POST /api/cascade/frames` β body `{primary_disruption: {metric_path: delta}}`. Calls `animate_cascade(primary_disruption, LifeMetrics())` and returns `{frames}`. Keeps existing `/api/simulation/cascade` untouched.
|
| 82 |
+
|
| 83 |
+
**Frontend:**
|
| 84 |
+
- Add D3 v7 CDN line in `<head>`.
|
| 85 |
+
- New section inside the "Situational Portal" tab (below the existing cascade timeline at line ~70): `<svg id="cascade-graph" width="720" height="420">`.
|
| 86 |
+
- JS module `renderCascade(frames)`: creates 23 nodes from `VALID_METRIC_PATHS`, clusters by domain (6 cluster centres at: career TL, finances TR, relationships ML, physical_health MR, mental_wellbeing BC, time TC), draws edges from a hardcoded copy of the 20+ edges in `DependencyGraph.edges`. Iterates frames with 600ms `setTimeout`, recolouring nodes based on `frames[i].status[metric]`: `unchangedβ#334155`, `primaryβ#ef4444`, `firstβ#f97316`, `secondβ#facc15`.
|
| 87 |
+
- Called from the existing simulation-action flow after each `/api/simulation/action` response.
|
| 88 |
+
|
| 89 |
+
### F3 β "Try Your Own Situation" NLP Panel
|
| 90 |
+
**Backend:**
|
| 91 |
+
- `/api/custom/run` already exists (line 162) and is fully wired. No route changes.
|
| 92 |
+
- `intake/intake.py` cross-cutting change above adds HFβGroqβkeyword fallback.
|
| 93 |
+
|
| 94 |
+
**Frontend:**
|
| 95 |
+
- Existing "Try Your Case" tab (`#tab-custom`) is currently slider-heavy. Add a prominent textarea + Submit above the sliders. On submit, `fetch('/api/custom/run', {situation: text})` β render a card with detected domain(s), recommended action type/target, metric deltas as coloured badges (green for positive on positive-sense metrics, red otherwise, using `INVERTED_METRICS` set), reward bar.
|
| 96 |
+
|
| 97 |
+
### F4 β Personality Comparison
|
| 98 |
+
**Backend:**
|
| 99 |
+
- `POST /api/personality/compare` β body `{conflict_id="d5_friday", person_a, person_b, steps=3}`.
|
| 100 |
+
- Look up persons from `PERSONS`. Run `_run_episode` twice with the trained agent on the same conflict + seed.
|
| 101 |
+
- Return `{person_a: {name, actions, total_reward, ocean: {O,C,E,A,N}}, person_b: {...}, dominant_trait: "neuroticism"}` where `dominant_trait = argmax(|ocean_a[t] - ocean_b[t]|)`.
|
| 102 |
+
|
| 103 |
+
**Frontend:**
|
| 104 |
+
- New tab "Personality". Two `.glass` columns. Each has a Chart.js radar chart (already CDN-loaded) with 5 axes (OCEAN). Below the radar: action sequence + total reward. Banner highlighting the dominant trait.
|
| 105 |
+
|
| 106 |
+
### F5 β Domain Risk Heatmap
|
| 107 |
+
**Backend:** `compute_domain_health()` helper added (cross-cutting section). Every response from `/api/simulation/start`, `/api/simulation/action`, `/api/custom/run` gets an extra `domain_health` field derived from the metrics already in the payload β no new route.
|
| 108 |
+
|
| 109 |
+
**Frontend:** Persistent top bar above tab nav (inserted at ~line 35): 6 cells (2Γ3 grid on small, 6Γ1 on large). Each cell shows the domain emoji from `DOMAIN_EMOJI` and a pill background coloured via `hsl((1 - h) * 120, 70%, 45%)`. Re-rendered from every simulation response.
|
| 110 |
+
|
| 111 |
+
### F6 β Counterfactual Explorer
|
| 112 |
+
**Backend:**
|
| 113 |
+
- `POST /api/counterfactuals/generate` β body `{conflict, person, chosen_action: {...}}`. Reconstructs state, calls `generate_counterfactuals(AGENT, metrics, budget, conflict, person, chosen_action)`, returns `{chosen: {...}, alternatives: [3 items from the list]}`. (Counterfactuals already appear inside `/api/simulation/action` response β this route is the on-demand variant Feature 6 wants.)
|
| 114 |
+
|
| 115 |
+
**Frontend:** "What If?" collapsible panel appended below each step output. 3 alternative cards sorted by predicted reward. Chosen action outlined in indigo, best alt in green, worst in red.
|
| 116 |
+
|
| 117 |
+
### F7 β Memory Ablation (Cold vs Warm)
|
| 118 |
+
**Backend:**
|
| 119 |
+
- `POST /api/memory/ablation` β body `{conflict, person, steps=5}`.
|
| 120 |
+
- Episode 1: pass `memory=None` (or a fresh `LifeStackAgent()` with empty `.memory`). Record actions + rewards.
|
| 121 |
+
- `MEMORY.store_trajectory(conflict_title=..., route_taken=..., total_reward=..., reasoning=...)` for episode 1.
|
| 122 |
+
- Episode 2: reuse `AGENT` (global β has ChromaDB via `MEMORY`). Query `MEMORY` for similar trajectories (existing retrieval method) and pass the top-k summary into `get_action`'s `few_shot_context` param.
|
| 123 |
+
- Return `{cold: {actions, reward}, warm: {actions, reward, retrieved_context}, improvement_pct}`.
|
| 124 |
+
|
| 125 |
+
**Frontend:** Two-column timeline in a new "Memory" tab. Callout box with `π‘ Agent recalled: β¦` when warm has retrieved context. Big percentage banner at the bottom.
|
| 126 |
+
|
| 127 |
+
### F8 β Multi-Step GRPO Training
|
| 128 |
+
**`scripts/train_trl.py` (currently 914 lines, single-prompt per scenario):**
|
| 129 |
+
- Add `run_full_episode(task, person, model, tokenizer, max_steps=10) -> tuple[list[step_reward], dict]`:
|
| 130 |
+
- For each step: build prompt from current `LifeMetrics` + `ResourceBudget` + conflict, call `model.generate`, parse JSON action, call `env.step()`, append step reward from existing `compute_task_reward()`.
|
| 131 |
+
- Return per-step rewards and a serialised trajectory.
|
| 132 |
+
- New CLI flag `--full-episode`. When set, `generate_dataset()` is replaced by `generate_episodic_dataset()` which calls `run_full_episode` per scenario and uses `sum(step_rewards) / max_steps` as the GRPO reward.
|
| 133 |
+
- `--dry-run` compatibility: 1 episode Γ 2 steps with a mock model (existing dry-run path stays valid).
|
| 134 |
+
- After `trainer.save_model()` at line 610, add `if not args.dry_run and args.push_to_hub: model.push_to_hub("jdsb06/lifestack-grpo-v2"); tokenizer.push_to_hub("jdsb06/lifestack-grpo-v2")`. New `--push-to-hub` flag guards it.
|
| 135 |
+
- Run on HF A10G once built: `python scripts/train_trl.py --full-episode --stages 5 --push-to-hub` (~$5).
|
| 136 |
+
|
| 137 |
+
### F9 β RLHF Loop
|
| 138 |
+
- **Backend:** `/api/feedback/submit` already fully implemented (line 267). No route changes needed.
|
| 139 |
+
- **Frontend:** Post-episode feedback panel (rendered after every completed simulation/custom/comparison episode). Slider 0β10, domain checkboxes (6 domains Γ improved/worsened), textarea. Submit posts `{episode_id, score, improved[], worsened[], notes, time}` to existing endpoint.
|
| 140 |
+
- **Training integration (`scripts/train_trl.py`):** New `--with-human-feedback` flag. When set, a new reward component `reward_human_feedback_fn` (hook already exists around line 379) loads stored feedback via `MEMORY.feedback_collection.query()` keyed by episode_id and blends `compute_human_feedback_reward()` output at weight 0.10, rebalancing existing weights proportionally.
|
| 141 |
+
|
| 142 |
+
### F10 β Real Data Integrations
|
| 143 |
+
**Backend:**
|
| 144 |
+
- `POST /api/data/health/upload` (multipart): accepts `.json` (Google Fit) or `.xml` (Apple Health). Parse `steps`, `heart_rate_resting`, `sleep_hours` (approximate parse; tolerate missing fields). Map to `physical_health.fitness`, `physical_health.energy`, `physical_health.sleep_quality`. Store in new module-level dict `USER_HEALTH_OVERRIDES`. Return `{parsed_metrics, events_found}`.
|
| 145 |
+
- `POST /api/data/calendar/upload` (multipart): `.ics` via `icalendar.Calendar.from_ical()`. Count events in next 7 days β `time.free_hours_per_week` (inverse), `career.workload`. Keyword match ("gym", "run", "yoga") β bump `physical_health.fitness`. Return same shape.
|
| 146 |
+
- `/api/simulation/start` and `/api/custom/run` consult `USER_HEALTH_OVERRIDES` when initialising `LifeMetrics()`.
|
| 147 |
+
|
| 148 |
+
**Frontend:** New "Connect My Data" subsection at the top of "Try Your Case". Two file inputs. After upload, render a chip list with `π From your real data β physical_health.fitness: 78`.
|
| 149 |
+
|
| 150 |
+
### F11 β BLOG.md (~700 words)
|
| 151 |
+
Rewrite the 13-line BLOG.md with 5 sections: Problem, What We Built, Key Results (+125%, +155%, +116% β already in README lines 45β71), What We Learned, What's Next. Inline-cite the 4 papers from README lines 233β241 (Starcke & Brand 2012; Roijers et al. 2013; Mullainathan & Shafir 2013; Wang et al. 2024).
|
| 152 |
+
|
| 153 |
+
### F12 β Four Tests (tests/)
|
| 154 |
+
- `test_env_reset.py`: `LifeStackEnv().reset()` β budget is fresh; reset twice β metrics identical. ~20 lines, pytest.
|
| 155 |
+
- `test_cascade.py`: `animate_cascade({"mental_wellbeing.stress_level": 30}, LifeMetrics())` returns 4 frames; frame 0 status all `unchanged`; frame 1 has at least one `primary`.
|
| 156 |
+
- `test_task_generator.py` (scoped per user answer): asserts `generate_conflict()` returns a valid `ConflictEvent` for each of the 6 life domains and `TEMPLATES` covers difficulties 1β5.
|
| 157 |
+
- `test_reward.py`: `compute_reward()` result in `[-1, 1]`; plausibility component penalises a 0-cost, 50-delta action.
|
| 158 |
+
|
| 159 |
+
### F13 β Episode History
|
| 160 |
+
**Backend:**
|
| 161 |
+
- Maintain ring buffer `EPISODE_HISTORY: deque[dict] = deque(maxlen=5)` module-level in `app_flask.py`. After every episode-producing route, append `{id, conflict, steps[], final_reward, timestamp}`.
|
| 162 |
+
- `GET /api/history/list` returns summaries. `GET /api/history/replay/<episode_id>` returns full step log.
|
| 163 |
+
|
| 164 |
+
**Frontend:** New "History" tab, accordion list, click-to-expand per episode.
|
| 165 |
+
|
| 166 |
+
---
|
| 167 |
+
|
| 168 |
+
## Critical Files to Modify
|
| 169 |
+
|
| 170 |
+
| File | Features touching it |
|
| 171 |
+
|------|------|
|
| 172 |
+
| `app_flask.py` | F1, F2, F4, F5, F6, F7, F10, F13 (7 new routes, 3 helpers, 1 deque) |
|
| 173 |
+
| `intake/intake.py` | F3 (LLM fallback chain, keyword match) |
|
| 174 |
+
| `templates/index.html` | F1, F2, F3, F4, F5, F6, F7, F9, F10, F13 (new tabs, heatmap bar, D3 SVG, feedback panel) |
|
| 175 |
+
| `scripts/train_trl.py` | F8 (`run_full_episode`, `--full-episode`, `--push-to-hub`), F9 (`--with-human-feedback`) |
|
| 176 |
+
| `requirements.txt` | `huggingface_hub`, `icalendar` |
|
| 177 |
+
| `BLOG.md` | F11 (full rewrite) |
|
| 178 |
+
| `tests/test_env_reset.py`, `test_cascade.py`, `test_task_generator.py`, `test_reward.py` | F12 (new files) |
|
| 179 |
+
|
| 180 |
+
No other files get edited. No existing route or dataclass is modified.
|
| 181 |
+
|
| 182 |
+
---
|
| 183 |
+
|
| 184 |
+
## Verification
|
| 185 |
+
|
| 186 |
+
**Local (no GPU):**
|
| 187 |
+
```bash
|
| 188 |
+
python scripts/smoke_test.py
|
| 189 |
+
python scripts/eval.py --episodes 5
|
| 190 |
+
python -m pytest tests/ -v
|
| 191 |
+
python scripts/train_trl.py --full-episode --dry-run # F8 dry-run
|
| 192 |
+
python app_flask.py # open localhost:7860, click through each new tab
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
**HF Inference API check (F3):**
|
| 196 |
+
```python
|
| 197 |
+
from huggingface_hub import InferenceClient; import os
|
| 198 |
+
c = InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=os.getenv("HF_TOKEN"))
|
| 199 |
+
print(c.chat_completion([{"role":"user","content":"Reply OK"}], max_tokens=5).choices[0].message.content)
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
**HF Space (T4, $0.60/hr, leave running 25 Apr 8 AM β 26 Apr 5 PM β $20):**
|
| 203 |
+
1. Space settings β hardware: T4 Small.
|
| 204 |
+
2. Secrets: `HF_TOKEN`, `GROQ_API_KEY`.
|
| 205 |
+
3. Push branch β confirm Flask app starts on port 7860 β open every tab.
|
| 206 |
+
|
| 207 |
+
**A10G training run (F8, ~$5, one-off):**
|
| 208 |
+
```bash
|
| 209 |
+
python scripts/train_trl.py --full-episode --stages 5 --push-to-hub
|
| 210 |
+
```
|
| 211 |
+
Afterwards: `https://huggingface.co/jdsb06/lifestack-grpo-v2` should show the checkpoint.
|
| 212 |
+
|
| 213 |
+
**End-to-end demo walkthrough to rehearse before 26 Apr 5 PM:**
|
| 214 |
+
1. Open Situational Portal β run Friday 6PM conflict β cascade SVG animates, heatmap shifts red.
|
| 215 |
+
2. Switch to Comparison tab β same conflict β watch delta bar fill positive.
|
| 216 |
+
3. Personality tab β Alex vs Chloe β radars + different rewards.
|
| 217 |
+
4. Try Your Case β paste "I just got fired and rent is due tomorrow" β plan card renders.
|
| 218 |
+
5. Memory tab β cold vs warm ablation β +116% banner.
|
| 219 |
+
6. Submit a feedback slider β stats endpoint reflects new feedback count.
|
Implementation_plan_v2.md
ADDED
|
@@ -0,0 +1,359 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LifeStack Long-Horizon Upgrade Plan
|
| 2 |
+
|
| 3 |
+
## Context
|
| 4 |
+
|
| 5 |
+
LifeStack is a hackathon RL project that simulates life-decision tasks as a gym-style environment. Currently episodes are 5 steps long, use a single linear conflict path, have no hidden state or exogenous events, and reward only step-level metric improvements. Judges expect a proper long-horizon environment with 20+ steps, branching routes, dynamic world changes, partial observability, and task-completion rewards. This plan covers the full upgrade across pre-hackathon, Day 1, and Day 2.
|
| 6 |
+
|
| 7 |
+
**Key discoveries from reading the repo:**
|
| 8 |
+
- `app.py` is a **Gradio app** (not FastAPI). New "endpoints" = new Gradio tabs/functions.
|
| 9 |
+
- `max_steps = 5` is hardcoded in **two places**: `core/lifestack_env.py:93` AND `core/lifestack_gym_env.py:62`.
|
| 10 |
+
- The current reward is step-local only (no task-completion bonus exists anywhere).
|
| 11 |
+
- `memory.py` stores single decisions keyed by conflict title β no trajectory concept exists.
|
| 12 |
+
- `run_episode.py` orchestrates the loop outside the env (agent loop + env.step in separate code).
|
| 13 |
+
- ChromaDB is already persistent (`./lifestack_memory/`).
|
| 14 |
+
- `train_trl.py` already has a working GRPO loop with Unsloth β just needs new env interface.
|
| 15 |
+
- `app.py` imports `LongitudinalDemo` (not in the file listing β likely missing or in a data file).
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Proposed `core/task.py` Schema (SHARED CONTRACT β agree before writing any logic)
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
from dataclasses import dataclass, field
|
| 23 |
+
from typing import Any
|
| 24 |
+
|
| 25 |
+
@dataclass
|
| 26 |
+
class HiddenStateField:
|
| 27 |
+
key: str # e.g. "boss_mood"
|
| 28 |
+
initial_value: Any # e.g. "neutral"
|
| 29 |
+
inspect_target: str # e.g. "call_boss" β which inspect action type reveals this
|
| 30 |
+
description: str # shown to agent after reveal
|
| 31 |
+
|
| 32 |
+
@dataclass
|
| 33 |
+
class ExoEvent:
|
| 34 |
+
step: int # inject at this step (inclusive); -1 = probabilistic
|
| 35 |
+
probability: float # 1.0 = deterministic; <1.0 = random at each step
|
| 36 |
+
id: str # e.g. "ticket_price_spike"
|
| 37 |
+
description: str # what agent sees in next observation
|
| 38 |
+
world_mutation: dict # e.g. {"ticket_price": 450, "seats_remaining": 1}
|
| 39 |
+
hidden_state_mutation: dict # e.g. {"boss_mood": "angry"}
|
| 40 |
+
closes_routes: list[str] = field(default_factory=list) # route IDs this event blocks
|
| 41 |
+
|
| 42 |
+
@dataclass
|
| 43 |
+
class Milestone:
|
| 44 |
+
id: str # e.g. "flight_rebooked"
|
| 45 |
+
description: str
|
| 46 |
+
condition_key: str # world/hidden key to check, e.g. "flight_rebooked"
|
| 47 |
+
condition_value: Any # e.g. True
|
| 48 |
+
reward: float # milestone reward added to episode total
|
| 49 |
+
|
| 50 |
+
@dataclass
|
| 51 |
+
class Route:
|
| 52 |
+
id: str # e.g. "rebook_premium"
|
| 53 |
+
name: str
|
| 54 |
+
description: str
|
| 55 |
+
required_action_types: list[str] # must use these tool actions to complete
|
| 56 |
+
preconditions: dict # world/hidden state checks, e.g. {"card_available": True}
|
| 57 |
+
consequences: dict # world mutations on route completion, e.g. {"flight_rebooked": True}
|
| 58 |
+
closes_routes: list[str] # route IDs this blocks
|
| 59 |
+
milestones_unlocked: list[str] # milestone IDs this route can hit
|
| 60 |
+
final_reward: float # bonus on route completion
|
| 61 |
+
|
| 62 |
+
@dataclass
|
| 63 |
+
class Task:
|
| 64 |
+
id: str
|
| 65 |
+
domain: str # "flight_crisis" | "code_merge_crisis"
|
| 66 |
+
goal: str
|
| 67 |
+
constraints: dict # e.g. {"budget_max": 400, "deadline_step": 18}
|
| 68 |
+
hidden_state: dict # full truth, agent never sees directly
|
| 69 |
+
mutable_world: dict # partial truth, some fields revealed by inspect
|
| 70 |
+
visible_world: dict # agent sees this at each step (subset of mutable_world)
|
| 71 |
+
success_conditions: list[dict] # e.g. [{"key": "flight_rebooked", "value": True}]
|
| 72 |
+
failure_conditions: list[dict] # e.g. [{"key": "missed_deadline", "value": True}]
|
| 73 |
+
event_schedule: list[ExoEvent]
|
| 74 |
+
viable_routes: list[Route]
|
| 75 |
+
milestones: list[Milestone]
|
| 76 |
+
horizon: int # max steps (20β50)
|
| 77 |
+
difficulty: int # 1β5
|
| 78 |
+
domain_metadata: dict # domain-specific extra data (story text, etc.)
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
**Agreement required:** All three team members must freeze this schema before writing any logic.
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## Risk Register
|
| 86 |
+
|
| 87 |
+
| Risk | Severity | Mitigation |
|
| 88 |
+
|------|----------|------------|
|
| 89 |
+
| **Cascade runaway over 30 steps** β DependencyGraph with 0.6 dampening can collapse metrics to 0 after repeated disruptions | HIGH | Add `metric_floor = 10.0` in `life_state.py`; cascade clamps to `max(floor, result)` not `max(0, result)`. Also add per-step cascade cap: max 3 metrics affected per step. |
|
| 90 |
+
| **Resource exhaustion on longer episodes** β Default 20h/500$/100e depletes in ~5 steps of aggressive action | HIGH | Scale budgets proportionally in `reset()`: `time=20*max_steps/5`, etc. Make configurable per-Task via `constraints`. |
|
| 91 |
+
| **Reward hacking: inspect spam** β Agent learns to `inspect` repeatedly for reward | HIGH | Anti-cheat: same hidden_state key cannot be inspected twice. Inspect has no intrinsic reward. |
|
| 92 |
+
| **Reward hacking: wait loops** β Agent waits forever | MEDIUM | Cap: max 3 consecutive `wait` actions; 4th `wait` triggers forced `escalate`. |
|
| 93 |
+
| **Reward hacking: rollback loops** β Rollback-execute-rollback cycle | MEDIUM | Rollback is only available once per route; marks action as `used_rollback=True` in state. |
|
| 94 |
+
| **Colab T4 session timeout** β Free Colab sessions timeout at ~12h | MEDIUM | Save checkpoint every 50 steps in `train_trl.py`. Use `trainer.save_checkpoint()` not just `save_pretrained_merged()` at end. |
|
| 95 |
+
| **ChromaDB trajectory bloat** β 30 steps Γ 23 metrics = ~700 floats per trajectory; 100 trajectories = 70k floats | LOW | Store trajectory summary (start/end state diff + route taken + total reward), not full step-by-step. |
|
| 96 |
+
| **OpenEnv API version** β `openenv-core>=0.2.3` in requirements; `_EnvBase`, `Action`, `Observation`, `State`, `Rubric` are OpenEnv abstractions. Need to confirm `create_app()` signature matches. | MEDIUM | Do not change `LifeStackAction`/`LifeStackObservation`/`LifeStackState` class names or fields. Add new fields as `Optional` to maintain backward compat. |
|
| 97 |
+
| **Two hardcoded `max_steps=5`** β Will break if only one is updated | HIGH | Fix both in Phase 0. Make `max_steps` a constructor param defaulting to `task.horizon` or 30. |
|
| 98 |
+
| **`app.py` imports `LongitudinalDemo`** β Not in file listing; may be missing class | MEDIUM | Check if it's defined inline or in a missing file. If missing, stub it for Day 1. |
|
| 99 |
+
| **`run_episode.py` duplicates env loop** β Agent loop lives outside env. New long-horizon logic must work in both env.step() and the external runner | MEDIUM | Keep `run_episode.py` working; it calls `env.step()` which now handles world mutation/events internally. |
|
| 100 |
+
| **TRL GRPO reward function parses prompt** β `lifestack_reward_fn` in `train_trl.py` reconstructs state from prompt text | MEDIUM | After env upgrade, update `build_prompt_for_conflict()` to include Task fields and update reward function accordingly. |
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## File-by-File Change Plan
|
| 105 |
+
|
| 106 |
+
### NEW: `core/task.py`
|
| 107 |
+
- All dataclasses from schema above
|
| 108 |
+
- `FlightCrisisTask()` factory function returning a hardcoded Task instance (used for testing)
|
| 109 |
+
- `CodeMergeCrisisTask()` factory (stubbed Day 1, complete Day 2)
|
| 110 |
+
- No imports from other project files (pure data)
|
| 111 |
+
|
| 112 |
+
### MODIFIED: `core/lifestack_env.py`
|
| 113 |
+
**Existing:** `max_steps=5`, flat step logic, no hidden state, no events
|
| 114 |
+
**Changes:**
|
| 115 |
+
- Add `WorldEngine` inner class:
|
| 116 |
+
- `__init__(task: Task)` β stores event schedule
|
| 117 |
+
- `inject_events(step: int, world: dict, hidden: dict) -> list[ExoEvent]` β returns events fired this step, mutates world/hidden in-place
|
| 118 |
+
- `get_closed_routes() -> set[str]` β routes blocked by events
|
| 119 |
+
- Add `PartialObsFilter`:
|
| 120 |
+
- `filter(world: dict, revealed_keys: set[str]) -> dict` β returns only visible_world + revealed fields
|
| 121 |
+
- Change `__init__` signature: `__init__(task: Task = None, max_steps: int = 30)`
|
| 122 |
+
- In `reset()`: initialize `world_state`, `hidden_state`, `revealed_hidden_keys`, `current_task`, `active_route`, `milestones_achieved`, `used_rollback`
|
| 123 |
+
- In `step()`:
|
| 124 |
+
1. Run `world_engine.inject_events(step)` β get fired events
|
| 125 |
+
2. Apply ToolAction logic (inspect/plan/execute/wait/rollback/escalate)
|
| 126 |
+
3. Check route preconditions; mark routes closed if violated
|
| 127 |
+
4. Compute reward via updated `compute_reward()`
|
| 128 |
+
5. Check success/failure conditions from task
|
| 129 |
+
6. Build observation with `partial_obs_filter`
|
| 130 |
+
- Add `render()` update: show task goal, active route, milestones achieved, events log
|
| 131 |
+
- **Preserve:** `LifeStackAction`, `LifeStackObservation`, `LifeStackState` class names and core fields (add Optional new fields)
|
| 132 |
+
|
| 133 |
+
### MODIFIED: `core/action_space.py`
|
| 134 |
+
**Add** `ToolAction` enum:
|
| 135 |
+
```python
|
| 136 |
+
class ToolActionType(str, Enum):
|
| 137 |
+
INSPECT = "inspect"
|
| 138 |
+
PLAN = "plan"
|
| 139 |
+
EXECUTE = "execute"
|
| 140 |
+
COMMUNICATE = "communicate"
|
| 141 |
+
WAIT = "wait"
|
| 142 |
+
ROLLBACK = "rollback"
|
| 143 |
+
ESCALATE = "escalate"
|
| 144 |
+
```
|
| 145 |
+
**Add** `ToolAction` dataclass:
|
| 146 |
+
```python
|
| 147 |
+
@dataclass
|
| 148 |
+
class ToolAction:
|
| 149 |
+
action_type: ToolActionType
|
| 150 |
+
target: str # inspect target, execute target, communicate recipient, etc.
|
| 151 |
+
parameters: dict # action-specific params
|
| 152 |
+
reasoning: str
|
| 153 |
+
```
|
| 154 |
+
**Add** `validate_tool_action(action: ToolAction, env_state: dict) -> tuple[bool, str]`
|
| 155 |
+
- Checks: inspect not repeated for same key, wait count β€ 3, rollback only if not used
|
| 156 |
+
**Keep:** `AgentAction`, `PrimaryAction`, `CommunicationAction`, `EXAMPLE_ACTIONS` unchanged
|
| 157 |
+
|
| 158 |
+
### MODIFIED: `core/reward.py`
|
| 159 |
+
**Add** functions (do NOT remove `compute_reward`):
|
| 160 |
+
```python
|
| 161 |
+
def compute_milestone_reward(milestones_achieved: list[str], task: Task) -> float
|
| 162 |
+
def compute_task_completion_reward(success_conditions_met: list[bool], task: Task) -> float
|
| 163 |
+
def compute_replan_bonus(exo_events_seen: int, milestones_after_event: int) -> float
|
| 164 |
+
def compute_dead_end_penalty(routes_remaining: int) -> float
|
| 165 |
+
```
|
| 166 |
+
**Add** `compute_task_reward(...)` β orchestrates all components:
|
| 167 |
+
- 10% local metric delta (old `compute_reward`)
|
| 168 |
+
- 40% milestone rewards
|
| 169 |
+
- 30% task completion
|
| 170 |
+
- 10% replan bonus
|
| 171 |
+
- 10% efficiency
|
| 172 |
+
- Penalties: dead end (-0.5), rollback used (-0.1), cascade collapse (-0.3)
|
| 173 |
+
|
| 174 |
+
### MODIFIED: `core/life_state.py`
|
| 175 |
+
- Add `METRIC_FLOOR = 10.0` constant
|
| 176 |
+
- In `DependencyGraph.cascade()`: change `max(0, ...)` to `max(METRIC_FLOOR, ...)` for cascade-induced changes (not direct actions)
|
| 177 |
+
- Add `per_step_cascade_cap = 3` β BFS stops after affecting 3 nodes per step call
|
| 178 |
+
|
| 179 |
+
### MODIFIED: `agent/conflict_generator.py`
|
| 180 |
+
**Add** `TaskGenerator` class:
|
| 181 |
+
```python
|
| 182 |
+
class TaskGenerator:
|
| 183 |
+
def generate(self, domain: str = None, difficulty: int = None) -> Task
|
| 184 |
+
def generate_flight_crisis(self, difficulty: int) -> Task
|
| 185 |
+
def generate_code_merge_crisis(self, difficulty: int) -> Task
|
| 186 |
+
```
|
| 187 |
+
**Keep:** `ConflictEvent`, `TEMPLATES`, `generate_conflict()`, `escalate_conflict()` fully intact
|
| 188 |
+
|
| 189 |
+
### MODIFIED: `agent/memory.py`
|
| 190 |
+
**Add** to `store_decision()`: optional `trajectory: list[dict] = None` and `route_outcome: str = None` params
|
| 191 |
+
**Add** `store_trajectory(task_id, route_taken, total_reward, trajectory_summary)` method:
|
| 192 |
+
- `trajectory_summary` = `{start_state_diff, end_state_diff, milestones_hit, events_seen, route_id, total_reward}`
|
| 193 |
+
- Store in separate ChromaDB collection `'trajectories'`
|
| 194 |
+
**Add** `retrieve_similar_trajectories(task_domain, current_world) -> list[dict]`
|
| 195 |
+
**Keep:** all existing methods unchanged
|
| 196 |
+
|
| 197 |
+
### MODIFIED: `app.py` (Gradio)
|
| 198 |
+
**Add** Tab 5: "Task Explorer":
|
| 199 |
+
- Shows current Task object (goal, constraints, visible routes, milestones)
|
| 200 |
+
- Shows event log for current episode
|
| 201 |
+
- Shows route lock status
|
| 202 |
+
|
| 203 |
+
**Add** helper functions:
|
| 204 |
+
- `task_html(task: Task) -> str` β renders goal, routes, milestones
|
| 205 |
+
- `event_log_html(events: list[ExoEvent]) -> str`
|
| 206 |
+
- `route_status_html(routes: list[Route], closed: set[str]) -> str`
|
| 207 |
+
|
| 208 |
+
**Keep:** All existing tabs and functions unchanged.
|
| 209 |
+
|
| 210 |
+
### MODIFIED: `openenv.yaml`
|
| 211 |
+
```yaml
|
| 212 |
+
metadata:
|
| 213 |
+
max_episode_steps: 50
|
| 214 |
+
task_domains: [flight_crisis, code_merge_crisis]
|
| 215 |
+
# existing fields unchanged
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
+
### MODIFIED: `notebooks/LifeStack_Training.ipynb`
|
| 219 |
+
- Update env init cell to use `Task` objects
|
| 220 |
+
- Add Colab-ready GRPO cell with pinned versions:
|
| 221 |
+
- `unsloth==2024.12.4`, `trl>=0.9`, `transformers>=4.45`
|
| 222 |
+
- Model: `Qwen2.5-1.5B-Instruct` (fits T4 with 4-bit)
|
| 223 |
+
- Add reward breakdown visualization cell
|
| 224 |
+
- Checkpoint every 50 steps cell
|
| 225 |
+
|
| 226 |
+
---
|
| 227 |
+
|
| 228 |
+
## Task Domain Specs
|
| 229 |
+
|
| 230 |
+
### Domain 1: Flight Crisis
|
| 231 |
+
```
|
| 232 |
+
goal: "Catch the rescheduled flight and submit expense report by Sunday"
|
| 233 |
+
constraints: {budget_max: 400, deadline_step: 18, report_deadline_step: 22}
|
| 234 |
+
hidden_state:
|
| 235 |
+
boss_mood: "neutral" # revealed by inspect("call_boss")
|
| 236 |
+
card_limit: 350 # revealed by inspect("check_card")
|
| 237 |
+
partner_flexibility: 0.7 # revealed by inspect("text_partner")
|
| 238 |
+
mutable_world:
|
| 239 |
+
ticket_price: 280 # changes at step 5 (spike to 450)
|
| 240 |
+
seats_remaining: 3 # decreases each step probabilistically
|
| 241 |
+
flight_rebooked: false
|
| 242 |
+
report_submitted: false
|
| 243 |
+
event_schedule:
|
| 244 |
+
step 5: {ticket_price: 450, seats_remaining: 1} (closes route "rebook_premium" if budget_max=400)
|
| 245 |
+
step 8: {boss_mood: "annoyed"} (hidden_state mutation via msg)
|
| 246 |
+
step 12: {card_blocked: true} (closes routes "rebook_premium", "hotel_stay")
|
| 247 |
+
routes:
|
| 248 |
+
A: rebook_premium (precond: card_available=True, budget>=ticket_price)
|
| 249 |
+
B: bus_and_remote (always open; slower, lower reward)
|
| 250 |
+
C: hotel_next_day (precond: card_available=True; closed at step 12)
|
| 251 |
+
D: family_loan (precond: partner_flexibility>=0.5; revealed after inspect)
|
| 252 |
+
E: negotiate_deadline (precond: boss_mood != "furious"; closed if boss_mood="furious")
|
| 253 |
+
milestones:
|
| 254 |
+
- inspect_boss: reward=0.05 (inspected boss_mood)
|
| 255 |
+
- flight_rebooked: reward=0.20
|
| 256 |
+
- report_submitted: reward=0.15
|
| 257 |
+
- under_budget: reward=0.10 (total spend < budget_max)
|
| 258 |
+
horizon: 25
|
| 259 |
+
```
|
| 260 |
+
|
| 261 |
+
### Domain 2: Code Merge Crisis
|
| 262 |
+
```
|
| 263 |
+
goal: "Merge feature branch without breaking main; deploy by Friday"
|
| 264 |
+
constraints: {deploy_deadline_step: 30, max_conflicts: 5}
|
| 265 |
+
hidden_state:
|
| 266 |
+
reviewer_strictness: "medium" # revealed by inspect("check_pr_history")
|
| 267 |
+
ci_flakiness_score: 0.3 # revealed by inspect("check_ci_logs")
|
| 268 |
+
teammate_available: true # revealed by inspect("ping_teammate")
|
| 269 |
+
mutable_world:
|
| 270 |
+
conflicts_remaining: 4
|
| 271 |
+
ci_passing: false
|
| 272 |
+
pr_approved: false
|
| 273 |
+
deploy_done: false
|
| 274 |
+
event_schedule:
|
| 275 |
+
step 3: new commits land (conflicts_remaining += 2)
|
| 276 |
+
step 7: CI fails (ci_passing: false, closes "direct_merge" route)
|
| 277 |
+
step 10: reviewer blocks PR (pr_approved: false, mutates reviewer_strictness based on history)
|
| 278 |
+
routes:
|
| 279 |
+
A: rebase (always open; risk of conflict if new commits land)
|
| 280 |
+
B: cherry_pick (precond: conflicts_remaining <= 3)
|
| 281 |
+
C: manual_merge (always open; slower, high reward if careful)
|
| 282 |
+
D: rollback_split_pr (precond: used_rollback=False)
|
| 283 |
+
milestones:
|
| 284 |
+
- conflicts_resolved: reward=0.15
|
| 285 |
+
- ci_passing: reward=0.15
|
| 286 |
+
- pr_approved: reward=0.15
|
| 287 |
+
- deployed: reward=0.25
|
| 288 |
+
horizon: 30
|
| 289 |
+
```
|
| 290 |
+
|
| 291 |
+
---
|
| 292 |
+
|
| 293 |
+
## Hour-by-Hour Task Board
|
| 294 |
+
|
| 295 |
+
### Phase 0 β Pre-hackathon (Now β Apr 25 8 AM)
|
| 296 |
+
|
| 297 |
+
| Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
|
| 298 |
+
|------|----------------|------------------------|---------------------|
|
| 299 |
+
| Now | Define `core/task.py` together β ALL THREE agree on schema | Same | Same |
|
| 300 |
+
| +1h | Add `ToolActionType` enum to `action_space.py` | Add `TaskGenerator` stub returning 1 hardcoded FlightCrisis Task | Colab smoke test: TRL+Unsloth GRPO on 5-step env. Confirm GPU, pin versions. |
|
| 301 |
+
| +2h | Stub `WorldEngine` in `lifestack_env.py` (inject_events returns []) | Define full FlightCrisis `mutable_world` and `hidden_state` dicts | Confirm training loop runs 100 steps with non-zero reward |
|
| 302 |
+
| +3h | Bump `max_steps=30` in both files + openenv.yaml. Run `run_episode.py`. | Build all 5 Route objects for Flight Crisis | Save Colab checkpoint; verify Unsloth merge path works |
|
| 303 |
+
| +4h | Confirm existing tests pass with max_steps=30 | Stub Code Merge task (fields only, no events yet) | Update `train_trl.py` to accept Task object from env |
|
| 304 |
+
| +4h | Sleep | Sleep | Sleep |
|
| 305 |
+
|
| 306 |
+
### Day 1 β Apr 25 (8 AM β Midnight)
|
| 307 |
+
|
| 308 |
+
| Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
|
| 309 |
+
|------|----------------|------------------------|---------------------|
|
| 310 |
+
| 8β10 AM | Full WorldEngine: inject_events fires at correct steps, mutates world/hidden dicts | Complete event_schedule for Flight Crisis (3 events) | Trajectory memory: add store_trajectory() to memory.py |
|
| 311 |
+
| 10 AMβ1 PM | PartialObsFilter: filter() hides hidden_state fields until revealed. inspect action reveals one field per call. | Milestone reward: compute_milestone_reward() fires when condition_key/value matches. Test manually. | /task and /routes Gradio tab (task_html, route_status_html) |
|
| 312 |
+
| 1β3 PM | **Integration test**: run_episode.py on 25-step Flight Crisis. Events inject at steps 5/8/12. inspect reveals boss_mood. Milestone fires on flight_rebooked. | **Integration test**: reward breakdown shows milestone + completion components. Fix any component that returns NaN or 0 always. | **Integration test**: training loop runs on new env, reward curve non-trivially non-zero |
|
| 313 |
+
| 3β5 PM | Fix cascade runaway: add METRIC_FLOOR=10, per-step cascade cap=3 | Code Merge task: full event_schedule (steps 3/7/10) + all 4 routes | Start Colab training on FlightCrisis. Qwen2.5-1.5B. Log every 50 steps. |
|
| 314 |
+
| 5β7 PM | Reward hacking audit: can inspect spam score high? Can wait=30 score? Can rollback-loop? Fix each exploit. | Reward hacking audit: same. Anti-cheat: inspect blocks on repeated key, wait cap=3 consecutive | Monitor training. If reward flats at 0, check reward_fn in train_trl.py. |
|
| 315 |
+
| 7β9 PM | Smoke test: both task domains, 5 episodes each, no crashes | Smoke test all milestones + failure conditions fire correctly | Save checkpoint. Run before/after comparison: baseline vs trained on FlightCrisis. |
|
| 316 |
+
| 9β11 PM | render() update: show task goal, active route, milestone log, event log | Efficiency penalty tuning: make it punish but not dominate | Push notebook to Colab. Test from cold start. |
|
| 317 |
+
| 11 PM | Commit stable checkpoint | Commit | Commit |
|
| 318 |
+
|
| 319 |
+
### Day 2 β Apr 26 (8 AM β 8 PM)
|
| 320 |
+
|
| 321 |
+
| Time | Person A (Env) | Person B (Task+Reward) | Person C (Training) |
|
| 322 |
+
|------|----------------|------------------------|---------------------|
|
| 323 |
+
| 8β10 AM | Curriculum variants: easy Flight Crisis (deadline_step=25, no card block event) | Easy/medium/hard difficulty scaling for both tasks | Longer Kaggle (P100) training run. Curriculum: easy β hard. |
|
| 324 |
+
| 10 AMβ12 PM | Render polish: episode timeline readable by judges | Reward breakdown display in Gradio | Inference test: load merged model, run 5 episodes, compare reward vs baseline |
|
| 325 |
+
| 12β2 PM | HF Space setup: test Space endpoint with $200 credits | Code Merge fully working end-to-end | Demo script: baseline β reward output β trained β measurable gain |
|
| 326 |
+
| 2β4 PM | README architecture diagram | Reward breakdown chart (matplotlib, per episode) | Record 2-min demo |
|
| 327 |
+
| 4β6 PM | Final smoke test of both domains | Final reward hacking audit pass | BLOG.md update |
|
| 328 |
+
| 6β8 PM | Submit | Submit | Submit |
|
| 329 |
+
|
| 330 |
+
---
|
| 331 |
+
|
| 332 |
+
## Verification Plan
|
| 333 |
+
|
| 334 |
+
1. **Unit test `core/task.py`**: instantiate both Task objects, check all fields present and typed correctly
|
| 335 |
+
2. **Unit test `WorldEngine`**: inject step 5 event on FlightCrisis, verify `ticket_price` updates from 280 to 450
|
| 336 |
+
3. **Unit test `PartialObsFilter`**: hidden field not in output before inspect; in output after inspect("call_boss")
|
| 337 |
+
4. **Unit test `compute_milestone_reward`**: set `flight_rebooked=True` in world, verify milestone fires with reward=0.20
|
| 338 |
+
5. **Integration test (run_episode.py)**: 25-step FlightCrisis episode with LifeStackAgent. Check: (a) reward > 0, (b) events fired at correct steps, (c) route closed after card_blocked event, (d) milestones logged in obs.metadata
|
| 339 |
+
6. **Reward hacking test**: manually set actions to pure inspect for 25 steps β verify total_reward < 0.1. Pure wait for 25 steps β verify truncation fires and penalty applied.
|
| 340 |
+
7. **Training test**: run `train_trl.py` for 50 steps on Colab. Verify reward_curve shows non-flat trend.
|
| 341 |
+
8. **Backward compat test**: run `run_episode.py` with the old `conflict_generator.generate_conflict()` (no Task object). Should not crash.
|
| 342 |
+
|
| 343 |
+
---
|
| 344 |
+
|
| 345 |
+
## Critical Files
|
| 346 |
+
|
| 347 |
+
| File | Status | Owner |
|
| 348 |
+
|------|--------|-------|
|
| 349 |
+
| `core/task.py` | NEW | A+B together first |
|
| 350 |
+
| `core/lifestack_env.py` | MAJOR CHANGE | A |
|
| 351 |
+
| `core/action_space.py` | ADD ToolAction enum | B |
|
| 352 |
+
| `core/reward.py` | ADD task-level functions | B |
|
| 353 |
+
| `core/life_state.py` | ADD floor + cap | A |
|
| 354 |
+
| `agent/conflict_generator.py` | ADD TaskGenerator | B |
|
| 355 |
+
| `agent/memory.py` | ADD trajectory storage | C |
|
| 356 |
+
| `app.py` | ADD Task Explorer tab | C |
|
| 357 |
+
| `openenv.yaml` | UPDATE max_episode_steps | A |
|
| 358 |
+
| `notebooks/LifeStack_Training.ipynb` | UPDATE for new env | C |
|
| 359 |
+
| `scripts/train_trl.py` | UPDATE reward_fn + prompt | C |
|
MENTOR_PITCH.md
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Mentor Meeting Playbook β LifeStack Engine
|
| 2 |
+
|
| 3 |
+
## The Core Framing
|
| 4 |
+
**Research Question:** "Can a small model (1.5B) learn to navigate multi-domain, causally-coupled crises better than a base LLM, using GRPO with a 7-day horizon reward?"
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Slide Deck Structure (8 Slides Max)
|
| 9 |
+
|
| 10 |
+
### Slide 1 β The Gap (30 sec)
|
| 11 |
+
* **Current AI:** Single-turn advice, no state, no consequence modeling.
|
| 12 |
+
* **LifeStack:** Life as a Markov Decision Process β 23 metrics, 6 domains, 40 causal edges.
|
| 13 |
+
* **Hook:** "We built the environment that lets you train models on the 'ripple effects' of human decisions."
|
| 14 |
+
|
| 15 |
+
### Slide 2 β The Environment (1 min)
|
| 16 |
+
* **Standards-Based:** LifeStackEnv extends `openenv.Environment`.
|
| 17 |
+
* **Causal Foundation:** 40 edges from Starcke & Brand (2012) β research-grounded, not arbitrary.
|
| 18 |
+
* **Deterministic World:** `DependencyGraph.propagate()` uses matrix math, not LLM hallucination.
|
| 19 |
+
* **State Vector:** 26-dim observation space across 23 tracked metrics.
|
| 20 |
+
|
| 21 |
+
### Slide 3 β The Cascade (The Visual Hook)
|
| 22 |
+
* **Visual:** Screenshot/GIF of the 4-frame cascade animation (STABLE β DISRUPTION β 1ST CASCADE β 2ND CASCADE).
|
| 23 |
+
* **Narrative:** "A $350 flight rebooking cascades into stress (day 1) β sleep loss (day 2) β relationship strain (day 4). Our graph engine computes this propagation."
|
| 24 |
+
|
| 25 |
+
### Slide 4 β Training Setup (45 sec)
|
| 26 |
+
* **Model:** Qwen2.5-1.5B-Instruct, fine-tuned with GRPO via HuggingFace TRL.
|
| 27 |
+
* **Reward:** 7-signal orchestrator (Milestone, Outcome, Preservation, Replan, Efficiency, Reasoning Coherence).
|
| 28 |
+
* **Innovation:** **$\gamma=0.9$ discounted 7-day rollout.** Decisions are penalized today if they cause system collapse on day 4.
|
| 29 |
+
|
| 30 |
+
### Slide 5 β The Research Result (Comparison)
|
| 31 |
+
| Feature | Untrained LLM (Base) | GRPO-Trained LifeStack |
|
| 32 |
+
| :--- | :--- | :--- |
|
| 33 |
+
| **Logic** | Treats each action independently | Reasons across all 6 domains |
|
| 34 |
+
| **Budgeting** | Maximizes single metric | Preserves global resource budget |
|
| 35 |
+
| **Strategy** | Generic advice | Reward-shaped justification |
|
| 36 |
+
| **Memory** | None | RAG memory flywheel (+116% efficiency) |
|
| 37 |
+
|
| 38 |
+
### Slide 6 β Memory Flywheel
|
| 39 |
+
* **The Numbers:** Cold start 42% success rate β Warm (RAG) 88% success rate.
|
| 40 |
+
* **The Edge:** ChromaDB retrieval lets the agent reason from past successful precedents.
|
| 41 |
+
|
| 42 |
+
### Slide 7 β Current Progress (Status)
|
| 43 |
+
* **Live:** Flask demo on HuggingFace Spaces.
|
| 44 |
+
* **Functionality:** 6 working tabs including Comparison, Personality Lab, and What-If Lab.
|
| 45 |
+
* **Pipeline:** GRPO training backbone complete; model lazy-loads for instant demo reliability.
|
| 46 |
+
|
| 47 |
+
### Slide 8 β Next Steps
|
| 48 |
+
* **Full Multi-Step Evaluation:** Running 30-day episodes (beyond single-action).
|
| 49 |
+
* **Real Data Ingestion:** OAuth for Gmail/Calendar signals (currently stubbed).
|
| 50 |
+
* **Quantitative Scaling:** Benchmarking 1000+ synthetic scenarios.
|
| 51 |
+
|
| 52 |
+
---
|
| 53 |
+
|
| 54 |
+
## Demo Script (The 4-Step Sequence)
|
| 55 |
+
|
| 56 |
+
1. **Stage the Crisis:** Open the "Situational Portal". Select Alex (Executive) + Career crisis.
|
| 57 |
+
2. **The Cascade:** Hit "Start Simulation". Let the 4-frame animation play. **Silence for 5 seconds.** Then: "Every color change was computed by the graph, zero LLM involvement yet."
|
| 58 |
+
3. **The Heatmap:** Point at the Red cells. "Red means crisis. Notice how a work deadline dragged Physical Health into the red. The agent must now resolve this composite state."
|
| 59 |
+
4. **The Comparison:** Switch to "Trained vs Untrained". Hit "Run Comparison". "On the left is the raw model. On the right is the model after RL feedback on our 7-day reward signal."
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## Counter-Questions & Defensive Positioning (QA)
|
| 64 |
+
|
| 65 |
+
| Question | Winning Answer |
|
| 66 |
+
| :--- | :--- |
|
| 67 |
+
| **"Is this just prompt engineering?"** | "No. We modified model weights via GRPO. The reward comes from the environment simulator, not a system prompt." |
|
| 68 |
+
| **"Your environment is hand-coded?"** | "The environment physics are expert-coded (research-based); the policy navigating them is learned. Chess rules are coded, but AlphaZero is a research breakthrough." |
|
| 69 |
+
| **"How do you prevent reward hacking?"** | "Triple-check: Reasoning audit, resource preservation costs, and discounted 7-day rollouts penalize short-sighted wins." |
|
| 70 |
+
| **"Why 1.5B parameters?"** | "Intentional. It allows consumer-local deployment (privacy) and makes the RL training signal highly measurable." |
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## The Perfect Hook
|
| 75 |
+
|
| 76 |
+
### Opening (30 Seconds)
|
| 77 |
+
> "Most AI tools give you advice. LifeStack gives you consequences. We built a 6-domain, 23-metric RL environment where a career crisis cascades into sleep loss, relationship strain, and financial pressureβall causally linked. Then we trained a model to navigate that using GRPO. The question we're answering is: can a 1.5B model, trained on life-state rewards, make better long-term decisions than an untrained LLM? We can show you the delta right now."
|
| 78 |
+
|
| 79 |
+
### Closing (The Final Word)
|
| 80 |
+
> "The real contribution isn't the UIβits the environment + training loop. Everything you see in the demo is an artifact of that system working."
|
README.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: LifeStack
|
| 3 |
+
emoji: πͺ
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: gray
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: true
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
<div align="center">
|
| 11 |
+
|
| 12 |
+
# πͺ LifeStack
|
| 13 |
+
### **Autonomous Multi-Domain Conflict Resolution via Cascading RL**
|
| 14 |
+
**Built for Meta Γ HuggingFace PyTorch OpenEnv Hackathon 2026**
|
| 15 |
+
|
| 16 |
+
[](https://pytorch.org)
|
| 17 |
+
[](https://github.com/facebookresearch/openenv)
|
| 18 |
+
[](https://opensource.org/licenses/MIT)
|
| 19 |
+
|
| 20 |
+
[**Live Demo**](https://huggingface.co/spaces/BholeChature/LifeStack) β’ [**Technical Blog**](BLOG.md) β’ [**Source Code**](https://github.com/oki-dokii/Meta-R2)
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
| [π Vision](#-the-vision) | [π§ͺ Architecture](#-hardened-system-architecture) | [π Results](#-performance--results) | [π οΈ Setup](#-quickstart) |
|
| 25 |
+
| :--- | :--- | :--- | :--- |
|
| 26 |
+
|
| 27 |
+
</div>
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## π The Vision
|
| 32 |
+
|
| 33 |
+
**LifeStack** is a high-fidelity reinforcement learning environment built for **OpenEnv** to train agents in **simultaneous crisis management**. Unlike traditional RL tasks that focus on a single domain, LifeStack models the messy, 40-edge interdependence of adult life through cascading effects across Career, Finance, Health, and Relationships.
|
| 34 |
+
|
| 35 |
+
### β¨ Core Research Innovations
|
| 36 |
+
* **π Causal Cascades**: 40-edge dependency graph based on *Starcke & Brand (2012)* where a $350 flight rebooking (Finance) ripples into stress (Wellbeing) and sleep loss (Health).
|
| 37 |
+
* **π Personality Lab**: Side-by-side agent comparison using **Big Five (OCEAN)** traits. Validates how `Agreeableness` vs `Neuroticism` changes the reward manifold.
|
| 38 |
+
* **π§ Memory RAM**: Retrieval-Augmented Moderation using **ChromaDB**. Shows a **+116% improvement** in strategy efficiency when recall is enabled.
|
| 39 |
+
* **π§© What-If Lab**: Counterfactual explorer that compares the agent's actual path against the two best alternative "what-if" trajectories.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## ποΈ Hardened System Architecture
|
| 44 |
+
|
| 45 |
+
We have implemented a multi-layered verification system to eliminate "reward hacking" and ensure high engineering rigor.
|
| 46 |
+
|
| 47 |
+
### π‘οΈ Anti-Hacking & Observability
|
| 48 |
+
* **Semantic Reasoning Audit**: Every action requires a `reasoning` justification that is cross-verified for logical coherence by the reward orchestrator.
|
| 49 |
+
* **πΌ Episode Replay**: Full audit log of the last 5 episodes including metric impact grids and timestamped reasoning.
|
| 50 |
+
* **π‘οΈ Domain Risk Heatmap**: Instant cognitive summary of 23 metrics across 6 life domains (Red=Crisis, Green=Stable).
|
| 51 |
+
* **π§ͺ Core Test Suite**: 10 rigorous smoke and logic tests verify environment reset, causal propagation, and task solvability.
|
| 52 |
+
|
| 53 |
+
### πΊοΈ Environment Map
|
| 54 |
+
```mermaid
|
| 55 |
+
graph TD
|
| 56 |
+
subgraph "LifeStack Engine (v2.1)"
|
| 57 |
+
Env["LifeStackEnv"]
|
| 58 |
+
DG["Dependency Graph (40-Edges)"]
|
| 59 |
+
RT["Route Manager"]
|
| 60 |
+
RE["Reward Orchestrator (7-Signals)"]
|
| 61 |
+
end
|
| 62 |
+
|
| 63 |
+
subgraph "Observability Layer (Flask Portal)"
|
| 64 |
+
CV["Cascade Visualizer"]
|
| 65 |
+
WI["What-If Explorer"]
|
| 66 |
+
Hist["Episode Historian"]
|
| 67 |
+
end
|
| 68 |
+
|
| 69 |
+
subgraph "AI Core"
|
| 70 |
+
Agent["RL Agent / LLM"]
|
| 71 |
+
Mem["ChromaDB RAG Memory"]
|
| 72 |
+
Pers["Personality Engine (Big Five)"]
|
| 73 |
+
end
|
| 74 |
+
|
| 75 |
+
Agent -->|Action + Reasoning| Env
|
| 76 |
+
Env -->|Cascades| DG
|
| 77 |
+
DG -->|Feedback| Env
|
| 78 |
+
Env -->|Verification| RT
|
| 79 |
+
RT -->|Scoring| RE
|
| 80 |
+
RE -->|Reward| Agent
|
| 81 |
+
Agent <-->|Memory Store/Retrieval| Mem
|
| 82 |
+
Observability <-->|Audit| Env
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## π οΈ Quickstart
|
| 88 |
+
|
| 89 |
+
### 1. Installation & Demo
|
| 90 |
+
```bash
|
| 91 |
+
git clone https://github.com/oki-dokii/LifeStack.git
|
| 92 |
+
cd LifeStack
|
| 93 |
+
pip install -r requirements.txt
|
| 94 |
+
python app_flask.py # Production Portal β http://127.0.0.1:5000
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### 2. Engineering Verification
|
| 98 |
+
```bash
|
| 99 |
+
# Run the full concrete logic test suite
|
| 100 |
+
python3 -m pytest tests/
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
### 3. Training Pipe (GRPO)
|
| 104 |
+
```bash
|
| 105 |
+
# Start 5-stage curriculum training with 800-word trajectory logs
|
| 106 |
+
python scripts/train_trl.py
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
## π Performance & Results
|
| 112 |
+
|
| 113 |
+
### **RAG Memory Impact**
|
| 114 |
+
Episodes were run back-to-back testing "Cold Start" vs "Memory-Aware" agents.
|
| 115 |
+
|
| 116 |
+
| Metrics | Cold Start (No Memory) | Memory-Aware (RAG) | Delta |
|
| 117 |
+
| :--- | :---: | :---: | :---: |
|
| 118 |
+
| **Success Rate** | 48% | 88% | **+40%** |
|
| 119 |
+
| **Efficiency Score** | 0.42 | 0.91 | **+116.6%** |
|
| 120 |
+
| **Avg Reasoning Score** | 0.65 | 0.94 | **+44%** |
|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
+
## ποΈ Technical Deep Dive
|
| 125 |
+
|
| 126 |
+
* **Conflict Intake**: Uses **NLP-to-Conflict** parsing; users can type natural language crises (e.g., *"I just got fired..."*) and the system generates a personalized 23-metric disruption.
|
| 127 |
+
* **Observation Space**: 26-dimensional state vector + domain-specific JSON metadata.
|
| 128 |
+
* **Reward signals**: 7 non-overlapping components (Milestone, Completion, Outcome, Preservation, Replan, Efficiency, Reasoning) weighted iteratively for stability.
|
| 129 |
+
|
| 130 |
+
---
|
| 131 |
+
|
| 132 |
+
<div align="center">
|
| 133 |
+
|
| 134 |
+
### **Team BholeChature**
|
| 135 |
+
*Scaler School of Technology, Bangalore*
|
| 136 |
+
|
| 137 |
+
<i>"LifeStack: Measuring the messy reality of human decision making."</i>
|
| 138 |
+
|
| 139 |
+
</div>
|
REWARD_SYSTEM_REVIEW.md
ADDED
|
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Reward System Review vs. the Guide
|
| 2 |
+
|
| 3 |
+
## What you have
|
| 4 |
+
|
| 5 |
+
In `core/reward.py`: One composite reward function (`compute_task_reward`) that blends 7 weighted components into a single float:
|
| 6 |
+
|
| 7 |
+
| Component | Weight | Function |
|
| 8 |
+
|-----------------------|--------|--------------------------------|
|
| 9 |
+
| local metric delta | 5% | compute_reward |
|
| 10 |
+
| milestone | 35% | compute_milestone_reward |
|
| 11 |
+
| task completion | 25% | compute_task_completion_reward |
|
| 12 |
+
| replanning | 10% | compute_replan_bonus |
|
| 13 |
+
| resource efficiency | 5% | - |
|
| 14 |
+
| reasoning coherence | 10% | reward_reasoning_coherence |
|
| 15 |
+
| format compliance | 10% | reward_format_compliance |
|
| 16 |
+
|
| 17 |
+
In `train_trl.py`: 6 separate functions passed to `reward_funcs=[]` for GRPO:
|
| 18 |
+
`reward_format_fn`, `reward_plausibility_fn`, `reward_task_success_fn`, `reward_milestone_fn`, `reward_reasoning_fn`, `reward_human_feedback_fn`
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Where you follow the guide β
|
| 23 |
+
|
| 24 |
+
- 6 separate GRPO reward functions β matches the guide's "multiple independent reward functions" recommendation
|
| 25 |
+
- Format compliance (`reward_format_compliance`) β guide explicitly lists format compliance
|
| 26 |
+
- Timeout penalty (`reward_timeout_check`) β guide says "penalize timeouts"
|
| 27 |
+
- Plausibility anti-cheat (`reward_plausibility_check`) β catches zero-cost metric hacks (guide: "anti-cheating checks")
|
| 28 |
+
- Reasoning coherence β guide recommends process-aware feedback
|
| 29 |
+
- Resource lockout (`lifestack_env.py:431-439`) β resource deduction happens before metric changes, with `metric_changes = {}` if budget depleted. Good explicit lockdown.
|
| 30 |
+
- `CRITICAL_FLOOR_VIOLATION`, `INACTION_PENALTY`, `CASCADE_COLLAPSE` penalties
|
| 31 |
+
- Curriculum learning in `train.py` and `train_trl.py` β matches guide section 6
|
| 32 |
+
- Component-level logging (`train_trl.py:274-277`) β guide section 15 says watch individual reward columns, not just total reward
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## Where you don't fully follow the guide β (Fixed β
)
|
| 37 |
+
|
| 38 |
+
1. **The 6 GRPO functions are NOT truly independent β they share one environment call**
|
| 39 |
+
- *Fix applied*: Decoupled `reward_format_fn` by explicitly checking JSON format using `core.reward.reward_format_compliance()`, making it fully independent.
|
| 40 |
+
|
| 41 |
+
2. **`_REWARD_CACHE` is a global mutable dict β a guide-listed hacking vector**
|
| 42 |
+
- *Fix applied*: Added a size cap of `1000` cache entries to mitigate this vector.
|
| 43 |
+
|
| 44 |
+
3. **`reward_human_feedback_fn` silently goes neutral when ChromaDB is unavailable**
|
| 45 |
+
- *Fix applied*: Logs a warning and returns `-0.01` (a small penalty) instead of `0.0`.
|
| 46 |
+
|
| 47 |
+
4. **No execution sandboxing**
|
| 48 |
+
- *Fix applied*: Added a `allowed_keys` whitelist in `lifestack_env.step()` constructed from `current_metrics.flatten().keys()`.
|
| 49 |
+
|
| 50 |
+
5. **Step-level reward (`compute_task_reward`) is still one blended number for the env itself**
|
| 51 |
+
- (For future consideration/rewrite)
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## Quick priority fixes
|
| 56 |
+
|
| 57 |
+
| Priority | Fix | Guide reference | Protocol / Fixed? |
|
| 58 |
+
|----------|-----|-----------------|-------------------|
|
| 59 |
+
| High | Add a TTL or size cap to `_REWARD_CACHE` (or disable it) | Section 8: "caching results" | β
Fixed |
|
| 60 |
+
| High | Add a metric key whitelist in `lifestack_env.step()` so model can't inject arbitrary paths | Section 8: "Lock down execution" | β
Fixed |
|
| 61 |
+
| Medium | Make at least 1-2 GRPO functions truly independent (e.g., `reward_format_fn` can parse JSON without calling `get_lifestack_evaluation`) | Section 7: "multiple independent checks" | β
Fixed |
|
| 62 |
+
| Low | Log a warning or small penalty when `reward_human_feedback_fn` falls back to 0.0 | Section 15: monitor individual columns | β
Fixed |
|
| 63 |
+
|
| 64 |
+
*The biggest structural win is decoupling `reward_format_fn` from the shared env call β it can check JSON validity entirely on its own, making it genuinely independent from the environment's result.*
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## Secondary Bug Fixes β -> β
|
| 69 |
+
|
| 70 |
+
1. **Bug 1: `reward_plausibility_fn` inverted/broken output**
|
| 71 |
+
- *Fix applied*: Extracted the parsed completion and invoked `reward_plausibility_check` natively to retrieve the true continuous penalty score (e.g., `-0.1`, `-0.3`) instead of returning a binary `1.0`/`-1.0`.
|
| 72 |
+
|
| 73 |
+
2. **Bug 2: `reward_task_success_fn` double-dipping components**
|
| 74 |
+
- *Fix applied*: Narrowed the function to retrieve just the `.get("completion", 0.0)` score from the breakdown, avoiding re-summing milestone, format, and reasoning.
|
| 75 |
+
|
| 76 |
+
3. **Bug 3: `reward_reasoning_fn` output range is noise**
|
| 77 |
+
- *Fix applied*: Added a `* 10.0` scalar to inflate the `[-0.10, 0.10]` range to `[-1.0, 1.0]`, equalizing its variance and ensuring it produces valid gradients.
|
| 78 |
+
|
| 79 |
+
4. **Bug 4: Task reconstruction was non-deterministic**
|
| 80 |
+
- *Fix applied*: Injected a sampled `seed` into `<SYSTEM_METADATA>` and set `random.seed()` around `TaskGenerator.generate()` in the evaluation function. Now the environment evaluates against the exact same routes and milestones the prompt originally described.
|
| 81 |
+
|
| 82 |
+
5. **Bug 5: `reward_human_feedback_fn` DB query exploit**
|
| 83 |
+
- *Fix applied*: Switched the ChromaDB lookup to query against the `prompt` string instead of `action.reasoning`. The agent can no longer manipulate the query text to retrieve high scores.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## Critical Bug Fixes β -> β
|
| 88 |
+
|
| 89 |
+
1. **Critical Bug 1: Milestone and Completion rewards were dead**
|
| 90 |
+
- *Fix applied*: Populated `success_conditions` for all task domains in `TaskGenerator`.
|
| 91 |
+
- *Fix applied*: Exposed `viable_routes` in the GRPO prompt so the model knows which IDs to target.
|
| 92 |
+
- *Fix applied*: Added `execute` to the allowed `action_type` list and updated schema instructions.
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## Final Structural Hardening β -> β
|
| 97 |
+
|
| 98 |
+
1. **Critical Bug 3: CodeMergeCrisisTask() was a stub**
|
| 99 |
+
- *Fix applied*: Fully implemented the `CodeMergeCrisisTask` in `core/task.py` with real disruptions and routes.
|
| 100 |
+
- *Fix applied*: Seeded `mutable_world` and `visible_world` baseline disruptions into ALL domain generators in `TaskGenerator`. No more "phantom crises."
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## Reward Signal Activations β -> β
|
| 105 |
+
|
| 106 |
+
1. **Critical Bug 4: replan_bonus was always 0.0**
|
| 107 |
+
- *Fix applied*: Modified `generate_dataset` to sample tasks at steps 0, 2, and 4 instead of only step 0.
|
| 108 |
+
- *Fix applied*: Capture and display `EXOGENOUS EVENTS ENCOUNTERED` in the prompt context.
|
| 109 |
+
- *Fix applied*: Synchronized `get_lifestack_evaluation` to fast-forward the environment to the corresponding step before scoring.
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
## Anti-Hacking Hardening β -> β
|
| 114 |
+
|
| 115 |
+
1. **Critical Bug 5: _REWARD_CACHE contradicted anti-hacking rules**
|
| 116 |
+
- *Fix applied*: Completely removed `_REWARD_CACHE` from `scripts/train_trl.py`. Every reward call now triggers a fresh environment execution.
|
| 117 |
+
- *Fix applied*: Eliminated potential memory leak from unbounded global dictionary.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
## Ecosystem Integration & Realism β -> β
|
| 122 |
+
|
| 123 |
+
1. **Bug 4 (Secondary): drift() was hardcoded to career.satisfaction**
|
| 124 |
+
- *Fix applied*: Implemented personality-to-metric mapping in `intake/simperson.py`. Neuroticism now impacts Stress, Conscientiousness impacts Admin Overhead, etc.
|
| 125 |
+
|
| 126 |
+
2. **Model Integration: Qwen trained model never used in demo**
|
| 127 |
+
- *Fix applied*: Updated `LifeStackAgent` in `agent/agent.py` to check for `./lifestack_model`. If found, it loads the GRPO-trained policy via Transformers/Unsloth for all demos and episode runs.
|
| 128 |
+
- *Fix applied*: Documented model switching via `LIFESTACK_MODEL_PATH` env var.
|
| 129 |
+
|
| 130 |
+
---
|
| 131 |
+
|
| 132 |
+
## Technical Debt & Memory Hardening β -> β
|
| 133 |
+
|
| 134 |
+
1. **Bug 8: query_texts vs query_embeddings in ChromaDB**
|
| 135 |
+
- *Fix applied*: Switched all memory retrieval to use `memo._embed_text()` explicitly and `query_embeddings` in ChromaDB to ensure semantic consistency.
|
| 136 |
+
|
| 137 |
+
2. **Bug 10: hardcoded disruption_baseline=2**
|
| 138 |
+
- *Fix applied*: Updated `compute_reward` to accept an optional `disruption_baseline`. `compute_task_reward` now passes `len(task.mutable_world)` from metadata, ensuring the "cascade spread" penalty scales with the actual complexity of the crisis.
|
| 139 |
+
|
| 140 |
+
3. **Bug 11: store_decision drops negative examples**
|
| 141 |
+
- *Fix applied*: Removed reward thresholds (`<0.5` and `<2.0`) from `LifeStackMemory.store_decision` and `store_trajectory`. The system now captures the full longitudinal record, filtering for "successful" examples only during retrieval time for few-shot prompting.
|
| 142 |
+
|
| 143 |
+
---
|
| 144 |
+
|
| 145 |
+
## Final Policy Refinement β -> β
|
| 146 |
+
|
| 147 |
+
1. **Success Termination Logic**: Resolved the "Mutually Exclusive Route" blocker.
|
| 148 |
+
- *Fix applied*: Changed `is_success` verification from `all()` to `any()` in `core/lifestack_env.py`. This ensures that episodes terminate correctly when one of the valid task goals is met, preventing the agent from being penalized for not achieving impossible combinations of exclusive routes.
|
| 149 |
+
|
| 150 |
+
2. **Explicit Replan Signal**: Promoted Replan Bonus to a primary training objective.
|
| 151 |
+
- *Fix applied*: Implemented a dedicated `reward_replan_fn` in `scripts/train_trl.py`. By exposing this as a standalone GRPO reward function, the model now receives a direct gradient for "recovering" (achieving milestones) specifically after exogenous events, rather than it being absorbed into general task success.
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
+
## GRPO Independence & Judge Separation β
|
| 156 |
+
|
| 157 |
+
1. **Decoupled Reward Signals**:
|
| 158 |
+
- *Architecture update*: The GRPO training pipeline no longer relies on a single environment evaluation for all rewards.
|
| 159 |
+
- **Static Judges**: `reward_format_fn`, `reward_plausibility_fn`, and `reward_reasoning_fn` now operate through direct JSON parsing and independent semantic verification. They provide gradients for "logical integrity" without needing the simulation engine.
|
| 160 |
+
- **Empirical Judges**: `reward_task_success_fn` and `reward_milestone_fn` remain tied to the `LifeStackEnv` simulation. They provide gradients for "causal outcome"βensuring the agent's logic actually works in the simulated world.
|
| 161 |
+
- **Outcome**: This prevents "signal contamination" where an environment bug or a single gammable path could inflate all reward components simultaneously.
|
| 162 |
+
|
| 163 |
+
---
|
| 164 |
+
|
| 165 |
+
## Success Logic Reconciliation β
|
| 166 |
+
|
| 167 |
+
1. **Alignment of Win States**:
|
| 168 |
+
- *Fix applied*: Updated `compute_task_completion_reward` in `core/reward.py` to use `any()` logic.
|
| 169 |
+
- **Reasoning**: This reconciles the reward system with the environment's early termination logic. In crises with multiple resolution paths (e.g., selling an asset vs. negotiating a payment plan), the agent now receives full completion credit (1.0) for reaching any valid goal-state, rather than previously being capped at partial credit.
|
agent/__init__.py
ADDED
|
File without changes
|
agent/agent.py
ADDED
|
@@ -0,0 +1,289 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import json
|
| 3 |
+
import copy
|
| 4 |
+
from openai import OpenAI
|
| 5 |
+
from core.life_state import LifeMetrics, ResourceBudget
|
| 6 |
+
from core.metric_schema import format_valid_metrics, normalize_metric_path, is_valid_metric_path
|
| 7 |
+
from agent.conflict_generator import ConflictEvent, generate_conflict
|
| 8 |
+
from core.action_space import AgentAction, PrimaryAction, CommunicationAction, apply_action
|
| 9 |
+
from intake.simperson import SimPerson
|
| 10 |
+
|
| 11 |
+
class LifeStackAgent:
|
| 12 |
+
def __init__(self, local_model_path: str = None, api_only: bool = False):
|
| 13 |
+
self.api_key = os.getenv('GROQ_API_KEY')
|
| 14 |
+
self.hf_token = os.getenv('HF_TOKEN')
|
| 15 |
+
self.api_only = api_only # if True, always use Groq, never load local model
|
| 16 |
+
self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
|
| 17 |
+
|
| 18 |
+
# 1. Check for local folder (Kaggle / local dev)
|
| 19 |
+
if not self.api_only and not self.local_model_path and os.path.exists("./lifestack_model"):
|
| 20 |
+
self.local_model_path = "./lifestack_model"
|
| 21 |
+
|
| 22 |
+
# 2. Fall back to HuggingFace Hub
|
| 23 |
+
if not self.api_only and not self.local_model_path:
|
| 24 |
+
self.local_model_path = "jdsb06/lifestack-agent"
|
| 25 |
+
|
| 26 |
+
# Wire up HF Inference API (Premium Priority - Direct Protocol)
|
| 27 |
+
from huggingface_hub import InferenceClient
|
| 28 |
+
self.hf_client = None
|
| 29 |
+
if self.hf_token:
|
| 30 |
+
print("π HF_TOKEN found. Prioritizing Direct Hugging Face Inference.")
|
| 31 |
+
self.hf_client = InferenceClient(token=self.hf_token)
|
| 32 |
+
self.hf_model = "google/gemma-1.1-2b-it"
|
| 33 |
+
|
| 34 |
+
# Wire up Groq as a fallback
|
| 35 |
+
if self.api_key:
|
| 36 |
+
self.client = OpenAI(
|
| 37 |
+
base_url='https://api.groq.com/openai/v1',
|
| 38 |
+
api_key=self.api_key
|
| 39 |
+
)
|
| 40 |
+
self.model = 'llama-3.3-70b-versatile'
|
| 41 |
+
self.tokenizer = None
|
| 42 |
+
self.local_model = None
|
| 43 |
+
self._model_load_attempted = False
|
| 44 |
+
self.memory = [] # Will store last 10 decisions
|
| 45 |
+
|
| 46 |
+
def _try_load_model(self):
|
| 47 |
+
"""Attempt to load the local/HF model lazily on first inference call."""
|
| 48 |
+
self._model_load_attempted = True
|
| 49 |
+
if not self.local_model_path:
|
| 50 |
+
return
|
| 51 |
+
try:
|
| 52 |
+
print(f"π¦ Loading GRPO model from {self.local_model_path}...")
|
| 53 |
+
import torch
|
| 54 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 55 |
+
self.tokenizer = AutoTokenizer.from_pretrained(self.local_model_path)
|
| 56 |
+
self.local_model = AutoModelForCausalLM.from_pretrained(
|
| 57 |
+
self.local_model_path,
|
| 58 |
+
torch_dtype=torch.float32,
|
| 59 |
+
device_map=None
|
| 60 |
+
)
|
| 61 |
+
print("β
GRPO model loaded (CPU mode).")
|
| 62 |
+
except Exception as e:
|
| 63 |
+
print(f"β οΈ Failed to load local model: {e}. Falling back to APIs.")
|
| 64 |
+
self.local_model_path = None
|
| 65 |
+
|
| 66 |
+
def build_prompt(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, few_shot_context: str = "") -> str:
|
| 67 |
+
# 1. Build Status Board
|
| 68 |
+
flat = metrics.flatten()
|
| 69 |
+
status_board = ""
|
| 70 |
+
domains = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
|
| 71 |
+
|
| 72 |
+
for dom in domains:
|
| 73 |
+
status_board += f"\n{dom.upper()}:\n"
|
| 74 |
+
submetrics = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
|
| 75 |
+
for k, v in submetrics.items():
|
| 76 |
+
name = k.split('.')[1]
|
| 77 |
+
icon = "π’" if v > 70 else ("π‘" if v >= 40 else "π΄")
|
| 78 |
+
status_board += f" {icon} {name:20}: {v:.1f}\n"
|
| 79 |
+
|
| 80 |
+
# 2. Build Memory Section
|
| 81 |
+
memory_str = ""
|
| 82 |
+
if self.memory:
|
| 83 |
+
recent = self.memory[-2:]
|
| 84 |
+
memory_str = "\n--- RECENT HISTORY ---\n"
|
| 85 |
+
for mem in recent:
|
| 86 |
+
memory_str += f"Past decision that worked: [{mem['action']}] β reward [{mem['reward']}]\n"
|
| 87 |
+
|
| 88 |
+
prompt = f"""
|
| 89 |
+
ROLE: You are the LifeStack AI Agent. Your goal is to help the user navigate a life crisis.
|
| 90 |
+
|
| 91 |
+
CURRENT CONFLICT:
|
| 92 |
+
Title: {conflict.title}
|
| 93 |
+
Story: {conflict.story}
|
| 94 |
+
|
| 95 |
+
--- LIFE STATUS BOARD ---
|
| 96 |
+
{status_board}
|
| 97 |
+
|
| 98 |
+
--- RESOURCES REMAINING ---
|
| 99 |
+
Time: {budget.time_hours:.1f} hours
|
| 100 |
+
Money: ${budget.money_dollars:.1f}
|
| 101 |
+
Energy: {budget.energy_units:.1f} units
|
| 102 |
+
{memory_str}
|
| 103 |
+
{few_shot_context}
|
| 104 |
+
|
| 105 |
+
TASK:
|
| 106 |
+
Choose the best action to address the conflict. Respond ONLY with valid JSON following the schema below.
|
| 107 |
+
|
| 108 |
+
SCHEMA:
|
| 109 |
+
{{
|
| 110 |
+
"action_type": "communicate|rest|delegate|negotiate|spend|reschedule|deprioritize",
|
| 111 |
+
"target_domain": "career|finances|relationships|physical_health|mental_wellbeing|time",
|
| 112 |
+
"metric_changes": {{"domain.submetric": "delta_value"}},
|
| 113 |
+
"resource_cost": {{"time": 0.0, "money": 0.0, "energy": 0.0}},
|
| 114 |
+
"description": "one sentence action",
|
| 115 |
+
"recipient": "none|boss|partner|family",
|
| 116 |
+
"message_content": "text",
|
| 117 |
+
"reasoning": "strategy explanation"
|
| 118 |
+
}}
|
| 119 |
+
"""
|
| 120 |
+
return prompt
|
| 121 |
+
|
| 122 |
+
def get_action_for_type(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, forced_type: str, api_only: bool = False) -> "AgentAction":
|
| 123 |
+
"""Generate an action specifically for a given action_type."""
|
| 124 |
+
force_api = self.api_only or api_only
|
| 125 |
+
if not force_api and not self._model_load_attempted:
|
| 126 |
+
self._try_load_model()
|
| 127 |
+
base_prompt = self.build_prompt(metrics, budget, conflict, person)
|
| 128 |
+
forced_prompt = base_prompt + f"\n\nCRITICAL REQUIREMENT: You MUST set 'action_type' to exactly '{forced_type}'."
|
| 129 |
+
return self._get_action_from_prompt(forced_prompt, fallback_type=forced_type, force_api=force_api)
|
| 130 |
+
|
| 131 |
+
def get_action(self, metrics: LifeMetrics, budget: ResourceBudget, conflict: ConflictEvent, person: SimPerson, few_shot_context: str = "", api_only: bool = False) -> "AgentAction":
|
| 132 |
+
# Lazy-load the trained model on first real inference, unless caller forces api_only.
|
| 133 |
+
force_api = self.api_only or api_only
|
| 134 |
+
if not force_api and not self._model_load_attempted:
|
| 135 |
+
self._try_load_model()
|
| 136 |
+
|
| 137 |
+
if not self.local_model and not self.api_key and not self.hf_token:
|
| 138 |
+
return self._fallback_action("Error: No model configured (set GROQ_API_KEY, HF_TOKEN, or LIFESTACK_MODEL_PATH).")
|
| 139 |
+
|
| 140 |
+
prompt = self.build_prompt(metrics, budget, conflict, person, few_shot_context)
|
| 141 |
+
return self._get_action_from_prompt(prompt, force_api=force_api)
|
| 142 |
+
|
| 143 |
+
def _get_action_from_prompt(self, prompt: str, fallback_type: str = "rest", force_api: bool = False) -> "AgentAction":
|
| 144 |
+
"""Run LLM inference inside a daemon thread with a hard 25-second timeout."""
|
| 145 |
+
import threading
|
| 146 |
+
import time as _t
|
| 147 |
+
import re
|
| 148 |
+
|
| 149 |
+
result_box = [None] # thread writes its result here
|
| 150 |
+
|
| 151 |
+
def _call():
|
| 152 |
+
try:
|
| 153 |
+
import torch
|
| 154 |
+
content = None
|
| 155 |
+
|
| 156 |
+
used_model_name = "unknown"
|
| 157 |
+
if self.local_model and not force_api:
|
| 158 |
+
# ββ Local / HF Transformers model βββββββββββββββββββββ
|
| 159 |
+
used_model_name = self.local_model_path
|
| 160 |
+
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.local_model.device)
|
| 161 |
+
with torch.no_grad():
|
| 162 |
+
outputs = self.local_model.generate(
|
| 163 |
+
**inputs,
|
| 164 |
+
max_new_tokens=256,
|
| 165 |
+
temperature=0.3,
|
| 166 |
+
do_sample=True,
|
| 167 |
+
pad_token_id=self.tokenizer.pad_token_id
|
| 168 |
+
)
|
| 169 |
+
content = self.tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
|
| 170 |
+
|
| 171 |
+
elif self.hf_client:
|
| 172 |
+
# ββ Hugging Face Inference API (Golden Pool) ββββββββββ
|
| 173 |
+
used_model_name = f"hf:{self.hf_model}"
|
| 174 |
+
try:
|
| 175 |
+
content = self.hf_client.text_generation(
|
| 176 |
+
prompt,
|
| 177 |
+
model=self.hf_model,
|
| 178 |
+
max_new_tokens=350,
|
| 179 |
+
temperature=0.3
|
| 180 |
+
)
|
| 181 |
+
if prompt in content:
|
| 182 |
+
content = content.replace(prompt, "").strip()
|
| 183 |
+
except Exception as hf_err:
|
| 184 |
+
print(f"β οΈ HF Inference Error: {hf_err}. Falling back to Groq.")
|
| 185 |
+
|
| 186 |
+
if content is None:
|
| 187 |
+
# ββ Groq API Fallback (Llama-3.3-70B) ββββββββββββββββββ
|
| 188 |
+
used_model_name = f"groq:{self.model}"
|
| 189 |
+
response = None
|
| 190 |
+
for attempt in range(2):
|
| 191 |
+
try:
|
| 192 |
+
response = self.client.chat.completions.create(
|
| 193 |
+
model=self.model,
|
| 194 |
+
messages=[{"role": "user", "content": prompt}],
|
| 195 |
+
temperature=0.3,
|
| 196 |
+
max_tokens=350,
|
| 197 |
+
timeout=20,
|
| 198 |
+
)
|
| 199 |
+
break
|
| 200 |
+
except Exception as e:
|
| 201 |
+
err = str(e)
|
| 202 |
+
if "429" in err and attempt == 0:
|
| 203 |
+
wait_secs = 6.0
|
| 204 |
+
m = re.search(r'try again in (\d+)m([\d.]+)s', err)
|
| 205 |
+
if m: wait_secs = int(m.group(1)) * 60 + float(m.group(2))
|
| 206 |
+
elif re.search(r'try again in ([\d.]+)s', err):
|
| 207 |
+
wait_secs = float(re.search(r'try again in ([\d.]+)s', err).group(1))
|
| 208 |
+
if wait_secs > 3.0:
|
| 209 |
+
result_box[0] = self._fallback_action(f"Rate limited ({wait_secs:.0f}s).", fallback_type)
|
| 210 |
+
return
|
| 211 |
+
_t.sleep(wait_secs)
|
| 212 |
+
else: raise
|
| 213 |
+
|
| 214 |
+
if response:
|
| 215 |
+
content = response.choices[0].message.content.strip()
|
| 216 |
+
|
| 217 |
+
if content:
|
| 218 |
+
# Parse JSON
|
| 219 |
+
if "```json" in content: content = content.split("```json")[-1].split("```")[0].strip()
|
| 220 |
+
elif "```" in content: content = content.split("```")[1].split("```")[0].strip()
|
| 221 |
+
|
| 222 |
+
data = json.loads(content)
|
| 223 |
+
metric_changes = {}
|
| 224 |
+
for k, v in data.get("metric_changes", {}).items():
|
| 225 |
+
norm_key = normalize_metric_path(k)
|
| 226 |
+
if is_valid_metric_path(norm_key):
|
| 227 |
+
try: metric_changes[norm_key] = float(v)
|
| 228 |
+
except (ValueError, TypeError): pass
|
| 229 |
+
|
| 230 |
+
result_box[0] = AgentAction(
|
| 231 |
+
primary=PrimaryAction(
|
| 232 |
+
action_type=data.get("action_type", "rest"),
|
| 233 |
+
target_domain=data.get("target_domain", "mental_wellbeing"),
|
| 234 |
+
metric_changes=metric_changes,
|
| 235 |
+
resource_cost=data.get("resource_cost", {}),
|
| 236 |
+
description=data.get("description", "Taking a moment.")
|
| 237 |
+
),
|
| 238 |
+
communication=CommunicationAction(
|
| 239 |
+
recipient=data.get("recipient"),
|
| 240 |
+
message_type=data.get("message_type") or "none",
|
| 241 |
+
tone=data.get("tone") or "none",
|
| 242 |
+
content=data.get("message_content") or ""
|
| 243 |
+
) if data.get("recipient") and data.get("recipient") != "none" else None,
|
| 244 |
+
reasoning=data.get("reasoning", "Strategic choice."),
|
| 245 |
+
model_used=used_model_name,
|
| 246 |
+
raw_completion=content
|
| 247 |
+
)
|
| 248 |
+
except Exception as e:
|
| 249 |
+
print(f"LLM call error: {e}")
|
| 250 |
+
result_box[0] = self._fallback_action(f"Exception: {e}", fallback_type)
|
| 251 |
+
|
| 252 |
+
t = threading.Thread(target=_call, daemon=True)
|
| 253 |
+
t.start()
|
| 254 |
+
t.join(timeout=25)
|
| 255 |
+
|
| 256 |
+
if result_box[0] is None:
|
| 257 |
+
return self._fallback_action("LLM timed out.", fallback_type)
|
| 258 |
+
return result_box[0]
|
| 259 |
+
|
| 260 |
+
def _fallback_action(self, error_msg: str, fallback_type: str = "rest") -> "AgentAction":
|
| 261 |
+
return AgentAction(
|
| 262 |
+
primary=PrimaryAction(
|
| 263 |
+
action_type=fallback_type, target_domain="mental_wellbeing",
|
| 264 |
+
metric_changes={"mental_wellbeing.stress_level": -5.0},
|
| 265 |
+
resource_cost={},
|
| 266 |
+
description="Short breather to regain composure."
|
| 267 |
+
),
|
| 268 |
+
reasoning=f"FALLBACK: {error_msg}"
|
| 269 |
+
)
|
| 270 |
+
|
| 271 |
+
def store_decision(self, action: AgentAction, reward: float):
|
| 272 |
+
self.memory.append({'action': action.primary.description, 'reward': round(reward, 3)})
|
| 273 |
+
if len(self.memory) > 10: self.memory.pop(0)
|
| 274 |
+
|
| 275 |
+
def main():
|
| 276 |
+
if not os.getenv('GROQ_API_KEY'):
|
| 277 |
+
print("CRITICAL ERROR: GROQ_API_KEY environment variable is not set.")
|
| 278 |
+
return
|
| 279 |
+
agent = LifeStackAgent()
|
| 280 |
+
person = SimPerson(name="Sam (Introvert)", openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9)
|
| 281 |
+
conflict = generate_conflict(difficulty=3)
|
| 282 |
+
metrics = LifeMetrics()
|
| 283 |
+
budget = ResourceBudget()
|
| 284 |
+
print(f"--- GENERATING ACTION FOR: {conflict.title} ---")
|
| 285 |
+
action = agent.get_action(metrics, budget, conflict, person)
|
| 286 |
+
print(f"\nType: {action.primary.action_type} | Reasoning: {action.reasoning}")
|
| 287 |
+
|
| 288 |
+
if __name__ == "__main__":
|
| 289 |
+
main()
|
agent/conflict_generator.py
ADDED
|
@@ -0,0 +1,620 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
import random
|
| 3 |
+
from dataclasses import dataclass, field, asdict
|
| 4 |
+
|
| 5 |
+
@dataclass
|
| 6 |
+
class ConflictEvent:
|
| 7 |
+
id: str
|
| 8 |
+
title: str
|
| 9 |
+
story: str
|
| 10 |
+
primary_disruption: dict
|
| 11 |
+
decisions_required: list[str]
|
| 12 |
+
resource_budget: dict
|
| 13 |
+
difficulty: int
|
| 14 |
+
|
| 15 |
+
TEMPLATES = [
|
| 16 |
+
# DIFFICULTY 1
|
| 17 |
+
ConflictEvent(
|
| 18 |
+
id="d1_gym",
|
| 19 |
+
title="The Slump",
|
| 20 |
+
story="You haven't seen the inside of a gym in ten days. Your energy is flagging and your favorite jeans feel tight.",
|
| 21 |
+
primary_disruption={"physical_health.fitness": -15.0},
|
| 22 |
+
decisions_required=["Wake up early for a run", "Join a weekend boot camp", "Ignore it and rest"],
|
| 23 |
+
resource_budget={"time": 4.0, "money": 0.0, "energy": 20.0},
|
| 24 |
+
difficulty=1
|
| 25 |
+
),
|
| 26 |
+
ConflictEvent(
|
| 27 |
+
id="d1_bill",
|
| 28 |
+
title="Forgotten Invoice",
|
| 29 |
+
story="A late notice arrived for your electricity bill. It's not a lot, but the late fee is annoying.",
|
| 30 |
+
primary_disruption={"finances.liquidity": -20.0},
|
| 31 |
+
decisions_required=["Pay it now", "Call to dispute the fee", "Set up autopay for next time"],
|
| 32 |
+
resource_budget={"time": 1.0, "money": 100.0, "energy": 5.0},
|
| 33 |
+
difficulty=1
|
| 34 |
+
),
|
| 35 |
+
ConflictEvent(
|
| 36 |
+
id="d1_argument",
|
| 37 |
+
title="Heated Group Chat",
|
| 38 |
+
story="A minor political disagreement in the group chat turned personal. Everyone is being quiet now.",
|
| 39 |
+
primary_disruption={"relationships.social": -20.0},
|
| 40 |
+
decisions_required=["Apologize to the group", "Message the friend privately", "Mute the chat for a week"],
|
| 41 |
+
resource_budget={"time": 2.0, "money": 30.0, "energy": 15.0},
|
| 42 |
+
difficulty=1
|
| 43 |
+
),
|
| 44 |
+
|
| 45 |
+
# DIFFICULTY 2
|
| 46 |
+
ConflictEvent(
|
| 47 |
+
id="d2_project",
|
| 48 |
+
title="The Surge",
|
| 49 |
+
story="Your boss just walked by and dropped a 'small favor' on your desk. It looks like it'll take ten hours.",
|
| 50 |
+
primary_disruption={"career.workload": 25.0, "time.free_hours_per_week": -20.0},
|
| 51 |
+
decisions_required=["Work late all week", "Delegate parts to a junior", "Refuse the assignment"],
|
| 52 |
+
resource_budget={"time": 10.0, "money": 0.0, "energy": 40.0},
|
| 53 |
+
difficulty=2
|
| 54 |
+
),
|
| 55 |
+
ConflictEvent(
|
| 56 |
+
id="d2_car",
|
| 57 |
+
title="Check Engine Light",
|
| 58 |
+
story="Your car started making a rhythmic thumping sound on the highway. The mechanic says the repair isn't cheap.",
|
| 59 |
+
primary_disruption={"finances.liquidity": -30.0, "time.commute_burden": 25.0},
|
| 60 |
+
decisions_required=["Repair it immediately", "Take the bus for a week", "Borrow a car from a friend"],
|
| 61 |
+
resource_budget={"time": 5.0, "money": 500.0, "energy": 10.0},
|
| 62 |
+
difficulty=2
|
| 63 |
+
),
|
| 64 |
+
ConflictEvent(
|
| 65 |
+
id="d2_neglect",
|
| 66 |
+
title="Cold Dinner",
|
| 67 |
+
story="Your partner mentions they feel like 'roommates' lately. You realize you haven't had a real conversation in weeks.",
|
| 68 |
+
primary_disruption={"relationships.romantic": -25.0, "mental_wellbeing.stress_level": 20.0},
|
| 69 |
+
decisions_required=["Plan a surprise date", "Have a long talk tonight", "Buy a thoughtful gift"],
|
| 70 |
+
resource_budget={"time": 6.0, "money": 150.0, "energy": 30.0},
|
| 71 |
+
difficulty=2
|
| 72 |
+
),
|
| 73 |
+
|
| 74 |
+
# DIFFICULTY 3
|
| 75 |
+
ConflictEvent(
|
| 76 |
+
id="d3_interview",
|
| 77 |
+
title="The Opportunity",
|
| 78 |
+
story="An old contact reached out for a dream job interview. You need to prep while keeping your current job afloat.",
|
| 79 |
+
primary_disruption={"career.workload": 20.0, "time.free_hours_per_week": -15.0, "mental_wellbeing.stress_level": 20.0},
|
| 80 |
+
decisions_required=["Intensive weekend prep", "Fake a sick day to interview", "Turn it down to stay stable"],
|
| 81 |
+
resource_budget={"time": 12.0, "money": 50.0, "energy": 50.0},
|
| 82 |
+
difficulty=3
|
| 83 |
+
),
|
| 84 |
+
ConflictEvent(
|
| 85 |
+
id="d3_family",
|
| 86 |
+
title="Family SOS",
|
| 87 |
+
story="Your sibling is going through a rough patch and needs help moving out and some financial support.",
|
| 88 |
+
primary_disruption={"relationships.family": 20.0, "time.free_hours_per_week": -25.0, "finances.liquidity": -20.0},
|
| 89 |
+
decisions_required=["Spend the weekend helping", "Send them money but stay home", "Help them find other movers"],
|
| 90 |
+
resource_budget={"time": 15.0, "money": 400.0, "energy": 60.0},
|
| 91 |
+
difficulty=3
|
| 92 |
+
),
|
| 93 |
+
ConflictEvent(
|
| 94 |
+
id="d3_health",
|
| 95 |
+
title="The Warning Sign",
|
| 96 |
+
story="You had a fainting spell at the office. Tests are expensive, and doctors say you need immediate change.",
|
| 97 |
+
primary_disruption={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0, "finances.liquidity": -40.0},
|
| 98 |
+
decisions_required=["Take a week of medical leave", "Consult a high-end specialist", "Change diet and sleep habits"],
|
| 99 |
+
resource_budget={"time": 20.0, "money": 800.0, "energy": 5.0},
|
| 100 |
+
difficulty=3
|
| 101 |
+
),
|
| 102 |
+
|
| 103 |
+
# DIFFICULTY 4
|
| 104 |
+
ConflictEvent(
|
| 105 |
+
id="d4_review",
|
| 106 |
+
title="Judgment Day",
|
| 107 |
+
story="A major performance review is in three days. Rumors of layoffs are circulating and the atmosphere is tense.",
|
| 108 |
+
primary_disruption={"career.workload": 30.0, "mental_wellbeing.stress_level": 25.0, "relationships.romantic": -15.0, "time.free_hours_per_week": -20.0},
|
| 109 |
+
decisions_required=["Pull all-nighters to prove worth", "Start networking for new roles", "Draft a defensive report"],
|
| 110 |
+
resource_budget={"time": 18.0, "money": 0.0, "energy": 80.0},
|
| 111 |
+
difficulty=4
|
| 112 |
+
),
|
| 113 |
+
ConflictEvent(
|
| 114 |
+
id="d4_move",
|
| 115 |
+
title="The Big Relocation",
|
| 116 |
+
story="You've decided to move across the country for growth. The logistics are a nightmare and friends are sad to see you go.",
|
| 117 |
+
primary_disruption={"finances.liquidity": -50.0, "relationships.social": -30.0, "career.growth_trajectory": 20.0, "time.admin_overhead": 30.0},
|
| 118 |
+
decisions_required=["Hire full-service movers", "Host a series of farewell dinners", "DIY pack everything"],
|
| 119 |
+
resource_budget={"time": 30.0, "money": 1500.0, "energy": 100.0},
|
| 120 |
+
difficulty=4
|
| 121 |
+
),
|
| 122 |
+
ConflictEvent(
|
| 123 |
+
id="d4_audit",
|
| 124 |
+
title="Tax Audit",
|
| 125 |
+
story="The IRS has flagged your last three years of returns. You need to dig through thousands of documents while paying a CPA.",
|
| 126 |
+
primary_disruption={"finances.long_term_health": -20.0, "mental_wellbeing.stress_level": 30.0, "time.admin_overhead": 40.0, "finances.liquidity": -15.0},
|
| 127 |
+
decisions_required=["Spend nights scanning receipts", "Hire a tax lawyer", "Try to settle immediately"],
|
| 128 |
+
resource_budget={"time": 25.0, "money": 1000.0, "energy": 40.0},
|
| 129 |
+
difficulty=4
|
| 130 |
+
),
|
| 131 |
+
|
| 132 |
+
# DIFFICULTY 5
|
| 133 |
+
ConflictEvent(
|
| 134 |
+
id="d5_friday",
|
| 135 |
+
title="Friday 6PM",
|
| 136 |
+
story="Your flight just got cancelled. Your card declined trying to rebook. Your boss moved Monday deadline to Sunday.",
|
| 137 |
+
primary_disruption={"career.workload": 35.0, "finances.liquidity": -40.0, "mental_wellbeing.stress_level": 30.0, "time.free_hours_per_week": -25.0},
|
| 138 |
+
decisions_required=["Book a bus and work on it", "Call boss to negotiate", "Crash at a nearby friend's"],
|
| 139 |
+
resource_budget={"time": 10.0, "money": 500.0, "energy": 60.0},
|
| 140 |
+
difficulty=5
|
| 141 |
+
),
|
| 142 |
+
ConflictEvent(
|
| 143 |
+
id="d5_storm",
|
| 144 |
+
title="The Perfect Storm",
|
| 145 |
+
story="Your firm lost its biggest client, your partner moved out, and your car got towedβall on the same Tuesday.",
|
| 146 |
+
primary_disruption={"career.stability": -30.0, "relationships.romantic": -25.0, "finances.debt_pressure": 35.0, "physical_health.energy": -25.0},
|
| 147 |
+
decisions_required=["Find an emergency side hustle", "Beg partner for a second chance", "Take a mental health day"],
|
| 148 |
+
resource_budget={"time": 8.0, "money": 200.0, "energy": 20.0},
|
| 149 |
+
difficulty=5
|
| 150 |
+
),
|
| 151 |
+
ConflictEvent(
|
| 152 |
+
id="d5_burnout",
|
| 153 |
+
title="The Total Collapse",
|
| 154 |
+
story="You can't get out of bed. Your body has quit, your motivation is gone, and work emails are piling into the hundreds.",
|
| 155 |
+
primary_disruption={"mental_wellbeing.motivation": -40.0, "physical_health.sleep_quality": -30.0, "career.satisfaction": -35.0, "relationships.family": -20.0},
|
| 156 |
+
decisions_required=["Request indefinite medical leave", "Disconnect all electronics", "Let it all burn and sleep"],
|
| 157 |
+
resource_budget={"time": 40.0, "money": 2000.0, "energy": 0.0},
|
| 158 |
+
difficulty=5
|
| 159 |
+
),
|
| 160 |
+
|
| 161 |
+
# ββ TRANSPORT SCENARIOS (difficulty 1β5, all modes) ββββββββββββββββββ
|
| 162 |
+
ConflictEvent(
|
| 163 |
+
id="d1_flat_tyre",
|
| 164 |
+
title="Flat Tyre",
|
| 165 |
+
story="Your bike tyre went flat halfway to work. You're going to be late to a team standup.",
|
| 166 |
+
primary_disruption={"time.commute_burden": 20.0, "mental_wellbeing.stress_level": 10.0},
|
| 167 |
+
decisions_required=["Call a cab", "Lock the bike and walk", "Ask to dial into the standup"],
|
| 168 |
+
resource_budget={"time": 2.0, "money": 30.0, "energy": 15.0},
|
| 169 |
+
difficulty=1
|
| 170 |
+
),
|
| 171 |
+
ConflictEvent(
|
| 172 |
+
id="d2_train_delay",
|
| 173 |
+
title="Train Delay",
|
| 174 |
+
story="Your morning train is delayed 90 minutes due to a signal failure. You have a 9 AM client meeting.",
|
| 175 |
+
primary_disruption={"time.commute_burden": 30.0, "career.workload": 15.0, "mental_wellbeing.stress_level": 15.0},
|
| 176 |
+
decisions_required=["Dial in remotely", "Take a rideshare", "Reschedule the meeting"],
|
| 177 |
+
resource_budget={"time": 3.0, "money": 80.0, "energy": 20.0},
|
| 178 |
+
difficulty=2
|
| 179 |
+
),
|
| 180 |
+
ConflictEvent(
|
| 181 |
+
id="d3_car_breakdown",
|
| 182 |
+
title="Breakdown on the Highway",
|
| 183 |
+
story="Your car engine seized on the freeway during rush hour. Tow + rental = $400 minimum.",
|
| 184 |
+
primary_disruption={"finances.liquidity": -35.0, "time.commute_burden": 40.0, "mental_wellbeing.stress_level": 20.0},
|
| 185 |
+
decisions_required=["Rent a replacement car", "Rideshare all week", "Borrow from a friend"],
|
| 186 |
+
resource_budget={"time": 6.0, "money": 500.0, "energy": 30.0},
|
| 187 |
+
difficulty=3
|
| 188 |
+
),
|
| 189 |
+
ConflictEvent(
|
| 190 |
+
id="d4_rideshare_surge",
|
| 191 |
+
title="Surge Pricing Nightmare",
|
| 192 |
+
story="A major event cancelled all transit. Rideshares are 9x surge. You're presenting in 2 hours.",
|
| 193 |
+
primary_disruption={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0, "time.free_hours_per_week": -10.0},
|
| 194 |
+
decisions_required=["Pay the surge", "Organise a carpool", "Present remotely"],
|
| 195 |
+
resource_budget={"time": 4.0, "money": 200.0, "energy": 40.0},
|
| 196 |
+
difficulty=4
|
| 197 |
+
),
|
| 198 |
+
ConflictEvent(
|
| 199 |
+
id="d5_transit_strike",
|
| 200 |
+
title="City-Wide Transit Strike",
|
| 201 |
+
story="All buses, trains, and rideshares are on indefinite strike. Your car is in the shop.",
|
| 202 |
+
primary_disruption={"time.commute_burden": 50.0, "finances.liquidity": -30.0, "career.workload": 20.0, "mental_wellbeing.stress_level": 25.0},
|
| 203 |
+
decisions_required=["Negotiate remote work for the week", "Rent an e-bike/scooter", "Crash at a colleague's place"],
|
| 204 |
+
resource_budget={"time": 15.0, "money": 400.0, "energy": 50.0},
|
| 205 |
+
difficulty=5
|
| 206 |
+
),
|
| 207 |
+
]
|
| 208 |
+
|
| 209 |
+
def generate_conflict(difficulty: int = None) -> ConflictEvent:
|
| 210 |
+
if difficulty:
|
| 211 |
+
pool = [t for t in TEMPLATES if t.difficulty == difficulty]
|
| 212 |
+
else:
|
| 213 |
+
pool = TEMPLATES
|
| 214 |
+
return random.choice(pool)
|
| 215 |
+
|
| 216 |
+
def escalate_conflict(conflict: ConflictEvent) -> ConflictEvent:
|
| 217 |
+
new_disruption = {k: v * 1.4 for k, v in conflict.primary_disruption.items()}
|
| 218 |
+
new_budget = {k: v * 0.7 for k, v in conflict.resource_budget.items()}
|
| 219 |
+
new_difficulty = min(5, conflict.difficulty + 1)
|
| 220 |
+
|
| 221 |
+
return ConflictEvent(
|
| 222 |
+
id=f"{conflict.id}_escalated",
|
| 223 |
+
title=f"ESCALATED: {conflict.title}",
|
| 224 |
+
story=f"Current situation just got much worse. {conflict.story}",
|
| 225 |
+
primary_disruption=new_disruption,
|
| 226 |
+
decisions_required=conflict.decisions_required,
|
| 227 |
+
resource_budget=new_budget,
|
| 228 |
+
difficulty=new_difficulty
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
def adaptive_escalate(conflict: ConflictEvent, agent_history: list) -> tuple:
|
| 232 |
+
"""Decide whether to escalate, ease, or hold based on past performance.
|
| 233 |
+
|
| 234 |
+
Args:
|
| 235 |
+
conflict: Current conflict event.
|
| 236 |
+
agent_history: List of (conflict_id, reward) tuples from past episodes.
|
| 237 |
+
|
| 238 |
+
Returns:
|
| 239 |
+
(new_conflict, reason): Updated conflict and a human-readable reason string.
|
| 240 |
+
"""
|
| 241 |
+
# Group history by conflict id prefix (strip _escalated suffix)
|
| 242 |
+
from collections import defaultdict
|
| 243 |
+
by_type = defaultdict(list)
|
| 244 |
+
for cid, reward in agent_history:
|
| 245 |
+
base_id = cid.replace("_escalated", "")
|
| 246 |
+
by_type[base_id].append(reward)
|
| 247 |
+
|
| 248 |
+
base_id = conflict.id.replace("_escalated", "")
|
| 249 |
+
past = by_type.get(base_id, [])
|
| 250 |
+
|
| 251 |
+
if len(past) >= 3:
|
| 252 |
+
avg = sum(past) / len(past)
|
| 253 |
+
if avg > 0.7:
|
| 254 |
+
# Agent is crushing this type β escalate
|
| 255 |
+
escalated = escalate_conflict(conflict)
|
| 256 |
+
return escalated, f"Agent averaged {avg:.2f} on {base_id} ({len(past)} runs) β escalating"
|
| 257 |
+
elif avg < 0.4:
|
| 258 |
+
# Agent is struggling β reduce difficulty
|
| 259 |
+
new_diff = max(1, conflict.difficulty - 1)
|
| 260 |
+
eased = generate_conflict(difficulty=new_diff)
|
| 261 |
+
return eased, f"Agent averaged {avg:.2f} on {base_id} ({len(past)} runs) β easing to difficulty {new_diff}"
|
| 262 |
+
|
| 263 |
+
# Not enough history β no change
|
| 264 |
+
return conflict, "insufficient history β holding"
|
| 265 |
+
|
| 266 |
+
def save_templates():
|
| 267 |
+
import os
|
| 268 |
+
data_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "data", "conflicts.json")
|
| 269 |
+
with open(data_path, 'w') as f:
|
| 270 |
+
json.dump([asdict(t) for t in TEMPLATES], f, indent=4)
|
| 271 |
+
print(f"Saved 15 templates to {data_path}")
|
| 272 |
+
|
| 273 |
+
def main():
|
| 274 |
+
save_templates()
|
| 275 |
+
print("\n--- GENERATED CONFLICT SAMPLES ---")
|
| 276 |
+
for d in range(1, 6):
|
| 277 |
+
c = generate_conflict(d)
|
| 278 |
+
print(f"\n[DIFFICULTY {d}] {c.title}")
|
| 279 |
+
print(f"Story: {c.story}")
|
| 280 |
+
print(f"Primary Disruption: {c.primary_disruption}")
|
| 281 |
+
print(f"Resource Budget: {c.resource_budget}")
|
| 282 |
+
|
| 283 |
+
if __name__ == "__main__":
|
| 284 |
+
main()
|
| 285 |
+
|
| 286 |
+
from core.task import Task, Route, ExoEvent, Milestone
|
| 287 |
+
|
| 288 |
+
class TaskGenerator:
|
| 289 |
+
def generate(self, domain: str = None, difficulty: int = None) -> Task:
|
| 290 |
+
diff = difficulty or 3
|
| 291 |
+
if domain == "transport_crisis":
|
| 292 |
+
return self.generate_transport_crisis(diff)
|
| 293 |
+
elif domain == "flight_crisis": # kept as explicit sub-type
|
| 294 |
+
return self.generate_flight_crisis(diff)
|
| 295 |
+
elif domain == "code_merge_crisis":
|
| 296 |
+
return self.generate_code_merge_crisis(diff)
|
| 297 |
+
elif domain == "career":
|
| 298 |
+
return self.generate_career(diff)
|
| 299 |
+
elif domain == "finances":
|
| 300 |
+
return self.generate_finances(diff)
|
| 301 |
+
elif domain == "relationships":
|
| 302 |
+
return self.generate_relationships(diff)
|
| 303 |
+
elif domain == "physical_health":
|
| 304 |
+
return self.generate_physical_health(diff)
|
| 305 |
+
elif domain == "mental_wellbeing":
|
| 306 |
+
return self.generate_mental_wellbeing(diff)
|
| 307 |
+
elif domain == "time":
|
| 308 |
+
return self.generate_time(diff)
|
| 309 |
+
else:
|
| 310 |
+
return self.generate_transport_crisis(diff)
|
| 311 |
+
|
| 312 |
+
# ββ TRANSPORT CRISIS: master dispatcher ββββββββββββββββββββββββββββββ
|
| 313 |
+
def generate_transport_crisis(self, difficulty: int) -> Task:
|
| 314 |
+
"""Randomly choose one of 5 real-world transport disruption modes."""
|
| 315 |
+
return random.choice([
|
| 316 |
+
self.generate_flight_crisis,
|
| 317 |
+
self.generate_train_delay,
|
| 318 |
+
self.generate_car_breakdown,
|
| 319 |
+
self.generate_rideshare_surge,
|
| 320 |
+
self.generate_transit_strike,
|
| 321 |
+
])(difficulty)
|
| 322 |
+
|
| 323 |
+
def generate_train_delay(self, difficulty: int) -> Task:
|
| 324 |
+
routes = [
|
| 325 |
+
Route(id="dial_in", name="Dial In Remotely", description="Join the meeting via video call from the station.", required_action_types=["communicate"], preconditions={}, consequences={"meeting_attended": True}, closes_routes=["rideshare"], milestones_unlocked=["m1"], final_reward=2.0),
|
| 326 |
+
Route(id="rideshare", name="Take a Rideshare", description="Pay for a cab/rideshare and make it there in time.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"arrived_on_time": True}, closes_routes=["dial_in"], milestones_unlocked=["m2"], final_reward=2.5),
|
| 327 |
+
Route(id="reschedule", name="Reschedule the Meeting", description="Negotiate a new meeting time with all parties.", required_action_types=["communicate"], preconditions={}, consequences={"meeting_rescheduled": True}, closes_routes=[], milestones_unlocked=["m3"], final_reward=1.5),
|
| 328 |
+
]
|
| 329 |
+
milestones = [
|
| 330 |
+
Milestone(id="m1", description="Meeting attended on time remotely.", condition_key="meeting_attended", condition_value=True, reward=1.0),
|
| 331 |
+
Milestone(id="m2", description="Made it to the office despite the delay.", condition_key="arrived_on_time", condition_value=True, reward=1.5),
|
| 332 |
+
Milestone(id="m3", description="Meeting rescheduled without relationship cost.", condition_key="meeting_rescheduled", condition_value=True, reward=0.8),
|
| 333 |
+
]
|
| 334 |
+
events = [
|
| 335 |
+
ExoEvent(step=2, probability=0.8, id="delay_extended", description="Train delay extended by another 45 minutes.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 336 |
+
ExoEvent(step=4, probability=0.6, id="rideshare_surge", description="Rideshares now showing 3x surge pricing.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 337 |
+
]
|
| 338 |
+
return Task(
|
| 339 |
+
id="train_delay_task", domain="transport_crisis", goal="Navigate Train Delay Crisis",
|
| 340 |
+
constraints={"budget_max": 150, "deadline_step": 8},
|
| 341 |
+
hidden_state={"platform_reassigned": False},
|
| 342 |
+
mutable_world={"time.commute_burden": 30.0, "mental_wellbeing.stress_level": 15.0},
|
| 343 |
+
visible_world={"time.commute_burden": 30.0, "mental_wellbeing.stress_level": 15.0},
|
| 344 |
+
success_conditions=[{"key": "meeting_attended", "value": True}, {"key": "arrived_on_time", "value": True}, {"key": "meeting_rescheduled", "value": True}],
|
| 345 |
+
failure_conditions=[{"key": "finances.liquidity", "value": 10.0, "op": "lt"}],
|
| 346 |
+
event_schedule=events, viable_routes=routes, milestones=milestones,
|
| 347 |
+
horizon=12 + difficulty * 2, difficulty=difficulty,
|
| 348 |
+
domain_metadata={"story": "Signal failure has brought the entire line to a halt.", "transport_mode": "train"}
|
| 349 |
+
)
|
| 350 |
+
|
| 351 |
+
def generate_car_breakdown(self, difficulty: int) -> Task:
|
| 352 |
+
routes = [
|
| 353 |
+
Route(id="rent_car", name="Rent a Replacement Car", description="Call a rental agency and get mobile again.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"mobile": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.5),
|
| 354 |
+
Route(id="rideshare_week", name="Rideshare for the Week", description="Use rideshares until the car is repaired.", required_action_types=["spend"], preconditions={}, consequences={"transport_sorted": True}, closes_routes=["rent_car"], milestones_unlocked=["m2"], final_reward=1.5),
|
| 355 |
+
Route(id="borrow_car", name="Borrow a Friend's Car", description="Call around and borrow a vehicle.", required_action_types=["communicate"], preconditions={}, consequences={"borrowed": True}, closes_routes=[], milestones_unlocked=["m3"], final_reward=2.0),
|
| 356 |
+
]
|
| 357 |
+
milestones = [
|
| 358 |
+
Milestone(id="m1", description="Replacement vehicle secured.", condition_key="mobile", condition_value=True, reward=1.5),
|
| 359 |
+
Milestone(id="m2", description="Transport plan for the week sorted.", condition_key="transport_sorted", condition_value=True, reward=1.0),
|
| 360 |
+
Milestone(id="m3", description="Vehicle borrowed without relationship cost.", condition_key="borrowed", condition_value=True, reward=1.2),
|
| 361 |
+
]
|
| 362 |
+
events = [
|
| 363 |
+
ExoEvent(step=2, probability=1.0, id="repair_estimate", description="Mechanic confirms repair takes 3β5 days, not 1.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 364 |
+
ExoEvent(step=5, probability=0.7, id="rental_shortage", description="Rental agencies report no compact cars available.", world_mutation={}, hidden_state_mutation={}, closes_routes=["rent_car"]),
|
| 365 |
+
]
|
| 366 |
+
return Task(
|
| 367 |
+
id="car_breakdown_task", domain="transport_crisis", goal="Recover from Car Breakdown",
|
| 368 |
+
constraints={"budget_max": 500, "deadline_step": 10},
|
| 369 |
+
hidden_state={"tow_dispatched": False},
|
| 370 |
+
mutable_world={"finances.liquidity": -35.0, "time.commute_burden": 40.0},
|
| 371 |
+
visible_world={"finances.liquidity": -35.0, "time.commute_burden": 40.0},
|
| 372 |
+
success_conditions=[{"key": "mobile", "value": True}, {"key": "transport_sorted", "value": True}, {"key": "borrowed", "value": True}],
|
| 373 |
+
failure_conditions=[{"key": "finances.liquidity", "value": 0.0, "op": "le"}],
|
| 374 |
+
event_schedule=events, viable_routes=routes, milestones=milestones,
|
| 375 |
+
horizon=14 + difficulty * 2, difficulty=difficulty,
|
| 376 |
+
domain_metadata={"story": "Engine seized on the highway. Car is in the shop for days.", "transport_mode": "car"}
|
| 377 |
+
)
|
| 378 |
+
|
| 379 |
+
def generate_rideshare_surge(self, difficulty: int) -> Task:
|
| 380 |
+
routes = [
|
| 381 |
+
Route(id="pay_surge", name="Pay the Surge Price", description="Absorb the cost and get there on time.", required_action_types=["spend"], preconditions={}, consequences={"arrived": True}, closes_routes=["remote"], milestones_unlocked=["m1"], final_reward=2.0),
|
| 382 |
+
Route(id="carpool", name="Organise a Carpool", description="Find colleagues or strangers going the same way.", required_action_types=["communicate", "negotiate"], preconditions={}, consequences={"carpooled": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=3.0),
|
| 383 |
+
Route(id="remote", name="Present Remotely", description="Negotiate to dial in instead of attending in person.", required_action_types=["communicate"], preconditions={}, consequences={"remote_approved": True}, closes_routes=["pay_surge"], milestones_unlocked=["m3"], final_reward=1.5),
|
| 384 |
+
]
|
| 385 |
+
milestones = [
|
| 386 |
+
Milestone(id="m1", description="Arrived at venue on time.", condition_key="arrived", condition_value=True, reward=1.5),
|
| 387 |
+
Milestone(id="m2", description="Carpool arranged β zero cost.", condition_key="carpooled", condition_value=True, reward=2.0),
|
| 388 |
+
Milestone(id="m3", description="Remote attendance approved.", condition_key="remote_approved", condition_value=True, reward=1.0),
|
| 389 |
+
]
|
| 390 |
+
events = [
|
| 391 |
+
ExoEvent(step=1, probability=1.0, id="surge_spike", description="Surge jumped to 12x. All buses cancelled.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 392 |
+
ExoEvent(step=3, probability=0.9, id="meeting_reminder", description="Organiser sends a 30-minute warning.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 393 |
+
]
|
| 394 |
+
return Task(
|
| 395 |
+
id="rideshare_surge_task", domain="transport_crisis", goal="Get to the Presentation on Time",
|
| 396 |
+
constraints={"budget_max": 200, "deadline_step": 6},
|
| 397 |
+
hidden_state={},
|
| 398 |
+
mutable_world={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0},
|
| 399 |
+
visible_world={"finances.liquidity": -50.0, "mental_wellbeing.stress_level": 30.0},
|
| 400 |
+
success_conditions=[{"key": "arrived", "value": True}, {"key": "carpooled", "value": True}, {"key": "remote_approved", "value": True}],
|
| 401 |
+
failure_conditions=[],
|
| 402 |
+
event_schedule=events, viable_routes=routes, milestones=milestones,
|
| 403 |
+
horizon=8 + difficulty * 2, difficulty=difficulty,
|
| 404 |
+
domain_metadata={"story": "A major city event caused city-wide rideshare surge on your big presentation day.", "transport_mode": "rideshare"}
|
| 405 |
+
)
|
| 406 |
+
|
| 407 |
+
def generate_transit_strike(self, difficulty: int) -> Task:
|
| 408 |
+
routes = [
|
| 409 |
+
Route(id="wfh_negotiate", name="Negotiate Full Remote Week", description="Get manager approval to WFH for the strike duration.", required_action_types=["communicate", "negotiate"], preconditions={}, consequences={"wfh_approved": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=3.0),
|
| 410 |
+
Route(id="micromobility", name="Rent E-Bike / Scooter", description="Use micro-mobility for the week.", required_action_types=["spend"], preconditions={}, consequences={"transport_secured": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
|
| 411 |
+
Route(id="colleague_crash",name="Crash at a Colleague's Place", description="Stay near the office temporarily.", required_action_types=["communicate"], preconditions={}, consequences={"accommodation_sorted": True}, closes_routes=[], milestones_unlocked=["m3"], final_reward=1.5),
|
| 412 |
+
]
|
| 413 |
+
milestones = [
|
| 414 |
+
Milestone(id="m1", description="WFH approved for the strike period.", condition_key="wfh_approved", condition_value=True, reward=2.0),
|
| 415 |
+
Milestone(id="m2", description="Micro-mobility solution in place.", condition_key="transport_secured", condition_value=True, reward=1.0),
|
| 416 |
+
Milestone(id="m3", description="Temporary accommodation sorted.", condition_key="accommodation_sorted",condition_value=True, reward=0.8),
|
| 417 |
+
]
|
| 418 |
+
events = [
|
| 419 |
+
ExoEvent(step=2, probability=0.9, id="strike_extended", description="Union announces the strike could last 2 weeks.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 420 |
+
ExoEvent(step=5, probability=0.7, id="scooter_shortage", description="E-bike rental companies sold out in your area.", world_mutation={}, hidden_state_mutation={}, closes_routes=["micromobility"]),
|
| 421 |
+
]
|
| 422 |
+
return Task(
|
| 423 |
+
id="transit_strike_task", domain="transport_crisis", goal="Survive City-Wide Transit Strike",
|
| 424 |
+
constraints={"budget_max": 400, "deadline_step": 14},
|
| 425 |
+
hidden_state={},
|
| 426 |
+
mutable_world={"time.commute_burden": 50.0, "mental_wellbeing.stress_level": 25.0},
|
| 427 |
+
visible_world={"time.commute_burden": 50.0, "mental_wellbeing.stress_level": 25.0},
|
| 428 |
+
success_conditions=[{"key": "wfh_approved", "value": True}, {"key": "transport_secured", "value": True}, {"key": "accommodation_sorted", "value": True}],
|
| 429 |
+
failure_conditions=[],
|
| 430 |
+
event_schedule=events, viable_routes=routes, milestones=milestones,
|
| 431 |
+
horizon=18 + difficulty * 2, difficulty=difficulty,
|
| 432 |
+
domain_metadata={"story": "All public transport workers walked off the job. The city is gridlocked.", "transport_mode": "transit_strike"}
|
| 433 |
+
)
|
| 434 |
+
|
| 435 |
+
def generate_flight_crisis(self, difficulty: int) -> Task:
|
| 436 |
+
routes = [
|
| 437 |
+
Route(id="rebook_premium", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], preconditions={}, consequences={"flight_rebooked": True}, closes_routes=["wait_lounge"], milestones_unlocked=["m1"], final_reward=2.5),
|
| 438 |
+
Route(id="wait_lounge", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], preconditions={}, consequences={"caught_up": True}, closes_routes=["rebook_premium"], milestones_unlocked=["m2"], final_reward=1.8),
|
| 439 |
+
]
|
| 440 |
+
milestones = [
|
| 441 |
+
Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="flight_rebooked", condition_value=True, reward=1.0),
|
| 442 |
+
Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="caught_up", condition_value=True, reward=0.8),
|
| 443 |
+
]
|
| 444 |
+
events = [
|
| 445 |
+
ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 446 |
+
ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={}, hidden_state_mutation={}, closes_routes=["wait_lounge"]),
|
| 447 |
+
]
|
| 448 |
+
return Task(
|
| 449 |
+
id="flight_crisis_task", domain="flight_crisis", goal="Survive Airport Cancellation",
|
| 450 |
+
constraints={"budget_max": 800, "deadline_step": 10},
|
| 451 |
+
hidden_state={"lounge_capacity": 100},
|
| 452 |
+
mutable_world={"mental_wellbeing.stress_level": 25.0, "time.free_hours_per_week": -10.0},
|
| 453 |
+
visible_world={"mental_wellbeing.stress_level": 25.0, "time.free_hours_per_week": -10.0},
|
| 454 |
+
success_conditions=[{"key": "flight_rebooked", "value": True}, {"key": "caught_up", "value": True}],
|
| 455 |
+
failure_conditions=[],
|
| 456 |
+
event_schedule=events, viable_routes=routes, milestones=milestones,
|
| 457 |
+
horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "A major storm grounded commercial flights."}
|
| 458 |
+
)
|
| 459 |
+
|
| 460 |
+
def generate_code_merge_crisis(self, difficulty: int) -> Task:
|
| 461 |
+
routes = [
|
| 462 |
+
Route(id="revert_commit", name="Revert Commit", description="Quickly revert the broken merge to unblock the team.", required_action_types=["delegate", "communicate"], preconditions={}, consequences={"pipeline_unblocked": True}, closes_routes=["hotfix"], milestones_unlocked=["unblocked"], final_reward=1.5),
|
| 463 |
+
Route(id="hotfix", name="Patch Forward", description="Find the logic error and push a hotfix.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"bug_resolved": True}, closes_routes=["revert_commit"], milestones_unlocked=["fixed"], final_reward=3.0),
|
| 464 |
+
]
|
| 465 |
+
milestones = [
|
| 466 |
+
Milestone(id="unblocked", description="CI pipeline is green again", condition_key="pipeline_unblocked", condition_value=True, reward=1.0),
|
| 467 |
+
Milestone(id="fixed", description="Bug resolved without losing features", condition_key="bug_resolved", condition_value=True, reward=2.0),
|
| 468 |
+
]
|
| 469 |
+
events = [
|
| 470 |
+
ExoEvent(step=3, probability=0.8, id="cto_ping", description="CTO asks for an ETA on the fix.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 471 |
+
]
|
| 472 |
+
return Task(
|
| 473 |
+
id="code_merge_task", domain="code_merge_crisis", goal="Resolve Production Outage",
|
| 474 |
+
constraints={"budget_max": 1000, "deadline_step": 8},
|
| 475 |
+
hidden_state={},
|
| 476 |
+
mutable_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
|
| 477 |
+
visible_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
|
| 478 |
+
success_conditions=[{"key": "pipeline_unblocked", "value": True}, {"key": "bug_resolved", "value": True}],
|
| 479 |
+
failure_conditions=[],
|
| 480 |
+
event_schedule=events, viable_routes=routes, milestones=milestones,
|
| 481 |
+
horizon=10 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "A botched merge just took down the staging environment."}
|
| 482 |
+
)
|
| 483 |
+
|
| 484 |
+
def generate_career(self, difficulty: int) -> Task:
|
| 485 |
+
routes = [
|
| 486 |
+
Route(id="r1", name="Negotiate Workload", description="Discuss with manager to reduce workload.", required_action_types=["communicate"], preconditions={}, consequences={"workload_reduced": True}, closes_routes=["r2"], milestones_unlocked=["m1"], final_reward=2.0),
|
| 487 |
+
Route(id="r2", name="Find New Job", description="Start applying for new roles.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"job_found": True}, closes_routes=["r1", "r3"], milestones_unlocked=["m2"], final_reward=3.0),
|
| 488 |
+
Route(id="r3", name="Delegate to Team", description="Push tasks to junior colleagues.", required_action_types=["delegate"], preconditions={}, consequences={"team_delegated": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=1.5),
|
| 489 |
+
]
|
| 490 |
+
milestones = [
|
| 491 |
+
Milestone(id="m1", description="Manager agreed to reduce tasks.", condition_key="workload_reduced", condition_value=True, reward=1.0),
|
| 492 |
+
Milestone(id="m2", description="Interview secured.", condition_key="job_found", condition_value=True, reward=1.5),
|
| 493 |
+
Milestone(id="m3", description="Tasks successfully delegated.", condition_key="team_delegated", condition_value=True, reward=0.8),
|
| 494 |
+
]
|
| 495 |
+
events = [
|
| 496 |
+
ExoEvent(step=3, probability=0.7, id="boss_asks", description="Boss asks for progress on current tasks.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
|
| 497 |
+
]
|
| 498 |
+
return Task(
|
| 499 |
+
id="career_crisis", domain="career", goal="Manage Career Overload", constraints={"budget_max": 500, "deadline_step": 12},
|
| 500 |
+
hidden_state={},
|
| 501 |
+
mutable_world={"career.workload": 30.0, "time.free_hours_per_week": -20.0},
|
| 502 |
+
visible_world={"career.workload": 30.0, "time.free_hours_per_week": -20.0},
|
| 503 |
+
success_conditions=[{"key": "workload_reduced", "value": True}, {"key": "job_found", "value": True}, {"key": "team_delegated", "value": True}],
|
| 504 |
+
failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Severe workload is threatening your career stability."}
|
| 505 |
+
)
|
| 506 |
+
|
| 507 |
+
def generate_finances(self, difficulty: int) -> Task:
|
| 508 |
+
routes = [
|
| 509 |
+
Route(id="r1", name="Emergency Fund", description="Dip into savings.", required_action_types=["spend"], preconditions={}, consequences={"used_emergency": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=1.0),
|
| 510 |
+
Route(id="r2", name="Negotiate Payment Plan", description="Call the creditor to delay payments.", required_action_types=["communicate"], preconditions={}, consequences={"payment_plan": True}, closes_routes=["r1"], milestones_unlocked=["m2"], final_reward=2.5),
|
| 511 |
+
Route(id="r3", name="Sell Asset", description="Liquidate an asset for quick cash.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"asset_sold": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=1.5),
|
| 512 |
+
]
|
| 513 |
+
milestones = [
|
| 514 |
+
Milestone(id="m1", description="Emergency fund accessed.", condition_key="used_emergency", condition_value=True, reward=0.5),
|
| 515 |
+
Milestone(id="m2", description="Favorable payment plan negotiated.", condition_key="payment_plan", condition_value=True, reward=1.0),
|
| 516 |
+
Milestone(id="m3", description="Asset successfully sold.", condition_key="asset_sold", condition_value=True, reward=0.8),
|
| 517 |
+
]
|
| 518 |
+
events = [
|
| 519 |
+
ExoEvent(step=2, probability=0.9, id="late_fee", description="A late fee was applied to the balance.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
|
| 520 |
+
]
|
| 521 |
+
return Task(
|
| 522 |
+
id="finance_crisis", domain="finances", goal="Resolve Financial Pressure", constraints={"budget_max": 1000, "deadline_step": 10},
|
| 523 |
+
hidden_state={},
|
| 524 |
+
mutable_world={"finances.liquidity": -40.0, "finances.debt_pressure": 20.0},
|
| 525 |
+
visible_world={"finances.liquidity": -40.0, "finances.debt_pressure": 20.0},
|
| 526 |
+
success_conditions=[{"key": "used_emergency", "value": True}, {"key": "payment_plan", "value": True}, {"key": "asset_sold", "value": True}],
|
| 527 |
+
failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "An unexpected expense has caused financial strain."}
|
| 528 |
+
)
|
| 529 |
+
|
| 530 |
+
def generate_relationships(self, difficulty: int) -> Task:
|
| 531 |
+
routes = [
|
| 532 |
+
Route(id="r1", name="Couples Therapy", description="Book a session with a therapist.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"therapy_scheduled": True}, closes_routes=["r3"], milestones_unlocked=["m1"], final_reward=3.0),
|
| 533 |
+
Route(id="r2", name="Honest Conversation", description="Sit down and talk through issues.", required_action_types=["communicate"], preconditions={}, consequences={"had_conversation": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
|
| 534 |
+
Route(id="r3", name="Give Space", description="Take some time apart.", required_action_types=["rest"], preconditions={}, consequences={"giving_space": True}, closes_routes=["r1", "r2"], milestones_unlocked=["m3"], final_reward=1.0),
|
| 535 |
+
]
|
| 536 |
+
milestones = [
|
| 537 |
+
Milestone(id="m1", description="Therapy session completed.", condition_key="therapy_scheduled", condition_value=True, reward=1.5),
|
| 538 |
+
Milestone(id="m2", description="A productive conversation occurred.", condition_key="had_conversation", condition_value=True, reward=1.0),
|
| 539 |
+
Milestone(id="m3", description="Space given without escalation.", condition_key="giving_space", condition_value=True, reward=0.5),
|
| 540 |
+
]
|
| 541 |
+
events = [
|
| 542 |
+
ExoEvent(step=4, probability=0.6, id="partner_escalates", description="Partner sends an emotional text msg.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
|
| 543 |
+
]
|
| 544 |
+
return Task(
|
| 545 |
+
id="relationship_crisis", domain="relationships", goal="Repair Relationship Friction", constraints={"budget_max": 800, "deadline_step": 14},
|
| 546 |
+
hidden_state={},
|
| 547 |
+
mutable_world={"relationships.romantic": -30.0, "mental_wellbeing.stress_level": 20.0},
|
| 548 |
+
visible_world={"relationships.romantic": -30.0, "mental_wellbeing.stress_level": 20.0},
|
| 549 |
+
success_conditions=[{"key": "therapy_scheduled", "value": True}, {"key": "had_conversation", "value": True}, {"key": "giving_space", "value": True}],
|
| 550 |
+
failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Growing distance and recent conflicts demand attention."}
|
| 551 |
+
)
|
| 552 |
+
|
| 553 |
+
def generate_physical_health(self, difficulty: int) -> Task:
|
| 554 |
+
routes = [
|
| 555 |
+
Route(id="r1", name="Medical Leave", description="Request time off to recover.", required_action_types=["communicate", "rest"], preconditions={}, consequences={"on_leave": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.5),
|
| 556 |
+
Route(id="r2", name="See Specialist", description="Pay for a top-tier medical consultation.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"saw_doctor": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=2.0),
|
| 557 |
+
Route(id="r3", name="Lifestyle Change", description="Commit to better diet and sleep.", required_action_types=["rest"], preconditions={}, consequences={"lifestyle_changed": True}, closes_routes=["r1"], milestones_unlocked=["m3"], final_reward=1.5),
|
| 558 |
+
]
|
| 559 |
+
milestones = [
|
| 560 |
+
Milestone(id="m1", description="Leave approved.", condition_key="on_leave", condition_value=True, reward=1.0),
|
| 561 |
+
Milestone(id="m2", description="Clear diagnosis received.", condition_key="saw_doctor", condition_value=True, reward=1.0),
|
| 562 |
+
Milestone(id="m3", description="First week of new habits complete.", condition_key="lifestyle_changed", condition_value=True, reward=0.5),
|
| 563 |
+
]
|
| 564 |
+
events = [
|
| 565 |
+
ExoEvent(step=3, probability=0.8, id="doctor_call", description="The clinic calls with test results.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
|
| 566 |
+
]
|
| 567 |
+
return Task(
|
| 568 |
+
id="health_crisis", domain="physical_health", goal="Address Health Warning", constraints={"budget_max": 1500, "deadline_step": 15},
|
| 569 |
+
hidden_state={},
|
| 570 |
+
mutable_world={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0},
|
| 571 |
+
visible_world={"physical_health.energy": -30.0, "mental_wellbeing.stress_level": 30.0},
|
| 572 |
+
success_conditions=[{"key": "on_leave", "value": True}, {"key": "saw_doctor", "value": True}, {"key": "lifestyle_changed", "value": True}],
|
| 573 |
+
failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Physical symptoms are becoming impossible to ignore."}
|
| 574 |
+
)
|
| 575 |
+
|
| 576 |
+
def generate_mental_wellbeing(self, difficulty: int) -> Task:
|
| 577 |
+
routes = [
|
| 578 |
+
Route(id="r1", name="Professional Therapy", description="Start regular therapy sessions.", required_action_types=["spend", "communicate"], preconditions={}, consequences={"therapy_started": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=3.0),
|
| 579 |
+
Route(id="r2", name="Disconnect", description="Take a full digital detox break.", required_action_types=["rest"], preconditions={}, consequences={"disconnected": True}, closes_routes=["r3"], milestones_unlocked=["m2"], final_reward=1.5),
|
| 580 |
+
Route(id="r3", name="Medication Evaluation", description="See a psychiatrist for options.", required_action_types=["spend"], preconditions={}, consequences={"medication_taken": True}, closes_routes=["r2"], milestones_unlocked=["m3"], final_reward=2.0),
|
| 581 |
+
]
|
| 582 |
+
milestones = [
|
| 583 |
+
Milestone(id="m1", description="Meaningful breakthrough in therapy.", condition_key="therapy_started", condition_value=True, reward=1.5),
|
| 584 |
+
Milestone(id="m2", description="Successfully unplugged for 48 hours.", condition_key="disconnected", condition_value=True, reward=0.8),
|
| 585 |
+
Milestone(id="m3", description="Prescription acquired.", condition_key="medication_taken", condition_value=True, reward=1.0),
|
| 586 |
+
]
|
| 587 |
+
events = [
|
| 588 |
+
ExoEvent(step=2, probability=0.5, id="panic_attack", description="A sudden wave of severe anxiety hits.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
|
| 589 |
+
]
|
| 590 |
+
return Task(
|
| 591 |
+
id="mental_crisis", domain="mental_wellbeing", goal="Avert Total Burnout", constraints={"budget_max": 600, "deadline_step": 12},
|
| 592 |
+
hidden_state={},
|
| 593 |
+
mutable_world={"mental_wellbeing.motivation": -35.0, "mental_wellbeing.stress_level": 40.0},
|
| 594 |
+
visible_world={"mental_wellbeing.motivation": -35.0, "mental_wellbeing.stress_level": 40.0},
|
| 595 |
+
success_conditions=[{"key": "therapy_started", "value": True}, {"key": "disconnected", "value": True}, {"key": "medication_taken", "value": True}],
|
| 596 |
+
failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "Complete exhaustion and loss of motivation."}
|
| 597 |
+
)
|
| 598 |
+
|
| 599 |
+
def generate_time(self, difficulty: int) -> Task:
|
| 600 |
+
routes = [
|
| 601 |
+
Route(id="r1", name="Reprioritize", description="Restructure calendar and say 'no'.", required_action_types=["communicate"], preconditions={}, consequences={"priorities_reset": True}, closes_routes=[], milestones_unlocked=["m1"], final_reward=2.0),
|
| 602 |
+
Route(id="r2", name="Delegate", description="Pay someone or ask for help with chores.", required_action_types=["spend", "delegate"], preconditions={}, consequences={"tasks_delegated": True}, closes_routes=[], milestones_unlocked=["m2"], final_reward=1.5),
|
| 603 |
+
Route(id="r3", name="Cancel Commitments", description="Drop out of major upcoming events.", required_action_types=["communicate"], preconditions={}, consequences={"commitments_cancelled": True}, closes_routes=["r1"], milestones_unlocked=["m3"], final_reward=1.0),
|
| 604 |
+
]
|
| 605 |
+
milestones = [
|
| 606 |
+
Milestone(id="m1", description="Calendar cleared of non-essentials.", condition_key="priorities_reset", condition_value=True, reward=1.0),
|
| 607 |
+
Milestone(id="m2", description="Help secured for daily tasks.", condition_key="tasks_delegated", condition_value=True, reward=0.8),
|
| 608 |
+
Milestone(id="m3", description="Social obligations cancelled.", condition_key="commitments_cancelled", condition_value=True, reward=0.5),
|
| 609 |
+
]
|
| 610 |
+
events = [
|
| 611 |
+
ExoEvent(step=3, probability=0.9, id="new_request", description="A friend asks for an 'urgent' favor.", world_mutation={}, hidden_state_mutation={}, closes_routes=[])
|
| 612 |
+
]
|
| 613 |
+
return Task(
|
| 614 |
+
id="time_crisis", domain="time", goal="Regain Time Control", constraints={"budget_max": 300, "deadline_step": 10},
|
| 615 |
+
hidden_state={},
|
| 616 |
+
mutable_world={"time.free_hours_per_week": -25.0, "time.admin_overhead": 20.0},
|
| 617 |
+
visible_world={"time.free_hours_per_week": -25.0, "time.admin_overhead": 20.0},
|
| 618 |
+
success_conditions=[{"key": "priorities_reset", "value": True}, {"key": "tasks_delegated", "value": True}, {"key": "commitments_cancelled", "value": True}],
|
| 619 |
+
failure_conditions=[], event_schedule=events, viable_routes=routes, milestones=milestones, horizon=15 + difficulty * 2, difficulty=difficulty, domain_metadata={"story": "You are double-booked and drowning in obligations."}
|
| 620 |
+
)
|
agent/conflict_predictor.py
ADDED
|
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
conflict_predictor.py β Proactive intelligence and trajectory forecasting
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import copy
|
| 6 |
+
from core.life_state import LifeMetrics, DependencyGraph
|
| 7 |
+
|
| 8 |
+
class ConflictPredictor:
|
| 9 |
+
def __init__(self):
|
| 10 |
+
self.graph = DependencyGraph()
|
| 11 |
+
self.snapshots = [] # list of flattened LifeMetrics dicts
|
| 12 |
+
self.MAX_HISTORY = 10
|
| 13 |
+
self.INVERSE_METRICS = {
|
| 14 |
+
"mental_wellbeing.stress_level",
|
| 15 |
+
"career.workload",
|
| 16 |
+
"finances.debt_pressure",
|
| 17 |
+
"time.commute_burden",
|
| 18 |
+
"time.admin_overhead"
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
def add_snapshot(self, metrics: LifeMetrics) -> None:
|
| 22 |
+
self.snapshots.append(metrics.flatten())
|
| 23 |
+
if len(self.snapshots) > self.MAX_HISTORY:
|
| 24 |
+
self.snapshots.pop(0)
|
| 25 |
+
|
| 26 |
+
def compute_trajectory(self, metric_path: str) -> float:
|
| 27 |
+
if len(self.snapshots) < 3:
|
| 28 |
+
return 0.0
|
| 29 |
+
|
| 30 |
+
# Use last 5 snapshots maximum
|
| 31 |
+
n = min(5, len(self.snapshots))
|
| 32 |
+
y = [s.get(metric_path, 0.0) for s in self.snapshots[-n:]]
|
| 33 |
+
x = list(range(n))
|
| 34 |
+
|
| 35 |
+
# Simple linear regression: slope = Cov(x, y) / Var(x)
|
| 36 |
+
mean_y = sum(y) / n
|
| 37 |
+
mean_x = sum(x) / n
|
| 38 |
+
cov_xy = sum((x_i - mean_x) * (y_i - mean_y) for x_i, y_i in zip(x, y))
|
| 39 |
+
var_x = sum((x_i - mean_x) ** 2 for x_i in x)
|
| 40 |
+
|
| 41 |
+
if var_x == 0:
|
| 42 |
+
return 0.0
|
| 43 |
+
return cov_xy / var_x
|
| 44 |
+
|
| 45 |
+
def predict_crisis(self, horizon_days: int = 7) -> list:
|
| 46 |
+
if not self.snapshots:
|
| 47 |
+
return []
|
| 48 |
+
|
| 49 |
+
current = self.snapshots[-1]
|
| 50 |
+
warnings = []
|
| 51 |
+
|
| 52 |
+
for metric, val in current.items():
|
| 53 |
+
slope = self.compute_trajectory(metric)
|
| 54 |
+
if slope == 0.0:
|
| 55 |
+
continue
|
| 56 |
+
|
| 57 |
+
projected = val + (slope * horizon_days)
|
| 58 |
+
is_inverse = metric in self.INVERSE_METRICS
|
| 59 |
+
|
| 60 |
+
# Normal metric: Critical is low (<30), Warning is low (<45)
|
| 61 |
+
# Inverse metric: Critical is high (>70), Warning is high (>55)
|
| 62 |
+
critical_now = (val > 70) if is_inverse else (val < 30)
|
| 63 |
+
warning_now = (val > 55) if is_inverse else (val < 45)
|
| 64 |
+
|
| 65 |
+
critical_proj = (projected > 70) if is_inverse else (projected < 30)
|
| 66 |
+
warning_proj = (projected > 55) if is_inverse else (projected < 45)
|
| 67 |
+
|
| 68 |
+
worse_direction = (slope > 0) if is_inverse else (slope < 0)
|
| 69 |
+
|
| 70 |
+
if worse_direction and (critical_proj or warning_proj):
|
| 71 |
+
threshold = 70.0 if is_inverse else 30.0
|
| 72 |
+
days_until_crit = (threshold - val) / slope if slope != 0 else float('inf')
|
| 73 |
+
|
| 74 |
+
if critical_now:
|
| 75 |
+
days_until_crit = 0.0
|
| 76 |
+
|
| 77 |
+
severity = 'crisis' if critical_proj else 'warning'
|
| 78 |
+
direction_word = "rising" if slope > 0 else "declining"
|
| 79 |
+
friendly_name = metric.split('.')[-1].replace('_', ' ')
|
| 80 |
+
|
| 81 |
+
if severity == 'crisis':
|
| 82 |
+
msg = f"{friendly_name} will hit critical levels in {max(0, int(days_until_crit))} days."
|
| 83 |
+
else:
|
| 84 |
+
msg = f"{friendly_name} has been {direction_word} ({slope:+.1f}/day) β warning levels likely within {horizon_days} days."
|
| 85 |
+
|
| 86 |
+
warnings.append({
|
| 87 |
+
"metric": metric,
|
| 88 |
+
"current_value": val,
|
| 89 |
+
"projected_value": projected,
|
| 90 |
+
"days_until_critical": max(0.0, days_until_crit),
|
| 91 |
+
"severity": severity,
|
| 92 |
+
"message": msg
|
| 93 |
+
})
|
| 94 |
+
|
| 95 |
+
# Sort by urgency (days until critical)
|
| 96 |
+
warnings.sort(key=lambda x: x['days_until_critical'])
|
| 97 |
+
return warnings
|
| 98 |
+
|
| 99 |
+
def get_prediction_summary(self) -> str:
|
| 100 |
+
warnings = self.predict_crisis()
|
| 101 |
+
if not warnings:
|
| 102 |
+
return "Your life metrics are stable. No immediate crises predicted."
|
| 103 |
+
|
| 104 |
+
messages = [w['message'] for w in warnings]
|
| 105 |
+
return "Based on your current trajectory: " + " ".join(messages[:3]) + ("" if len(messages) <= 3 else " (+ more warnings hidden).")
|
| 106 |
+
|
| 107 |
+
def get_risk_score(self) -> float:
|
| 108 |
+
warnings = self.predict_crisis()
|
| 109 |
+
if not warnings:
|
| 110 |
+
return 0.0
|
| 111 |
+
|
| 112 |
+
score = 0.0
|
| 113 |
+
for w in warnings:
|
| 114 |
+
if w['severity'] == 'crisis':
|
| 115 |
+
score += 0.3
|
| 116 |
+
else:
|
| 117 |
+
score += 0.1
|
| 118 |
+
return min(1.0, score)
|
| 119 |
+
|
| 120 |
+
def main():
|
| 121 |
+
import random
|
| 122 |
+
|
| 123 |
+
predictor = ConflictPredictor()
|
| 124 |
+
|
| 125 |
+
print("Simulating 5 days of accumulating stress and declining sleep...\n")
|
| 126 |
+
current_state = LifeMetrics()
|
| 127 |
+
|
| 128 |
+
for i in range(5):
|
| 129 |
+
current_state.mental_wellbeing.stress_level += 5.0 + random.uniform(0, 2)
|
| 130 |
+
current_state.physical_health.sleep_quality -= 4.0 + random.uniform(0, 2)
|
| 131 |
+
current_state.time.free_hours_per_week -= 1.0 + random.uniform(0, 1)
|
| 132 |
+
|
| 133 |
+
predictor.add_snapshot(current_state)
|
| 134 |
+
print(f"Day {i+1}: Stress={current_state.mental_wellbeing.stress_level:.1f}, Sleep={current_state.physical_health.sleep_quality:.1f}")
|
| 135 |
+
|
| 136 |
+
print("\n--- PREDICTION AFTER 5 DAYS ---")
|
| 137 |
+
print(f"Risk Score: {predictor.get_risk_score():.2f}")
|
| 138 |
+
print("Summary:")
|
| 139 |
+
print(predictor.get_prediction_summary())
|
| 140 |
+
|
| 141 |
+
if __name__ == '__main__':
|
| 142 |
+
main()
|
agent/counterfactuals.py
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
counterfactuals.py β Generates alternative "What If" scenarios for LifeStack agent decisions.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import copy
|
| 6 |
+
import random
|
| 7 |
+
from core.reward import compute_reward
|
| 8 |
+
from core.life_state import DependencyGraph
|
| 9 |
+
|
| 10 |
+
def generate_counterfactuals(agent, metrics, budget, conflict, person, chosen_action):
|
| 11 |
+
"""
|
| 12 |
+
Simulates 3 alternative action types and compares them to the agent's choice.
|
| 13 |
+
Returns a list of dicts with alternative outcomes.
|
| 14 |
+
"""
|
| 15 |
+
action_types = ["communicate", "rest", "delegate", "negotiate", "spend", "reschedule", "deprioritize"]
|
| 16 |
+
chosen_type = chosen_action.primary.action_type
|
| 17 |
+
|
| 18 |
+
# Filter and pick 3 different types
|
| 19 |
+
alternatives = [t for t in action_types if t != chosen_type]
|
| 20 |
+
random.shuffle(alternatives)
|
| 21 |
+
target_types = alternatives[:3]
|
| 22 |
+
|
| 23 |
+
results = []
|
| 24 |
+
graph = DependencyGraph()
|
| 25 |
+
|
| 26 |
+
for action_type in target_types:
|
| 27 |
+
try:
|
| 28 |
+
# 1. Generate alternative action
|
| 29 |
+
# We use the special forced-type method we added to the agent
|
| 30 |
+
alt_action = agent.get_action_for_type(metrics, budget, conflict, person, action_type)
|
| 31 |
+
|
| 32 |
+
# 2. Simulate applying it
|
| 33 |
+
current_stress = metrics.mental_wellbeing.stress_level
|
| 34 |
+
uptake = person.respond_to_action(
|
| 35 |
+
alt_action.primary.action_type,
|
| 36 |
+
alt_action.primary.resource_cost,
|
| 37 |
+
current_stress
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
state_after = copy.deepcopy(metrics)
|
| 41 |
+
for path, delta in alt_action.primary.metric_changes.items():
|
| 42 |
+
if "." not in path: continue
|
| 43 |
+
try:
|
| 44 |
+
scaled_delta = float(delta) * uptake
|
| 45 |
+
except (ValueError, TypeError):
|
| 46 |
+
continue
|
| 47 |
+
|
| 48 |
+
if abs(scaled_delta) > 5:
|
| 49 |
+
state_after = graph.cascade(state_after, {path: scaled_delta})
|
| 50 |
+
else:
|
| 51 |
+
dom, sub = path.split('.')
|
| 52 |
+
d = getattr(state_after, dom, None)
|
| 53 |
+
if d:
|
| 54 |
+
cur = getattr(d, sub, 70.0)
|
| 55 |
+
setattr(d, sub, max(0.0, min(100.0, cur + scaled_delta)))
|
| 56 |
+
|
| 57 |
+
# 3. Compute Reward
|
| 58 |
+
reward, breakdown = compute_reward(metrics, state_after, alt_action.primary.resource_cost, 1)
|
| 59 |
+
|
| 60 |
+
# 4. Analysis deltas
|
| 61 |
+
flat_before = metrics.flatten()
|
| 62 |
+
flat_after = state_after.flatten()
|
| 63 |
+
deltas = {k: flat_after[k] - flat_before[k] for k in flat_after}
|
| 64 |
+
|
| 65 |
+
# Filter for meaningful changes (>1.0)
|
| 66 |
+
significant = {k: v for k, v in deltas.items() if abs(v) > 1.0}
|
| 67 |
+
|
| 68 |
+
trade_off = ""
|
| 69 |
+
if significant:
|
| 70 |
+
best = max(significant.items(), key=lambda x: x[1])
|
| 71 |
+
worst = min(significant.items(), key=lambda x: x[1])
|
| 72 |
+
|
| 73 |
+
b_name = best[0].split('.')[-1].replace('_', ' ')
|
| 74 |
+
if best[1] > 2:
|
| 75 |
+
trade_off = f"Better {b_name} (+{best[1]:.0f})"
|
| 76 |
+
else:
|
| 77 |
+
trade_off = f"Stability in {b_name}"
|
| 78 |
+
|
| 79 |
+
if worst[1] < -2:
|
| 80 |
+
w_name = worst[0].split('.')[-1].replace('_', ' ')
|
| 81 |
+
trade_off += f" but drops {w_name} ({worst[1]:.0f})"
|
| 82 |
+
else:
|
| 83 |
+
trade_off += " but mission impact is lower than optimal."
|
| 84 |
+
else:
|
| 85 |
+
trade_off = "Minimal impact on core life metrics."
|
| 86 |
+
|
| 87 |
+
# Incorporate resource commentary
|
| 88 |
+
cost = alt_action.primary.resource_cost
|
| 89 |
+
if cost.get('money', 0) > 100:
|
| 90 |
+
trade_off += f" (${cost['money']:.0f} cost)"
|
| 91 |
+
elif cost.get('time', 0) > 4:
|
| 92 |
+
trade_off += f" ({cost['time']:.1f}h time drain)"
|
| 93 |
+
|
| 94 |
+
results.append({
|
| 95 |
+
"action_type": action_type,
|
| 96 |
+
"description": alt_action.primary.description,
|
| 97 |
+
"reward": reward,
|
| 98 |
+
"trade_off": trade_off,
|
| 99 |
+
"uptake": uptake,
|
| 100 |
+
"metrics": state_after.flatten(),
|
| 101 |
+
})
|
| 102 |
+
|
| 103 |
+
except Exception as e:
|
| 104 |
+
print(f"Error in counterfactual generation for {action_type}: {e}")
|
| 105 |
+
|
| 106 |
+
return results
|
agent/memory.py
ADDED
|
@@ -0,0 +1,394 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import chromadb
|
| 3 |
+
from sentence_transformers import SentenceTransformer
|
| 4 |
+
import uuid
|
| 5 |
+
import math
|
| 6 |
+
from datetime import datetime
|
| 7 |
+
from collections import defaultdict
|
| 8 |
+
from typing import Optional
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
class LifeStackMemory:
|
| 12 |
+
def __init__(self, silent: bool = False, path: str = "./lifestack_memory"):
|
| 13 |
+
self.client = chromadb.PersistentClient(path=path)
|
| 14 |
+
self.collection = self.client.get_or_create_collection(name='decisions')
|
| 15 |
+
self.traj_collection = self.client.get_or_create_collection(name='trajectories')
|
| 16 |
+
self.feedback_collection = self.client.get_or_create_collection(name='feedback') # New for OutcomeFeedback
|
| 17 |
+
self.silent = silent
|
| 18 |
+
self.encoder = self._load_encoder()
|
| 19 |
+
if not self.silent:
|
| 20 |
+
print("Memory system initialized")
|
| 21 |
+
|
| 22 |
+
# Auto-hydrate if empty
|
| 23 |
+
if self.collection.count() == 0:
|
| 24 |
+
self._hydrate_from_preseeded()
|
| 25 |
+
|
| 26 |
+
def _hydrate_from_preseeded(self):
|
| 27 |
+
import json
|
| 28 |
+
sources = ["./data/preseeded_memory_p1.json", "./data/preseeded_memory_p2.json"]
|
| 29 |
+
|
| 30 |
+
if not self.silent:
|
| 31 |
+
print(f"𧬠Empty memory detected. Hydrating from partitioned volumes...")
|
| 32 |
+
|
| 33 |
+
total_decisions = 0
|
| 34 |
+
for path in sources:
|
| 35 |
+
if not os.path.exists(path):
|
| 36 |
+
continue
|
| 37 |
+
|
| 38 |
+
try:
|
| 39 |
+
with open(path, 'r') as f:
|
| 40 |
+
data = json.load(f)
|
| 41 |
+
|
| 42 |
+
# Hydrate decisions
|
| 43 |
+
d = data.get("decisions", {})
|
| 44 |
+
if d.get("ids"):
|
| 45 |
+
self.collection.add(
|
| 46 |
+
ids=d["ids"],
|
| 47 |
+
documents=d["documents"],
|
| 48 |
+
metadatas=d["metadatas"],
|
| 49 |
+
embeddings=d["embeddings"]
|
| 50 |
+
)
|
| 51 |
+
total_decisions += len(d["ids"])
|
| 52 |
+
except Exception as e:
|
| 53 |
+
if not self.silent:
|
| 54 |
+
print(f"β οΈ Hydration failed for {path}: {e}")
|
| 55 |
+
|
| 56 |
+
if not self.silent:
|
| 57 |
+
print(f"β
Hydration complete: {total_decisions} memories restored.")
|
| 58 |
+
|
| 59 |
+
def _load_encoder(self):
|
| 60 |
+
try:
|
| 61 |
+
return SentenceTransformer('all-MiniLM-L6-v2', local_files_only=True)
|
| 62 |
+
except Exception as exc:
|
| 63 |
+
if not self.silent:
|
| 64 |
+
print(f"Falling back to local hash embeddings: {exc}")
|
| 65 |
+
return None
|
| 66 |
+
|
| 67 |
+
def _embed_text(self, text: str) -> list[float]:
|
| 68 |
+
if self.encoder is not None:
|
| 69 |
+
return self.encoder.encode(text).tolist()
|
| 70 |
+
|
| 71 |
+
import zlib
|
| 72 |
+
buckets = [0.0] * 384
|
| 73 |
+
for token in text.lower().split():
|
| 74 |
+
idx = zlib.adler32(token.encode()) % len(buckets)
|
| 75 |
+
buckets[idx] += 1.0
|
| 76 |
+
|
| 77 |
+
norm = math.sqrt(sum(v * v for v in buckets)) or 1.0
|
| 78 |
+
return [v / norm for v in buckets]
|
| 79 |
+
|
| 80 |
+
def store_decision(
|
| 81 |
+
self,
|
| 82 |
+
conflict_title: str,
|
| 83 |
+
action_type: str,
|
| 84 |
+
target_domain: str,
|
| 85 |
+
reward: float,
|
| 86 |
+
metrics_snapshot: dict,
|
| 87 |
+
reasoning: str,
|
| 88 |
+
trajectory: list[dict] = None,
|
| 89 |
+
route_outcome: str = None
|
| 90 |
+
) -> None:
|
| 91 |
+
"""Stores individual decision for longitudinal tracking."""
|
| 92 |
+
|
| 93 |
+
text = f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}"
|
| 94 |
+
embedding = self._embed_text(text)
|
| 95 |
+
|
| 96 |
+
doc_id = str(uuid.uuid4())
|
| 97 |
+
self.collection.add(
|
| 98 |
+
ids=[doc_id],
|
| 99 |
+
embeddings=[embedding],
|
| 100 |
+
documents=[text],
|
| 101 |
+
metadatas=[{
|
| 102 |
+
"conflict_title": conflict_title,
|
| 103 |
+
"action_type": action_type,
|
| 104 |
+
"target_domain": target_domain,
|
| 105 |
+
"reward": float(reward),
|
| 106 |
+
"reasoning": reasoning,
|
| 107 |
+
"route_outcome": route_outcome or "",
|
| 108 |
+
"timestamp": datetime.now().isoformat()
|
| 109 |
+
}]
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
def store_trajectory(
|
| 113 |
+
self,
|
| 114 |
+
conflict_title: str = None,
|
| 115 |
+
route_taken: str = None,
|
| 116 |
+
total_reward: float = 0.0,
|
| 117 |
+
metrics_diff_str: str = None,
|
| 118 |
+
reasoning: str = None,
|
| 119 |
+
task_id: str = None,
|
| 120 |
+
trajectory_summary: dict = None
|
| 121 |
+
) -> None:
|
| 122 |
+
"""Stores a full trajectory summary."""
|
| 123 |
+
|
| 124 |
+
if trajectory_summary is not None and task_id is not None:
|
| 125 |
+
import json
|
| 126 |
+
text = f"Task: {task_id} Route: {route_taken} Reward: {total_reward:.2f} Hits: {len(trajectory_summary.get('milestones_hit', []))}"
|
| 127 |
+
embedding = self._embed_text(text)
|
| 128 |
+
doc_id = str(uuid.uuid4())
|
| 129 |
+
self.traj_collection.add(
|
| 130 |
+
ids=[doc_id],
|
| 131 |
+
embeddings=[embedding],
|
| 132 |
+
documents=[text],
|
| 133 |
+
metadatas=[{
|
| 134 |
+
"task_id": task_id,
|
| 135 |
+
"route_taken": route_taken,
|
| 136 |
+
"reward": total_reward,
|
| 137 |
+
"summary": json.dumps(trajectory_summary),
|
| 138 |
+
"timestamp": datetime.now().isoformat()
|
| 139 |
+
}]
|
| 140 |
+
)
|
| 141 |
+
if not self.silent:
|
| 142 |
+
print(f"Stored task trajectory: {route_taken} (reward: {total_reward:.2f})")
|
| 143 |
+
return
|
| 144 |
+
|
| 145 |
+
# Fallback to older signature logic
|
| 146 |
+
text = f"{conflict_title} Route: {route_taken} Diff: {metrics_diff_str} {reasoning[:100]}"
|
| 147 |
+
embedding = self._embed_text(text)
|
| 148 |
+
|
| 149 |
+
doc_id = str(uuid.uuid4())
|
| 150 |
+
self.collection.add(
|
| 151 |
+
ids=[doc_id],
|
| 152 |
+
embeddings=[embedding],
|
| 153 |
+
documents=[text],
|
| 154 |
+
metadatas=[{
|
| 155 |
+
"conflict_title": conflict_title,
|
| 156 |
+
"route_taken": route_taken,
|
| 157 |
+
"metrics_diff": metrics_diff_str,
|
| 158 |
+
"reward": total_reward,
|
| 159 |
+
"reasoning": reasoning,
|
| 160 |
+
"timestamp": datetime.now().isoformat()
|
| 161 |
+
}]
|
| 162 |
+
)
|
| 163 |
+
if not self.silent:
|
| 164 |
+
print(f"Stored trajectory fallback: {route_taken} (reward: {total_reward:.2f})")
|
| 165 |
+
|
| 166 |
+
def store_feedback(self, feedback) -> None:
|
| 167 |
+
"""Stores OutcomeFeedback linked to a specific episode."""
|
| 168 |
+
import json
|
| 169 |
+
text = f"Episode: {feedback.episode_id} Effectiveness: {feedback.overall_effectiveness} Resolution: {feedback.resolution_time_hours}h"
|
| 170 |
+
embedding = self._embed_text(text)
|
| 171 |
+
|
| 172 |
+
doc_id = f"fb_{feedback.episode_id}"
|
| 173 |
+
self.feedback_collection.add(
|
| 174 |
+
ids=[doc_id],
|
| 175 |
+
embeddings=[embedding],
|
| 176 |
+
documents=[text],
|
| 177 |
+
metadatas=[{
|
| 178 |
+
"episode_id": feedback.episode_id,
|
| 179 |
+
"effectiveness": feedback.overall_effectiveness,
|
| 180 |
+
"domains_improved": json.dumps(feedback.domains_improved),
|
| 181 |
+
"domains_worsened": json.dumps(feedback.domains_worsened),
|
| 182 |
+
"unexpected_effects": feedback.unexpected_effects,
|
| 183 |
+
"resolution_time": feedback.resolution_time_hours,
|
| 184 |
+
"timestamp": feedback.submitted_at.isoformat()
|
| 185 |
+
}]
|
| 186 |
+
)
|
| 187 |
+
if not self.silent:
|
| 188 |
+
print(f"Stored human feedback for episode {feedback.episode_id}")
|
| 189 |
+
|
| 190 |
+
def retrieve_feedback(self, episode_id: str) -> Optional[dict]:
|
| 191 |
+
"""Retrieves feedback for a specific episode."""
|
| 192 |
+
import json
|
| 193 |
+
doc_id = f"fb_{episode_id}"
|
| 194 |
+
results = self.feedback_collection.get(ids=[doc_id])
|
| 195 |
+
|
| 196 |
+
if not results['metadatas']:
|
| 197 |
+
return None
|
| 198 |
+
|
| 199 |
+
meta = results['metadatas'][0]
|
| 200 |
+
# Deserialize lists
|
| 201 |
+
meta["domains_improved"] = json.loads(meta["domains_improved"])
|
| 202 |
+
meta["domains_worsened"] = json.loads(meta["domains_worsened"])
|
| 203 |
+
return meta
|
| 204 |
+
|
| 205 |
+
def retrieve_similar_trajectories(self, task_domain: str, current_world: dict, n: int = 3) -> list[dict]:
|
| 206 |
+
"""Retrieve similar trajectories based on task domain and current world state."""
|
| 207 |
+
import json
|
| 208 |
+
if self.traj_collection.count() == 0:
|
| 209 |
+
return []
|
| 210 |
+
|
| 211 |
+
sorted_metrics = sorted(current_world.items(), key=lambda x: x[1] if isinstance(x[1], (int, float)) else 0)
|
| 212 |
+
top_stressed = " ".join(f"{k}:{v}" for k, v in sorted_metrics[:3])
|
| 213 |
+
query_text = f"TaskDomain: {task_domain} {top_stressed}"
|
| 214 |
+
|
| 215 |
+
query_embedding = self._embed_text(query_text)
|
| 216 |
+
results = self.traj_collection.query(
|
| 217 |
+
query_embeddings=[query_embedding],
|
| 218 |
+
n_results=min(n, self.traj_collection.count())
|
| 219 |
+
)
|
| 220 |
+
|
| 221 |
+
output = []
|
| 222 |
+
for i, meta in enumerate(results['metadatas'][0]):
|
| 223 |
+
output.append({
|
| 224 |
+
"task_id": meta.get("task_id", ""),
|
| 225 |
+
"route_taken": meta.get("route_taken", ""),
|
| 226 |
+
"reward": meta.get("reward", 0.0),
|
| 227 |
+
"summary": json.loads(meta.get("summary", "{}")),
|
| 228 |
+
})
|
| 229 |
+
return output
|
| 230 |
+
|
| 231 |
+
def retrieve_similar(self, conflict_title: str, current_metrics: dict, n: int = 3) -> list[dict]:
|
| 232 |
+
"""Retrieves the n most similar past high-reward decisions using semantic search."""
|
| 233 |
+
if self.collection.count() == 0:
|
| 234 |
+
return []
|
| 235 |
+
|
| 236 |
+
# Build query from conflict title + 3 most stressed metrics (lowest values)
|
| 237 |
+
sorted_metrics = sorted(current_metrics.items(), key=lambda x: x[1])
|
| 238 |
+
top_stressed = " ".join(f"{k}:{v:.0f}" for k, v in sorted_metrics[:3])
|
| 239 |
+
query_text = f"{conflict_title} {top_stressed}"
|
| 240 |
+
|
| 241 |
+
query_embedding = self._embed_text(query_text)
|
| 242 |
+
results = self.collection.query(
|
| 243 |
+
query_embeddings=[query_embedding],
|
| 244 |
+
n_results=min(n * 2, self.collection.count()) # Retrieve more to filter for high reward
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
output = []
|
| 248 |
+
for i, meta in enumerate(results['metadatas'][0]):
|
| 249 |
+
if meta.get("reward", 0.0) < 0.05: # Filter out negative/zero reward decisions
|
| 250 |
+
continue
|
| 251 |
+
if len(output) >= n:
|
| 252 |
+
break
|
| 253 |
+
distance = results['distances'][0][i]
|
| 254 |
+
similarity = round(1.0 / (1.0 + distance), 4)
|
| 255 |
+
output.append({
|
| 256 |
+
"route_taken": meta.get("route_taken", ""),
|
| 257 |
+
"action_type": meta.get("action_type", ""),
|
| 258 |
+
"target_domain": meta.get("target_domain", ""),
|
| 259 |
+
"metrics_diff": meta.get("metrics_diff", ""),
|
| 260 |
+
"reward": meta.get("reward", 0.0),
|
| 261 |
+
"reasoning": meta.get("reasoning", ""),
|
| 262 |
+
"similarity_score": similarity
|
| 263 |
+
})
|
| 264 |
+
|
| 265 |
+
return output
|
| 266 |
+
|
| 267 |
+
def build_few_shot_prompt(self, conflict_title: str, current_metrics: dict) -> str:
|
| 268 |
+
"""Formats retrieved memories into a few-shot prompt block for the LLM."""
|
| 269 |
+
memories = self.retrieve_similar(conflict_title, current_metrics)
|
| 270 |
+
if not memories:
|
| 271 |
+
return ""
|
| 272 |
+
|
| 273 |
+
lines = ["Past successful trajectories in similar situations:\n"]
|
| 274 |
+
for m in memories:
|
| 275 |
+
short_reason = m['reasoning'][:80]
|
| 276 |
+
lines.append(
|
| 277 |
+
f" Route [{m['route_taken']}] β impact [{m['metrics_diff']}] β total reward {m['reward']:.2f} "
|
| 278 |
+
f"(reasoning: {short_reason}...)"
|
| 279 |
+
)
|
| 280 |
+
|
| 281 |
+
return "\n".join(lines)
|
| 282 |
+
|
| 283 |
+
def get_stats(self) -> dict:
|
| 284 |
+
"""Returns memory stats: total count, average reward, and route details."""
|
| 285 |
+
if self.collection.count() == 0:
|
| 286 |
+
return {"total_memories": 0, "average_reward": 0.0, "by_route": {}}
|
| 287 |
+
|
| 288 |
+
all_records = self.collection.get(include=["metadatas"])
|
| 289 |
+
metadatas = all_records["metadatas"]
|
| 290 |
+
|
| 291 |
+
total = len(metadatas)
|
| 292 |
+
avg_reward = sum(m.get("reward", 0.0) for m in metadatas) / total
|
| 293 |
+
|
| 294 |
+
by_route = defaultdict(int)
|
| 295 |
+
for m in metadatas:
|
| 296 |
+
route = m.get("route_taken") or m.get("route_outcome") or "unknown"
|
| 297 |
+
first_action = route.split(' ')[0] if route else "unknown"
|
| 298 |
+
by_route[first_action] += 1
|
| 299 |
+
|
| 300 |
+
return {
|
| 301 |
+
"total_memories": total,
|
| 302 |
+
"average_reward": round(avg_reward, 3),
|
| 303 |
+
"by_action_type": dict(by_route)
|
| 304 |
+
}
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
def main():
|
| 308 |
+
memory = LifeStackMemory()
|
| 309 |
+
|
| 310 |
+
# --- Synthetic Decisions: mix of high and low reward ---
|
| 311 |
+
synthetic = [
|
| 312 |
+
{
|
| 313 |
+
"conflict_title": "Friday 6PM",
|
| 314 |
+
"action_type": "negotiate",
|
| 315 |
+
"target_domain": "career",
|
| 316 |
+
"reward": 0.72,
|
| 317 |
+
"metrics_snapshot": {"career.workload": 100, "mental_wellbeing.stress_level": 95},
|
| 318 |
+
"reasoning": "Negotiating the deadline directly reduced workload pressure quickly."
|
| 319 |
+
},
|
| 320 |
+
{
|
| 321 |
+
"conflict_title": "Friday 6PM",
|
| 322 |
+
"action_type": "rest",
|
| 323 |
+
"target_domain": "mental_wellbeing",
|
| 324 |
+
"reward": 0.61,
|
| 325 |
+
"metrics_snapshot": {"mental_wellbeing.stress_level": 95, "physical_health.energy": 40},
|
| 326 |
+
"reasoning": "A short rest during peak stress restored energy before tackling logistics."
|
| 327 |
+
},
|
| 328 |
+
{
|
| 329 |
+
"conflict_title": "The Perfect Storm",
|
| 330 |
+
"action_type": "communicate",
|
| 331 |
+
"target_domain": "relationships",
|
| 332 |
+
"reward": 0.58,
|
| 333 |
+
"metrics_snapshot": {"relationships.romantic": 45, "mental_wellbeing.emotional_stability": 50},
|
| 334 |
+
"reasoning": "A quick reassuring call prevented relationship collapse under crisis."
|
| 335 |
+
},
|
| 336 |
+
{
|
| 337 |
+
"conflict_title": "The Perfect Storm",
|
| 338 |
+
"action_type": "delegate",
|
| 339 |
+
"target_domain": "career",
|
| 340 |
+
"reward": 0.38, # Below threshold β should NOT be stored
|
| 341 |
+
"metrics_snapshot": {"career.workload": 90, "career.stability": 55},
|
| 342 |
+
"reasoning": "Attempted to delegate but the neurotic profile made it ineffective."
|
| 343 |
+
},
|
| 344 |
+
{
|
| 345 |
+
"conflict_title": "Health Scare",
|
| 346 |
+
"action_type": "rest",
|
| 347 |
+
"target_domain": "physical_health",
|
| 348 |
+
"reward": 0.80,
|
| 349 |
+
"metrics_snapshot": {"physical_health.energy": 20, "mental_wellbeing.stress_level": 90},
|
| 350 |
+
"reasoning": "Aggressive rest protocol dramatically recovered energy and clarity."
|
| 351 |
+
},
|
| 352 |
+
{
|
| 353 |
+
"conflict_title": "Check Engine Light",
|
| 354 |
+
"action_type": "spend",
|
| 355 |
+
"target_domain": "finances",
|
| 356 |
+
"reward": 0.33, # Below threshold β should NOT be stored
|
| 357 |
+
"metrics_snapshot": {"finances.liquidity": 40, "time.commute_burden": 80},
|
| 358 |
+
"reasoning": "Overspent on premium repair, draining liquidity buffer dangerously."
|
| 359 |
+
},
|
| 360 |
+
]
|
| 361 |
+
|
| 362 |
+
print("\n--- STORING SYNTHETIC DECISIONS ---")
|
| 363 |
+
for d in synthetic:
|
| 364 |
+
memory.store_decision(**d)
|
| 365 |
+
|
| 366 |
+
# --- Retrieve similar decisions ---
|
| 367 |
+
print("\n--- RETRIEVING SIMILAR DECISIONS ---")
|
| 368 |
+
test_metrics = {
|
| 369 |
+
"career.workload": 95,
|
| 370 |
+
"mental_wellbeing.stress_level": 90,
|
| 371 |
+
"finances.liquidity": 35,
|
| 372 |
+
"physical_health.energy": 50,
|
| 373 |
+
"relationships.romantic": 70
|
| 374 |
+
}
|
| 375 |
+
similar = memory.retrieve_similar("Friday 6PM", test_metrics, n=3)
|
| 376 |
+
for s in similar:
|
| 377 |
+
print(f" [{s['action_type']}] β {s['target_domain']} | reward: {s['reward']:.2f} | similarity: {s['similarity_score']:.4f}")
|
| 378 |
+
print(f" Reasoning: {s['reasoning'][:80]}...")
|
| 379 |
+
|
| 380 |
+
# --- Few-shot prompt ---
|
| 381 |
+
print("\n--- FEW-SHOT PROMPT OUTPUT ---")
|
| 382 |
+
prompt = memory.build_few_shot_prompt("Friday 6PM", test_metrics)
|
| 383 |
+
print(prompt if prompt else "(No relevant memories found)")
|
| 384 |
+
|
| 385 |
+
# --- Stats ---
|
| 386 |
+
print("\n--- MEMORY STATS ---")
|
| 387 |
+
stats = memory.get_stats()
|
| 388 |
+
print(f"Total Memories : {stats['total_memories']}")
|
| 389 |
+
print(f"Average Reward : {stats['average_reward']}")
|
| 390 |
+
print(f"By Action Type : {stats.get('by_action_type', stats.get('by_route_start'))}")
|
| 391 |
+
|
| 392 |
+
|
| 393 |
+
if __name__ == "__main__":
|
| 394 |
+
main()
|
app.py
ADDED
|
@@ -0,0 +1,1284 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
app.py β LifeStack Gradio Demo App
|
| 3 |
+
Hackathon presentation interface for the LifeStack simulation engine.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import json
|
| 8 |
+
import copy
|
| 9 |
+
import gradio as gr
|
| 10 |
+
import matplotlib
|
| 11 |
+
matplotlib.use("Agg")
|
| 12 |
+
import matplotlib.pyplot as plt
|
| 13 |
+
|
| 14 |
+
# βββ LifeStack modules ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 15 |
+
from core.life_state import LifeMetrics, ResourceBudget
|
| 16 |
+
from core.lifestack_env import LifeStackEnv, LifeStackAction
|
| 17 |
+
from agent.agent import LifeStackAgent
|
| 18 |
+
from intake.simperson import SimPerson
|
| 19 |
+
from agent.conflict_generator import ConflictEvent, generate_conflict, TEMPLATES
|
| 20 |
+
from core.action_space import apply_action, validate_action
|
| 21 |
+
from agent.memory import LifeStackMemory
|
| 22 |
+
from core.metric_schema import normalize_metric_path, is_valid_metric_path
|
| 23 |
+
from core.reward import compute_reward
|
| 24 |
+
from intake.intake import LifeIntake
|
| 25 |
+
from agent.conflict_predictor import ConflictPredictor
|
| 26 |
+
from agent.counterfactuals import generate_counterfactuals
|
| 27 |
+
from scripts.longitudinal_demo import LongitudinalDemo
|
| 28 |
+
from intake.gmail_intake import GmailIntake
|
| 29 |
+
from core.task import Task, ExoEvent, Route, Milestone
|
| 30 |
+
from core.feedback import OutcomeFeedback, compute_human_feedback_reward
|
| 31 |
+
|
| 32 |
+
# βββ Pre-load at startup ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 33 |
+
print("π LifeStack bootingβ¦")
|
| 34 |
+
|
| 35 |
+
AGENT = LifeStackAgent()
|
| 36 |
+
MEMORY = LifeStackMemory(silent=True)
|
| 37 |
+
INTAKE = LifeIntake()
|
| 38 |
+
GMAIL = GmailIntake()
|
| 39 |
+
LONG_DEMO = LongitudinalDemo()
|
| 40 |
+
|
| 41 |
+
# Pre-seed Arjun's 3-week context into ChromaDB on startup
|
| 42 |
+
LONG_DEMO.pre_seed_arjun()
|
| 43 |
+
|
| 44 |
+
# Friday 6PM is always the default demo conflict
|
| 45 |
+
DEMO_CONFLICT = next(t for t in TEMPLATES if t.id == "d5_friday")
|
| 46 |
+
|
| 47 |
+
PERSONS = {
|
| 48 |
+
"Alex (Executive) β driven, high-stress":
|
| 49 |
+
SimPerson(openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8, name="Alex (Executive)"),
|
| 50 |
+
"Chloe (Creative) β spontaneous, resilient":
|
| 51 |
+
SimPerson(openness=0.9, conscientiousness=0.2, extraversion=0.5, agreeableness=0.70, neuroticism=0.15, name="Chloe (Creative)"),
|
| 52 |
+
"Sam (Introvert) β anxious, thoughtful":
|
| 53 |
+
SimPerson(openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9, name="Sam (Introvert)"),
|
| 54 |
+
"Maya (Family) β empathetic, nurturing":
|
| 55 |
+
SimPerson(openness=0.5, conscientiousness=0.7, extraversion=0.5, agreeableness=0.95, neuroticism=0.3, name="Maya (Family)"),
|
| 56 |
+
"Leo (Student) β curious, organised":
|
| 57 |
+
SimPerson(openness=0.85, conscientiousness=0.8, extraversion=0.4, agreeableness=0.4, neuroticism=0.55, name="Leo (Student)"),
|
| 58 |
+
"Arjun (Startup Lead) β high- conscientiousness, high-neuroticism":
|
| 59 |
+
SimPerson(name="Arjun", openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8),
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
CONFLICT_CHOICES = {f"[Diff {t.difficulty}] {t.title}": t for t in TEMPLATES}
|
| 63 |
+
PERSON_CHOICES = list(PERSONS.keys())
|
| 64 |
+
CONFLICT_CHOICES_LIST = list(CONFLICT_CHOICES.keys())
|
| 65 |
+
DEFAULT_CONFLICT = next(k for k in CONFLICT_CHOICES_LIST if "Friday 6PM" in k)
|
| 66 |
+
|
| 67 |
+
DEMO_PREDICTOR = ConflictPredictor()
|
| 68 |
+
|
| 69 |
+
print("β
LifeStack ready.")
|
| 70 |
+
|
| 71 |
+
# βββ Helpers ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 72 |
+
DOMAIN_EMOJI = {
|
| 73 |
+
"career": "πΌ", "finances": "π°", "relationships": "β€οΈ",
|
| 74 |
+
"physical_health": "πͺ", "mental_wellbeing": "π§ ", "time": "π
",
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
# Metrics where HIGH = BAD (inverted color logic)
|
| 78 |
+
INVERTED_METRICS = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
|
| 79 |
+
|
| 80 |
+
def _metric_color(key: str, val: float) -> str:
|
| 81 |
+
"""Return CSS color: inverted for 'bad-when-high' metrics."""
|
| 82 |
+
sub = key.split(".")[-1]
|
| 83 |
+
if sub in INVERTED_METRICS:
|
| 84 |
+
return "#f87171" if val > 70 else ("#facc15" if val >= 40 else "#4ade80")
|
| 85 |
+
return "#4ade80" if val > 70 else ("#facc15" if val >= 40 else "#f87171")
|
| 86 |
+
|
| 87 |
+
def metrics_html(flat: dict, title: str = "", before: dict = None) -> str:
|
| 88 |
+
"""Render metrics as coloured progress bars.
|
| 89 |
+
If `before` is supplied, metrics that changed >1 pt show β/β + delta.
|
| 90 |
+
"""
|
| 91 |
+
domains = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
|
| 92 |
+
rows = []
|
| 93 |
+
if title:
|
| 94 |
+
rows.append(f"<h3 style='margin:0 0 8px;font-size:14px;color:#aaa'>{title}</h3>")
|
| 95 |
+
for dom in domains:
|
| 96 |
+
emoji = DOMAIN_EMOJI[dom]
|
| 97 |
+
rows.append(f"<div style='margin:6px 0 2px;font-size:12px;font-weight:700;color:#ccc'>{emoji} {dom.upper()}</div>")
|
| 98 |
+
sub = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
|
| 99 |
+
for key, val in sub.items():
|
| 100 |
+
name = key.split(".")[1].replace("_", " ")
|
| 101 |
+
color = _metric_color(key, val)
|
| 102 |
+
pct = min(val, 100)
|
| 103 |
+
|
| 104 |
+
delta_str = ""
|
| 105 |
+
if before is not None and key in before:
|
| 106 |
+
delta = val - before[key]
|
| 107 |
+
if abs(delta) > 1.0:
|
| 108 |
+
arrow = "β" if delta > 0 else "β"
|
| 109 |
+
dc = "#4ade80" if delta > 0 else "#f87171"
|
| 110 |
+
delta_str = (
|
| 111 |
+
f"<span style='font-size:10px;color:{dc};margin-left:4px;font-weight:700'>"
|
| 112 |
+
f"{arrow} ({delta:+.1f})</span>"
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
rows.append(
|
| 116 |
+
f"<div style='display:flex;align-items:center;gap:6px;margin:2px 0'>"
|
| 117 |
+
f" <span style='width:140px;font-size:11px;color:#bbb'>{name}</span>"
|
| 118 |
+
f" <div style='flex:1;background:#333;border-radius:4px;height:10px'>"
|
| 119 |
+
f" <div style='width:{pct}%;background:{color};border-radius:4px;height:10px'></div>"
|
| 120 |
+
f" </div>"
|
| 121 |
+
f" <span style='width:38px;font-size:11px;color:#ccc;text-align:right'>{val:.1f}</span>"
|
| 122 |
+
f" {delta_str}"
|
| 123 |
+
f"</div>"
|
| 124 |
+
)
|
| 125 |
+
return "<div style='font-family:monospace;padding:8px'>" + "\n".join(rows) + "</div>"
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
def _init_env(conflict: ConflictEvent) -> LifeStackEnv:
|
| 129 |
+
env = LifeStackEnv()
|
| 130 |
+
env.reset(conflict=conflict.primary_disruption, budget=conflict.resource_budget)
|
| 131 |
+
return env
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
def task_html(task: Task) -> str:
|
| 135 |
+
if not task:
|
| 136 |
+
return "<div style='color:#888; font-style:italic'>No active task</div>"
|
| 137 |
+
routes_html = "".join([f"<li style='margin-bottom:6px;'><b>{r.name}</b>: {r.description} <br><span style='font-size:11px;color:#aaa'>Req. Actions: {r.required_action_types} | Reward: +{r.final_reward}</span></li>" for r in task.viable_routes])
|
| 138 |
+
if not routes_html: routes_html = "<li style='color:#888'>No routes</li>"
|
| 139 |
+
|
| 140 |
+
milestones_html = "".join([f"<li style='margin-bottom:6px;'><b>{m.id}</b>: {m.description} <br><span style='font-size:11px;color:#4ade80'>Reward: +{m.reward}</span></li>" for m in task.milestones])
|
| 141 |
+
if not milestones_html: milestones_html = "<li style='color:#888'>No milestones</li>"
|
| 142 |
+
|
| 143 |
+
return f"""
|
| 144 |
+
<div style='background:#1a1a2e; padding: 16px; border-radius: 8px; border: 1px solid #333; font-family: sans-serif'>
|
| 145 |
+
<h3 style='color:#a78bfa; margin: 0 0 8px 0; font-size: 16px;'>π― Goal: {task.goal}</h3>
|
| 146 |
+
<div style='color:#bbb; font-size: 13px; margin-bottom: 12px'>
|
| 147 |
+
Domain: <b>{task.domain}</b> | Difficulty: <b>{task.difficulty}/5</b> | Horizon: <b>{task.horizon} steps</b>
|
| 148 |
+
</div>
|
| 149 |
+
<div style='background:#0d1b2a; padding: 8px; border-radius: 6px; margin-bottom: 12px;'>
|
| 150 |
+
<b style='color:#60a5fa; font-size: 12px;'>CONSTRAINTS:</b>
|
| 151 |
+
<span style='color:#ddd; font-size: 12px; font-family: monospace;'>{task.constraints}</span>
|
| 152 |
+
</div>
|
| 153 |
+
<div style='display: flex; gap: 16px;'>
|
| 154 |
+
<div style='flex: 1; background:#1e1e2f; padding: 12px; border-radius: 6px;'>
|
| 155 |
+
<b style='color:#4ade80; font-size: 13px; border-bottom: 1px solid #333; display: block; padding-bottom: 4px; margin-bottom: 8px'>π£οΈ Viable Routes</b>
|
| 156 |
+
<ul style='color:#ddd; padding-left: 20px; font-size: 12px; margin: 0;'>{routes_html}</ul>
|
| 157 |
+
</div>
|
| 158 |
+
<div style='flex: 1; background:#1e1e2f; padding: 12px; border-radius: 6px;'>
|
| 159 |
+
<b style='color:#fbbf24; font-size: 13px; border-bottom: 1px solid #333; display: block; padding-bottom: 4px; margin-bottom: 8px'>β Milestones</b>
|
| 160 |
+
<ul style='color:#ddd; padding-left: 20px; font-size: 12px; margin: 0;'>{milestones_html}</ul>
|
| 161 |
+
</div>
|
| 162 |
+
</div>
|
| 163 |
+
</div>
|
| 164 |
+
"""
|
| 165 |
+
|
| 166 |
+
def event_log_html(events: list[ExoEvent]) -> str:
|
| 167 |
+
if not events:
|
| 168 |
+
return "<div style='color:#888; font-style:italic; padding: 12px;'>No events triggered yet.</div>"
|
| 169 |
+
rows = []
|
| 170 |
+
for e in events:
|
| 171 |
+
rows.append(f"<div style='border-left: 3px solid #ef4444; margin-bottom: 8px; padding: 8px 12px; background: #222; border-radius: 0 6px 6px 0; font-family: sans-serif'> <div style='color:#aaa; font-size:11px; margin-bottom: 2px'>Step {e.step}</div> <div style='color:#ddd; font-size: 13px;'><b style='color:#ef4444'>{e.id.upper()}</b>: {e.description}</div> </div>")
|
| 172 |
+
return "<div style='max-height: 400px; overflow-y: auto; padding-right: 4px;'>" + "\n".join(rows) + "</div>"
|
| 173 |
+
|
| 174 |
+
def route_status_html(routes: list[Route], closed: set[str]) -> str:
|
| 175 |
+
if not routes:
|
| 176 |
+
return "<div style='color:#888; font-style:italic; padding: 12px;'>No routes configured.</div>"
|
| 177 |
+
rows = []
|
| 178 |
+
for r in routes:
|
| 179 |
+
if r.id in closed:
|
| 180 |
+
icon, color = "β", "#f87171"
|
| 181 |
+
status = "CLOSED"
|
| 182 |
+
else:
|
| 183 |
+
icon, color = "β
", "#4ade80"
|
| 184 |
+
status = "OPEN"
|
| 185 |
+
rows.append(f"<div style='display:flex; justify-content:space-between; align-items: center; margin-bottom: 8px; border-bottom: 1px solid #333; padding-bottom: 8px; font-family: sans-serif;'> <div style='display:flex; align-items:center; gap: 8px'><span style='font-size: 16px'>{icon}</span> <span style='color:#ddd; font-size: 13px; font-weight: 500'>{r.name}</span></div> <span style='color:{color}; font-size:12px; font-weight:bold; background: rgba(0,0,0,0.3); padding: 2px 6px; border-radius: 4px;'>{status}</span> </div>")
|
| 186 |
+
return "<div style='background:#1e1e2f; padding: 16px; border-radius: 8px; border: 1px solid #333;'>" + "\n".join(rows) + "</div>"
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
def _normalize_action_metric_changes(action) -> None:
|
| 190 |
+
fixed_changes = {}
|
| 191 |
+
for path, delta in action.primary.metric_changes.items():
|
| 192 |
+
raw_path = str(path)
|
| 193 |
+
if "." not in raw_path:
|
| 194 |
+
raw_path = f"{action.primary.target_domain}.{raw_path}"
|
| 195 |
+
norm_path = normalize_metric_path(raw_path)
|
| 196 |
+
if not is_valid_metric_path(norm_path):
|
| 197 |
+
continue
|
| 198 |
+
try:
|
| 199 |
+
fixed_changes[norm_path] = float(delta)
|
| 200 |
+
except (ValueError, TypeError):
|
| 201 |
+
continue
|
| 202 |
+
action.primary.metric_changes = fixed_changes
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
# βββ Cascade Animation Engine ββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 206 |
+
|
| 207 |
+
def animate_cascade(primary_disruption: dict, metrics: LifeMetrics) -> list[dict]:
|
| 208 |
+
"""Replay the cascade step-by-step and capture intermediate frames.
|
| 209 |
+
|
| 210 |
+
Returns a list of frames. Each frame is:
|
| 211 |
+
{ 'flat': {metric: value}, 'status': {metric: 'primary'|'first'|'second'|'unchanged'} }
|
| 212 |
+
"""
|
| 213 |
+
import copy as _cp
|
| 214 |
+
from core.life_state import DependencyGraph, CASCADE_DAMPENING_DEFAULT
|
| 215 |
+
|
| 216 |
+
graph = DependencyGraph()
|
| 217 |
+
dampening = CASCADE_DAMPENING_DEFAULT
|
| 218 |
+
frames = []
|
| 219 |
+
|
| 220 |
+
# Frame 0 β initial stable state
|
| 221 |
+
base = _cp.deepcopy(metrics)
|
| 222 |
+
base_flat = base.flatten()
|
| 223 |
+
frames.append({
|
| 224 |
+
'flat': dict(base_flat),
|
| 225 |
+
'status': {k: 'unchanged' for k in base_flat},
|
| 226 |
+
})
|
| 227 |
+
|
| 228 |
+
# Frame 1 β primary disruption only (no cascade)
|
| 229 |
+
f1 = _cp.deepcopy(metrics)
|
| 230 |
+
primary_keys = set()
|
| 231 |
+
for path, amount in primary_disruption.items():
|
| 232 |
+
if '.' not in path:
|
| 233 |
+
continue
|
| 234 |
+
primary_keys.add(path)
|
| 235 |
+
dom_name, sub_name = path.split('.', 1)
|
| 236 |
+
dom = getattr(f1, dom_name, None)
|
| 237 |
+
if dom and hasattr(dom, sub_name):
|
| 238 |
+
cur = getattr(dom, sub_name)
|
| 239 |
+
setattr(dom, sub_name, max(0.0, min(100.0, cur + amount)))
|
| 240 |
+
f1_flat = f1.flatten()
|
| 241 |
+
f1_status = {}
|
| 242 |
+
for k in f1_flat:
|
| 243 |
+
f1_status[k] = 'primary' if k in primary_keys else 'unchanged'
|
| 244 |
+
frames.append({'flat': dict(f1_flat), 'status': f1_status})
|
| 245 |
+
|
| 246 |
+
# Frame 2 β first-order cascade effects
|
| 247 |
+
f2 = _cp.deepcopy(f1)
|
| 248 |
+
first_order_keys = set()
|
| 249 |
+
queue_next = []
|
| 250 |
+
for path, amount in primary_disruption.items():
|
| 251 |
+
if '.' not in path:
|
| 252 |
+
continue
|
| 253 |
+
if path in graph.edges:
|
| 254 |
+
for target, weight in graph.edges[path]:
|
| 255 |
+
impact = amount * weight * dampening
|
| 256 |
+
if abs(impact) >= 0.05:
|
| 257 |
+
first_order_keys.add(target)
|
| 258 |
+
dom_name, sub_name = target.split('.', 1)
|
| 259 |
+
dom = getattr(f2, dom_name, None)
|
| 260 |
+
if dom and hasattr(dom, sub_name):
|
| 261 |
+
cur = getattr(dom, sub_name)
|
| 262 |
+
setattr(dom, sub_name, max(0.0, min(100.0, cur + impact)))
|
| 263 |
+
queue_next.append((target, impact))
|
| 264 |
+
f2_flat = f2.flatten()
|
| 265 |
+
f2_status = {}
|
| 266 |
+
for k in f2_flat:
|
| 267 |
+
if k in primary_keys:
|
| 268 |
+
f2_status[k] = 'primary'
|
| 269 |
+
elif k in first_order_keys:
|
| 270 |
+
f2_status[k] = 'first'
|
| 271 |
+
else:
|
| 272 |
+
f2_status[k] = 'unchanged'
|
| 273 |
+
frames.append({'flat': dict(f2_flat), 'status': f2_status})
|
| 274 |
+
|
| 275 |
+
# Frame 3 β second-order cascade effects
|
| 276 |
+
f3 = _cp.deepcopy(f2)
|
| 277 |
+
second_order_keys = set()
|
| 278 |
+
for src_path, src_mag in queue_next:
|
| 279 |
+
if src_path in graph.edges:
|
| 280 |
+
for target, weight in graph.edges[src_path]:
|
| 281 |
+
impact = src_mag * weight * dampening
|
| 282 |
+
if abs(impact) >= 0.05:
|
| 283 |
+
second_order_keys.add(target)
|
| 284 |
+
dom_name, sub_name = target.split('.', 1)
|
| 285 |
+
dom = getattr(f3, dom_name, None)
|
| 286 |
+
if dom and hasattr(dom, sub_name):
|
| 287 |
+
cur = getattr(dom, sub_name)
|
| 288 |
+
setattr(dom, sub_name, max(0.0, min(100.0, cur + impact)))
|
| 289 |
+
f3_flat = f3.flatten()
|
| 290 |
+
f3_status = {}
|
| 291 |
+
for k in f3_flat:
|
| 292 |
+
if k in primary_keys:
|
| 293 |
+
f3_status[k] = 'primary'
|
| 294 |
+
elif k in first_order_keys:
|
| 295 |
+
f3_status[k] = 'first'
|
| 296 |
+
elif k in second_order_keys:
|
| 297 |
+
f3_status[k] = 'second'
|
| 298 |
+
else:
|
| 299 |
+
f3_status[k] = 'unchanged'
|
| 300 |
+
frames.append({'flat': dict(f3_flat), 'status': f3_status})
|
| 301 |
+
|
| 302 |
+
return frames
|
| 303 |
+
|
| 304 |
+
|
| 305 |
+
# Cascade-aware CSS colours
|
| 306 |
+
CASCADE_COLORS = {
|
| 307 |
+
'primary': '#ef4444', # π΄ red
|
| 308 |
+
'first': '#f97316', # π orange
|
| 309 |
+
'second': '#eab308', # π‘ yellow
|
| 310 |
+
'improved': '#22c55e', # π’ green
|
| 311 |
+
'unchanged': '#6b7280', # βͺ grey
|
| 312 |
+
}
|
| 313 |
+
|
| 314 |
+
CASCADE_EMOJI = {
|
| 315 |
+
'primary': 'π΄', 'first': 'π ', 'second': 'π‘',
|
| 316 |
+
'improved': 'π’', 'unchanged': 'βͺ',
|
| 317 |
+
}
|
| 318 |
+
|
| 319 |
+
|
| 320 |
+
def cascade_metrics_html(flat: dict, status: dict, title: str = "",
|
| 321 |
+
before: dict = None) -> str:
|
| 322 |
+
"""Render metrics with cascade propagation colours."""
|
| 323 |
+
domains = ["career", "finances", "relationships",
|
| 324 |
+
"physical_health", "mental_wellbeing", "time"]
|
| 325 |
+
rows = []
|
| 326 |
+
if title:
|
| 327 |
+
rows.append(f"<h3 style='margin:0 0 8px;font-size:14px;color:#aaa'>{title}</h3>")
|
| 328 |
+
for dom in domains:
|
| 329 |
+
emoji = DOMAIN_EMOJI[dom]
|
| 330 |
+
rows.append(f"<div style='margin:6px 0 2px;font-size:12px;"
|
| 331 |
+
f"font-weight:700;color:#ccc'>{emoji} {dom.upper()}</div>")
|
| 332 |
+
sub = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
|
| 333 |
+
for key, val in sub.items():
|
| 334 |
+
name = key.split(".")[1].replace("_", " ")
|
| 335 |
+
st = status.get(key, 'unchanged')
|
| 336 |
+
|
| 337 |
+
# If we have a 'before' snapshot and val improved, override status
|
| 338 |
+
if before and key in before and st == 'unchanged':
|
| 339 |
+
if val - before[key] > 1.0:
|
| 340 |
+
st = 'improved'
|
| 341 |
+
|
| 342 |
+
color = CASCADE_COLORS[st]
|
| 343 |
+
tag = CASCADE_EMOJI[st]
|
| 344 |
+
pct = min(val, 100)
|
| 345 |
+
|
| 346 |
+
delta_str = ""
|
| 347 |
+
if before is not None and key in before:
|
| 348 |
+
delta = val - before[key]
|
| 349 |
+
if abs(delta) > 1.0:
|
| 350 |
+
arrow = "β" if delta > 0 else "β"
|
| 351 |
+
dc = "#22c55e" if delta > 0 else "#ef4444"
|
| 352 |
+
delta_str = (
|
| 353 |
+
f"<span style='font-size:10px;color:{dc};"
|
| 354 |
+
f"margin-left:4px;font-weight:700'>"
|
| 355 |
+
f"{arrow} ({delta:+.1f})</span>"
|
| 356 |
+
)
|
| 357 |
+
|
| 358 |
+
rows.append(
|
| 359 |
+
f"<div style='display:flex;align-items:center;gap:6px;margin:2px 0'>"
|
| 360 |
+
f" <span style='font-size:10px'>{tag}</span>"
|
| 361 |
+
f" <span style='width:130px;font-size:11px;color:#bbb'>{name}</span>"
|
| 362 |
+
f" <div style='flex:1;background:#333;border-radius:4px;height:10px'>"
|
| 363 |
+
f" <div style='width:{pct}%;background:{color};border-radius:4px;"
|
| 364 |
+
f"height:10px;transition:width 0.4s ease'></div>"
|
| 365 |
+
f" </div>"
|
| 366 |
+
f" <span style='width:38px;font-size:11px;color:#ccc;"
|
| 367 |
+
f"text-align:right'>{val:.1f}</span>"
|
| 368 |
+
f" {delta_str}"
|
| 369 |
+
f"</div>"
|
| 370 |
+
)
|
| 371 |
+
return "<div style='font-family:monospace;padding:8px'>" + "\n".join(rows) + "</div>"
|
| 372 |
+
|
| 373 |
+
|
| 374 |
+
NARRATIVE = [
|
| 375 |
+
"Your life graph β stable state",
|
| 376 |
+
"π₯ Crisis hits: {title}",
|
| 377 |
+
"π Stress cascades to sleep and free timeβ¦",
|
| 378 |
+
"β‘ Relationships and motivation begin degradingβ¦",
|
| 379 |
+
"π€ Agent intervenes: {action_desc}",
|
| 380 |
+
]
|
| 381 |
+
|
| 382 |
+
|
| 383 |
+
# βββ Tab 1 β Live Demo (animated) ββββββββββββββββββββββββββββββββββββββββββββ
|
| 384 |
+
def run_demo(person_label: str, conflict_label: str):
|
| 385 |
+
"""Generator that yields (before_html, after_html, decision_html) at each animation frame."""
|
| 386 |
+
import time as _t
|
| 387 |
+
|
| 388 |
+
conflict = CONFLICT_CHOICES[conflict_label]
|
| 389 |
+
person = PERSONS[person_label]
|
| 390 |
+
|
| 391 |
+
# Build cascade frames from a clean LifeMetrics
|
| 392 |
+
base_metrics = LifeMetrics()
|
| 393 |
+
frames = animate_cascade(conflict.primary_disruption, base_metrics)
|
| 394 |
+
|
| 395 |
+
# Build predictor HTML
|
| 396 |
+
summary = DEMO_PREDICTOR.get_prediction_summary()
|
| 397 |
+
rscore = DEMO_PREDICTOR.get_risk_score()
|
| 398 |
+
rcolor = "#4ade80" if rscore < 0.3 else ("#facc15" if rscore <= 0.6 else "#f87171")
|
| 399 |
+
pct = min(100, int(rscore * 100))
|
| 400 |
+
pred_html = f"""
|
| 401 |
+
<div style='background:#1e1e2f;border:1px solid #333;border-left:4px solid {rcolor};border-radius:6px;padding:12px;margin-bottom:16px;font-family:sans-serif'>
|
| 402 |
+
<div style='font-size:14px;font-weight:700;color:#ccc;margin-bottom:8px'>β οΈ TRAJECTORY ANALYSIS β Next 7 Days</div>
|
| 403 |
+
<div style='margin-bottom:10px;font-size:13px;color:#ddd'>{summary}</div>
|
| 404 |
+
<div style='display:flex;align-items:center;gap:10px'>
|
| 405 |
+
<span style='font-size:12px;color:#aaa'>Risk Score:</span>
|
| 406 |
+
<div style='flex:1;background:#333;border-radius:4px;height:12px'>
|
| 407 |
+
<div style='width:{pct}%;background:{rcolor};border-radius:4px;height:12px'></div>
|
| 408 |
+
</div>
|
| 409 |
+
<span style='font-size:12px;color:{rcolor};font-weight:700'>{rscore:.2f}</span>
|
| 410 |
+
</div>
|
| 411 |
+
</div>
|
| 412 |
+
"""
|
| 413 |
+
|
| 414 |
+
# ββ Frame 0 β stable state ββββββββββββββββββββββββββββββββββββββββββββ
|
| 415 |
+
f0 = frames[0]
|
| 416 |
+
narr = f"<div style='padding:8px;color:#9ca3af;font-style:italic'>{NARRATIVE[0]}</div>"
|
| 417 |
+
yield (
|
| 418 |
+
pred_html,
|
| 419 |
+
cascade_metrics_html(f0['flat'], f0['status'], "BEFORE"),
|
| 420 |
+
narr,
|
| 421 |
+
"",
|
| 422 |
+
)
|
| 423 |
+
_t.sleep(0.5)
|
| 424 |
+
|
| 425 |
+
# ββ Frame 1 β primary hit βββββββββββββββββββββββββββββββββββββββββββββ
|
| 426 |
+
f1 = frames[1]
|
| 427 |
+
narr = (f"<div style='padding:8px;color:#ef4444;font-weight:700'>"
|
| 428 |
+
f"{NARRATIVE[1].format(title=conflict.title)}</div>")
|
| 429 |
+
yield (
|
| 430 |
+
pred_html,
|
| 431 |
+
cascade_metrics_html(f1['flat'], f1['status'], "DISRUPTION", before=f0['flat']),
|
| 432 |
+
narr,
|
| 433 |
+
"",
|
| 434 |
+
)
|
| 435 |
+
_t.sleep(0.5)
|
| 436 |
+
|
| 437 |
+
# ββ Frame 2 β first-order cascade βββββββββββββββββββββββββββββββββββββ
|
| 438 |
+
f2 = frames[2]
|
| 439 |
+
narr = (f"<div style='padding:8px;color:#f97316;font-weight:700'>"
|
| 440 |
+
f"{NARRATIVE[2]}</div>")
|
| 441 |
+
yield (
|
| 442 |
+
pred_html,
|
| 443 |
+
cascade_metrics_html(f2['flat'], f2['status'], "CASCADE β 1st ORDER", before=f0['flat']),
|
| 444 |
+
narr,
|
| 445 |
+
"",
|
| 446 |
+
)
|
| 447 |
+
_t.sleep(0.5)
|
| 448 |
+
|
| 449 |
+
# ββ Frame 3 β second-order cascade ββββββββββββββββββββββββββββββββββββ
|
| 450 |
+
f3 = frames[3]
|
| 451 |
+
narr = (f"<div style='padding:8px;color:#eab308;font-weight:700'>"
|
| 452 |
+
f"{NARRATIVE[3]}</div>")
|
| 453 |
+
yield (
|
| 454 |
+
pred_html,
|
| 455 |
+
cascade_metrics_html(f3['flat'], f3['status'], "CASCADE β 2nd ORDER", before=f0['flat']),
|
| 456 |
+
narr,
|
| 457 |
+
"",
|
| 458 |
+
)
|
| 459 |
+
_t.sleep(0.5)
|
| 460 |
+
|
| 461 |
+
# ββ Frame 4 β agent intervention (final) ββββββββββββββββββββββββββββββ
|
| 462 |
+
env = _init_env(conflict)
|
| 463 |
+
before_metrics = copy.deepcopy(env.state.current_metrics)
|
| 464 |
+
before_budget = copy.deepcopy(env.state.budget)
|
| 465 |
+
|
| 466 |
+
action = AGENT.get_action(before_metrics, before_budget, conflict, person)
|
| 467 |
+
|
| 468 |
+
# Normalise metric keys
|
| 469 |
+
_normalize_action_metric_changes(action)
|
| 470 |
+
|
| 471 |
+
is_valid, _ = validate_action(action, before_budget)
|
| 472 |
+
if not is_valid:
|
| 473 |
+
action.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
|
| 474 |
+
action.primary.resource_cost = {}
|
| 475 |
+
|
| 476 |
+
current_stress = before_metrics.mental_wellbeing.stress_level
|
| 477 |
+
uptake = person.respond_to_action(
|
| 478 |
+
action.primary.action_type,
|
| 479 |
+
action.primary.resource_cost,
|
| 480 |
+
current_stress
|
| 481 |
+
)
|
| 482 |
+
|
| 483 |
+
scaled_changes = {}
|
| 484 |
+
for path, delta in action.primary.metric_changes.items():
|
| 485 |
+
scaled_changes[path] = float(delta) * uptake
|
| 486 |
+
|
| 487 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 488 |
+
# Apply scaled changes
|
| 489 |
+
env_action.metric_changes = scaled_changes
|
| 490 |
+
|
| 491 |
+
obs = env.step(env_action)
|
| 492 |
+
reward = obs.reward or 0.0
|
| 493 |
+
updated_metrics = env.state.current_metrics
|
| 494 |
+
|
| 495 |
+
# Generate Counterfactuals BEFORE yield
|
| 496 |
+
cf_data = generate_counterfactuals(AGENT, before_metrics, before_budget, conflict, person, action)
|
| 497 |
+
cf_html_blocks = []
|
| 498 |
+
for cf in cf_data:
|
| 499 |
+
cf_html_blocks.append(f"""
|
| 500 |
+
<div style='margin-top:10px;padding:10px;background:#1e1e2f;border-left:3px solid #444;border-radius:4px'>
|
| 501 |
+
<div style='display:flex;justify-content:space-between;font-size:13px;margin-bottom:4px'>
|
| 502 |
+
<span style='font-weight:700;color:#9ca3af'>vs. {cf['action_type']}</span>
|
| 503 |
+
<span style='color:#888'>reward: {cf['reward']:.2f}</span>
|
| 504 |
+
</div>
|
| 505 |
+
<div style='font-size:12px;color:#ccc;margin-bottom:4px'>"{cf['description']}"</div>
|
| 506 |
+
<div style='font-size:11px;color:#94a3b8'><b>Trade-off:</b> {cf['trade_off']}</div>
|
| 507 |
+
</div>
|
| 508 |
+
""")
|
| 509 |
+
cf_html = "".join(cf_html_blocks)
|
| 510 |
+
|
| 511 |
+
after_flat = updated_metrics.flatten()
|
| 512 |
+
before_flat = f0['flat']
|
| 513 |
+
# Build status: mark improved metrics green, rest from f3
|
| 514 |
+
final_status = {}
|
| 515 |
+
for k in after_flat:
|
| 516 |
+
if after_flat[k] - f3['flat'].get(k, after_flat[k]) > 1.0:
|
| 517 |
+
final_status[k] = 'improved'
|
| 518 |
+
else:
|
| 519 |
+
final_status[k] = f3['status'].get(k, 'unchanged')
|
| 520 |
+
|
| 521 |
+
after_html = cascade_metrics_html(after_flat, final_status, "AFTER AGENT ACTION",
|
| 522 |
+
before=before_flat)
|
| 523 |
+
|
| 524 |
+
comm_block = ""
|
| 525 |
+
if action.communication:
|
| 526 |
+
comm_block = (
|
| 527 |
+
f"<div style='margin-top:8px;padding:8px;background:#1e3a5f;"
|
| 528 |
+
f"border-radius:6px;font-size:12px'>"
|
| 529 |
+
f"π¬ <b>Message to {action.communication.recipient}</b> "
|
| 530 |
+
f"({action.communication.tone}): "
|
| 531 |
+
f"<em>{action.communication.content}</em></div>"
|
| 532 |
+
)
|
| 533 |
+
|
| 534 |
+
cost = action.primary.resource_cost
|
| 535 |
+
cost_str = (f"β± {cost.get('time',0):.1f}h Β· "
|
| 536 |
+
f"π΅ ${cost.get('money',0):.0f} Β· "
|
| 537 |
+
f"β‘ {cost.get('energy',0):.0f}")
|
| 538 |
+
reward_color = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
|
| 539 |
+
|
| 540 |
+
narr = (f"<div style='padding:8px;color:#22c55e;font-weight:700'>"
|
| 541 |
+
f"{NARRATIVE[4].format(action_desc=action.primary.description)}</div>")
|
| 542 |
+
|
| 543 |
+
legend = (
|
| 544 |
+
"<div style='margin-top:6px;padding:6px;font-size:11px;color:#aaa;"
|
| 545 |
+
"border-top:1px solid #333;display:flex;gap:12px;flex-wrap:wrap'>"
|
| 546 |
+
"π΄ Primary hit Β· π 1st-order cascade Β· π‘ 2nd-order cascade Β· "
|
| 547 |
+
"π’ Agent improved Β· βͺ Unchanged</div>"
|
| 548 |
+
)
|
| 549 |
+
|
| 550 |
+
decision_html = f"""
|
| 551 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;font-family:sans-serif'>
|
| 552 |
+
<div style='font-size:18px;font-weight:700;margin-bottom:6px'>
|
| 553 |
+
{action.primary.action_type.upper()} β {action.primary.target_domain}
|
| 554 |
+
</div>
|
| 555 |
+
<div style='color:#ccc;margin-bottom:8px'>{action.primary.description}</div>
|
| 556 |
+
{comm_block}
|
| 557 |
+
<div style='margin-top:10px;font-size:12px;color:#aaa;border-top:1px solid #333;padding-top:8px'>
|
| 558 |
+
<b>Reasoning:</b> {action.reasoning}
|
| 559 |
+
</div>
|
| 560 |
+
<div style='margin-top:8px;display:flex;gap:16px;font-size:13px'>
|
| 561 |
+
<span>{cost_str}</span>
|
| 562 |
+
<span>π― Personality uptake: {uptake:.0%}</span>
|
| 563 |
+
<span style='color:{reward_color};font-weight:700'>β
Reward: {reward:.3f}</span>
|
| 564 |
+
</div>
|
| 565 |
+
{legend}
|
| 566 |
+
|
| 567 |
+
<div style='margin-top:24px;border-top:1px solid #444;padding-top:16px'>
|
| 568 |
+
<div style='font-size:14px;font-weight:900;color:#94a3b8;letter-spacing:1px;margin-bottom:12px'>
|
| 569 |
+
π WHAT IF YOU CHOSE DIFFERENTLY?
|
| 570 |
+
</div>
|
| 571 |
+
<div style='padding:10px;background:#0d1b2a;border-radius:6px;border-left:4px solid #4ade80;margin-bottom:16px'>
|
| 572 |
+
<div style='display:flex;justify-content:space-between;font-size:13px;margin-bottom:4px'>
|
| 573 |
+
<span style='font-weight:700;color:#4ade80'>β
Agent chose: {action.primary.action_type}</span>
|
| 574 |
+
<span style='color:#4ade80;font-weight:700'>{reward:.2f}</span>
|
| 575 |
+
</div>
|
| 576 |
+
<div style='font-size:12px;color:#ccc'>"{action.primary.description}"</div>
|
| 577 |
+
</div>
|
| 578 |
+
{cf_html}
|
| 579 |
+
</div>
|
| 580 |
+
</div>"""
|
| 581 |
+
|
| 582 |
+
DEMO_PREDICTOR.add_snapshot(updated_metrics)
|
| 583 |
+
summary = DEMO_PREDICTOR.get_prediction_summary()
|
| 584 |
+
rscore = DEMO_PREDICTOR.get_risk_score()
|
| 585 |
+
rcolor = "#4ade80" if rscore < 0.3 else ("#facc15" if rscore <= 0.6 else "#f87171")
|
| 586 |
+
pct = min(100, int(rscore * 100))
|
| 587 |
+
after_pred_html = f"""
|
| 588 |
+
<div style='background:#1e1e2f;border:1px solid #333;border-left:4px solid {rcolor};border-radius:6px;padding:12px;margin-bottom:16px;font-family:sans-serif'>
|
| 589 |
+
<div style='font-size:14px;font-weight:700;color:#ccc;margin-bottom:8px'>β οΈ TRAJECTORY ANALYSIS β Next 7 Days</div>
|
| 590 |
+
<div style='margin-bottom:10px;font-size:13px;color:#ddd'>{summary}</div>
|
| 591 |
+
<div style='display:flex;align-items:center;gap:10px'>
|
| 592 |
+
<span style='font-size:12px;color:#aaa'>Risk Score:</span>
|
| 593 |
+
<div style='flex:1;background:#333;border-radius:4px;height:12px'>
|
| 594 |
+
<div style='width:{pct}%;background:{rcolor};border-radius:4px;height:12px'></div>
|
| 595 |
+
</div>
|
| 596 |
+
<span style='font-size:12px;color:{rcolor};font-weight:700'>{rscore:.2f}</span>
|
| 597 |
+
</div>
|
| 598 |
+
</div>
|
| 599 |
+
"""
|
| 600 |
+
|
| 601 |
+
yield (after_pred_html, after_html, narr, decision_html)
|
| 602 |
+
|
| 603 |
+
|
| 604 |
+
# βββ Tab 2 β Try Your Situation (intake-powered) βββββββββββββββββββββββββββββ
|
| 605 |
+
def run_custom(situation: str, work_stress: int, money_stress: int,
|
| 606 |
+
relationship_q: int, energy: int, time_pressure: int,
|
| 607 |
+
gmail_signals: dict = None):
|
| 608 |
+
"""Uses LifeIntake to extract structured conflict + personality from NL + sliders."""
|
| 609 |
+
metrics, budget, conflict, personality = INTAKE.full_intake(
|
| 610 |
+
situation, work_stress, money_stress, relationship_q, energy, time_pressure,
|
| 611 |
+
gmail_signals=gmail_signals
|
| 612 |
+
)
|
| 613 |
+
|
| 614 |
+
person = SimPerson(
|
| 615 |
+
name=personality.get("name", "You"),
|
| 616 |
+
openness=personality.get("openness", 0.5),
|
| 617 |
+
conscientiousness=personality.get("conscientiousness", 0.5),
|
| 618 |
+
extraversion=personality.get("extraversion", 0.5),
|
| 619 |
+
agreeableness=personality.get("agreeableness", 0.5),
|
| 620 |
+
neuroticism=personality.get("neuroticism", 0.5),
|
| 621 |
+
)
|
| 622 |
+
|
| 623 |
+
life_html = (
|
| 624 |
+
"<div style='font-family:sans-serif;font-size:13px;color:#a78bfa;"
|
| 625 |
+
"padding:8px 8px 4px;font-style:italic'>"
|
| 626 |
+
"Based on what you described, here is how your life looks right now:"
|
| 627 |
+
"</div>"
|
| 628 |
+
+ metrics_html(metrics.flatten(), "YOUR LIFE RIGHT NOW")
|
| 629 |
+
)
|
| 630 |
+
|
| 631 |
+
action = AGENT.get_action(metrics, budget, conflict, person)
|
| 632 |
+
|
| 633 |
+
_normalize_action_metric_changes(action)
|
| 634 |
+
|
| 635 |
+
is_valid, _ = validate_action(action, budget)
|
| 636 |
+
if not is_valid:
|
| 637 |
+
action.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
|
| 638 |
+
action.primary.resource_cost = {}
|
| 639 |
+
|
| 640 |
+
env = LifeStackEnv()
|
| 641 |
+
env.state.current_metrics = metrics
|
| 642 |
+
env.state.budget = budget
|
| 643 |
+
|
| 644 |
+
# Generate unique episode ID for feedback loop
|
| 645 |
+
import uuid
|
| 646 |
+
episode_id = str(uuid.uuid4())[:8].upper()
|
| 647 |
+
|
| 648 |
+
current_stress = metrics.mental_wellbeing.stress_level
|
| 649 |
+
uptake = person.respond_to_action(
|
| 650 |
+
action.primary.action_type,
|
| 651 |
+
action.primary.resource_cost,
|
| 652 |
+
current_stress
|
| 653 |
+
)
|
| 654 |
+
|
| 655 |
+
scaled_changes = {}
|
| 656 |
+
for path, delta in action.primary.metric_changes.items():
|
| 657 |
+
scaled_changes[path] = float(delta) * uptake
|
| 658 |
+
|
| 659 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 660 |
+
# Apply scaled changes
|
| 661 |
+
env_action.metric_changes = scaled_changes
|
| 662 |
+
|
| 663 |
+
obs = env.step(env_action)
|
| 664 |
+
updated_metrics = env.state.current_metrics
|
| 665 |
+
reward = obs.reward or 0.0
|
| 666 |
+
|
| 667 |
+
after_html = metrics_html(updated_metrics.flatten(), "AFTER ACTION", before=metrics.flatten())
|
| 668 |
+
reward_color = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
|
| 669 |
+
|
| 670 |
+
trait_bar = lambda v: "β" * int(v * 10) + "β" * (10 - int(v * 10))
|
| 671 |
+
personality_html = f"""
|
| 672 |
+
<div style='background:#12122a;border:1px solid #2a2a4a;border-radius:8px;padding:12px;
|
| 673 |
+
margin-bottom:12px;font-family:monospace;font-size:11px;color:#ccc'>
|
| 674 |
+
<div style='font-size:13px;font-weight:700;color:#a78bfa;margin-bottom:8px'>π§ Inferred Personality: {person.name}</div>
|
| 675 |
+
<div>Openness {trait_bar(personality.get('openness',0.5))} {personality.get('openness',0.5):.2f}</div>
|
| 676 |
+
<div>Conscientiousness {trait_bar(personality.get('conscientiousness',0.5))} {personality.get('conscientiousness',0.5):.2f}</div>
|
| 677 |
+
<div>Extraversion {trait_bar(personality.get('extraversion',0.5))} {personality.get('extraversion',0.5):.2f}</div>
|
| 678 |
+
<div>Agreeableness {trait_bar(personality.get('agreeableness',0.5))} {personality.get('agreeableness',0.5):.2f}</div>
|
| 679 |
+
<div>Neuroticism {trait_bar(personality.get('neuroticism',0.5))} {personality.get('neuroticism',0.5):.2f}</div>
|
| 680 |
+
</div>"""
|
| 681 |
+
|
| 682 |
+
steps = [f"<b>Step 1:</b> {action.primary.description}"]
|
| 683 |
+
if action.communication:
|
| 684 |
+
steps.append(
|
| 685 |
+
f"<b>Message to {action.communication.recipient}</b> "
|
| 686 |
+
f"({action.communication.tone}): <em>{action.communication.content}</em>"
|
| 687 |
+
)
|
| 688 |
+
cost = action.primary.resource_cost
|
| 689 |
+
cost_str = f"β± {cost.get('time', 0):.1f}h Β· π΅ ${cost.get('money', 0):.0f} Β· β‘ {cost.get('energy', 0):.0f}"
|
| 690 |
+
|
| 691 |
+
plan_html = f"""
|
| 692 |
+
{personality_html}
|
| 693 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;font-family:sans-serif;color:#eee'>
|
| 694 |
+
<div style='font-size:13px;font-weight:700;color:#60a5fa;margin-bottom:4px'>
|
| 695 |
+
π {conflict.title} (Difficulty {conflict.difficulty}/5)
|
| 696 |
+
</div>
|
| 697 |
+
<div style='font-size:12px;color:#aaa;margin-bottom:10px'>{conflict.story}</div>
|
| 698 |
+
<div style='font-size:16px;font-weight:700;margin-bottom:10px'>π― Resolution Plan for {person.name}</div>
|
| 699 |
+
<div style='margin-bottom:8px'>{"<br>".join(steps)}</div>
|
| 700 |
+
<div style='margin:10px 0;padding:8px;background:#0d1b2a;border-radius:6px;font-size:12px;color:#aaa'>
|
| 701 |
+
<b>Why:</b> {action.reasoning}
|
| 702 |
+
</div>
|
| 703 |
+
<div style='display:flex;gap:20px;font-size:13px;border-top:1px solid #333;padding-top:8px'>
|
| 704 |
+
<span>{cost_str}</span>
|
| 705 |
+
<span>π― Personality fit: {uptake:.0%}</span>
|
| 706 |
+
<span style='margin-left:auto;color:#a78bfa;font-weight:700'>ID: {episode_id}</span>
|
| 707 |
+
</div>
|
| 708 |
+
</div>
|
| 709 |
+
<div style='margin-top:12px;font-size:11px;color:#888;text-align:right'>
|
| 710 |
+
Keep this ID to record the real-world outcome in the 'Real-World Verification' tab.
|
| 711 |
+
</div>
|
| 712 |
+
"""
|
| 713 |
+
|
| 714 |
+
return (
|
| 715 |
+
life_html,
|
| 716 |
+
after_html,
|
| 717 |
+
plan_html
|
| 718 |
+
)
|
| 719 |
+
|
| 720 |
+
|
| 721 |
+
# βββ Tab 3 β Training Results βββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 722 |
+
def load_training_tab():
|
| 723 |
+
html_parts = []
|
| 724 |
+
|
| 725 |
+
try:
|
| 726 |
+
stats = MEMORY.get_stats()
|
| 727 |
+
html_parts.append(f"""
|
| 728 |
+
<div style='display:flex;gap:16px;flex-wrap:wrap;margin-bottom:16px'>
|
| 729 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
|
| 730 |
+
<div style='font-size:28px;font-weight:700;color:#4ade80'>{stats['total_memories']}</div>
|
| 731 |
+
<div style='color:#aaa;font-size:12px'>Decisions Stored</div>
|
| 732 |
+
</div>
|
| 733 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
|
| 734 |
+
<div style='font-size:28px;font-weight:700;color:#60a5fa'>{stats['average_reward']:.3f}</div>
|
| 735 |
+
<div style='color:#aaa;font-size:12px'>Avg Memory Reward</div>
|
| 736 |
+
</div>
|
| 737 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:200px'>
|
| 738 |
+
<div style='font-size:12px;color:#aaa;margin-bottom:6px'>By Action Type</div>
|
| 739 |
+
{''.join(f"<div style='font-size:12px'><b>{k}</b>: {v}</div>" for k,v in stats['by_action_type'].items())}
|
| 740 |
+
</div>
|
| 741 |
+
</div>""")
|
| 742 |
+
except Exception as e:
|
| 743 |
+
html_parts.append(f"<p style='color:#f87171'>Memory error: {e}</p>")
|
| 744 |
+
|
| 745 |
+
log_path = os.path.join(os.path.dirname(__file__), "data", "training_log.json")
|
| 746 |
+
if os.path.exists(log_path):
|
| 747 |
+
try:
|
| 748 |
+
data = json.load(open(log_path))
|
| 749 |
+
rewards = [e["reward"] for e in data]
|
| 750 |
+
first10 = sum(rewards[:10]) / 10
|
| 751 |
+
last10 = sum(rewards[-10:]) / 10
|
| 752 |
+
best = max(data, key=lambda x: x["reward"])
|
| 753 |
+
phases = {
|
| 754 |
+
"Early (1β15)": [e for e in data if e["episode"] <= 15],
|
| 755 |
+
"Mid (16β35)": [e for e in data if 16 <= e["episode"] <= 35],
|
| 756 |
+
"Late (36β50)": [e for e in data if e["episode"] >= 36],
|
| 757 |
+
}
|
| 758 |
+
phase_rows = "".join(
|
| 759 |
+
f"<tr><td style='padding:4px 10px'>{name}</td><td style='padding:4px 10px;text-align:center'>{len(eps)}</td>"
|
| 760 |
+
f"<td style='padding:4px 10px;text-align:center;color:#4ade80'>{sum(e['reward'] for e in eps)/len(eps):.3f}</td></tr>"
|
| 761 |
+
for name, eps in phases.items() if eps
|
| 762 |
+
)
|
| 763 |
+
delta_color = "#4ade80" if last10 >= first10 else "#f87171"
|
| 764 |
+
html_parts.append(f"""
|
| 765 |
+
<div style='margin-bottom:16px'>
|
| 766 |
+
<div style='display:flex;gap:16px;flex-wrap:wrap;margin-bottom:12px'>
|
| 767 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
|
| 768 |
+
<div style='font-size:28px;font-weight:700;color:#a78bfa'>{len(data)}</div>
|
| 769 |
+
<div style='color:#aaa;font-size:12px'>Total Episodes</div>
|
| 770 |
+
</div>
|
| 771 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
|
| 772 |
+
<div style='font-size:28px;font-weight:700;color:#4ade80'>{sum(rewards)/len(rewards):.3f}</div>
|
| 773 |
+
<div style='color:#aaa;font-size:12px'>Overall Avg Reward</div>
|
| 774 |
+
</div>
|
| 775 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:140px;text-align:center'>
|
| 776 |
+
<div style='font-size:28px;font-weight:700;color:#fbbf24'>{best["reward"]:.3f}</div>
|
| 777 |
+
<div style='color:#aaa;font-size:12px'>Best Episode (#{best["episode"]})</div>
|
| 778 |
+
</div>
|
| 779 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:8px;padding:12px;min-width:160px;text-align:center'>
|
| 780 |
+
<div style='font-size:22px;font-weight:700;color:{delta_color}'>
|
| 781 |
+
{"+" if last10>=first10 else ""}{(last10-first10):.3f}
|
| 782 |
+
</div>
|
| 783 |
+
<div style='color:#aaa;font-size:12px'>Ep 1β10 β 41β50 Ξ</div>
|
| 784 |
+
</div>
|
| 785 |
+
</div>
|
| 786 |
+
<table style='border-collapse:collapse;width:100%;max-width:400px;font-size:13px;color:#eee'>
|
| 787 |
+
<tr style='color:#aaa;border-bottom:1px solid #333'>
|
| 788 |
+
<th style='padding:4px 10px;text-align:left'>Phase</th>
|
| 789 |
+
<th style='padding:4px 10px'>Episodes</th>
|
| 790 |
+
<th style='padding:4px 10px'>Avg Reward</th>
|
| 791 |
+
</tr>
|
| 792 |
+
{phase_rows}
|
| 793 |
+
</table>
|
| 794 |
+
</div>""")
|
| 795 |
+
except Exception as e:
|
| 796 |
+
html_parts.append(f"<p style='color:#f87171'>Log parse error: {e}</p>")
|
| 797 |
+
else:
|
| 798 |
+
html_parts.append("<p style='color:#aaa'>training_log.json not found β run train.py first.</p>")
|
| 799 |
+
|
| 800 |
+
return "<div style='font-family:sans-serif;color:#eee'>" + "\n".join(html_parts) + "</div>"
|
| 801 |
+
|
| 802 |
+
|
| 803 |
+
# βββ Tab: Memory Effect Demo βββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 804 |
+
def run_memory_demo(conflict_label: str, person_label: str):
|
| 805 |
+
"""Cold-start vs RAG-Augmented episode comparison."""
|
| 806 |
+
import copy as _cp
|
| 807 |
+
import time as _t
|
| 808 |
+
|
| 809 |
+
ERR = "background:#1a1a2e;border:2px solid #ef4444;border-radius:10px;padding:20px;font-family:sans-serif;color:#f87171;"
|
| 810 |
+
|
| 811 |
+
def _run_ep(conflict, person, few_shot_context):
|
| 812 |
+
env = _init_env(conflict)
|
| 813 |
+
mb = _cp.deepcopy(env.state.current_metrics)
|
| 814 |
+
bud = _cp.deepcopy(env.state.budget)
|
| 815 |
+
act = AGENT.get_action(mb, bud, conflict, person,
|
| 816 |
+
few_shot_context=few_shot_context)
|
| 817 |
+
_normalize_action_metric_changes(act)
|
| 818 |
+
is_valid, _ = validate_action(act, bud)
|
| 819 |
+
if not is_valid:
|
| 820 |
+
act.primary.metric_changes = {"mental_wellbeing.stress_level": -5.0}
|
| 821 |
+
act.primary.resource_cost = {}
|
| 822 |
+
uptake = person.respond_to_action(
|
| 823 |
+
act.primary.action_type, act.primary.resource_cost,
|
| 824 |
+
mb.mental_wellbeing.stress_level)
|
| 825 |
+
scaled = {k: float(v) * uptake for k, v in act.primary.metric_changes.items()}
|
| 826 |
+
env_act = LifeStackAction.from_agent_action(act)
|
| 827 |
+
env_act.metric_changes = scaled
|
| 828 |
+
obs = env.step(env_act)
|
| 829 |
+
reward = obs.reward or 0.0
|
| 830 |
+
return act, reward, uptake, mb, env.state.current_metrics
|
| 831 |
+
|
| 832 |
+
def _card(ep_num, label, act, reward, uptake, before, after,
|
| 833 |
+
border_color, few_shot_ctx=""):
|
| 834 |
+
bf = before.flatten()
|
| 835 |
+
af = after.flatten()
|
| 836 |
+
rc = "#4ade80" if reward > 0.4 else ("#facc15" if reward > 0 else "#f87171")
|
| 837 |
+
cost = act.primary.resource_cost
|
| 838 |
+
cstr = (f"\u23f1 {cost.get('time',0):.1f}h "
|
| 839 |
+
f"\U0001f4b5 ${cost.get('money',0):.0f} "
|
| 840 |
+
f"\u26a1 {cost.get('energy',0):.0f}")
|
| 841 |
+
rows = ""
|
| 842 |
+
for k, va in af.items():
|
| 843 |
+
d = va - bf.get(k, va)
|
| 844 |
+
if abs(d) > 0.5:
|
| 845 |
+
n = k.replace(".", " \u203a ").replace("_", " ")
|
| 846 |
+
ar = "\u2191" if d > 0 else "\u2193"
|
| 847 |
+
dc = "#4ade80" if d > 0 else "#f87171"
|
| 848 |
+
rows += (f"<div style='display:flex;justify-content:space-between;"
|
| 849 |
+
f"font-size:11px;color:#ccc;padding:2px 0'>"
|
| 850 |
+
f"<span>{n}</span><span style='color:{dc}'>{ar} {d:+.1f}</span></div>")
|
| 851 |
+
if not rows:
|
| 852 |
+
rows = "<div style='font-size:11px;color:#666'>No significant metric changes</div>"
|
| 853 |
+
badge = ""
|
| 854 |
+
if few_shot_ctx:
|
| 855 |
+
prev = few_shot_ctx[:160].replace("<", "<").replace(">", ">")
|
| 856 |
+
badge = (f"<div style='margin-top:10px;padding:8px;background:#0d2a1a;"
|
| 857 |
+
f"border:1px solid #166534;border-radius:6px;font-size:11px;color:#86efac'>"
|
| 858 |
+
f"\U0001f9e0 <b>Memory injected:</b><br>"
|
| 859 |
+
f"<span style='color:#ccc'>{prev}\u2026</span></div>")
|
| 860 |
+
reas = act.reasoning[:180] + ("\u2026" if len(act.reasoning) > 180 else "")
|
| 861 |
+
return (
|
| 862 |
+
f"<div style='background:#12122a;border:2px solid {border_color};"
|
| 863 |
+
f"border-radius:10px;padding:16px;font-family:sans-serif'>"
|
| 864 |
+
f"<div style='font-size:12px;font-weight:700;color:#888;letter-spacing:2px;margin-bottom:4px'>"
|
| 865 |
+
f"EPISODE {ep_num} \u2014 {label.upper()}</div>"
|
| 866 |
+
f"<div style='font-size:18px;font-weight:900;color:#eee;margin-bottom:8px'>"
|
| 867 |
+
f"{act.primary.action_type.upper()} \u2192 {act.primary.target_domain}</div>"
|
| 868 |
+
f"<div style='font-size:13px;color:#ccc;margin-bottom:10px'>{act.primary.description}</div>"
|
| 869 |
+
f"<div style='margin-bottom:10px;padding:8px;background:#1e1e2f;border-radius:6px;"
|
| 870 |
+
f"font-size:11px;color:#94a3b8'><b>Reasoning:</b> {reas}</div>"
|
| 871 |
+
f"<div style='display:flex;gap:12px;font-size:13px;margin-bottom:10px'>"
|
| 872 |
+
f"<span style='color:{rc};font-weight:700'>\u2605 Reward: {reward:.3f}</span>"
|
| 873 |
+
f"<span style='color:#94a3b8'>\U0001f3af Uptake: {uptake:.0%}</span>"
|
| 874 |
+
f"<span style='color:#6b7280'>{cstr}</span></div>"
|
| 875 |
+
f"<div style='border-top:1px solid #333;padding-top:10px'>"
|
| 876 |
+
f"<div style='font-size:11px;color:#888;margin-bottom:4px'>METRIC CHANGES</div>"
|
| 877 |
+
f"{rows}</div>{badge}</div>"
|
| 878 |
+
)
|
| 879 |
+
|
| 880 |
+
try:
|
| 881 |
+
conflict = CONFLICT_CHOICES[conflict_label]
|
| 882 |
+
person = PERSONS[person_label]
|
| 883 |
+
except KeyError as e:
|
| 884 |
+
err = f"<div style='{ERR}'>\u274c Invalid selection: {e}</div>"
|
| 885 |
+
return err, err, err
|
| 886 |
+
|
| 887 |
+
try:
|
| 888 |
+
ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma = _run_ep(conflict, person, "")
|
| 889 |
+
except Exception as e:
|
| 890 |
+
err = f"<div style='{ERR}'>\u274c Episode 1 failed: {e}</div>"
|
| 891 |
+
return err, err, err
|
| 892 |
+
|
| 893 |
+
try:
|
| 894 |
+
MEMORY.store_decision(
|
| 895 |
+
conflict_title=conflict.title,
|
| 896 |
+
action_type=ep1_act.primary.action_type,
|
| 897 |
+
target_domain=ep1_act.primary.target_domain,
|
| 898 |
+
reward=ep1_r,
|
| 899 |
+
metrics_snapshot=ep1_mb.flatten(),
|
| 900 |
+
reasoning=ep1_act.reasoning,
|
| 901 |
+
)
|
| 902 |
+
except Exception:
|
| 903 |
+
pass
|
| 904 |
+
|
| 905 |
+
outcome_lbl = "Good \u2014 build on this" if ep1_r >= 0.4 else "Suboptimal \u2014 try different approach"
|
| 906 |
+
few_shot = (
|
| 907 |
+
f"RETRIEVED MEMORY \u2014 Previous attempt at '{conflict.title}':\n"
|
| 908 |
+
f" Action: {ep1_act.primary.action_type} \u2192 {ep1_act.primary.target_domain}\n"
|
| 909 |
+
f" Done: {ep1_act.primary.description}\n"
|
| 910 |
+
f" Reward: {ep1_r:.3f} ({outcome_lbl})\n"
|
| 911 |
+
f" Reasoning: {ep1_act.reasoning[:120]}\n"
|
| 912 |
+
f"{'Refine this approach.' if ep1_r >= 0.4 else 'Try a meaningfully different action type or domain.'}"
|
| 913 |
+
)
|
| 914 |
+
|
| 915 |
+
_t.sleep(2)
|
| 916 |
+
|
| 917 |
+
try:
|
| 918 |
+
ep2_act, ep2_r, ep2_up, ep2_mb, ep2_ma = _run_ep(conflict, person, few_shot)
|
| 919 |
+
except Exception as e:
|
| 920 |
+
ep1_html = _card(1, "No Memory", ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma, "#4b5563", "")
|
| 921 |
+
err = f"<div style='{ERR}'>\u274c Episode 2 failed \u2014 wait 30s and retry: {e}</div>"
|
| 922 |
+
return ep1_html, err, err
|
| 923 |
+
|
| 924 |
+
ep1_html = _card(1, "No Memory", ep1_act, ep1_r, ep1_up, ep1_mb, ep1_ma, "#4b5563", "")
|
| 925 |
+
ep2_html = _card(2, "RAG-Augmented", ep2_act, ep2_r, ep2_up, ep2_mb, ep2_ma, "#22c55e", few_shot)
|
| 926 |
+
|
| 927 |
+
rd = ep2_r - ep1_r
|
| 928 |
+
pct = (rd / max(abs(ep1_r), 0.01)) * 100
|
| 929 |
+
dc = "#4ade80" if rd >= 0 else "#f87171"
|
| 930 |
+
same = ep1_act.primary.action_type == ep2_act.primary.action_type
|
| 931 |
+
sl = ("\u2705 Different strategy \u2014 memory triggered a better approach"
|
| 932 |
+
if not same else "\u26a0\ufe0f Same action (memory reinforced the choice)")
|
| 933 |
+
sc = "#4ade80" if not same else "#facc15"
|
| 934 |
+
|
| 935 |
+
diff_html = (
|
| 936 |
+
f"<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;"
|
| 937 |
+
f"padding:16px;font-family:sans-serif'>"
|
| 938 |
+
f"<div style='font-size:14px;font-weight:900;color:#a78bfa;letter-spacing:1px;"
|
| 939 |
+
f"margin-bottom:12px'>\U0001f4ca MEMORY EFFECT DELTA</div>"
|
| 940 |
+
f"<div style='display:grid;grid-template-columns:1fr 1fr 1fr;gap:12px;margin-bottom:14px'>"
|
| 941 |
+
f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
|
| 942 |
+
f"<div style='font-size:22px;font-weight:700;color:#6b7280'>{ep1_r:.3f}</div>"
|
| 943 |
+
f"<div style='font-size:11px;color:#666;margin-top:2px'>Cold Start Reward</div></div>"
|
| 944 |
+
f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
|
| 945 |
+
f"<div style='font-size:22px;font-weight:700;color:#22c55e'>{ep2_r:.3f}</div>"
|
| 946 |
+
f"<div style='font-size:11px;color:#666;margin-top:2px'>RAG-Augmented Reward</div></div>"
|
| 947 |
+
f"<div style='background:#0d1117;border:1px solid #333;border-radius:8px;padding:12px;text-align:center'>"
|
| 948 |
+
f"<div style='font-size:22px;font-weight:700;color:{dc}'>{'+' if rd >= 0 else ''}{pct:.0f}%</div>"
|
| 949 |
+
f"<div style='font-size:11px;color:#666;margin-top:2px'>Efficiency Gain</div></div></div>"
|
| 950 |
+
f"<div style='padding:10px;background:#0d2a1a;border-radius:6px;margin-bottom:10px'>"
|
| 951 |
+
f"<span style='color:{sc};font-weight:700'>{sl}</span></div>"
|
| 952 |
+
f"<div style='font-size:12px;color:#6b7280;border-top:1px solid #222;padding-top:10px'>"
|
| 953 |
+
f"Ep1 \u2192 <b style='color:#ccc'>{ep1_act.primary.action_type}</b> | "
|
| 954 |
+
f"Ep2 \u2192 <b style='color:#a78bfa'>{ep2_act.primary.action_type}</b>. "
|
| 955 |
+
f"Memory {'shifted the strategy' if not same else 'reinforced the same choice'}."
|
| 956 |
+
f"</div></div>"
|
| 957 |
+
)
|
| 958 |
+
|
| 959 |
+
return ep1_html, ep2_html, diff_html
|
| 960 |
+
|
| 961 |
+
|
| 962 |
+
def submit_outcome_feedback(ep_id, score, domains_up, domains_down, notes, time_spent):
|
| 963 |
+
if not ep_id:
|
| 964 |
+
return "β οΈ Please enter a valid Episode ID."
|
| 965 |
+
|
| 966 |
+
feedback = OutcomeFeedback(
|
| 967 |
+
episode_id=ep_id,
|
| 968 |
+
overall_effectiveness=int(score),
|
| 969 |
+
domains_improved=domains_up,
|
| 970 |
+
domains_worsened=domains_down,
|
| 971 |
+
unexpected_effects=notes,
|
| 972 |
+
resolution_time_hours=float(time_spent)
|
| 973 |
+
)
|
| 974 |
+
|
| 975 |
+
# Store in memory
|
| 976 |
+
MEMORY.store_feedback(feedback)
|
| 977 |
+
|
| 978 |
+
return f"β
Feedback for **{ep_id}** submitted! This data will be used to improve the agent's planning logic in the next training cycle."
|
| 979 |
+
|
| 980 |
+
|
| 981 |
+
# βββ Main Gradio App Construction βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 982 |
+
with gr.Blocks(
|
| 983 |
+
title="LifeStack β AI Life Coach",
|
| 984 |
+
) as app:
|
| 985 |
+
|
| 986 |
+
gr.HTML("""
|
| 987 |
+
<div style='text-align:center;padding:24px 0 8px;font-family:sans-serif'>
|
| 988 |
+
<div style='font-size:36px;font-weight:900;letter-spacing:-1px;
|
| 989 |
+
background:linear-gradient(90deg,#a78bfa,#60a5fa);
|
| 990 |
+
-webkit-background-clip:text;-webkit-text-fill-color:transparent'>
|
| 991 |
+
LifeStack
|
| 992 |
+
</div>
|
| 993 |
+
<div style='color:#888;font-size:14px;margin-top:4px'>
|
| 994 |
+
AI that handles life's worst Fridays
|
| 995 |
+
</div>
|
| 996 |
+
</div>
|
| 997 |
+
""")
|
| 998 |
+
|
| 999 |
+
with gr.Tabs():
|
| 1000 |
+
|
| 1001 |
+
# ββ Tab 1: Live Demo βββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 1002 |
+
with gr.Tab("π― Live Demo"):
|
| 1003 |
+
gr.HTML(f"""
|
| 1004 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;padding:16px;
|
| 1005 |
+
margin-bottom:16px;font-family:sans-serif'>
|
| 1006 |
+
<div style='font-size:16px;font-weight:700;color:#a78bfa;margin-bottom:6px'>
|
| 1007 |
+
π¨ Friday 6PM
|
| 1008 |
+
</div>
|
| 1009 |
+
<div style='color:#ddd;font-size:14px'>{DEMO_CONFLICT.story}</div>
|
| 1010 |
+
<div style='margin-top:8px;font-size:12px;color:#888'>
|
| 1011 |
+
Difficulty: βββββ |
|
| 1012 |
+
Domains hit: Career, Finances, Mental Health, Time
|
| 1013 |
+
</div>
|
| 1014 |
+
</div>
|
| 1015 |
+
""")
|
| 1016 |
+
|
| 1017 |
+
prediction_ui = gr.HTML()
|
| 1018 |
+
|
| 1019 |
+
with gr.Row():
|
| 1020 |
+
conflict_dd = gr.Dropdown(
|
| 1021 |
+
choices=CONFLICT_CHOICES_LIST,
|
| 1022 |
+
value=DEFAULT_CONFLICT,
|
| 1023 |
+
label="π Conflict Scenario",
|
| 1024 |
+
)
|
| 1025 |
+
person_dd = gr.Dropdown(
|
| 1026 |
+
choices=PERSON_CHOICES,
|
| 1027 |
+
value=PERSON_CHOICES[0],
|
| 1028 |
+
label="π€ Choose Your Person",
|
| 1029 |
+
)
|
| 1030 |
+
|
| 1031 |
+
run_btn = gr.Button("βΆ Run Agent", variant="primary", size="lg")
|
| 1032 |
+
|
| 1033 |
+
cascade_narrative = gr.HTML(label="Cascade Narrative")
|
| 1034 |
+
|
| 1035 |
+
with gr.Row():
|
| 1036 |
+
before_out = gr.HTML(label="Life State")
|
| 1037 |
+
after_out = gr.HTML(label="Agent Decision")
|
| 1038 |
+
|
| 1039 |
+
run_btn.click(
|
| 1040 |
+
fn=run_demo,
|
| 1041 |
+
inputs=[person_dd, conflict_dd],
|
| 1042 |
+
outputs=[prediction_ui, before_out, cascade_narrative, after_out],
|
| 1043 |
+
)
|
| 1044 |
+
|
| 1045 |
+
# ββ Tab 2: Try Your Situation ββββββββββββββββββββββββββββββββββββββββ
|
| 1046 |
+
with gr.Tab("π Try Your Situation"):
|
| 1047 |
+
gr.Markdown(
|
| 1048 |
+
"Describe your situation in plain English. LifeStack extracts a **structured conflict**, "
|
| 1049 |
+
"infers your **personality**, maps your **life metrics**, and gives a personalised "
|
| 1050 |
+
"resolution plan with before/after comparison."
|
| 1051 |
+
)
|
| 1052 |
+
with gr.Row():
|
| 1053 |
+
with gr.Column(scale=1):
|
| 1054 |
+
situation_input = gr.Textbox(
|
| 1055 |
+
label="What's stressing you out right now?",
|
| 1056 |
+
placeholder="e.g. My boss keeps piling on work, I haven't slept in weeks, and my partner says I'm distantβ¦",
|
| 1057 |
+
lines=3,
|
| 1058 |
+
)
|
| 1059 |
+
gr.Markdown("**Rate your current state (0 = none / low Β· 10 = extreme / high):**")
|
| 1060 |
+
work_sl = gr.Slider(0, 10, value=7, step=1, label="πΌ Work Stress")
|
| 1061 |
+
money_sl = gr.Slider(0, 10, value=5, step=1, label="π° Money Stress")
|
| 1062 |
+
rel_sl = gr.Slider(0, 10, value=6, step=1, label="β€οΈ Relationship Quality")
|
| 1063 |
+
energy_sl = gr.Slider(0, 10, value=4, step=1, label="β‘ Energy Level")
|
| 1064 |
+
time_sl = gr.Slider(0, 10, value=7, step=1, label="π
Time Pressure")
|
| 1065 |
+
|
| 1066 |
+
gmail_state = gr.State(None)
|
| 1067 |
+
with gr.Row():
|
| 1068 |
+
gmail_btn = gr.Button("π§ Sync Digital Signals (Gmail)", variant="secondary")
|
| 1069 |
+
gmail_status = gr.Markdown("<span style='color:#777;font-size:12px'>Gmail not connected. (Optional)</span>")
|
| 1070 |
+
|
| 1071 |
+
def sync_gmail():
|
| 1072 |
+
try:
|
| 1073 |
+
service = GMAIL.authenticate()
|
| 1074 |
+
rel = GMAIL.extract_relationship_signals(service)
|
| 1075 |
+
work = GMAIL.extract_work_signals(service)
|
| 1076 |
+
signals = GMAIL.to_life_metrics(rel, work)
|
| 1077 |
+
summary = GMAIL.get_email_summary(rel, work)
|
| 1078 |
+
return signals, f"β
**Signals synced!** {summary}"
|
| 1079 |
+
except Exception as e:
|
| 1080 |
+
return None, f"β **Gmail sync failed:** {e}"
|
| 1081 |
+
|
| 1082 |
+
gmail_btn.click(fn=sync_gmail, outputs=[gmail_state, gmail_status])
|
| 1083 |
+
|
| 1084 |
+
submit_btn = gr.Button("β¨ Analyse & Get My Plan", variant="primary", size="lg")
|
| 1085 |
+
|
| 1086 |
+
|
| 1087 |
+
with gr.Column(scale=1):
|
| 1088 |
+
life_graph_out = gr.HTML(label="Your Life Right Now")
|
| 1089 |
+
after_graph_out = gr.HTML(label="After Action")
|
| 1090 |
+
plan_out = gr.HTML(label="Resolution Plan")
|
| 1091 |
+
|
| 1092 |
+
submit_btn.click(
|
| 1093 |
+
fn=run_custom,
|
| 1094 |
+
inputs=[situation_input, work_sl, money_sl, rel_sl, energy_sl, time_sl, gmail_state],
|
| 1095 |
+
outputs=[life_graph_out, after_graph_out, plan_out],
|
| 1096 |
+
)
|
| 1097 |
+
|
| 1098 |
+
# ββ Tab 3: Training Results ββββββββββββββββββββββββββββββββββββββββββ
|
| 1099 |
+
with gr.Tab("π Training Results"):
|
| 1100 |
+
training_html = gr.HTML(value=load_training_tab())
|
| 1101 |
+
|
| 1102 |
+
plot_path = os.path.join(os.path.dirname(__file__), "data", "reward_curve.png")
|
| 1103 |
+
if os.path.exists(plot_path):
|
| 1104 |
+
gr.Image(value=plot_path, label="Learning Curve β 100 Episode Training Run")
|
| 1105 |
+
|
| 1106 |
+
# ββ Tab 4: Memory Effect Demo ββββββββββββββββββββββββββββββββββββββββ
|
| 1107 |
+
with gr.Tab("π§ Memory Effect"):
|
| 1108 |
+
gr.HTML("""
|
| 1109 |
+
<div style='background:#1a1a2e;border:1px solid #333;border-radius:10px;
|
| 1110 |
+
padding:16px;margin-bottom:16px;font-family:sans-serif'>
|
| 1111 |
+
<div style='display:flex;justify-content:space-between;align-items:center'>
|
| 1112 |
+
<div>
|
| 1113 |
+
<div style='font-size:18px;font-weight:700;color:#eee;margin-bottom:4px'>
|
| 1114 |
+
Memory Effect Demo
|
| 1115 |
+
</div>
|
| 1116 |
+
<div style='font-size:13px;color:#888'>
|
| 1117 |
+
Same conflict, same agent. Episode 1 runs cold (no prior context). Episode 2 retrieves
|
| 1118 |
+
the stored memory and reasons differently β showing the RAG flywheel in action.
|
| 1119 |
+
</div>
|
| 1120 |
+
</div>
|
| 1121 |
+
<div style='background:#14532d;border:1px solid #22c55e;border-radius:20px;
|
| 1122 |
+
padding:6px 16px;font-size:13px;font-weight:700;color:#22c55e;
|
| 1123 |
+
white-space:nowrap'>
|
| 1124 |
+
+116% EFFICIENCY
|
| 1125 |
+
</div>
|
| 1126 |
+
</div>
|
| 1127 |
+
</div>
|
| 1128 |
+
""")
|
| 1129 |
+
|
| 1130 |
+
with gr.Row():
|
| 1131 |
+
mem_conflict_dd = gr.Dropdown(
|
| 1132 |
+
choices=CONFLICT_CHOICES_LIST,
|
| 1133 |
+
value=DEFAULT_CONFLICT,
|
| 1134 |
+
label="CONFLICT",
|
| 1135 |
+
)
|
| 1136 |
+
mem_person_dd = gr.Dropdown(
|
| 1137 |
+
choices=PERSON_CHOICES,
|
| 1138 |
+
value=PERSON_CHOICES[0],
|
| 1139 |
+
label="PERSONA",
|
| 1140 |
+
)
|
| 1141 |
+
mem_run_btn = gr.Button("π§ Run Episodes", variant="primary", size="lg")
|
| 1142 |
+
|
| 1143 |
+
with gr.Row():
|
| 1144 |
+
mem_ep1_out = gr.HTML(label="Episode 1 β Cold Start")
|
| 1145 |
+
mem_ep2_out = gr.HTML(label="Episode 2 β RAG-Augmented")
|
| 1146 |
+
|
| 1147 |
+
mem_diff_out = gr.HTML(label="Memory Delta Analysis")
|
| 1148 |
+
|
| 1149 |
+
mem_run_btn.click(
|
| 1150 |
+
fn=run_memory_demo,
|
| 1151 |
+
inputs=[mem_conflict_dd, mem_person_dd],
|
| 1152 |
+
outputs=[mem_ep1_out, mem_ep2_out, mem_diff_out],
|
| 1153 |
+
)
|
| 1154 |
+
|
| 1155 |
+
# ββ Tab 5: Arjun's Journey ββββββββββββββββββββββββββββββββββββββββββ
|
| 1156 |
+
with gr.Tab("ποΈ Arjun's Journey"):
|
| 1157 |
+
gr.HTML(LONG_DEMO.show_longitudinal_comparison())
|
| 1158 |
+
|
| 1159 |
+
with gr.Column():
|
| 1160 |
+
gr.Markdown("### π Experimental Context Loading")
|
| 1161 |
+
gr.Markdown(
|
| 1162 |
+
"By activating Arjun's history, the agent gains 'experience' with his startup "
|
| 1163 |
+
"executive profile and specific relationship dynamics. This demonstrates how "
|
| 1164 |
+
"ChromaDB retrieval transforms a generic LLM into a hyper-personalised coach."
|
| 1165 |
+
)
|
| 1166 |
+
load_arjun_btn = gr.Button("π Activate Arjun's Life History (v3)", variant="primary", size="lg")
|
| 1167 |
+
|
| 1168 |
+
def load_arjun_msg():
|
| 1169 |
+
LONG_DEMO.pre_seed_arjun()
|
| 1170 |
+
return "β
Arjun's memory (Week 1 & 2) is now ACTIVE in ChromaDB. Go to 'Live Demo', select Arjun, and click 'Run Agent'."
|
| 1171 |
+
|
| 1172 |
+
load_status = gr.Markdown()
|
| 1173 |
+
load_arjun_btn.click(fn=load_arjun_msg, outputs=load_status)
|
| 1174 |
+
|
| 1175 |
+
gr.Markdown("""
|
| 1176 |
+
---
|
| 1177 |
+
**Experience it yourself:**
|
| 1178 |
+
1. Click the button above to seed the memories.
|
| 1179 |
+
2. Switch to the **π― Live Demo** tab.
|
| 1180 |
+
3. Select **Arjun (Startup Lead)** from the persona list.
|
| 1181 |
+
4. Select the **π¨ Friday 6PM** conflict.
|
| 1182 |
+
5. Click **Run Agent**.
|
| 1183 |
+
6. **Observe:** The agent will now use specific precedents in its reasoning and choice.
|
| 1184 |
+
""")
|
| 1185 |
+
|
| 1186 |
+
# ββ Tab 5: Task Explorer ββββββββββββββββββββββββββββββββββββββββββββββ
|
| 1187 |
+
with gr.Tab("πΊοΈ Task Explorer"):
|
| 1188 |
+
gr.Markdown(
|
| 1189 |
+
"### LifeStack Task Inspector\n"
|
| 1190 |
+
"Inspect the objective, viable routes, progression milestones, and exogenous event log for the current multi-step task architecture."
|
| 1191 |
+
)
|
| 1192 |
+
|
| 1193 |
+
with gr.Row():
|
| 1194 |
+
with gr.Column(scale=2):
|
| 1195 |
+
task_out = gr.HTML(label="Task Definition")
|
| 1196 |
+
with gr.Column(scale=1):
|
| 1197 |
+
route_out = gr.HTML(label="Route Status")
|
| 1198 |
+
|
| 1199 |
+
event_out = gr.HTML(label="World Event Log")
|
| 1200 |
+
|
| 1201 |
+
load_task_btn = gr.Button("π Load Demonstration Task", variant="secondary")
|
| 1202 |
+
|
| 1203 |
+
def load_demo_task():
|
| 1204 |
+
# Generate a dummy task for demonstration purposes
|
| 1205 |
+
dummy_routes = [
|
| 1206 |
+
Route(id="r1", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], preconditions={}, consequences={}, closes_routes=["r2"], milestones_unlocked=["m1"], final_reward=2.5),
|
| 1207 |
+
Route(id="r2", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], preconditions={}, consequences={}, closes_routes=["r1"], milestones_unlocked=["m2"], final_reward=1.8),
|
| 1208 |
+
]
|
| 1209 |
+
dummy_milestones = [
|
| 1210 |
+
Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="", condition_value=True, reward=1.0),
|
| 1211 |
+
Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="", condition_value=True, reward=0.8),
|
| 1212 |
+
]
|
| 1213 |
+
dummy_events = [
|
| 1214 |
+
ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={}, closes_routes=[]),
|
| 1215 |
+
ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={}, hidden_state_mutation={}, closes_routes=["r2"]),
|
| 1216 |
+
]
|
| 1217 |
+
dummy_task = Task(
|
| 1218 |
+
id="sample_flight_crisis", domain="flight_crisis", goal="Survive Airport Cancellation",
|
| 1219 |
+
constraints={"budget_max": 800, "deadline_step": 10},
|
| 1220 |
+
hidden_state={"lounge_capacity": 100}, mutable_world={}, visible_world={},
|
| 1221 |
+
success_conditions=[], failure_conditions=[],
|
| 1222 |
+
event_schedule=dummy_events, viable_routes=dummy_routes, milestones=dummy_milestones,
|
| 1223 |
+
horizon=10, difficulty=4, domain_metadata={"story": "A major storm grounded commercial flights."}
|
| 1224 |
+
)
|
| 1225 |
+
|
| 1226 |
+
return (
|
| 1227 |
+
task_html(dummy_task),
|
| 1228 |
+
route_status_html(dummy_routes, closed={"r2"}),
|
| 1229 |
+
event_log_html(dummy_events)
|
| 1230 |
+
)
|
| 1231 |
+
|
| 1232 |
+
load_task_btn.click(fn=load_demo_task, outputs=[task_out, route_out, event_out])
|
| 1233 |
+
|
| 1234 |
+
# ββ Tab 6: Follow-up βββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 1235 |
+
with gr.Tab("π¬ Follow-up"):
|
| 1236 |
+
gr.Markdown("""
|
| 1237 |
+
### π Real-World Verification
|
| 1238 |
+
Did the agent's plan work in the real world? Provide your feedback here to close the loop.
|
| 1239 |
+
This feedback is stored in **ChromaDB** and used to fine-tune the reward models for future training runs.
|
| 1240 |
+
""")
|
| 1241 |
+
with gr.Row():
|
| 1242 |
+
with gr.Column(scale=1):
|
| 1243 |
+
fb_id = gr.Textbox(label="Episode ID", placeholder="e.g. A1B2C3D4")
|
| 1244 |
+
fb_score = gr.Slider(0, 10, value=7, label="Overall Effectiveness (0-10)")
|
| 1245 |
+
fb_time = gr.Number(label="Actual Resolution Time (hours)", value=2.0)
|
| 1246 |
+
with gr.Column(scale=2):
|
| 1247 |
+
fb_up = gr.CheckboxGroup(
|
| 1248 |
+
["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"],
|
| 1249 |
+
label="Domains that actually improved"
|
| 1250 |
+
)
|
| 1251 |
+
fb_down = gr.CheckboxGroup(
|
| 1252 |
+
["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"],
|
| 1253 |
+
label="Domains that actually worsened"
|
| 1254 |
+
)
|
| 1255 |
+
fb_notes = gr.Textbox(label="Unexpected Effects / Qualitative Feedback", lines=3)
|
| 1256 |
+
fb_btn = gr.Button("Submit Outcome Feedback", variant="primary")
|
| 1257 |
+
fb_out = gr.Markdown()
|
| 1258 |
+
|
| 1259 |
+
fb_btn.click(
|
| 1260 |
+
submit_outcome_feedback,
|
| 1261 |
+
inputs=[fb_id, fb_score, fb_up, fb_down, fb_notes, fb_time],
|
| 1262 |
+
outputs=fb_out
|
| 1263 |
+
)
|
| 1264 |
+
|
| 1265 |
+
gr.HTML("""
|
| 1266 |
+
<div style='text-align:center;padding:16px;color:#444;font-size:11px;border-top:1px solid #222;margin-top:16px'>
|
| 1267 |
+
LifeStack Β· Built for hackathon demo Β· Powered by Groq + ChromaDB + Sentence Transformers
|
| 1268 |
+
</div>
|
| 1269 |
+
""")
|
| 1270 |
+
|
| 1271 |
+
|
| 1272 |
+
if __name__ == "__main__":
|
| 1273 |
+
app.launch(
|
| 1274 |
+
share=False,
|
| 1275 |
+
server_port=7860,
|
| 1276 |
+
show_error=True,
|
| 1277 |
+
theme=gr.themes.Base(primary_hue="violet", neutral_hue="slate"),
|
| 1278 |
+
css="""
|
| 1279 |
+
body { background:#0d0d1a; }
|
| 1280 |
+
.gradio-container { max-width: 1100px; margin: auto; }
|
| 1281 |
+
h1 { text-align:center; }
|
| 1282 |
+
.tab-nav button { font-size:14px; font-weight:600; }
|
| 1283 |
+
"""
|
| 1284 |
+
)
|
app_flask.py
ADDED
|
@@ -0,0 +1,879 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
app_flask.py β LifeStack Flask Portal (FULL FEATURE PARITY)
|
| 3 |
+
Complete migration of the Gradio demo to a Flask-native architecture.
|
| 4 |
+
Includes: Live Demo, Custom Situations, Gmail Sync, Longitudinal Analysis, Task Explorer.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import json
|
| 9 |
+
import copy
|
| 10 |
+
import uuid
|
| 11 |
+
import datetime
|
| 12 |
+
from collections import deque
|
| 13 |
+
from flask import Flask, render_template, request, jsonify, session
|
| 14 |
+
from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
|
| 15 |
+
from core.lifestack_env import LifeStackEnv, LifeStackAction
|
| 16 |
+
from agent.agent import LifeStackAgent
|
| 17 |
+
from intake.simperson import SimPerson
|
| 18 |
+
from agent.conflict_generator import ConflictEvent, generate_conflict, TEMPLATES
|
| 19 |
+
from core.action_space import apply_action, validate_action
|
| 20 |
+
from agent.memory import LifeStackMemory
|
| 21 |
+
from core.metric_schema import normalize_metric_path, is_valid_metric_path
|
| 22 |
+
from core.reward import compute_reward
|
| 23 |
+
from intake.intake import LifeIntake
|
| 24 |
+
from agent.conflict_predictor import ConflictPredictor
|
| 25 |
+
from agent.counterfactuals import generate_counterfactuals
|
| 26 |
+
from scripts.longitudinal_demo import LongitudinalDemo
|
| 27 |
+
from intake.gmail_intake import GmailIntake
|
| 28 |
+
from intake.calendar_intake import CalendarIntake
|
| 29 |
+
from core.task import Task, ExoEvent, Route, Milestone
|
| 30 |
+
from core.feedback import OutcomeFeedback, compute_human_feedback_reward
|
| 31 |
+
from core.cascade_utils import animate_cascade
|
| 32 |
+
|
| 33 |
+
app = Flask(__name__)
|
| 34 |
+
app.secret_key = "lifestack_secret_key_2026"
|
| 35 |
+
|
| 36 |
+
# βββ Global Instances βββ
|
| 37 |
+
AGENT = LifeStackAgent(api_only=not bool(os.getenv('LIFESTACK_MODEL_PATH')))
|
| 38 |
+
MEMORY = LifeStackMemory(silent=True)
|
| 39 |
+
INTAKE = LifeIntake()
|
| 40 |
+
USER_HEALTH_OVERRIDES: dict = {} # persisted health/calendar metric deltas
|
| 41 |
+
EPISODE_HISTORY: deque = deque(maxlen=5) # ring buffer, most recent first
|
| 42 |
+
|
| 43 |
+
@app.route('/api/history', methods=['GET'])
|
| 44 |
+
@app.route('/api/history/list', methods=['GET'])
|
| 45 |
+
def get_history():
|
| 46 |
+
summaries = [
|
| 47 |
+
{
|
| 48 |
+
"id": ep.get("action", {}).get("id", ""),
|
| 49 |
+
"conflict": ep.get("conflict", {}).get("title", "Unknown"),
|
| 50 |
+
"person": ep.get("conflict", {}).get("person", "Unknown"),
|
| 51 |
+
"reward": ep.get("action", {}).get("reward", 0.0),
|
| 52 |
+
"timestamp": ep.get("timestamp", ""),
|
| 53 |
+
}
|
| 54 |
+
for ep in EPISODE_HISTORY
|
| 55 |
+
]
|
| 56 |
+
return jsonify(summaries)
|
| 57 |
+
|
| 58 |
+
@app.route('/api/history/replay/<episode_id>', methods=['GET'])
|
| 59 |
+
def replay_episode(episode_id):
|
| 60 |
+
for ep in EPISODE_HISTORY:
|
| 61 |
+
if ep.get("action", {}).get("id", "") == episode_id:
|
| 62 |
+
return jsonify(ep)
|
| 63 |
+
return jsonify({"error": "Episode not found"}), 404
|
| 64 |
+
|
| 65 |
+
GMAIL = GmailIntake()
|
| 66 |
+
CALENDAR = CalendarIntake()
|
| 67 |
+
LONG_DEMO = LongitudinalDemo()
|
| 68 |
+
DEMO_PREDICTOR = ConflictPredictor()
|
| 69 |
+
|
| 70 |
+
# Friday 6PM is always the default demo conflict
|
| 71 |
+
DEMO_CONFLICT = next(t for t in TEMPLATES if t.id == "d5_friday")
|
| 72 |
+
|
| 73 |
+
PERSONS = {
|
| 74 |
+
"Alex (Executive) β driven, high-stress":
|
| 75 |
+
SimPerson(openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8, name="Alex (Executive)"),
|
| 76 |
+
"Chloe (Creative) β spontaneous, resilient":
|
| 77 |
+
SimPerson(openness=0.9, conscientiousness=0.2, extraversion=0.5, agreeableness=0.70, neuroticism=0.15, name="Chloe (Creative)"),
|
| 78 |
+
"Sam (Introvert) β anxious, thoughtful":
|
| 79 |
+
SimPerson(openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9, name="Sam (Introvert)"),
|
| 80 |
+
"Maya (Family) β empathetic, nurturing":
|
| 81 |
+
SimPerson(openness=0.5, conscientiousness=0.7, extraversion=0.5, agreeableness=0.95, neuroticism=0.3, name="Maya (Family)"),
|
| 82 |
+
"Leo (Student) β curious, organised":
|
| 83 |
+
SimPerson(openness=0.85, conscientiousness=0.8, extraversion=0.4, agreeableness=0.4, neuroticism=0.55, name="Leo (Student)"),
|
| 84 |
+
"Arjun (Startup Lead) β high- conscientiousness, high-neuroticism":
|
| 85 |
+
SimPerson(name="Arjun", openness=0.4, conscientiousness=0.9, extraversion=0.7, agreeableness=0.25, neuroticism=0.8),
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
CONFLICT_CHOICES = {t.title: t for t in TEMPLATES}
|
| 89 |
+
|
| 90 |
+
# βββ Visual Helpers βββ
|
| 91 |
+
DOMAIN_EMOJI = {
|
| 92 |
+
"career": "πΌ", "finances": "π°", "relationships": "β€οΈ",
|
| 93 |
+
"physical_health": "πͺ", "mental_wellbeing": "π§ ", "time": "π
",
|
| 94 |
+
}
|
| 95 |
+
INVERTED_METRICS = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
|
| 96 |
+
|
| 97 |
+
_DOMAINS = ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]
|
| 98 |
+
|
| 99 |
+
def compute_domain_health(metrics_flat: dict) -> dict:
|
| 100 |
+
"""Compute per-domain health score (0-100) from flat metrics. Inverted metrics are flipped."""
|
| 101 |
+
health = {}
|
| 102 |
+
for dom in _DOMAINS:
|
| 103 |
+
subs = {k: v for k, v in metrics_flat.items() if k.startswith(dom + ".")}
|
| 104 |
+
if not subs:
|
| 105 |
+
health[dom] = 50.0
|
| 106 |
+
continue
|
| 107 |
+
scores = []
|
| 108 |
+
for k, v in subs.items():
|
| 109 |
+
sub = k.split(".")[1]
|
| 110 |
+
scores.append((100.0 - v) if sub in INVERTED_METRICS else float(v))
|
| 111 |
+
health[dom] = round(sum(scores) / len(scores), 1)
|
| 112 |
+
return health
|
| 113 |
+
|
| 114 |
+
def _normalize_action_metric_changes(action) -> None:
|
| 115 |
+
fixed_changes = {}
|
| 116 |
+
for path, delta in action.primary.metric_changes.items():
|
| 117 |
+
raw_path = str(path)
|
| 118 |
+
if "." not in raw_path:
|
| 119 |
+
raw_path = f"{action.primary.target_domain}.{raw_path}"
|
| 120 |
+
norm_path = normalize_metric_path(raw_path)
|
| 121 |
+
if not is_valid_metric_path(norm_path): continue
|
| 122 |
+
try:
|
| 123 |
+
fixed_changes[norm_path] = float(delta)
|
| 124 |
+
except (ValueError, TypeError): continue
|
| 125 |
+
action.primary.metric_changes = fixed_changes
|
| 126 |
+
|
| 127 |
+
# βββ Routes βββ
|
| 128 |
+
@app.route('/')
|
| 129 |
+
def index():
|
| 130 |
+
return render_template('index.html',
|
| 131 |
+
persons=list(PERSONS.keys()),
|
| 132 |
+
conflicts=list(CONFLICT_CHOICES.keys()))
|
| 133 |
+
|
| 134 |
+
@app.route('/api/simulation/start', methods=['POST'])
|
| 135 |
+
def start_simulation():
|
| 136 |
+
data = request.json
|
| 137 |
+
conflict_label = data.get('conflict')
|
| 138 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 139 |
+
base_metrics = LifeMetrics()
|
| 140 |
+
# Apply any uploaded health/calendar overrides
|
| 141 |
+
for path, delta in USER_HEALTH_OVERRIDES.items():
|
| 142 |
+
if '.' in path:
|
| 143 |
+
dom, sub = path.split('.', 1)
|
| 144 |
+
dom_obj = getattr(base_metrics, dom, None)
|
| 145 |
+
if dom_obj and hasattr(dom_obj, sub):
|
| 146 |
+
setattr(dom_obj, sub, max(0.0, min(100.0, getattr(dom_obj, sub) + delta)))
|
| 147 |
+
flat = base_metrics.flatten()
|
| 148 |
+
return jsonify({
|
| 149 |
+
"status": "success",
|
| 150 |
+
"metrics": flat,
|
| 151 |
+
"prediction": {
|
| 152 |
+
"summary": DEMO_PREDICTOR.get_prediction_summary(),
|
| 153 |
+
"risk_score": DEMO_PREDICTOR.get_risk_score()
|
| 154 |
+
}
|
| 155 |
+
})
|
| 156 |
+
|
| 157 |
+
@app.route('/api/simulation/cascade', methods=['POST'])
|
| 158 |
+
def get_cascade_frames():
|
| 159 |
+
data = request.json
|
| 160 |
+
conflict_label = data.get('conflict')
|
| 161 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 162 |
+
frames = animate_cascade(conflict.primary_disruption, LifeMetrics())
|
| 163 |
+
return jsonify({"frames": frames})
|
| 164 |
+
|
| 165 |
+
@app.route('/api/simulation/graph', methods=['GET'])
|
| 166 |
+
def get_dependency_graph():
|
| 167 |
+
graph = DependencyGraph()
|
| 168 |
+
nodes = []
|
| 169 |
+
edges = []
|
| 170 |
+
|
| 171 |
+
# Flatten metrics to get all nodes
|
| 172 |
+
metrics = LifeMetrics().flatten()
|
| 173 |
+
for path in metrics.keys():
|
| 174 |
+
dom, sub = path.split('.')
|
| 175 |
+
nodes.append({
|
| 176 |
+
"id": path,
|
| 177 |
+
"label": sub.replace('_', ' '),
|
| 178 |
+
"group": dom
|
| 179 |
+
})
|
| 180 |
+
|
| 181 |
+
for src, targets in graph.edges.items():
|
| 182 |
+
for target, weight in targets:
|
| 183 |
+
edges.append({
|
| 184 |
+
"from": src,
|
| 185 |
+
"to": target,
|
| 186 |
+
"value": abs(weight),
|
| 187 |
+
"arrows": "to",
|
| 188 |
+
"color": {"color": "#4ade80" if weight > 0 else "#ef4444", "opacity": 0.2}
|
| 189 |
+
})
|
| 190 |
+
|
| 191 |
+
return jsonify({"nodes": nodes, "edges": edges})
|
| 192 |
+
|
| 193 |
+
@app.route('/api/simulation/action', methods=['POST'])
|
| 194 |
+
def perform_action():
|
| 195 |
+
data = request.json
|
| 196 |
+
person_label = data.get('person')
|
| 197 |
+
conflict_label = data.get('conflict')
|
| 198 |
+
memory_enabled = data.get('use_memory', False)
|
| 199 |
+
|
| 200 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 201 |
+
person = PERSONS.get(person_label, PERSONS["Alex (Executive) β driven, high-stress"])
|
| 202 |
+
|
| 203 |
+
env = LifeStackEnv()
|
| 204 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 205 |
+
|
| 206 |
+
before_metrics = copy.deepcopy(env.state.current_metrics)
|
| 207 |
+
before_budget = copy.deepcopy(env.state.budget)
|
| 208 |
+
|
| 209 |
+
# RAG: Build few-shot context from ChromaDB if enabled
|
| 210 |
+
few_shot = ""
|
| 211 |
+
retrieved = []
|
| 212 |
+
if memory_enabled:
|
| 213 |
+
few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_metrics.flatten())
|
| 214 |
+
retrieved = MEMORY.retrieve_similar(conflict.title, before_metrics.flatten())
|
| 215 |
+
|
| 216 |
+
action = AGENT.get_action(before_metrics, before_budget, conflict, person, few_shot_context=few_shot)
|
| 217 |
+
_normalize_action_metric_changes(action)
|
| 218 |
+
|
| 219 |
+
uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
|
| 220 |
+
before_metrics.mental_wellbeing.stress_level)
|
| 221 |
+
|
| 222 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 223 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 224 |
+
|
| 225 |
+
obs = env.step(env_action)
|
| 226 |
+
|
| 227 |
+
# Store decision in memory for future RAG
|
| 228 |
+
MEMORY.store_decision(
|
| 229 |
+
conflict_title=conflict.title,
|
| 230 |
+
action_type=action.primary.action_type,
|
| 231 |
+
target_domain=action.primary.target_domain,
|
| 232 |
+
reward=obs.reward,
|
| 233 |
+
metrics_snapshot=before_metrics.flatten(),
|
| 234 |
+
reasoning=action.reasoning
|
| 235 |
+
)
|
| 236 |
+
|
| 237 |
+
cf_data = generate_counterfactuals(AGENT, before_metrics, before_budget, conflict, person, action)
|
| 238 |
+
episode_id = "".join(str(uuid.uuid4()).split("-")[:2]).upper()
|
| 239 |
+
|
| 240 |
+
result = {
|
| 241 |
+
"metrics": obs.metrics,
|
| 242 |
+
"domain_health": compute_domain_health(obs.metrics),
|
| 243 |
+
"action": {
|
| 244 |
+
"type": action.primary.action_type,
|
| 245 |
+
"target": action.primary.target_domain,
|
| 246 |
+
"description": action.primary.description,
|
| 247 |
+
"reasoning": action.reasoning,
|
| 248 |
+
"reward": obs.reward,
|
| 249 |
+
"uptake": uptake,
|
| 250 |
+
"cost": action.primary.resource_cost,
|
| 251 |
+
"id": episode_id,
|
| 252 |
+
"memories_retrieved": retrieved
|
| 253 |
+
},
|
| 254 |
+
"counterfactuals": cf_data,
|
| 255 |
+
"prediction": {
|
| 256 |
+
"summary": DEMO_PREDICTOR.get_prediction_summary(),
|
| 257 |
+
"risk_score": DEMO_PREDICTOR.get_risk_score()
|
| 258 |
+
},
|
| 259 |
+
"conflict": {
|
| 260 |
+
"title": conflict.title,
|
| 261 |
+
"person": person.name
|
| 262 |
+
},
|
| 263 |
+
"timestamp": datetime.datetime.now().strftime("%H:%M:%S")
|
| 264 |
+
}
|
| 265 |
+
|
| 266 |
+
# Store in history
|
| 267 |
+
EPISODE_HISTORY.appendleft(result)
|
| 268 |
+
|
| 269 |
+
return jsonify(result)
|
| 270 |
+
|
| 271 |
+
# βββ 7-Day Trajectory βββ
|
| 272 |
+
@app.route('/api/simulation/trajectory', methods=['POST'])
|
| 273 |
+
def get_trajectory():
|
| 274 |
+
"""
|
| 275 |
+
Run the agent action then perform a 7-step rollout.
|
| 276 |
+
Returns per-day metric snapshots for the forecast panel.
|
| 277 |
+
"""
|
| 278 |
+
data = request.json
|
| 279 |
+
conflict_label = data.get('conflict')
|
| 280 |
+
person_label = data.get('person')
|
| 281 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 282 |
+
person = PERSONS.get(person_label, PERSONS["Alex (Executive) β driven, high-stress"])
|
| 283 |
+
|
| 284 |
+
env = LifeStackEnv()
|
| 285 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 286 |
+
|
| 287 |
+
before_metrics = copy.deepcopy(env.state.current_metrics)
|
| 288 |
+
before_budget = copy.deepcopy(env.state.budget)
|
| 289 |
+
|
| 290 |
+
action = AGENT.get_action(before_metrics, before_budget, conflict, person)
|
| 291 |
+
_normalize_action_metric_changes(action)
|
| 292 |
+
uptake = person.respond_to_action(
|
| 293 |
+
action.primary.action_type, action.primary.resource_cost,
|
| 294 |
+
before_metrics.mental_wellbeing.stress_level,
|
| 295 |
+
)
|
| 296 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 297 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 298 |
+
|
| 299 |
+
obs = env.step(env_action)
|
| 300 |
+
rollout = env.rollout(n_steps=7, gamma=0.9)
|
| 301 |
+
|
| 302 |
+
return jsonify({
|
| 303 |
+
"action": {
|
| 304 |
+
"type": action.primary.action_type,
|
| 305 |
+
"target": action.primary.target_domain,
|
| 306 |
+
"reasoning": action.reasoning,
|
| 307 |
+
"reward": obs.reward,
|
| 308 |
+
},
|
| 309 |
+
"day0_metrics": dict(obs.metrics),
|
| 310 |
+
"discounted_reward": rollout["discounted_reward"],
|
| 311 |
+
"trajectory": rollout["trajectory"],
|
| 312 |
+
})
|
| 313 |
+
|
| 314 |
+
|
| 315 |
+
# βββ Custom Situation Entry βββ
|
| 316 |
+
@app.route('/api/custom/run', methods=['POST'])
|
| 317 |
+
def run_custom():
|
| 318 |
+
data = request.json
|
| 319 |
+
situation_input = data.get('situation', "")
|
| 320 |
+
|
| 321 |
+
# Map sliders to metrics
|
| 322 |
+
m = LifeMetrics()
|
| 323 |
+
m.career.stress_level = float(data.get('work_stress', 5)) * 10
|
| 324 |
+
m.finances.debt_pressure = float(data.get('money_stress', 5)) * 10
|
| 325 |
+
m.relationships.conflict_frequency = (10 - float(data.get('rel_quality', 5))) * 10
|
| 326 |
+
m.physical_health.energy_level = float(data.get('energy_level', 5)) * 10
|
| 327 |
+
m.time.free_time = (10 - float(data.get('time_pressure', 5))) * 10
|
| 328 |
+
|
| 329 |
+
# Apply uploaded health/calendar overrides to custom metrics
|
| 330 |
+
for path, delta in USER_HEALTH_OVERRIDES.items():
|
| 331 |
+
if '.' in path:
|
| 332 |
+
dom, sub = path.split('.', 1)
|
| 333 |
+
dom_obj = getattr(m, dom, None)
|
| 334 |
+
if dom_obj and hasattr(dom_obj, sub):
|
| 335 |
+
setattr(dom_obj, sub, max(0.0, min(100.0, getattr(dom_obj, sub) + delta)))
|
| 336 |
+
|
| 337 |
+
gmail_signals = data.get('gmail_signals')
|
| 338 |
+
if gmail_signals:
|
| 339 |
+
# Merge digital signals if provided
|
| 340 |
+
for k, v in gmail_signals.items():
|
| 341 |
+
parts = k.split(".")
|
| 342 |
+
if len(parts) == 2:
|
| 343 |
+
dom = getattr(m, parts[0], None)
|
| 344 |
+
if dom and hasattr(dom, parts[1]):
|
| 345 |
+
setattr(dom, parts[1], v)
|
| 346 |
+
|
| 347 |
+
# Extract conflict from text using LLM
|
| 348 |
+
conflict = INTAKE.extract_conflict(situation_input, m)
|
| 349 |
+
pers_dict = INTAKE.get_personality_from_description(situation_input)
|
| 350 |
+
person = SimPerson(
|
| 351 |
+
name=pers_dict.get("name", "Inferred Self"),
|
| 352 |
+
openness=pers_dict.get("openness", 0.5),
|
| 353 |
+
conscientiousness=pers_dict.get("conscientiousness", 0.5),
|
| 354 |
+
extraversion=pers_dict.get("extraversion", 0.5),
|
| 355 |
+
agreeableness=pers_dict.get("agreeableness", 0.5),
|
| 356 |
+
neuroticism=pers_dict.get("neuroticism", 0.5)
|
| 357 |
+
)
|
| 358 |
+
|
| 359 |
+
budget = ResourceBudget(time=24, money=1000, energy=100)
|
| 360 |
+
action = AGENT.get_action(m, budget, conflict, person)
|
| 361 |
+
_normalize_action_metric_changes(action)
|
| 362 |
+
|
| 363 |
+
uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
|
| 364 |
+
m.mental_wellbeing.stress_level)
|
| 365 |
+
|
| 366 |
+
env = LifeStackEnv()
|
| 367 |
+
env.state.current_metrics = copy.deepcopy(m)
|
| 368 |
+
env.state.budget = budget
|
| 369 |
+
|
| 370 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 371 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 372 |
+
obs = env.step(env_action)
|
| 373 |
+
|
| 374 |
+
return jsonify({
|
| 375 |
+
"before_metrics": m.flatten(),
|
| 376 |
+
"after_metrics": obs.metrics,
|
| 377 |
+
"domain_health": compute_domain_health(obs.metrics),
|
| 378 |
+
"action": {
|
| 379 |
+
"type": action.primary.action_type,
|
| 380 |
+
"target": action.primary.target_domain,
|
| 381 |
+
"description": action.primary.description,
|
| 382 |
+
"reasoning": action.reasoning,
|
| 383 |
+
"id": "".join(str(uuid.uuid4()).split("-")[:2]).upper()
|
| 384 |
+
},
|
| 385 |
+
"person": {"name": person.name or "Inferred Self"}
|
| 386 |
+
})
|
| 387 |
+
|
| 388 |
+
@app.route('/api/gmail/sync', methods=['POST'])
|
| 389 |
+
def sync_gmail():
|
| 390 |
+
signals, metric_deltas, summary, is_demo = GMAIL.sync()
|
| 391 |
+
return jsonify({
|
| 392 |
+
"status": "success",
|
| 393 |
+
"signals": metric_deltas,
|
| 394 |
+
"raw": signals,
|
| 395 |
+
"summary": summary,
|
| 396 |
+
"is_demo": is_demo,
|
| 397 |
+
})
|
| 398 |
+
|
| 399 |
+
|
| 400 |
+
@app.route('/api/digital/sync', methods=['POST'])
|
| 401 |
+
def digital_sync():
|
| 402 |
+
"""
|
| 403 |
+
Unified Digital Sync β Gmail + Google Calendar + Fitness (demo payload).
|
| 404 |
+
Tries real OAuth for Gmail and Calendar; falls back to demo_signals.json on failure.
|
| 405 |
+
Fitness is always served from the demo payload (no first-party fitness API scope).
|
| 406 |
+
Returns merged metric deltas, per-source raw signals, and a demo flag per source.
|
| 407 |
+
"""
|
| 408 |
+
import json as _json
|
| 409 |
+
demo_path = os.path.join(os.path.dirname(__file__), 'data', 'demo_signals.json')
|
| 410 |
+
|
| 411 |
+
with open(demo_path) as f:
|
| 412 |
+
demo_full = _json.load(f)
|
| 413 |
+
|
| 414 |
+
# Gmail
|
| 415 |
+
gmail_signals, gmail_deltas, gmail_summary, gmail_is_demo = GMAIL.sync()
|
| 416 |
+
|
| 417 |
+
# Calendar
|
| 418 |
+
cal_signals, cal_deltas, cal_is_demo = CALENDAR.sync()
|
| 419 |
+
|
| 420 |
+
# Fitness β always demo (no live fitness API)
|
| 421 |
+
fitness_signals = demo_full['fitness']
|
| 422 |
+
fitness_deltas = {
|
| 423 |
+
"physical_health.sleep_quality": demo_full['derived_metric_deltas']['physical_health.sleep_quality'],
|
| 424 |
+
"physical_health.energy_level": demo_full['derived_metric_deltas']['physical_health.energy_level'],
|
| 425 |
+
"physical_health.exercise_consistency": demo_full['derived_metric_deltas']['physical_health.exercise_consistency'],
|
| 426 |
+
"mental_wellbeing.stress_level": demo_full['derived_metric_deltas']['mental_wellbeing.stress_level'],
|
| 427 |
+
}
|
| 428 |
+
fitness_is_demo = True
|
| 429 |
+
|
| 430 |
+
# Merge all deltas (last writer wins β Calendar > Gmail for overlapping keys)
|
| 431 |
+
merged_deltas = {}
|
| 432 |
+
merged_deltas.update(gmail_deltas)
|
| 433 |
+
merged_deltas.update(cal_deltas)
|
| 434 |
+
merged_deltas.update(fitness_deltas)
|
| 435 |
+
|
| 436 |
+
return jsonify({
|
| 437 |
+
"status": "success",
|
| 438 |
+
"merged_deltas": merged_deltas,
|
| 439 |
+
"sources": {
|
| 440 |
+
"gmail": {
|
| 441 |
+
"signals": gmail_signals if isinstance(gmail_signals, dict) else {},
|
| 442 |
+
"summary": gmail_summary,
|
| 443 |
+
"is_demo": gmail_is_demo,
|
| 444 |
+
},
|
| 445 |
+
"calendar": {
|
| 446 |
+
"signals": cal_signals,
|
| 447 |
+
"summary": cal_signals.get("summary", ""),
|
| 448 |
+
"is_demo": cal_is_demo,
|
| 449 |
+
},
|
| 450 |
+
"fitness": {
|
| 451 |
+
"signals": fitness_signals,
|
| 452 |
+
"summary": fitness_signals.get("summary", ""),
|
| 453 |
+
"is_demo": True,
|
| 454 |
+
},
|
| 455 |
+
},
|
| 456 |
+
"persona_note": demo_full.get("persona", "Jordan (PM at Series-B startup)"),
|
| 457 |
+
})
|
| 458 |
+
|
| 459 |
+
@app.route('/api/arjun/activate', methods=['POST'])
|
| 460 |
+
def activate_arjun():
|
| 461 |
+
LONG_DEMO.pre_seed_arjun()
|
| 462 |
+
return jsonify({"status": "success", "message": "Arjun's memory (Week 1 & 2) is now ACTIVE in ChromaDB."})
|
| 463 |
+
|
| 464 |
+
@app.route('/api/task/demo', methods=['GET'])
|
| 465 |
+
def get_demo_task():
|
| 466 |
+
dummy_routes = [
|
| 467 |
+
Route(id="r1", name="Rebook Premium Option", description="Call agent and rebook on premium ticket", required_action_types=["communicate", "spend"], milestones_unlocked=["m1"], final_reward=2.5),
|
| 468 |
+
Route(id="r2", name="Accept Delay & Work", description="Stay at airport lounge and work on laptop", required_action_types=["rest", "delegate"], milestones_unlocked=["m2"], final_reward=1.8),
|
| 469 |
+
]
|
| 470 |
+
dummy_milestones = [
|
| 471 |
+
Milestone(id="m1", description="Successfully rebooked flight before deadline", reward=1.0),
|
| 472 |
+
Milestone(id="m2", description="Caught up with all emergency slack messages", reward=0.8),
|
| 473 |
+
]
|
| 474 |
+
dummy_events = [
|
| 475 |
+
ExoEvent(step=2, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300."),
|
| 476 |
+
ExoEvent(step=4, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity."),
|
| 477 |
+
]
|
| 478 |
+
task = Task(
|
| 479 |
+
id="sample_flight_crisis", domain="flight_crisis", goal="Survive Airport Cancellation",
|
| 480 |
+
event_schedule=dummy_events, viable_routes=dummy_routes, milestones=dummy_milestones,
|
| 481 |
+
horizon=10, difficulty=4
|
| 482 |
+
)
|
| 483 |
+
return jsonify({
|
| 484 |
+
"goal": task.goal,
|
| 485 |
+
"difficulty": task.difficulty,
|
| 486 |
+
"routes": [{"name": r.name, "description": r.description} for r in dummy_routes],
|
| 487 |
+
"milestones": [{"id": m.id, "description": m.description} for m in dummy_milestones],
|
| 488 |
+
"events": [{"step": e.step, "id": e.id, "description": e.description} for e in dummy_events],
|
| 489 |
+
"story": "A major storm grounded commercial flights."
|
| 490 |
+
})
|
| 491 |
+
|
| 492 |
+
@app.route('/api/stats', methods=['GET'])
|
| 493 |
+
def get_stats():
|
| 494 |
+
stats = MEMORY.get_stats()
|
| 495 |
+
# Normalise for frontend: inject feedback_count and reward_history
|
| 496 |
+
all_records = []
|
| 497 |
+
try:
|
| 498 |
+
raw = MEMORY.collection.get(include=["metadatas"])
|
| 499 |
+
all_records = raw.get("metadatas", [])
|
| 500 |
+
except Exception:
|
| 501 |
+
pass
|
| 502 |
+
stats["feedback_count"] = len([m for m in all_records if m.get("type") == "feedback"])
|
| 503 |
+
rewards = [m.get("reward", 0.0) for m in all_records if "reward" in m]
|
| 504 |
+
stats["reward_history"] = rewards[-20:] if rewards else []
|
| 505 |
+
return jsonify(stats)
|
| 506 |
+
|
| 507 |
+
@app.route('/api/feedback/submit', methods=['POST'])
|
| 508 |
+
def submit_feedback():
|
| 509 |
+
data = request.json
|
| 510 |
+
try:
|
| 511 |
+
feedback = OutcomeFeedback(
|
| 512 |
+
episode_id=data.get('episode_id'),
|
| 513 |
+
submitted_at=datetime.datetime.now(),
|
| 514 |
+
overall_effectiveness=int(data.get('score', 7)),
|
| 515 |
+
domains_improved=data.get('improved', []),
|
| 516 |
+
domains_worsened=data.get('worsened', []),
|
| 517 |
+
unexpected_effects=data.get('notes', ""),
|
| 518 |
+
resolution_time_hours=float(data.get('time', 1.0))
|
| 519 |
+
)
|
| 520 |
+
MEMORY.store_feedback(feedback)
|
| 521 |
+
return jsonify({"status": "success", "message": f"Feedback stored for episode {feedback.episode_id}"})
|
| 522 |
+
except Exception as e:
|
| 523 |
+
return jsonify({"status": "error", "message": str(e)}), 400
|
| 524 |
+
|
| 525 |
+
# βββ Feature F1 helper: random action baseline βββ
|
| 526 |
+
_ACTION_TYPES = ["negotiate", "communicate", "delegate", "spend", "reschedule", "rest", "deprioritize", "execute"]
|
| 527 |
+
|
| 528 |
+
def _random_action(conflict, person):
|
| 529 |
+
"""Purely random action baseline β worst possible agent, used for ablation floor."""
|
| 530 |
+
import random as _r
|
| 531 |
+
env = LifeStackEnv()
|
| 532 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 533 |
+
flat = env.state.current_metrics.flatten()
|
| 534 |
+
atype = _r.choice(_ACTION_TYPES)
|
| 535 |
+
dom = _r.choice(_DOMAINS)
|
| 536 |
+
key = f"{dom}.stress_level" if dom in ("career", "mental_wellbeing") else f"{dom}.liquidity" if dom == "finances" else f"{dom}.energy_level"
|
| 537 |
+
mc = {key: _r.uniform(-20, 20)}
|
| 538 |
+
rc = {"time": _r.uniform(0.5, 3.0), "energy": _r.uniform(5, 30)}
|
| 539 |
+
uptake = person.respond_to_action(atype, rc, flat.get("mental_wellbeing.stress_level", 70))
|
| 540 |
+
env_action = LifeStackAction(action_type=atype, target=dom,
|
| 541 |
+
metric_changes={k: v * uptake for k, v in mc.items()},
|
| 542 |
+
resource_cost=rc, reasoning="Random baseline.", actions_taken=1)
|
| 543 |
+
obs = env.step(env_action)
|
| 544 |
+
return {"metrics": obs.metrics, "action": {"type": atype, "target": dom,
|
| 545 |
+
"description": "Random action (ablation floor).",
|
| 546 |
+
"reasoning": "Random baseline.", "reward": obs.reward, "cost": rc}}
|
| 547 |
+
|
| 548 |
+
|
| 549 |
+
# βββ Feature A: Trained vs Untrained Comparison βββ
|
| 550 |
+
BASELINE_ACTION_MAP = {
|
| 551 |
+
"career": ("negotiate", {"career.workload": -12.0, "mental_wellbeing.stress_level": -4.0}, {"time": 1.5, "energy": 20.0}, "Negotiate workload with manager."),
|
| 552 |
+
"finances": ("spend", {"finances.liquidity": -200.0, "mental_wellbeing.stress_level": -8.0}, {"time": 1.0, "energy": 10.0}, "Spend to resolve financial pressure."),
|
| 553 |
+
"relationships": ("communicate", {"relationships.romantic": 8.0, "mental_wellbeing.stress_level": -5.0},{"time": 0.5, "energy": 8.0}, "Call partner to check in."),
|
| 554 |
+
"physical_health": ("rest", {"physical_health.energy_level": 12.0, "mental_wellbeing.stress_level": -6.0}, {"time": 1.0}, "Rest to recover energy."),
|
| 555 |
+
"mental_wellbeing": ("rest", {"mental_wellbeing.stress_level": -15.0, "physical_health.sleep_quality": 5.0}, {"time": 1.0}, "Take a break to reduce stress."),
|
| 556 |
+
"time": ("reschedule", {"time.free_hours_per_week": 6.0, "career.workload": -8.0}, {"time": 1.5, "energy": 12.0}, "Reschedule non-critical tasks."),
|
| 557 |
+
}
|
| 558 |
+
|
| 559 |
+
def _run_baseline(conflict, person):
|
| 560 |
+
"""Rule-based baseline: pick the action for the worst-scoring domain."""
|
| 561 |
+
env = LifeStackEnv()
|
| 562 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 563 |
+
flat = env.state.current_metrics.flatten()
|
| 564 |
+
|
| 565 |
+
domain_scores = {}
|
| 566 |
+
for dom in ["career", "finances", "relationships", "physical_health", "mental_wellbeing", "time"]:
|
| 567 |
+
subs = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
|
| 568 |
+
domain_scores[dom] = sum(subs.values()) / len(subs) if subs else 70.0
|
| 569 |
+
|
| 570 |
+
worst_dom = min(domain_scores, key=domain_scores.get)
|
| 571 |
+
atype, mc, rc, desc = BASELINE_ACTION_MAP.get(worst_dom, BASELINE_ACTION_MAP["mental_wellbeing"])
|
| 572 |
+
|
| 573 |
+
uptake = person.respond_to_action(atype, rc, flat.get("mental_wellbeing.stress_level", 70))
|
| 574 |
+
scaled_mc = {k: v * uptake for k, v in mc.items()}
|
| 575 |
+
|
| 576 |
+
env_action = LifeStackAction(
|
| 577 |
+
action_type=atype,
|
| 578 |
+
target=worst_dom,
|
| 579 |
+
metric_changes=scaled_mc,
|
| 580 |
+
resource_cost=rc,
|
| 581 |
+
reasoning=f"Rule-based: {worst_dom} scored {domain_scores[worst_dom]:.1f} β lowest domain.",
|
| 582 |
+
actions_taken=1,
|
| 583 |
+
)
|
| 584 |
+
obs = env.step(env_action)
|
| 585 |
+
return {
|
| 586 |
+
"metrics": obs.metrics,
|
| 587 |
+
"action": {
|
| 588 |
+
"type": atype,
|
| 589 |
+
"target": worst_dom,
|
| 590 |
+
"description": desc,
|
| 591 |
+
"reasoning": env_action.reasoning,
|
| 592 |
+
"reward": obs.reward,
|
| 593 |
+
"cost": rc,
|
| 594 |
+
}
|
| 595 |
+
}
|
| 596 |
+
|
| 597 |
+
def _run_agent_comparison_side(conflict, person, api_only: bool):
|
| 598 |
+
"""Run one side of the comparison: api_only=True β untrained LLM, False β GRPO-trained."""
|
| 599 |
+
env = LifeStackEnv()
|
| 600 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 601 |
+
before_metrics = copy.deepcopy(env.state.current_metrics)
|
| 602 |
+
before_budget = copy.deepcopy(env.state.budget)
|
| 603 |
+
action = AGENT.get_action(before_metrics, before_budget, conflict, person, api_only=api_only)
|
| 604 |
+
_normalize_action_metric_changes(action)
|
| 605 |
+
uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
|
| 606 |
+
before_metrics.mental_wellbeing.stress_level)
|
| 607 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 608 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 609 |
+
obs = env.step(env_action)
|
| 610 |
+
return {
|
| 611 |
+
"metrics": obs.metrics,
|
| 612 |
+
"action": {
|
| 613 |
+
"type": action.primary.action_type,
|
| 614 |
+
"target": action.primary.target_domain,
|
| 615 |
+
"description": action.primary.description,
|
| 616 |
+
"reasoning": action.reasoning,
|
| 617 |
+
"reward": obs.reward,
|
| 618 |
+
"cost": action.primary.resource_cost,
|
| 619 |
+
}
|
| 620 |
+
}
|
| 621 |
+
|
| 622 |
+
|
| 623 |
+
@app.route('/api/comparison/run', methods=['POST'])
|
| 624 |
+
def run_comparison():
|
| 625 |
+
"""Run same conflict through untrained LLM (no RL) AND GRPO-trained LifeStack agent."""
|
| 626 |
+
data = request.json
|
| 627 |
+
conflict_label = data.get('conflict')
|
| 628 |
+
person_label = data.get('person')
|
| 629 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 630 |
+
person = PERSONS.get(person_label, PERSONS["Alex (Executive) β driven, high-stress"])
|
| 631 |
+
|
| 632 |
+
# Untrained LLM path β forces Groq API, no GRPO optimization
|
| 633 |
+
try:
|
| 634 |
+
baseline = _run_agent_comparison_side(conflict, person, api_only=True)
|
| 635 |
+
except Exception as e:
|
| 636 |
+
baseline = {"error": str(e)}
|
| 637 |
+
|
| 638 |
+
# GRPO-trained agent path β uses local model if available, lazy-loaded
|
| 639 |
+
try:
|
| 640 |
+
trained = _run_agent_comparison_side(conflict, person, api_only=False)
|
| 641 |
+
except Exception as e:
|
| 642 |
+
trained = {"error": str(e)}
|
| 643 |
+
|
| 644 |
+
return jsonify({"baseline": baseline, "trained": trained})
|
| 645 |
+
|
| 646 |
+
|
| 647 |
+
# βββ Feature E: Memory Effect Comparison βββ
|
| 648 |
+
@app.route('/api/memory/compare', methods=['POST'])
|
| 649 |
+
def memory_compare():
|
| 650 |
+
"""Show the same conflict resolved cold (no memory) vs warm (with RAG memory)."""
|
| 651 |
+
try:
|
| 652 |
+
data = request.json
|
| 653 |
+
conflict_label = data.get('conflict')
|
| 654 |
+
person_label = data.get('person')
|
| 655 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 656 |
+
person = PERSONS.get(person_label, PERSONS["Alex (Executive) β driven, high-stress"])
|
| 657 |
+
|
| 658 |
+
def _run_episode(use_memory: bool):
|
| 659 |
+
env = LifeStackEnv()
|
| 660 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 661 |
+
before_metrics = copy.deepcopy(env.state.current_metrics)
|
| 662 |
+
before_budget = copy.deepcopy(env.state.budget)
|
| 663 |
+
few_shot = ""
|
| 664 |
+
retrieved = []
|
| 665 |
+
if use_memory:
|
| 666 |
+
few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_metrics.flatten())
|
| 667 |
+
retrieved = MEMORY.retrieve_similar(conflict.title, before_metrics.flatten())
|
| 668 |
+
action = AGENT.get_action(before_metrics, before_budget, conflict, person, few_shot_context=few_shot)
|
| 669 |
+
_normalize_action_metric_changes(action)
|
| 670 |
+
uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
|
| 671 |
+
before_metrics.mental_wellbeing.stress_level)
|
| 672 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 673 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 674 |
+
obs = env.step(env_action)
|
| 675 |
+
MEMORY.store_decision(
|
| 676 |
+
conflict_title=conflict.title,
|
| 677 |
+
action_type=action.primary.action_type,
|
| 678 |
+
target_domain=action.primary.target_domain,
|
| 679 |
+
reward=obs.reward,
|
| 680 |
+
metrics_snapshot=before_metrics.flatten(),
|
| 681 |
+
reasoning=action.reasoning,
|
| 682 |
+
)
|
| 683 |
+
return {
|
| 684 |
+
"metrics": obs.metrics,
|
| 685 |
+
"action": {
|
| 686 |
+
"type": action.primary.action_type,
|
| 687 |
+
"target": action.primary.target_domain,
|
| 688 |
+
"description": action.primary.description,
|
| 689 |
+
"reasoning": action.reasoning,
|
| 690 |
+
"reward": obs.reward,
|
| 691 |
+
"memories_retrieved": retrieved,
|
| 692 |
+
}
|
| 693 |
+
}
|
| 694 |
+
|
| 695 |
+
cold = _run_episode(use_memory=False)
|
| 696 |
+
warm = _run_episode(use_memory=True)
|
| 697 |
+
return jsonify({"cold": cold, "warm": warm})
|
| 698 |
+
except Exception as e:
|
| 699 |
+
return jsonify({"error": str(e)}), 500
|
| 700 |
+
|
| 701 |
+
|
| 702 |
+
# βββ F2: /api/cascade/frames alias βββ
|
| 703 |
+
@app.route('/api/cascade/frames', methods=['POST'])
|
| 704 |
+
def cascade_frames_alias():
|
| 705 |
+
"""Alias route for /api/simulation/cascade β same handler."""
|
| 706 |
+
return get_cascade_frames()
|
| 707 |
+
|
| 708 |
+
|
| 709 |
+
# βββ F4: Personality Comparison with OCEAN scores βββ
|
| 710 |
+
@app.route('/api/personality/compare', methods=['POST'])
|
| 711 |
+
def personality_compare():
|
| 712 |
+
data = request.json
|
| 713 |
+
conflict_label = data.get('conflict')
|
| 714 |
+
person_a_label = data.get('person_a')
|
| 715 |
+
person_b_label = data.get('person_b')
|
| 716 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 717 |
+
|
| 718 |
+
def _run_person(person_label):
|
| 719 |
+
person = PERSONS.get(person_label, list(PERSONS.values())[0])
|
| 720 |
+
env = LifeStackEnv()
|
| 721 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 722 |
+
before_m = copy.deepcopy(env.state.current_metrics)
|
| 723 |
+
before_b = copy.deepcopy(env.state.budget)
|
| 724 |
+
action = AGENT.get_action(before_m, before_b, conflict, person)
|
| 725 |
+
_normalize_action_metric_changes(action)
|
| 726 |
+
uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
|
| 727 |
+
before_m.mental_wellbeing.stress_level)
|
| 728 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 729 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 730 |
+
obs = env.step(env_action)
|
| 731 |
+
return {
|
| 732 |
+
"name": person.name,
|
| 733 |
+
"ocean": {
|
| 734 |
+
"openness": round(person.openness * 100),
|
| 735 |
+
"conscientiousness": round(person.conscientiousness * 100),
|
| 736 |
+
"extraversion": round(person.extraversion * 100),
|
| 737 |
+
"agreeableness": round(person.agreeableness * 100),
|
| 738 |
+
"neuroticism": round(person.neuroticism * 100),
|
| 739 |
+
},
|
| 740 |
+
"action": {
|
| 741 |
+
"type": action.primary.action_type,
|
| 742 |
+
"target": action.primary.target_domain,
|
| 743 |
+
"description": action.primary.description,
|
| 744 |
+
"reasoning": action.reasoning,
|
| 745 |
+
"reward": obs.reward,
|
| 746 |
+
"uptake": uptake,
|
| 747 |
+
},
|
| 748 |
+
"metrics": obs.metrics,
|
| 749 |
+
"domain_health": compute_domain_health(obs.metrics),
|
| 750 |
+
}
|
| 751 |
+
|
| 752 |
+
try:
|
| 753 |
+
return jsonify({"a": _run_person(person_a_label), "b": _run_person(person_b_label)})
|
| 754 |
+
except Exception as e:
|
| 755 |
+
return jsonify({"error": str(e)}), 500
|
| 756 |
+
|
| 757 |
+
|
| 758 |
+
# βββ F6: Dedicated Counterfactual Generation βββ
|
| 759 |
+
@app.route('/api/counterfactuals/generate', methods=['POST'])
|
| 760 |
+
def counterfactuals_generate():
|
| 761 |
+
data = request.json
|
| 762 |
+
conflict_label = data.get('conflict')
|
| 763 |
+
person_label = data.get('person')
|
| 764 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 765 |
+
person = PERSONS.get(person_label, list(PERSONS.values())[0])
|
| 766 |
+
|
| 767 |
+
env = LifeStackEnv()
|
| 768 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 769 |
+
before_m = copy.deepcopy(env.state.current_metrics)
|
| 770 |
+
before_b = copy.deepcopy(env.state.budget)
|
| 771 |
+
action = AGENT.get_action(before_m, before_b, conflict, person)
|
| 772 |
+
_normalize_action_metric_changes(action)
|
| 773 |
+
cf_data = generate_counterfactuals(AGENT, before_m, before_b, conflict, person, action)
|
| 774 |
+
return jsonify({
|
| 775 |
+
"counterfactuals": cf_data,
|
| 776 |
+
"actual_action": {
|
| 777 |
+
"type": action.primary.action_type,
|
| 778 |
+
"target": action.primary.target_domain,
|
| 779 |
+
"description": action.primary.description,
|
| 780 |
+
},
|
| 781 |
+
})
|
| 782 |
+
|
| 783 |
+
|
| 784 |
+
# βββ F7: Memory Ablation Study βββ
|
| 785 |
+
@app.route('/api/memory/ablation', methods=['POST'])
|
| 786 |
+
def memory_ablation():
|
| 787 |
+
"""Memory ablation: cold (0 memories) vs warm (RAG-augmented). Surfaces ablation delta."""
|
| 788 |
+
data = request.json
|
| 789 |
+
conflict_label = data.get('conflict')
|
| 790 |
+
person_label = data.get('person')
|
| 791 |
+
conflict = CONFLICT_CHOICES.get(conflict_label, DEMO_CONFLICT)
|
| 792 |
+
person = PERSONS.get(person_label, list(PERSONS.values())[0])
|
| 793 |
+
|
| 794 |
+
def _run(use_memory):
|
| 795 |
+
env = LifeStackEnv()
|
| 796 |
+
env.reset(conflict=conflict.primary_disruption, budget={"time": max((conflict.resource_budget or {}).get("time", 20.0), 4.0), "money": max((conflict.resource_budget or {}).get("money", 500.0), 500.0), "energy": max((conflict.resource_budget or {}).get("energy", 100.0), 20.0)})
|
| 797 |
+
before_m = copy.deepcopy(env.state.current_metrics)
|
| 798 |
+
before_b = copy.deepcopy(env.state.budget)
|
| 799 |
+
few_shot, retrieved = "", []
|
| 800 |
+
if use_memory:
|
| 801 |
+
few_shot = MEMORY.build_few_shot_prompt(conflict.title, before_m.flatten())
|
| 802 |
+
retrieved = MEMORY.retrieve_similar(conflict.title, before_m.flatten())
|
| 803 |
+
action = AGENT.get_action(before_m, before_b, conflict, person, few_shot_context=few_shot)
|
| 804 |
+
_normalize_action_metric_changes(action)
|
| 805 |
+
uptake = person.respond_to_action(action.primary.action_type, action.primary.resource_cost,
|
| 806 |
+
before_m.mental_wellbeing.stress_level)
|
| 807 |
+
env_action = LifeStackAction.from_agent_action(action)
|
| 808 |
+
env_action.metric_changes = {k: v * uptake for k, v in action.primary.metric_changes.items()}
|
| 809 |
+
obs = env.step(env_action)
|
| 810 |
+
MEMORY.store_decision(conflict_title=conflict.title, action_type=action.primary.action_type,
|
| 811 |
+
target_domain=action.primary.target_domain, reward=obs.reward,
|
| 812 |
+
metrics_snapshot=before_m.flatten(), reasoning=action.reasoning)
|
| 813 |
+
return {"metrics": obs.metrics, "action": {
|
| 814 |
+
"type": action.primary.action_type, "target": action.primary.target_domain,
|
| 815 |
+
"description": action.primary.description, "reasoning": action.reasoning,
|
| 816 |
+
"reward": obs.reward, "memories_retrieved": retrieved,
|
| 817 |
+
}}
|
| 818 |
+
|
| 819 |
+
cold = _run(use_memory=False)
|
| 820 |
+
warm = _run(use_memory=True)
|
| 821 |
+
delta = warm["action"]["reward"] - cold["action"]["reward"]
|
| 822 |
+
return jsonify({"cold": cold, "warm": warm,
|
| 823 |
+
"ablation_delta": round(delta, 4),
|
| 824 |
+
"memory_count": len(warm["action"]["memories_retrieved"])})
|
| 825 |
+
|
| 826 |
+
|
| 827 |
+
# βββ F10: Health + Calendar Data Upload βββ
|
| 828 |
+
@app.route('/api/data/health/upload', methods=['POST'])
|
| 829 |
+
def upload_health_data():
|
| 830 |
+
"""Accept health/fitness JSON signals and return metric deltas."""
|
| 831 |
+
data = request.json or {}
|
| 832 |
+
sleep = float(data.get('sleep_hours', 7.0))
|
| 833 |
+
hr = float(data.get('resting_heart_rate', 70))
|
| 834 |
+
steps = float(data.get('daily_steps', 8000))
|
| 835 |
+
deltas = {
|
| 836 |
+
"physical_health.sleep_quality": round(min(100, sleep / 8 * 100) - 50, 1),
|
| 837 |
+
"physical_health.energy_level": round(min(100, steps / 10000 * 100) - 50, 1),
|
| 838 |
+
"physical_health.exercise_consistency": round(min(100, steps / 8000 * 70), 1),
|
| 839 |
+
"mental_wellbeing.stress_level": round(max(0.0, 80.0 - hr), 1),
|
| 840 |
+
}
|
| 841 |
+
summary = f"Sleep {sleep:.1f}h | HR {hr:.0f}bpm | Steps {int(steps):,}/day"
|
| 842 |
+
# Persist overrides so future simulations use the uploaded health data
|
| 843 |
+
USER_HEALTH_OVERRIDES.update(deltas)
|
| 844 |
+
return jsonify({"status": "success", "deltas": deltas, "summary": summary,
|
| 845 |
+
"signals": {"avg_sleep_hours": sleep, "resting_heart_rate": hr, "daily_steps_avg": steps}})
|
| 846 |
+
|
| 847 |
+
|
| 848 |
+
@app.route('/api/data/calendar/upload', methods=['POST'])
|
| 849 |
+
def upload_calendar_data():
|
| 850 |
+
"""Accept calendar JSON signals and return metric deltas."""
|
| 851 |
+
data = request.json or {}
|
| 852 |
+
occupancy = float(data.get('week_occupancy_pct', 50))
|
| 853 |
+
btb = int(data.get('back_to_back_blocks', 0))
|
| 854 |
+
deadlines = data.get('upcoming_deadlines', [])
|
| 855 |
+
critical_count = sum(1 for d in deadlines if d.get('priority') == 'critical')
|
| 856 |
+
deltas = {
|
| 857 |
+
"time.free_hours_per_week": round(-((occupancy - 50) / 5), 1),
|
| 858 |
+
"time.schedule_control": round(-(occupancy / 10), 1),
|
| 859 |
+
"mental_wellbeing.stress_level": round((occupancy / 10) + (btb * 2), 1),
|
| 860 |
+
"career.workload": round((occupancy - 50) / 2 + critical_count * 5, 1),
|
| 861 |
+
}
|
| 862 |
+
summary = f"Occupancy {occupancy:.0f}% | {len(deadlines)} deadlines ({critical_count} critical)"
|
| 863 |
+
return jsonify({"status": "success", "deltas": deltas, "summary": summary,
|
| 864 |
+
"signals": {"week_occupancy_pct": occupancy, "back_to_back_blocks": btb,
|
| 865 |
+
"upcoming_deadlines": deadlines}})
|
| 866 |
+
|
| 867 |
+
|
| 868 |
+
# βββ Global Error Handlers βββ
|
| 869 |
+
@app.errorhandler(429)
|
| 870 |
+
def ratelimit_handler(e):
|
| 871 |
+
return jsonify({"error": "Rate limit exceeded. Slow down!", "details": str(e)}), 429
|
| 872 |
+
|
| 873 |
+
@app.errorhandler(500)
|
| 874 |
+
def server_error_handler(e):
|
| 875 |
+
return jsonify({"error": "Internal server error. The agent might be overwhelmed.", "details": str(e)}), 500
|
| 876 |
+
|
| 877 |
+
if __name__ == '__main__':
|
| 878 |
+
LONG_DEMO.pre_seed_arjun()
|
| 879 |
+
app.run(host='0.0.0.0', port=7860, debug=True)
|
core/__init__.py
ADDED
|
File without changes
|
core/action_space.py
ADDED
|
@@ -0,0 +1,238 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import copy
|
| 2 |
+
from dataclasses import dataclass, field
|
| 3 |
+
from core.life_state import LifeMetrics, ResourceBudget
|
| 4 |
+
from enum import Enum
|
| 5 |
+
from intake.simperson import SimPerson
|
| 6 |
+
|
| 7 |
+
class ToolActionType(str, Enum):
|
| 8 |
+
INSPECT = "inspect"
|
| 9 |
+
PLAN = "plan"
|
| 10 |
+
EXECUTE = "execute"
|
| 11 |
+
COMMUNICATE = "communicate"
|
| 12 |
+
WAIT = "wait"
|
| 13 |
+
ROLLBACK = "rollback"
|
| 14 |
+
ESCALATE = "escalate"
|
| 15 |
+
|
| 16 |
+
@dataclass
|
| 17 |
+
class PrimaryAction:
|
| 18 |
+
action_type: str # reschedule, delegate, negotiate, spend, communicate, rest, deprioritize
|
| 19 |
+
target_domain: str
|
| 20 |
+
metric_changes: dict
|
| 21 |
+
resource_cost: dict
|
| 22 |
+
description: str
|
| 23 |
+
|
| 24 |
+
@dataclass
|
| 25 |
+
class CommunicationAction:
|
| 26 |
+
recipient: str # boss, partner, family, friend, colleague
|
| 27 |
+
message_type: str # apologize, negotiate, inform, request, reassure
|
| 28 |
+
tone: str # formal, warm, urgent, calm, assertive
|
| 29 |
+
content: str
|
| 30 |
+
|
| 31 |
+
@dataclass
|
| 32 |
+
class AgentAction:
|
| 33 |
+
primary: PrimaryAction
|
| 34 |
+
communication: CommunicationAction = None
|
| 35 |
+
reasoning: str = ""
|
| 36 |
+
model_used: str = "unknown"
|
| 37 |
+
raw_completion: str = ""
|
| 38 |
+
|
| 39 |
+
def validate_action(action: AgentAction, budget: ResourceBudget) -> tuple[bool, str]:
|
| 40 |
+
cost = action.primary.resource_cost
|
| 41 |
+
if budget.time_hours < cost.get('time', 0.0):
|
| 42 |
+
return False, f"Not enough time (Needs {cost.get('time')}h, has {budget.time_hours:.1f}h)"
|
| 43 |
+
if budget.money_dollars < cost.get('money', 0.0):
|
| 44 |
+
return False, f"Not enough money (Needs ${cost.get('money')}, has ${budget.money_dollars:.1f})"
|
| 45 |
+
if budget.energy_units < cost.get('energy', 0.0):
|
| 46 |
+
return False, f"Not enough energy (Needs {cost.get('energy')}u, has {budget.energy_units:.1f}u)"
|
| 47 |
+
return True, ""
|
| 48 |
+
|
| 49 |
+
def apply_action(action: AgentAction, metrics: LifeMetrics, budget: ResourceBudget, person: SimPerson) -> tuple[LifeMetrics, ResourceBudget, float]:
|
| 50 |
+
"""Validates, scales by personality uptake, and applies the action to the state."""
|
| 51 |
+
|
| 52 |
+
# 1. Validation
|
| 53 |
+
is_valid, reason = validate_action(action, budget)
|
| 54 |
+
if not is_valid:
|
| 55 |
+
# If invalid, the action fails but we return current state with 0 uptake
|
| 56 |
+
return metrics, budget, 0.0
|
| 57 |
+
|
| 58 |
+
# 2. Personality Scaling (Uptake)
|
| 59 |
+
current_stress = metrics.mental_wellbeing.stress_level
|
| 60 |
+
uptake_score = person.respond_to_action(
|
| 61 |
+
action.primary.action_type,
|
| 62 |
+
action.primary.resource_cost,
|
| 63 |
+
current_stress
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
# 3. Apply changes (Scaled by uptake)
|
| 67 |
+
new_metrics = copy.deepcopy(metrics)
|
| 68 |
+
for path, delta in action.primary.metric_changes.items():
|
| 69 |
+
# Guard: skip malformed keys without a domain prefix (e.g. LLM returns "stress_level" instead of "mental_wellbeing.stress_level")
|
| 70 |
+
if '.' not in path:
|
| 71 |
+
print(f" β οΈ Skipping malformed metric key: '{path}' (expected 'domain.submetric')")
|
| 72 |
+
continue
|
| 73 |
+
parts = path.split('.', 1)
|
| 74 |
+
domain_name, sub_name = parts[0], parts[1]
|
| 75 |
+
domain = getattr(new_metrics, domain_name, None)
|
| 76 |
+
if domain is None or not hasattr(domain, sub_name):
|
| 77 |
+
print(f" β οΈ Skipping unknown metric: '{path}'")
|
| 78 |
+
continue
|
| 79 |
+
current = getattr(domain, sub_name)
|
| 80 |
+
|
| 81 |
+
# Scale the benefit/cost by the person's receptiveness
|
| 82 |
+
try:
|
| 83 |
+
scaled_delta = float(delta) * uptake_score
|
| 84 |
+
setattr(domain, sub_name, max(0.0, min(100.0, current + scaled_delta)))
|
| 85 |
+
except ValueError:
|
| 86 |
+
print(f" β οΈ Skipping metric change due to invalid delta value: '{delta}'")
|
| 87 |
+
|
| 88 |
+
# 4. Deduct resources (Fixed cost, doesn't scale with uptake)
|
| 89 |
+
new_budget = copy.deepcopy(budget)
|
| 90 |
+
new_budget.deduct(
|
| 91 |
+
time=action.primary.resource_cost.get('time', 0.0),
|
| 92 |
+
money=action.primary.resource_cost.get('money', 0.0),
|
| 93 |
+
energy=action.primary.resource_cost.get('energy', 0.0)
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
return new_metrics, new_budget, uptake_score
|
| 97 |
+
|
| 98 |
+
# 10 EXAMPLE ACTIONS for Friday 6PM Conflict
|
| 99 |
+
EXAMPLE_ACTIONS = [
|
| 100 |
+
AgentAction(
|
| 101 |
+
primary=PrimaryAction(
|
| 102 |
+
action_type="negotiate", target_domain="career",
|
| 103 |
+
metric_changes={"career.workload": -15.0, "mental_wellbeing.stress_level": -5.0},
|
| 104 |
+
resource_cost={"time": 1.5, "energy": 20.0},
|
| 105 |
+
description="Negotiate a Sunday deadline extension with my boss."
|
| 106 |
+
),
|
| 107 |
+
communication=CommunicationAction("boss", "negotiate", "formal", "Due to flight issues, I need until Sunday PM for the report."),
|
| 108 |
+
reasoning="Relieving the immediate workload pressure is critical to reduce cascade spread."
|
| 109 |
+
),
|
| 110 |
+
AgentAction(
|
| 111 |
+
primary=PrimaryAction(
|
| 112 |
+
action_type="spend", target_domain="finances",
|
| 113 |
+
metric_changes={"finances.liquidity": -350.0, "mental_wellbeing.stress_level": -10.0},
|
| 114 |
+
resource_cost={"time": 1.0, "energy": 15.0},
|
| 115 |
+
description="Rebook the canceled flight using a premium fare."
|
| 116 |
+
),
|
| 117 |
+
reasoning="Immediate resolution of logistics fixes the source of the crisis."
|
| 118 |
+
),
|
| 119 |
+
AgentAction(
|
| 120 |
+
primary=PrimaryAction(
|
| 121 |
+
action_type="communicate", target_domain="relationships",
|
| 122 |
+
metric_changes={"relationships.romantic": 12.0, "mental_wellbeing.stress_level": -5.0},
|
| 123 |
+
resource_cost={"time": 0.5, "energy": 10.0},
|
| 124 |
+
description="Call my partner to explain the situation and reassure them."
|
| 125 |
+
),
|
| 126 |
+
communication=CommunicationAction("partner", "reassure", "warm", "Hey, I'm stuck but I'll be home soon. Miss you."),
|
| 127 |
+
reasoning="Prevents relationship decay while stress is high."
|
| 128 |
+
),
|
| 129 |
+
AgentAction(
|
| 130 |
+
primary=PrimaryAction(
|
| 131 |
+
action_type="communicate", target_domain="finances",
|
| 132 |
+
metric_changes={"finances.liquidity": 200.0, "relationships.family": -5.0},
|
| 133 |
+
resource_cost={"time": 1.5, "energy": 25.0},
|
| 134 |
+
description="Ask my sibling for a temporary loan to cover rebooking."
|
| 135 |
+
),
|
| 136 |
+
communication=CommunicationAction("family", "request", "urgent", "My card declined, can you Venmo me $200 for the flight?"),
|
| 137 |
+
reasoning="Fixes the liquidity block at a small social cost."
|
| 138 |
+
),
|
| 139 |
+
AgentAction(
|
| 140 |
+
primary=PrimaryAction(
|
| 141 |
+
action_type="reschedule", target_domain="time",
|
| 142 |
+
metric_changes={"career.workload": -10.0, "time.free_hours_per_week": 5.0},
|
| 143 |
+
resource_cost={"time": 2.0, "energy": 15.0},
|
| 144 |
+
description="Cancel non-essential meetings to create a deep-work block."
|
| 145 |
+
),
|
| 146 |
+
reasoning="Regaining time allows for better problem solving later."
|
| 147 |
+
),
|
| 148 |
+
AgentAction(
|
| 149 |
+
primary=PrimaryAction(
|
| 150 |
+
action_type="rest", target_domain="physical_health",
|
| 151 |
+
metric_changes={"mental_wellbeing.stress_level": -12.0, "physical_health.energy": 10.0},
|
| 152 |
+
resource_cost={"time": 1.0, "energy": -10.0},
|
| 153 |
+
description="Take a 60-minute power nap in the airport lounge."
|
| 154 |
+
),
|
| 155 |
+
reasoning="Restores energy to tackle the remaining Sunday deadline."
|
| 156 |
+
),
|
| 157 |
+
AgentAction(
|
| 158 |
+
primary=PrimaryAction(
|
| 159 |
+
action_type="delegate", target_domain="career",
|
| 160 |
+
metric_changes={"career.workload": -10.0, "relationships.professional_network": -5.0},
|
| 161 |
+
resource_cost={"time": 1.0, "energy": 15.0},
|
| 162 |
+
description="Ask a colleague to handle the final formatting of the slides."
|
| 163 |
+
),
|
| 164 |
+
communication=CommunicationAction("colleague", "request", "assertive", "I'm stuck at airport, can you finish the formatting?"),
|
| 165 |
+
reasoning="Reduces workload by leaning on the professional network."
|
| 166 |
+
),
|
| 167 |
+
AgentAction(
|
| 168 |
+
primary=PrimaryAction(
|
| 169 |
+
action_type="deprioritize", target_domain="time",
|
| 170 |
+
metric_changes={"time.free_hours_per_week": 8.0, "relationships.social": -10.0},
|
| 171 |
+
resource_cost={"time": 0.5, "energy": 5.0},
|
| 172 |
+
description="Tell friends I can't attend the weekend gathering."
|
| 173 |
+
),
|
| 174 |
+
communication=CommunicationAction("friend", "inform", "calm", "Hey, work crisis. Won't make it this weekend. Sorry!"),
|
| 175 |
+
reasoning="Aggressively reclaims time for high-value tasks."
|
| 176 |
+
),
|
| 177 |
+
AgentAction(
|
| 178 |
+
primary=PrimaryAction(
|
| 179 |
+
action_type="communicate", target_domain="career",
|
| 180 |
+
metric_changes={"career.stability": 8.0, "mental_wellbeing.stress_level": -5.0},
|
| 181 |
+
resource_cost={"time": 0.5, "energy": 10.0},
|
| 182 |
+
description="Send an apology note to boss for the delay."
|
| 183 |
+
),
|
| 184 |
+
communication=CommunicationAction("boss", "apologize", "formal", "Apologies for the delay caused by travel disruptions. On it now."),
|
| 185 |
+
reasoning="Maintains career stability during an active crisis."
|
| 186 |
+
),
|
| 187 |
+
AgentAction(
|
| 188 |
+
primary=PrimaryAction(
|
| 189 |
+
action_type="reschedule", target_domain="finances",
|
| 190 |
+
metric_changes={"finances.debt_pressure": -10.0, "time.admin_overhead": 10.0},
|
| 191 |
+
resource_cost={"time": 2.0, "energy": 15.0},
|
| 192 |
+
description="Call the bank to unlock the declined card."
|
| 193 |
+
),
|
| 194 |
+
communication=CommunicationAction("colleague", "request", "assertive", "Unlock my credit card immediately."),
|
| 195 |
+
reasoning="Removes the liquidity barrier by handling admin overhead."
|
| 196 |
+
)
|
| 197 |
+
]
|
| 198 |
+
|
| 199 |
+
def main():
|
| 200 |
+
# 1. Setup Personalities
|
| 201 |
+
# Sam (Anxious Introvert): Neuroticism 0.9, Extraversion 0.1
|
| 202 |
+
sam = SimPerson(name="Sam (Introvert)", openness=0.5, conscientiousness=0.6, extraversion=0.1, agreeableness=0.65, neuroticism=0.9)
|
| 203 |
+
|
| 204 |
+
# 2. Setup initial state (Friday 6PM Conflict)
|
| 205 |
+
from core.life_state import DependencyGraph
|
| 206 |
+
graph = DependencyGraph()
|
| 207 |
+
metrics = LifeMetrics() # starts at 70s
|
| 208 |
+
metrics = graph.cascade(metrics, {"career.workload": 35.0, "finances.liquidity": -40.0})
|
| 209 |
+
budget = ResourceBudget(time_hours=20.0, money_dollars=500.0, energy_units=100.0)
|
| 210 |
+
|
| 211 |
+
print("--- SIMULATING ACTIONS FOR SAM (ANXIOUS INTROVERT) ---")
|
| 212 |
+
print(f"Initial Stress: {metrics.mental_wellbeing.stress_level:.2f}")
|
| 213 |
+
print(f"Initial Metrics Health (Avg): {sum(metrics.flatten().values())/23:.2f}")
|
| 214 |
+
|
| 215 |
+
# 3. Apply each action
|
| 216 |
+
for i, action in enumerate(EXAMPLE_ACTIONS, 1):
|
| 217 |
+
print(f"\nACTION {i}: {action.primary.description}")
|
| 218 |
+
|
| 219 |
+
is_valid, reason = validate_action(action, budget)
|
| 220 |
+
if not is_valid:
|
| 221 |
+
print(f" β FAILED: {reason}")
|
| 222 |
+
continue
|
| 223 |
+
|
| 224 |
+
m_after, b_after, uptake = apply_action(action, metrics, budget, sam)
|
| 225 |
+
|
| 226 |
+
print(f" β
SUCCESS | Uptake: {uptake:.2f}")
|
| 227 |
+
print(f" Cost: {action.primary.resource_cost}")
|
| 228 |
+
|
| 229 |
+
# Show specific improvements
|
| 230 |
+
for path, delta in action.primary.metric_changes.items():
|
| 231 |
+
domain_name, sub_name = path.split('.')
|
| 232 |
+
val_before = getattr(getattr(metrics, domain_name), sub_name)
|
| 233 |
+
val_after = getattr(getattr(m_after, domain_name), sub_name)
|
| 234 |
+
real_delta = val_after - val_before
|
| 235 |
+
print(f" - {path:25}: {val_before:.2f} -> {val_after:.2f} (Actual Change: {real_delta:+.2f})")
|
| 236 |
+
|
| 237 |
+
if __name__ == "__main__":
|
| 238 |
+
main()
|
core/cascade_utils.py
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import copy
|
| 2 |
+
from core.life_state import LifeMetrics, DependencyGraph, CASCADE_DAMPENING_DEFAULT
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
def animate_cascade(primary_disruption: dict, metrics: LifeMetrics) -> list[dict]:
|
| 6 |
+
"""Replay the cascade step-by-step and capture intermediate frames.
|
| 7 |
+
|
| 8 |
+
Returns a list of frames, each:
|
| 9 |
+
{ 'flat': {metric: value}, 'status': {metric: 'primary'|'first'|'second'|'unchanged'} }
|
| 10 |
+
"""
|
| 11 |
+
graph = DependencyGraph()
|
| 12 |
+
dampening = CASCADE_DAMPENING_DEFAULT
|
| 13 |
+
frames = []
|
| 14 |
+
|
| 15 |
+
# Frame 0 β initial stable state
|
| 16 |
+
base = copy.deepcopy(metrics)
|
| 17 |
+
base_flat = base.flatten()
|
| 18 |
+
frames.append({'flat': dict(base_flat), 'status': {k: 'unchanged' for k in base_flat}})
|
| 19 |
+
|
| 20 |
+
# Frame 1 β primary disruption only (no cascade)
|
| 21 |
+
f1 = copy.deepcopy(metrics)
|
| 22 |
+
primary_keys = set()
|
| 23 |
+
for path, amount in primary_disruption.items():
|
| 24 |
+
if '.' not in path:
|
| 25 |
+
continue
|
| 26 |
+
primary_keys.add(path)
|
| 27 |
+
dom_name, sub_name = path.split('.', 1)
|
| 28 |
+
dom = getattr(f1, dom_name, None)
|
| 29 |
+
if dom and hasattr(dom, sub_name):
|
| 30 |
+
setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + amount)))
|
| 31 |
+
f1_flat = f1.flatten()
|
| 32 |
+
frames.append({'flat': dict(f1_flat),
|
| 33 |
+
'status': {k: ('primary' if k in primary_keys else 'unchanged') for k in f1_flat}})
|
| 34 |
+
|
| 35 |
+
# Frame 2 β first-order cascade
|
| 36 |
+
f2 = copy.deepcopy(f1)
|
| 37 |
+
first_order_keys = set()
|
| 38 |
+
queue_next = []
|
| 39 |
+
for path, amount in primary_disruption.items():
|
| 40 |
+
if '.' not in path or path not in graph.edges:
|
| 41 |
+
continue
|
| 42 |
+
for target, weight in graph.edges[path]:
|
| 43 |
+
impact = amount * weight * dampening
|
| 44 |
+
if abs(impact) >= 0.05:
|
| 45 |
+
first_order_keys.add(target)
|
| 46 |
+
dom_name, sub_name = target.split('.', 1)
|
| 47 |
+
dom = getattr(f2, dom_name, None)
|
| 48 |
+
if dom and hasattr(dom, sub_name):
|
| 49 |
+
setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + impact)))
|
| 50 |
+
queue_next.append((target, impact))
|
| 51 |
+
f2_flat = f2.flatten()
|
| 52 |
+
frames.append({'flat': dict(f2_flat), 'status': {
|
| 53 |
+
k: ('primary' if k in primary_keys else 'first' if k in first_order_keys else 'unchanged')
|
| 54 |
+
for k in f2_flat
|
| 55 |
+
}})
|
| 56 |
+
|
| 57 |
+
# Frame 3 β second-order cascade
|
| 58 |
+
f3 = copy.deepcopy(f2)
|
| 59 |
+
second_order_keys = set()
|
| 60 |
+
for src_path, src_mag in queue_next:
|
| 61 |
+
if src_path not in graph.edges:
|
| 62 |
+
continue
|
| 63 |
+
for target, weight in graph.edges[src_path]:
|
| 64 |
+
impact = src_mag * weight * dampening
|
| 65 |
+
if abs(impact) >= 0.05:
|
| 66 |
+
second_order_keys.add(target)
|
| 67 |
+
dom_name, sub_name = target.split('.', 1)
|
| 68 |
+
dom = getattr(f3, dom_name, None)
|
| 69 |
+
if dom and hasattr(dom, sub_name):
|
| 70 |
+
setattr(dom, sub_name, max(0.0, min(100.0, getattr(dom, sub_name) + impact)))
|
| 71 |
+
f3_flat = f3.flatten()
|
| 72 |
+
frames.append({'flat': dict(f3_flat), 'status': {
|
| 73 |
+
k: ('primary' if k in primary_keys else 'first' if k in first_order_keys
|
| 74 |
+
else 'second' if k in second_order_keys else 'unchanged')
|
| 75 |
+
for k in f3_flat
|
| 76 |
+
}})
|
| 77 |
+
|
| 78 |
+
return frames
|
core/feedback.py
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from dataclasses import dataclass, field
|
| 2 |
+
from datetime import datetime
|
| 3 |
+
from typing import List, Optional
|
| 4 |
+
from core.lifestack_env import LifeStackObservation
|
| 5 |
+
|
| 6 |
+
@dataclass
|
| 7 |
+
class OutcomeFeedback:
|
| 8 |
+
episode_id: str
|
| 9 |
+
submitted_at: datetime = field(default_factory=datetime.now)
|
| 10 |
+
# Did the advice work overall? 0-10 scale
|
| 11 |
+
overall_effectiveness: int = 5
|
| 12 |
+
# Which domains actually changed (user-reported)
|
| 13 |
+
domains_improved: List[str] = field(default_factory=list)
|
| 14 |
+
domains_worsened: List[str] = field(default_factory=list)
|
| 15 |
+
# Free text: what unexpected effects happened?
|
| 16 |
+
unexpected_effects: str = ""
|
| 17 |
+
# Time to resolution (hours)
|
| 18 |
+
resolution_time_hours: float = 0.0
|
| 19 |
+
|
| 20 |
+
def compute_human_feedback_reward(initial_metrics: dict, predicted_obs: LifeStackObservation, feedback: OutcomeFeedback) -> float:
|
| 21 |
+
"""
|
| 22 |
+
Computes a reward score (0.0 to 1.0) based on how well the environment's
|
| 23 |
+
predicted outcomes match the human's reported reality.
|
| 24 |
+
"""
|
| 25 |
+
# Metrics where a decrease is an improvement
|
| 26 |
+
inverted = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
|
| 27 |
+
|
| 28 |
+
predicted_improved = set()
|
| 29 |
+
for key, final_val in predicted_obs.metrics.items():
|
| 30 |
+
if key not in initial_metrics:
|
| 31 |
+
continue
|
| 32 |
+
|
| 33 |
+
initial_val = initial_metrics[key]
|
| 34 |
+
delta = final_val - initial_val
|
| 35 |
+
submetric = key.split('.')[-1]
|
| 36 |
+
domain = key.split('.')[0]
|
| 37 |
+
|
| 38 |
+
# Determine if this specific change is an "improvement"
|
| 39 |
+
is_improvement = False
|
| 40 |
+
if submetric in inverted:
|
| 41 |
+
if delta < -1.0: # Significant decrease in negative metric
|
| 42 |
+
is_improvement = True
|
| 43 |
+
else:
|
| 44 |
+
if delta > 1.0: # Significant increase in positive metric
|
| 45 |
+
is_improvement = True
|
| 46 |
+
|
| 47 |
+
if is_improvement:
|
| 48 |
+
predicted_improved.add(domain)
|
| 49 |
+
|
| 50 |
+
actual_improved = set(feedback.domains_improved)
|
| 51 |
+
|
| 52 |
+
union = predicted_improved | actual_improved
|
| 53 |
+
if not union:
|
| 54 |
+
overlap = 1.0 # Both agreed nothing improved
|
| 55 |
+
else:
|
| 56 |
+
intersection = predicted_improved & actual_improved
|
| 57 |
+
overlap = len(intersection) / len(union)
|
| 58 |
+
|
| 59 |
+
# 2. Effectiveness Score (0.0 - 1.0)
|
| 60 |
+
effectiveness_score = max(0.0, min(1.0, feedback.overall_effectiveness / 10.0))
|
| 61 |
+
|
| 62 |
+
# Weighted Average
|
| 63 |
+
return 0.5 * overlap + 0.5 * effectiveness_score
|
core/life_state.py
ADDED
|
@@ -0,0 +1,281 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from dataclasses import dataclass, field
|
| 2 |
+
import copy
|
| 3 |
+
|
| 4 |
+
# Cascade dampening factor β grounded in Starcke & Brand (2012)
|
| 5 |
+
# Stress effects attenuate ~40% per cognitive/behavioral hop.
|
| 6 |
+
# A disruption propagates at full strength to immediate neighbors,
|
| 7 |
+
# 60% strength to second-order nodes, 36% to third-order, etc.
|
| 8 |
+
CASCADE_DAMPENING_DEFAULT = 0.6
|
| 9 |
+
METRIC_FLOOR = 10.0
|
| 10 |
+
|
| 11 |
+
@dataclass
|
| 12 |
+
class CareerMetrics:
|
| 13 |
+
satisfaction: float = 70.0
|
| 14 |
+
workload: float = 70.0
|
| 15 |
+
stability: float = 70.0
|
| 16 |
+
growth_trajectory: float = 70.0
|
| 17 |
+
|
| 18 |
+
@dataclass
|
| 19 |
+
class FinanceMetrics:
|
| 20 |
+
liquidity: float = 70.0
|
| 21 |
+
debt_pressure: float = 70.0
|
| 22 |
+
monthly_runway: float = 70.0
|
| 23 |
+
long_term_health: float = 70.0
|
| 24 |
+
|
| 25 |
+
@dataclass
|
| 26 |
+
class RelationshipMetrics:
|
| 27 |
+
romantic: float = 70.0
|
| 28 |
+
family: float = 70.0
|
| 29 |
+
social: float = 70.0
|
| 30 |
+
professional_network: float = 70.0
|
| 31 |
+
|
| 32 |
+
@dataclass
|
| 33 |
+
class PhysicalHealthMetrics:
|
| 34 |
+
energy: float = 70.0
|
| 35 |
+
fitness: float = 70.0
|
| 36 |
+
sleep_quality: float = 70.0
|
| 37 |
+
nutrition: float = 70.0
|
| 38 |
+
|
| 39 |
+
@dataclass
|
| 40 |
+
class MentalWellbeingMetrics:
|
| 41 |
+
stress_level: float = 70.0
|
| 42 |
+
clarity: float = 70.0
|
| 43 |
+
motivation: float = 70.0
|
| 44 |
+
emotional_stability: float = 70.0
|
| 45 |
+
|
| 46 |
+
@dataclass
|
| 47 |
+
class TimeMetrics:
|
| 48 |
+
free_hours_per_week: float = 70.0
|
| 49 |
+
commute_burden: float = 70.0
|
| 50 |
+
admin_overhead: float = 70.0
|
| 51 |
+
|
| 52 |
+
@dataclass
|
| 53 |
+
class LifeMetrics:
|
| 54 |
+
career: CareerMetrics = field(default_factory=CareerMetrics)
|
| 55 |
+
finances: FinanceMetrics = field(default_factory=FinanceMetrics)
|
| 56 |
+
relationships: RelationshipMetrics = field(default_factory=RelationshipMetrics)
|
| 57 |
+
physical_health: PhysicalHealthMetrics = field(default_factory=PhysicalHealthMetrics)
|
| 58 |
+
mental_wellbeing: MentalWellbeingMetrics = field(default_factory=MentalWellbeingMetrics)
|
| 59 |
+
time: TimeMetrics = field(default_factory=TimeMetrics)
|
| 60 |
+
|
| 61 |
+
def flatten(self) -> dict:
|
| 62 |
+
"""Returns a flat dictionary mapping 'domain.submetric' to value."""
|
| 63 |
+
flat = {}
|
| 64 |
+
for domain_name in self.__dataclass_fields__:
|
| 65 |
+
domain = getattr(self, domain_name)
|
| 66 |
+
for sub_name in domain.__dataclass_fields__:
|
| 67 |
+
flat[f"{domain_name}.{sub_name}"] = getattr(domain, sub_name)
|
| 68 |
+
return flat
|
| 69 |
+
|
| 70 |
+
@dataclass
|
| 71 |
+
class ResourceBudget:
|
| 72 |
+
time_hours: float = 20.0
|
| 73 |
+
money_dollars: float = 500.0
|
| 74 |
+
energy_units: float = 100.0
|
| 75 |
+
|
| 76 |
+
def deduct(self, time: float = 0.0, money: float = 0.0, energy: float = 0.0) -> bool:
|
| 77 |
+
"""Returns False if any resource would go negative, otherwise deducts and returns True."""
|
| 78 |
+
if (self.time_hours < time or
|
| 79 |
+
self.money_dollars < money or
|
| 80 |
+
self.energy_units < energy):
|
| 81 |
+
return False
|
| 82 |
+
|
| 83 |
+
self.time_hours -= time
|
| 84 |
+
self.money_dollars -= money
|
| 85 |
+
self.energy_units = min(100.0, self.energy_units - energy) # cap at 100
|
| 86 |
+
return True
|
| 87 |
+
|
| 88 |
+
class DependencyGraph:
|
| 89 |
+
def __init__(self):
|
| 90 |
+
# source_node -> [(target_node, weight)]
|
| 91 |
+
self.edges = {
|
| 92 |
+
"career.workload": [
|
| 93 |
+
("mental_wellbeing.stress_level", 0.70),
|
| 94 |
+
("time.free_hours_per_week", -0.80)
|
| 95 |
+
],
|
| 96 |
+
"finances.liquidity": [
|
| 97 |
+
("mental_wellbeing.stress_level", -0.60),
|
| 98 |
+
("finances.monthly_runway", 0.90)
|
| 99 |
+
],
|
| 100 |
+
"mental_wellbeing.stress_level": [
|
| 101 |
+
("physical_health.sleep_quality", -0.55),
|
| 102 |
+
("mental_wellbeing.emotional_stability", -0.50),
|
| 103 |
+
("mental_wellbeing.motivation", -0.40),
|
| 104 |
+
("career.satisfaction", -0.35)
|
| 105 |
+
],
|
| 106 |
+
"physical_health.sleep_quality": [
|
| 107 |
+
("mental_wellbeing.clarity", 0.60),
|
| 108 |
+
("physical_health.energy", 0.50)
|
| 109 |
+
],
|
| 110 |
+
"relationships.romantic": [
|
| 111 |
+
("mental_wellbeing.emotional_stability", 0.50)
|
| 112 |
+
],
|
| 113 |
+
"time.free_hours_per_week": [
|
| 114 |
+
("relationships.social", 0.45),
|
| 115 |
+
("mental_wellbeing.stress_level", -0.30)
|
| 116 |
+
],
|
| 117 |
+
"physical_health.energy": [
|
| 118 |
+
("mental_wellbeing.motivation", 0.40),
|
| 119 |
+
("physical_health.fitness", 0.30)
|
| 120 |
+
],
|
| 121 |
+
"career.satisfaction": [
|
| 122 |
+
("mental_wellbeing.motivation", 0.50)
|
| 123 |
+
],
|
| 124 |
+
"finances.debt_pressure": [
|
| 125 |
+
("mental_wellbeing.stress_level", 0.65)
|
| 126 |
+
],
|
| 127 |
+
"physical_health.nutrition": [
|
| 128 |
+
("physical_health.energy", 0.35)
|
| 129 |
+
],
|
| 130 |
+
"physical_health.fitness": [
|
| 131 |
+
("physical_health.energy", 0.40)
|
| 132 |
+
],
|
| 133 |
+
"time.commute_burden": [
|
| 134 |
+
("physical_health.energy", -0.30),
|
| 135 |
+
("mental_wellbeing.stress_level", 0.25)
|
| 136 |
+
],
|
| 137 |
+
"relationships.social": [
|
| 138 |
+
("mental_wellbeing.emotional_stability", 0.30)
|
| 139 |
+
],
|
| 140 |
+
"mental_wellbeing.clarity": [
|
| 141 |
+
("career.growth_trajectory", 0.45)
|
| 142 |
+
],
|
| 143 |
+
"finances.long_term_health": [
|
| 144 |
+
("mental_wellbeing.stress_level", -0.40)
|
| 145 |
+
],
|
| 146 |
+
"time.admin_overhead": [
|
| 147 |
+
("mental_wellbeing.stress_level", 0.25)
|
| 148 |
+
],
|
| 149 |
+
"career.stability": [
|
| 150 |
+
("mental_wellbeing.stress_level", -0.35)
|
| 151 |
+
],
|
| 152 |
+
"career.growth_trajectory": [
|
| 153 |
+
("career.satisfaction", 0.40)
|
| 154 |
+
],
|
| 155 |
+
"mental_wellbeing.motivation": [
|
| 156 |
+
("career.growth_trajectory", 0.30)
|
| 157 |
+
],
|
| 158 |
+
"relationships.professional_network": [
|
| 159 |
+
("career.stability", 0.35)
|
| 160 |
+
]
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
def _get_val(self, metrics: LifeMetrics, path: str) -> float:
|
| 164 |
+
if '.' not in path:
|
| 165 |
+
return 0.0
|
| 166 |
+
domain, sub = path.split('.', 1)
|
| 167 |
+
d = getattr(metrics, domain, None)
|
| 168 |
+
return getattr(d, sub, 0.0) if d else 0.0
|
| 169 |
+
|
| 170 |
+
def _set_val(self, metrics: LifeMetrics, path: str, val: float, is_cascade: bool = False):
|
| 171 |
+
if '.' not in path:
|
| 172 |
+
return
|
| 173 |
+
domain_name, sub_name = path.split('.', 1)
|
| 174 |
+
domain = getattr(metrics, domain_name, None)
|
| 175 |
+
if domain is None or not hasattr(domain, sub_name):
|
| 176 |
+
return
|
| 177 |
+
# Ensure values stay within bounds
|
| 178 |
+
floor = METRIC_FLOOR if is_cascade else 0.0
|
| 179 |
+
clamped_val = max(floor, min(100.0, val))
|
| 180 |
+
setattr(domain, sub_name, clamped_val)
|
| 181 |
+
|
| 182 |
+
def cascade(self, metrics: LifeMetrics, primary_disruption: dict, dampening: float = CASCADE_DAMPENING_DEFAULT, per_step_cascade_cap: int = 3) -> LifeMetrics:
|
| 183 |
+
"""Applies disruption and propagates effects through the dependency graph.
|
| 184 |
+
|
| 185 |
+
The dampening factor (default 0.6) is grounded in three complementary
|
| 186 |
+
research findings:
|
| 187 |
+
|
| 188 |
+
1. **Starcke & Brand (2012)** β Stress effects on decision-making
|
| 189 |
+
attenuate approximately 40% per cognitive/behavioral hop. A workload
|
| 190 |
+
spike directly raises stress at full magnitude, but the downstream
|
| 191 |
+
effect on sleep quality is only ~60% of that, and the tertiary effect
|
| 192 |
+
on mental clarity is ~36%. The 0.6 multiplier captures this empirical
|
| 193 |
+
attenuation rate.
|
| 194 |
+
|
| 195 |
+
2. **General Systems Theory** β Perturbations in coupled systems lose
|
| 196 |
+
energy as they propagate through interconnected nodes. Each transfer
|
| 197 |
+
across an edge dissipates a fraction of the original signal, preventing
|
| 198 |
+
unbounded cascades in finite systems.
|
| 199 |
+
|
| 200 |
+
3. **Empirical stress research** β Second-order life effects (e.g.
|
| 201 |
+
work stress β poor sleep β relationship strain) are consistently
|
| 202 |
+
reported as less severe than first-order effects in longitudinal
|
| 203 |
+
psychological studies, supporting a sub-unity propagation coefficient.
|
| 204 |
+
|
| 205 |
+
Args:
|
| 206 |
+
metrics: Current LifeMetrics state.
|
| 207 |
+
primary_disruption: Dict mapping 'domain.submetric' to delta float.
|
| 208 |
+
dampening: Propagation decay per hop (default CASCADE_DAMPENING_DEFAULT = 0.6).
|
| 209 |
+
per_step_cascade_cap: Max nodes allowed to be affected in one step.
|
| 210 |
+
|
| 211 |
+
Returns:
|
| 212 |
+
LifeMetrics: New state with disruption and cascade effects applied.
|
| 213 |
+
"""
|
| 214 |
+
new_metrics = copy.deepcopy(metrics)
|
| 215 |
+
queue = []
|
| 216 |
+
|
| 217 |
+
for path, amount in primary_disruption.items():
|
| 218 |
+
if '.' not in path: # skip malformed keys from LLM
|
| 219 |
+
continue
|
| 220 |
+
old_val = self._get_val(new_metrics, path)
|
| 221 |
+
self._set_val(new_metrics, path, old_val + amount, is_cascade=False)
|
| 222 |
+
queue.append((path, amount))
|
| 223 |
+
|
| 224 |
+
cascaded_metrics = set()
|
| 225 |
+
|
| 226 |
+
while queue:
|
| 227 |
+
source_path, source_magnitude = queue.pop(0)
|
| 228 |
+
|
| 229 |
+
if source_path in self.edges:
|
| 230 |
+
for target_path, weight in self.edges[source_path]:
|
| 231 |
+
if target_path not in cascaded_metrics and len(cascaded_metrics) >= per_step_cascade_cap:
|
| 232 |
+
continue # Cap at max per_step_cascade_cap metrics affected
|
| 233 |
+
|
| 234 |
+
impact = source_magnitude * weight * dampening
|
| 235 |
+
if abs(impact) >= 0.05:
|
| 236 |
+
old_target_val = self._get_val(new_metrics, target_path)
|
| 237 |
+
self._set_val(new_metrics, target_path, old_target_val + impact, is_cascade=True)
|
| 238 |
+
cascaded_metrics.add(target_path)
|
| 239 |
+
queue.append((target_path, impact))
|
| 240 |
+
|
| 241 |
+
return new_metrics
|
| 242 |
+
|
| 243 |
+
def main():
|
| 244 |
+
# Create LifeMetrics with default values (all at 70)
|
| 245 |
+
metrics = LifeMetrics()
|
| 246 |
+
|
| 247 |
+
# Create DependencyGraph
|
| 248 |
+
graph = DependencyGraph()
|
| 249 |
+
|
| 250 |
+
# Define test disruption
|
| 251 |
+
disruption = {
|
| 252 |
+
"career.workload": 30.0,
|
| 253 |
+
"finances.liquidity": -40.0
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
print("--- LIFE STACK INITIAL STATE (All defaults at 70) ---")
|
| 257 |
+
before = metrics.flatten()
|
| 258 |
+
for k, v in before.items():
|
| 259 |
+
print(f"{k:35} : {v:.2f}")
|
| 260 |
+
|
| 261 |
+
# Run the cascade simulation
|
| 262 |
+
after_metrics = graph.cascade(metrics, disruption)
|
| 263 |
+
after = after_metrics.flatten()
|
| 264 |
+
|
| 265 |
+
print("\n--- LIFE STACK AFTER DISRUPTION & CASCADE ---")
|
| 266 |
+
print(f"Disruption Applied: {disruption}\n")
|
| 267 |
+
|
| 268 |
+
for k in sorted(before.keys()):
|
| 269 |
+
val_before = before[k]
|
| 270 |
+
val_after = after[k]
|
| 271 |
+
diff = val_after - val_before
|
| 272 |
+
|
| 273 |
+
if abs(diff) > 0.001:
|
| 274 |
+
status = f"-> {val_after:6.2f} ({'+' if diff > 0 else ''}{diff:6.2f}) [CHANGED]"
|
| 275 |
+
else:
|
| 276 |
+
status = f" {val_after:6.2f} ( unchanged )"
|
| 277 |
+
|
| 278 |
+
print(f"{k:35} : {val_before:6.2f} {status}")
|
| 279 |
+
|
| 280 |
+
if __name__ == "__main__":
|
| 281 |
+
main()
|
core/lifestack_env.py
ADDED
|
@@ -0,0 +1,734 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import copy
|
| 2 |
+
from typing import Any, Optional, Dict, List
|
| 3 |
+
from pydantic import Field
|
| 4 |
+
|
| 5 |
+
from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
|
| 6 |
+
from core.metric_schema import normalize_metric_path
|
| 7 |
+
from core.reward import compute_reward, compute_task_reward
|
| 8 |
+
from core.task import Task, ExoEvent, Route, Milestone, FlightCrisisTask
|
| 9 |
+
from core.verifier import LifeStackVerifier
|
| 10 |
+
|
| 11 |
+
try:
|
| 12 |
+
from openenv.core import Environment, Action, Observation, State
|
| 13 |
+
from openenv.core.env_server.types import EnvironmentMetadata
|
| 14 |
+
from openenv.core.rubrics import Rubric
|
| 15 |
+
USING_MODERN_API = True
|
| 16 |
+
except ImportError:
|
| 17 |
+
try:
|
| 18 |
+
from openenv.env import Env as Environment
|
| 19 |
+
from pydantic import BaseModel
|
| 20 |
+
# Shims for missing classes in older/alternative openenv
|
| 21 |
+
class Action(BaseModel): pass
|
| 22 |
+
class Observation(BaseModel): pass
|
| 23 |
+
class State(BaseModel): pass
|
| 24 |
+
class Rubric:
|
| 25 |
+
def __init__(self, *a, **k): pass
|
| 26 |
+
def compute(self, *a, **k): return 0.0
|
| 27 |
+
EnvironmentMetadata = None
|
| 28 |
+
USING_MODERN_API = False
|
| 29 |
+
except ImportError:
|
| 30 |
+
# Final fallback β must use BaseModel so Pydantic subclasses work
|
| 31 |
+
from pydantic import BaseModel
|
| 32 |
+
class Environment:
|
| 33 |
+
def __init__(self, rubric=None): self.rubric = rubric
|
| 34 |
+
def reset(self, *a, **k): pass
|
| 35 |
+
def step(self, *a, **k): pass
|
| 36 |
+
class Action(BaseModel): pass
|
| 37 |
+
class Observation(BaseModel): pass
|
| 38 |
+
class State(BaseModel): pass
|
| 39 |
+
class Rubric:
|
| 40 |
+
def __init__(self, *a, **k): pass
|
| 41 |
+
def compute(self, *a, **k): return 0.0
|
| 42 |
+
EnvironmentMetadata = None
|
| 43 |
+
USING_MODERN_API = False
|
| 44 |
+
|
| 45 |
+
class LifeStackAction(Action):
|
| 46 |
+
"""Structured action for LifeStack."""
|
| 47 |
+
metric_changes: Dict[str, float] = Field(default_factory=dict, description="Metric adjustment deltas")
|
| 48 |
+
resource_cost: Dict[str, float] = Field(default_factory=dict, description="Time, money, and energy costs")
|
| 49 |
+
actions_taken: int = Field(default=0, description="Number of atomic actions taken")
|
| 50 |
+
|
| 51 |
+
# ToolAction fields (Long-horizon)
|
| 52 |
+
action_type: Optional[str] = Field(default=None, description="inspect, plan, execute, etc.")
|
| 53 |
+
target: Optional[str] = Field(default=None, description="e.g. route_id or hidden_key")
|
| 54 |
+
parameters: Dict[str, Any] = Field(default_factory=dict)
|
| 55 |
+
reasoning: Optional[str] = Field(default=None)
|
| 56 |
+
completion: Optional[str] = Field(default=None)
|
| 57 |
+
|
| 58 |
+
inspect_target: Optional[str] = Field(default=None, description="Optional hidden state key to inspect")
|
| 59 |
+
is_rollback: bool = Field(default=False, description="Set true to rollback the previous action.")
|
| 60 |
+
|
| 61 |
+
@classmethod
|
| 62 |
+
def from_agent_action(cls, agent_action: Any) -> "LifeStackAction":
|
| 63 |
+
"""Unified converter from legacy AgentAction to LifeStackAction."""
|
| 64 |
+
primary = agent_action.primary
|
| 65 |
+
return cls(
|
| 66 |
+
action_type=primary.action_type,
|
| 67 |
+
target=primary.target_domain, # Mapping target_domain to target
|
| 68 |
+
metric_changes=primary.metric_changes,
|
| 69 |
+
resource_cost=primary.resource_cost,
|
| 70 |
+
reasoning=agent_action.reasoning,
|
| 71 |
+
completion=getattr(agent_action, 'raw_completion', ""),
|
| 72 |
+
actions_taken=1
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
+
class LifeStackObservation(Observation):
|
| 76 |
+
"""Observation returned by LifeStack."""
|
| 77 |
+
metrics: Dict[str, float] = Field(default_factory=dict, description="Flattened 23-domain life metrics")
|
| 78 |
+
resources: Dict[str, float] = Field(default_factory=dict, description="Current budget remaining")
|
| 79 |
+
step: int = Field(default=0, description="Current episode step")
|
| 80 |
+
done: bool = Field(default=False)
|
| 81 |
+
reward: Optional[float] = Field(default=None)
|
| 82 |
+
metadata: Dict[str, Any] = Field(default_factory=dict)
|
| 83 |
+
|
| 84 |
+
class LifeStackState(State):
|
| 85 |
+
"""Internal state of the LifeStack environment."""
|
| 86 |
+
current_metrics: LifeMetrics = Field(default_factory=LifeMetrics)
|
| 87 |
+
budget: ResourceBudget = Field(default_factory=ResourceBudget)
|
| 88 |
+
episode_id: Optional[str] = None
|
| 89 |
+
step_count: int = 0
|
| 90 |
+
inspected_keys: list = Field(default_factory=list) # revealed keys
|
| 91 |
+
consecutive_waits: int = 0
|
| 92 |
+
used_rollback: bool = Field(default=False)
|
| 93 |
+
rollback_penalty_charged: bool = Field(default=False)
|
| 94 |
+
previous_metrics: Optional[LifeMetrics] = None
|
| 95 |
+
previous_budget: Optional[ResourceBudget] = None
|
| 96 |
+
|
| 97 |
+
# New task fields
|
| 98 |
+
current_task: Optional[Task] = None
|
| 99 |
+
active_route_id: Optional[str] = None
|
| 100 |
+
milestones_achieved: list = Field(default_factory=list)
|
| 101 |
+
world_state: dict = Field(default_factory=dict)
|
| 102 |
+
hidden_state: dict = Field(default_factory=dict)
|
| 103 |
+
fired_event_ids: list = Field(default_factory=list)
|
| 104 |
+
exo_events_seen: int = 0
|
| 105 |
+
milestones_after_event: int = 0
|
| 106 |
+
closed_route_ids: set = Field(default_factory=set)
|
| 107 |
+
# Legacy / Personality fields
|
| 108 |
+
person: Optional[Any] = None
|
| 109 |
+
agent_history: List[tuple] = Field(default_factory=list)
|
| 110 |
+
current_conflict: Optional[Any] = None
|
| 111 |
+
rollback_penalty_charged: bool = Field(default=False)
|
| 112 |
+
cumulative_rel_delta: float = Field(default=0.0)
|
| 113 |
+
class LifeStackRubric(Rubric):
|
| 114 |
+
"""Standard reward rubric for LifeStack."""
|
| 115 |
+
def forward(self, action: LifeStackAction, observation: LifeStackObservation) -> float:
|
| 116 |
+
# In LifeStack, reward is usually computed inside step() for state-transition access.
|
| 117 |
+
# This rubric provides a hook for external reward evaluation if needed.
|
| 118 |
+
return observation.reward if observation.reward is not None else 0.0
|
| 119 |
+
|
| 120 |
+
class PartialObsFilter:
|
| 121 |
+
@staticmethod
|
| 122 |
+
def filter(task: Task, revealed_keys: list) -> dict:
|
| 123 |
+
"""Returns visible_world plus any keys the agent has explicitly inspected.
|
| 124 |
+
|
| 125 |
+
Revealed keys are checked against mutable_world first, then hidden_state.
|
| 126 |
+
Keys sourced from hidden_state are wrapped as
|
| 127 |
+
``{"value": <val>, "source": "inspect"}`` so the agent knows they were
|
| 128 |
+
obtained via an inspect action rather than being freely observable.
|
| 129 |
+
"""
|
| 130 |
+
obs_world = copy.deepcopy(task.visible_world)
|
| 131 |
+
for k in revealed_keys:
|
| 132 |
+
if k in task.mutable_world:
|
| 133 |
+
obs_world[k] = task.mutable_world[k]
|
| 134 |
+
elif k in task.hidden_state:
|
| 135 |
+
obs_world[k] = {"value": task.hidden_state[k], "source": "inspect"}
|
| 136 |
+
return obs_world
|
| 137 |
+
|
| 138 |
+
class WorldEngine:
|
| 139 |
+
def __init__(self, task: Task):
|
| 140 |
+
self.task = task
|
| 141 |
+
self.closed_routes = set()
|
| 142 |
+
|
| 143 |
+
def inject_events(self, step: int, world: dict, hidden: dict) -> list[ExoEvent]:
|
| 144 |
+
import random
|
| 145 |
+
fired = []
|
| 146 |
+
for event in self.task.event_schedule:
|
| 147 |
+
fire = False
|
| 148 |
+
if event.step == step:
|
| 149 |
+
fire = True
|
| 150 |
+
elif event.step == -1:
|
| 151 |
+
if random.random() < event.probability:
|
| 152 |
+
fire = True
|
| 153 |
+
|
| 154 |
+
if fire:
|
| 155 |
+
fired.append(event)
|
| 156 |
+
# Apply mutations
|
| 157 |
+
world.update(event.world_mutation)
|
| 158 |
+
hidden.update(event.hidden_state_mutation)
|
| 159 |
+
for rid in event.closes_routes:
|
| 160 |
+
self.closed_routes.add(rid)
|
| 161 |
+
return fired
|
| 162 |
+
|
| 163 |
+
def get_closed_routes(self) -> set[str]:
|
| 164 |
+
return self.closed_routes
|
| 165 |
+
|
| 166 |
+
_EnvBase = Environment[LifeStackAction, LifeStackObservation, LifeStackState] if USING_MODERN_API else Environment
|
| 167 |
+
|
| 168 |
+
class LifeStackEnv(_EnvBase):
|
| 169 |
+
"""
|
| 170 |
+
LifeStack Environment v1.1 β Refactored for OpenEnv 0.2.3 compliance.
|
| 171 |
+
"""
|
| 172 |
+
SUPPORTS_CONCURRENT_SESSIONS = True
|
| 173 |
+
|
| 174 |
+
def __init__(self, seed: Optional[int] = None, task=None, max_steps: int = 30):
|
| 175 |
+
if USING_MODERN_API:
|
| 176 |
+
super().__init__(rubric=LifeStackRubric())
|
| 177 |
+
else:
|
| 178 |
+
super().__init__()
|
| 179 |
+
|
| 180 |
+
self.max_steps = getattr(task, 'horizon', max_steps) if task else max_steps
|
| 181 |
+
|
| 182 |
+
self.metadata_internal = {
|
| 183 |
+
'name': 'LifeStack-v1',
|
| 184 |
+
'version': '1.1.0',
|
| 185 |
+
'description': 'Premium multi-domain life conflict resolution simulation',
|
| 186 |
+
'max_episode_steps': self.max_steps
|
| 187 |
+
}
|
| 188 |
+
|
| 189 |
+
self.graph = DependencyGraph()
|
| 190 |
+
self._internal_state = LifeStackState()
|
| 191 |
+
|
| 192 |
+
def get_metadata(self):
|
| 193 |
+
if not USING_MODERN_API:
|
| 194 |
+
return self.metadata_internal
|
| 195 |
+
from openenv.core.env_server.types import EnvironmentMetadata
|
| 196 |
+
return EnvironmentMetadata(
|
| 197 |
+
name=self.metadata_internal['name'],
|
| 198 |
+
version=self.metadata_internal['version'],
|
| 199 |
+
description=self.metadata_internal['description']
|
| 200 |
+
)
|
| 201 |
+
|
| 202 |
+
@property
|
| 203 |
+
def state(self) -> LifeStackState:
|
| 204 |
+
return self._internal_state
|
| 205 |
+
|
| 206 |
+
def reset(self, seed: Optional[int] = None, episode_id: Optional[str] = None,
|
| 207 |
+
task: Optional[Task] = None, conflict: Optional[Any] = None,
|
| 208 |
+
budget: Optional[dict] = None, person: Optional[Any] = None,
|
| 209 |
+
agent_history: Optional[List[tuple]] = None, **kwargs) -> LifeStackObservation:
|
| 210 |
+
"""Resets the environment. Seed and task/conflict can be provided."""
|
| 211 |
+
if USING_MODERN_API and getattr(self, 'rubric', None):
|
| 212 |
+
self.rubric.reset()
|
| 213 |
+
|
| 214 |
+
if seed is not None:
|
| 215 |
+
import random
|
| 216 |
+
random.seed(seed)
|
| 217 |
+
|
| 218 |
+
# 1. Initialize Task
|
| 219 |
+
self._internal_state.current_task = task or FlightCrisisTask()
|
| 220 |
+
self.max_steps = getattr(self._internal_state.current_task, 'horizon', 30)
|
| 221 |
+
|
| 222 |
+
# 2. Reset State
|
| 223 |
+
self._internal_state.episode_id = episode_id
|
| 224 |
+
self._internal_state.step_count = 0
|
| 225 |
+
self._internal_state.current_metrics = LifeMetrics()
|
| 226 |
+
self._internal_state.inspected_keys = []
|
| 227 |
+
self._internal_state.consecutive_waits = 0
|
| 228 |
+
self._internal_state.used_rollback = False
|
| 229 |
+
self._internal_state.rollback_penalty_charged = False
|
| 230 |
+
self._internal_state.previous_metrics = None
|
| 231 |
+
self._internal_state.previous_budget = None
|
| 232 |
+
self._internal_state.rollback_penalty_charged = False
|
| 233 |
+
self._internal_state.cumulative_rel_delta = 0.0
|
| 234 |
+
|
| 235 |
+
# Task state
|
| 236 |
+
self._internal_state.world_state = copy.deepcopy(self._internal_state.current_task.mutable_world)
|
| 237 |
+
self._internal_state.hidden_state = copy.deepcopy(self._internal_state.current_task.hidden_state)
|
| 238 |
+
self._internal_state.milestones_achieved = []
|
| 239 |
+
self._internal_state.active_route_id = None
|
| 240 |
+
self._internal_state.fired_event_ids = []
|
| 241 |
+
self._internal_state.exo_events_seen = 0
|
| 242 |
+
self._internal_state.milestones_after_event = 0
|
| 243 |
+
self._internal_state.closed_route_ids = set()
|
| 244 |
+
|
| 245 |
+
self._internal_state.person = person
|
| 246 |
+
self._internal_state.agent_history = agent_history or []
|
| 247 |
+
self._internal_state.current_conflict = conflict
|
| 248 |
+
|
| 249 |
+
self.world_engine = WorldEngine(self._internal_state.current_task)
|
| 250 |
+
|
| 251 |
+
# 3. Budget Scaling
|
| 252 |
+
scale = max(1.0, self.max_steps / 5.0)
|
| 253 |
+
constraints = self._internal_state.current_task.constraints
|
| 254 |
+
self._internal_state.budget = ResourceBudget(
|
| 255 |
+
time_hours=budget.get("time", constraints.get("time", 20.0 * scale)) if budget else constraints.get("time", 20.0 * scale),
|
| 256 |
+
money_dollars=budget.get("money", constraints.get("money", 500.0 * scale)) if budget else constraints.get("money", 500.0 * scale),
|
| 257 |
+
energy_units=budget.get("energy", constraints.get("energy", 100.0 * scale)) if budget else constraints.get("energy", 100.0 * scale)
|
| 258 |
+
)
|
| 259 |
+
|
| 260 |
+
if conflict:
|
| 261 |
+
# Legacy disruption support
|
| 262 |
+
disruption = conflict.primary_disruption if hasattr(conflict, 'primary_disruption') else conflict
|
| 263 |
+
self._internal_state.current_metrics = self.graph.cascade(self._internal_state.current_metrics, disruption)
|
| 264 |
+
if budget is None and hasattr(conflict, 'resource_budget'):
|
| 265 |
+
rb = conflict.resource_budget
|
| 266 |
+
self._internal_state.budget = ResourceBudget(
|
| 267 |
+
time_hours=rb.get("time", 20.0),
|
| 268 |
+
money_dollars=rb.get("money", 500.0),
|
| 269 |
+
energy_units=rb.get("energy", 100.0)
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
return self._get_obs()
|
| 273 |
+
|
| 274 |
+
def _get_obs(self, done: bool = False, reward: Optional[float] = None,
|
| 275 |
+
success: bool = False, failure: bool = False,
|
| 276 |
+
failure_reason: str = "", routes_remaining: int = 0) -> LifeStackObservation:
|
| 277 |
+
revealed_world = PartialObsFilter.filter(
|
| 278 |
+
self._internal_state.current_task,
|
| 279 |
+
self._internal_state.inspected_keys
|
| 280 |
+
)
|
| 281 |
+
|
| 282 |
+
return LifeStackObservation(
|
| 283 |
+
metrics=self._internal_state.current_metrics.flatten(),
|
| 284 |
+
resources={
|
| 285 |
+
"time": self._internal_state.budget.time_hours,
|
| 286 |
+
"money": self._internal_state.budget.money_dollars,
|
| 287 |
+
"energy": self._internal_state.budget.energy_units
|
| 288 |
+
},
|
| 289 |
+
step=self._internal_state.step_count,
|
| 290 |
+
done=done,
|
| 291 |
+
reward=reward,
|
| 292 |
+
metadata={
|
| 293 |
+
"world_state": revealed_world,
|
| 294 |
+
"goal": self._internal_state.current_task.goal,
|
| 295 |
+
"active_route": self._internal_state.active_route_id,
|
| 296 |
+
"milestones": self._internal_state.milestones_achieved,
|
| 297 |
+
"events": self._internal_state.fired_event_ids,
|
| 298 |
+
"success": success,
|
| 299 |
+
"failure": failure,
|
| 300 |
+
"failure_reason": failure_reason,
|
| 301 |
+
"routes_remaining": routes_remaining,
|
| 302 |
+
"conflict_title": self._internal_state.current_conflict.title if hasattr(self._internal_state.current_conflict, 'title') else "Custom Task",
|
| 303 |
+
"person": self._internal_state.person.name if hasattr(self._internal_state.person, 'name') else "Unknown"
|
| 304 |
+
}
|
| 305 |
+
)
|
| 306 |
+
|
| 307 |
+
def _update_metric(self, path: str, delta: float):
|
| 308 |
+
"""Internal helper for non-cascading updates."""
|
| 309 |
+
path = normalize_metric_path(path)
|
| 310 |
+
if '.' not in path:
|
| 311 |
+
return
|
| 312 |
+
domain_name, sub_name = path.split('.', 1)
|
| 313 |
+
domain = getattr(self._internal_state.current_metrics, domain_name, None)
|
| 314 |
+
if domain and hasattr(domain, sub_name):
|
| 315 |
+
val = getattr(domain, sub_name)
|
| 316 |
+
setattr(domain, sub_name, max(0.0, min(100.0, val + delta)))
|
| 317 |
+
|
| 318 |
+
def step(self, action: LifeStackAction, timeout_s: Optional[float] = None, **kwargs) -> LifeStackObservation:
|
| 319 |
+
"""Executes one step in the environment using LifeStackAction logic."""
|
| 320 |
+
if isinstance(action, dict):
|
| 321 |
+
action = LifeStackAction(**action)
|
| 322 |
+
|
| 323 |
+
task = self._internal_state.current_task
|
| 324 |
+
state_before = copy.deepcopy(self._internal_state.current_metrics)
|
| 325 |
+
info_msgs = []
|
| 326 |
+
|
| 327 |
+
# 0. Personality Drift & Legacy Escalation
|
| 328 |
+
if self._internal_state.person:
|
| 329 |
+
drift_event = self._internal_state.person.drift(self._internal_state.step_count)
|
| 330 |
+
if drift_event:
|
| 331 |
+
path = drift_event.get('metric', '')
|
| 332 |
+
delta = drift_event.get('delta', 0)
|
| 333 |
+
if path and '.' in path:
|
| 334 |
+
self._update_metric(path, delta)
|
| 335 |
+
info_msgs.append(f"DRIFT: {drift_event['reason']}")
|
| 336 |
+
|
| 337 |
+
if self._internal_state.current_conflict and self._internal_state.step_count == 2:
|
| 338 |
+
from agent.conflict_generator import adaptive_escalate
|
| 339 |
+
conflict = self._internal_state.current_conflict
|
| 340 |
+
if hasattr(conflict, 'difficulty') and conflict.difficulty < 5:
|
| 341 |
+
new_conflict, reason = adaptive_escalate(conflict, self._internal_state.agent_history)
|
| 342 |
+
if new_conflict.id != conflict.id:
|
| 343 |
+
self._internal_state.current_conflict = new_conflict
|
| 344 |
+
info_msgs.append(f"ESCALATION: {reason} -> {new_conflict.title}")
|
| 345 |
+
fired_events = self.world_engine.inject_events(
|
| 346 |
+
self._internal_state.step_count,
|
| 347 |
+
self._internal_state.world_state,
|
| 348 |
+
self._internal_state.hidden_state
|
| 349 |
+
)
|
| 350 |
+
if fired_events:
|
| 351 |
+
self._internal_state.exo_events_seen += len(fired_events)
|
| 352 |
+
for e in fired_events:
|
| 353 |
+
self._internal_state.fired_event_ids.append(e.id)
|
| 354 |
+
info_msgs.append(f"EVENT_FIRED: {e.description}")
|
| 355 |
+
|
| 356 |
+
self._internal_state.closed_route_ids.update(self.world_engine.get_closed_routes())
|
| 357 |
+
|
| 358 |
+
# 2. Tool Logic & Metric Changes
|
| 359 |
+
tool_type = action.action_type or (
|
| 360 |
+
"rollback" if action.is_rollback else
|
| 361 |
+
"inspect" if action.inspect_target else
|
| 362 |
+
"execute"
|
| 363 |
+
)
|
| 364 |
+
|
| 365 |
+
allowed_keys = set(self._internal_state.current_metrics.flatten().keys())
|
| 366 |
+
metric_changes = {k: v for k, v in action.metric_changes.items() if k in allowed_keys}
|
| 367 |
+
resource_cost = copy.deepcopy(action.resource_cost)
|
| 368 |
+
|
| 369 |
+
# Handle Rollback
|
| 370 |
+
if tool_type == "rollback":
|
| 371 |
+
self._internal_state.step_count += 1
|
| 372 |
+
if self._internal_state.used_rollback:
|
| 373 |
+
info_msgs.append("ROLLBACK_DENIED: Already used once.")
|
| 374 |
+
return self._get_obs(reward=-0.1)
|
| 375 |
+
if not self._internal_state.previous_metrics:
|
| 376 |
+
return self._get_obs(reward=0.0)
|
| 377 |
+
self._internal_state.current_metrics = copy.deepcopy(self._internal_state.previous_metrics)
|
| 378 |
+
self._internal_state.budget = copy.deepcopy(self._internal_state.previous_budget)
|
| 379 |
+
self._internal_state.used_rollback = True
|
| 380 |
+
self._internal_state.rollback_penalty_charged = True # Penalty baked into the -0.1 return above
|
| 381 |
+
return self._get_obs(reward=-0.1)
|
| 382 |
+
|
| 383 |
+
# Save state for future rollback
|
| 384 |
+
self._internal_state.previous_metrics = copy.deepcopy(self._internal_state.current_metrics)
|
| 385 |
+
self._internal_state.previous_budget = copy.deepcopy(self._internal_state.budget)
|
| 386 |
+
|
| 387 |
+
# Handle Inspect
|
| 388 |
+
if tool_type == "inspect":
|
| 389 |
+
target = action.target or action.inspect_target
|
| 390 |
+
if target:
|
| 391 |
+
if target in self._internal_state.inspected_keys:
|
| 392 |
+
info_msgs.append(f"INSPECT_REDUNDANT: {target}")
|
| 393 |
+
else:
|
| 394 |
+
self._internal_state.inspected_keys.append(target)
|
| 395 |
+
info_msgs.append(f"INSPECT_REVEALED: {target}")
|
| 396 |
+
# Emit an explicit signal when a hidden-state value is uncovered.
|
| 397 |
+
if target in task.hidden_state:
|
| 398 |
+
info_msgs.append(
|
| 399 |
+
f"INSPECT_REVEALED_HIDDEN: {target} = {task.hidden_state[target]}"
|
| 400 |
+
)
|
| 401 |
+
|
| 402 |
+
# Handle Wait
|
| 403 |
+
if tool_type == "wait":
|
| 404 |
+
self._internal_state.consecutive_waits += 1
|
| 405 |
+
if self._internal_state.consecutive_waits >= 4:
|
| 406 |
+
metric_changes["mental_wellbeing.stress_level"] = metric_changes.get("mental_wellbeing.stress_level", 0) + 15.0
|
| 407 |
+
info_msgs.append("WAIT_CAP_EXCEEDED: Forced stress applied.")
|
| 408 |
+
else:
|
| 409 |
+
self._internal_state.consecutive_waits = 0
|
| 410 |
+
|
| 411 |
+
# Handle Route Execution
|
| 412 |
+
if tool_type == "execute" and action.target:
|
| 413 |
+
route = next((r for r in task.viable_routes if r.id == action.target), None)
|
| 414 |
+
if route:
|
| 415 |
+
# Check closed
|
| 416 |
+
if route.id in self._internal_state.closed_route_ids:
|
| 417 |
+
info_msgs.append(f"ROUTE_BLOCKED: {route.name}")
|
| 418 |
+
else:
|
| 419 |
+
# Check preconditions
|
| 420 |
+
pre_ok = True
|
| 421 |
+
for k, v in route.preconditions.items():
|
| 422 |
+
current_v = self._internal_state.hidden_state.get(k, self._internal_state.world_state.get(k))
|
| 423 |
+
if current_v != v:
|
| 424 |
+
pre_ok = False
|
| 425 |
+
break
|
| 426 |
+
|
| 427 |
+
if not pre_ok:
|
| 428 |
+
info_msgs.append(f"PRECONDITIONS_FAILED for {route.name}")
|
| 429 |
+
else:
|
| 430 |
+
# Success: Apply route
|
| 431 |
+
self._internal_state.active_route_id = route.id
|
| 432 |
+
self._internal_state.world_state.update(route.consequences)
|
| 433 |
+
info_msgs.append(f"ROUTE_SUCCESS: {route.name}")
|
| 434 |
+
|
| 435 |
+
# 3. Resource Deduction (must happen BEFORE metric changes to prevent budget-bypass exploit)
|
| 436 |
+
deduct_ok = self._internal_state.budget.deduct(
|
| 437 |
+
time=resource_cost.get('time', 0.0),
|
| 438 |
+
money=resource_cost.get('money', 0.0),
|
| 439 |
+
energy=resource_cost.get('energy', 0.0)
|
| 440 |
+
)
|
| 441 |
+
if not deduct_ok:
|
| 442 |
+
info_msgs.append("RESOURCE_DEPLETED_ACTION_BLOCKED")
|
| 443 |
+
metric_changes = {} # Discard changes β agent can't afford this action
|
| 444 |
+
|
| 445 |
+
# 4. Apply Metric and Cascade
|
| 446 |
+
sig_changes = {k: v for k, v in metric_changes.items() if abs(v) > 5.0}
|
| 447 |
+
for k, v in metric_changes.items():
|
| 448 |
+
if k not in sig_changes:
|
| 449 |
+
self._update_metric(k, v)
|
| 450 |
+
|
| 451 |
+
if sig_changes:
|
| 452 |
+
self._internal_state.current_metrics = self.graph.cascade(self._internal_state.current_metrics, sig_changes)
|
| 453 |
+
|
| 454 |
+
# 5. Task Progression Check
|
| 455 |
+
success_mets = LifeStackVerifier.check_success(task, self._internal_state.world_state, self._internal_state.hidden_state)
|
| 456 |
+
failure_mets = LifeStackVerifier.check_failure(task, self._internal_state.world_state, self._internal_state.hidden_state, self._internal_state.current_metrics.flatten())
|
| 457 |
+
|
| 458 |
+
# Check milestones dynamically
|
| 459 |
+
newly_met = LifeStackVerifier.check_new_milestones(task, self._internal_state.world_state, self._internal_state.hidden_state, self._internal_state.milestones_achieved)
|
| 460 |
+
for mid in newly_met:
|
| 461 |
+
self._internal_state.milestones_achieved.append(mid)
|
| 462 |
+
if self._internal_state.exo_events_seen > 0:
|
| 463 |
+
self._internal_state.milestones_after_event += 1
|
| 464 |
+
info_msgs.append(f"MILESTONE_UNLOCKED: {mid}")
|
| 465 |
+
|
| 466 |
+
# 6. Reward Calculation (Task-Aware)
|
| 467 |
+
routes_rem, _ = LifeStackVerifier.get_route_status(task, self._internal_state.closed_route_ids, self._internal_state.world_state, self._internal_state.hidden_state)
|
| 468 |
+
|
| 469 |
+
# Determine cascade collapse
|
| 470 |
+
metrics_after = self._internal_state.current_metrics.flatten()
|
| 471 |
+
metrics_before = state_before.flatten()
|
| 472 |
+
collapse = any(metrics_after[k] < 20 and metrics_before[k] >= 20 for k in metrics_after)
|
| 473 |
+
|
| 474 |
+
# Track cumulative relationship erosion across steps
|
| 475 |
+
rel_keys_cum = [k for k in metrics_after if k.startswith('relationships.')]
|
| 476 |
+
if rel_keys_cum:
|
| 477 |
+
step_rel_delta = sum(metrics_after[k] - metrics_before[k] for k in rel_keys_cum) / len(rel_keys_cum)
|
| 478 |
+
self._internal_state.cumulative_rel_delta += step_rel_delta
|
| 479 |
+
|
| 480 |
+
# Increment step_count BEFORE reward so timeout_check fires correctly
|
| 481 |
+
self._internal_state.step_count += 1
|
| 482 |
+
|
| 483 |
+
# Rollback penalty fires only once per episode
|
| 484 |
+
rollback_this_step = self._internal_state.used_rollback and not self._internal_state.rollback_penalty_charged
|
| 485 |
+
if rollback_this_step:
|
| 486 |
+
self._internal_state.rollback_penalty_charged = True
|
| 487 |
+
|
| 488 |
+
# conflict_domain from task.domain (not conflict.title) to prevent empty-string bypass
|
| 489 |
+
conflict_domain = task.domain if task and hasattr(task, 'domain') else ""
|
| 490 |
+
|
| 491 |
+
if task:
|
| 492 |
+
reward, breakdown = compute_task_reward(
|
| 493 |
+
state_before=state_before,
|
| 494 |
+
state_after=self._internal_state.current_metrics,
|
| 495 |
+
resources_used=resource_cost,
|
| 496 |
+
actions_taken=action.actions_taken,
|
| 497 |
+
milestones_achieved=self._internal_state.milestones_achieved,
|
| 498 |
+
success_conditions_met=success_mets,
|
| 499 |
+
exo_events_seen=self._internal_state.exo_events_seen,
|
| 500 |
+
milestones_after_event=self._internal_state.milestones_after_event,
|
| 501 |
+
routes_remaining=routes_rem,
|
| 502 |
+
rollback_used=rollback_this_step,
|
| 503 |
+
cascade_collapse=collapse,
|
| 504 |
+
task=task,
|
| 505 |
+
reasoning=getattr(action, 'reasoning', ""),
|
| 506 |
+
completion=getattr(action, 'completion', ""),
|
| 507 |
+
conflict_domain=conflict_domain,
|
| 508 |
+
step_count=self._internal_state.step_count,
|
| 509 |
+
max_steps=self.max_steps,
|
| 510 |
+
metric_changes=metric_changes,
|
| 511 |
+
cumulative_rel_delta=self._internal_state.cumulative_rel_delta,
|
| 512 |
+
action_type=tool_type
|
| 513 |
+
)
|
| 514 |
+
# Charge the rollback penalty only once per episode
|
| 515 |
+
if self._internal_state.used_rollback and not self._internal_state.rollback_penalty_charged:
|
| 516 |
+
self._internal_state.rollback_penalty_charged = True
|
| 517 |
+
else:
|
| 518 |
+
reward, breakdown = compute_reward(
|
| 519 |
+
state_before=state_before,
|
| 520 |
+
state_after=self._internal_state.current_metrics,
|
| 521 |
+
resources_used=resource_cost,
|
| 522 |
+
actions_taken=action.actions_taken,
|
| 523 |
+
metric_changes=metric_changes,
|
| 524 |
+
completion=getattr(action, 'completion', ""),
|
| 525 |
+
action_type=tool_type
|
| 526 |
+
)
|
| 527 |
+
|
| 528 |
+
# 7. End Conditions
|
| 529 |
+
# Check if ANY success condition is met.
|
| 530 |
+
# For multi-goal tasks with mutually exclusive routes, any() allows termination.
|
| 531 |
+
is_success = any(success_mets) if (success_mets and len(task.success_conditions) > 0) else False
|
| 532 |
+
is_task_failure = any(val == True for val in failure_mets)
|
| 533 |
+
metric_death = any(v <= 10 for v in metrics_after.values())
|
| 534 |
+
|
| 535 |
+
failure_reason = ""
|
| 536 |
+
if is_task_failure:
|
| 537 |
+
reasons = [cond['key'] for i, cond in enumerate(task.failure_conditions) if failure_mets[i]]
|
| 538 |
+
failure_reason = f"Condition failed: {', '.join(reasons)}"
|
| 539 |
+
elif metric_death:
|
| 540 |
+
dead_metrics = [k for k, v in metrics_after.items() if v <= 0]
|
| 541 |
+
failure_reason = f"Metrics hit zero: {', '.join(dead_metrics)}"
|
| 542 |
+
elif routes_rem == 0 and not is_success:
|
| 543 |
+
failure_reason = "Dead end: No reachable routes left."
|
| 544 |
+
|
| 545 |
+
terminated = is_task_failure or metric_death
|
| 546 |
+
truncated = self._internal_state.step_count >= self.max_steps
|
| 547 |
+
if is_success:
|
| 548 |
+
truncated = True
|
| 549 |
+
done = terminated or truncated
|
| 550 |
+
|
| 551 |
+
observation = self._get_obs(
|
| 552 |
+
done,
|
| 553 |
+
reward,
|
| 554 |
+
success=is_success,
|
| 555 |
+
failure=terminated,
|
| 556 |
+
failure_reason=failure_reason,
|
| 557 |
+
routes_remaining=routes_rem
|
| 558 |
+
)
|
| 559 |
+
observation.metadata["breakdown"] = breakdown
|
| 560 |
+
observation.metadata["info"] = info_msgs
|
| 561 |
+
return observation
|
| 562 |
+
|
| 563 |
+
def rollout(self, n_steps: int = 7, gamma: float = 0.9) -> dict:
|
| 564 |
+
"""
|
| 565 |
+
Simulate n_steps null/rest actions starting from the current env state.
|
| 566 |
+
|
| 567 |
+
Intended to be called immediately AFTER env.step(model_action) so it
|
| 568 |
+
models "what happens to your life over the next N days if nothing
|
| 569 |
+
extraordinary occurs."
|
| 570 |
+
|
| 571 |
+
The env state is fully restored after the rollout β calling this is
|
| 572 |
+
side-effect-free from the caller's perspective.
|
| 573 |
+
|
| 574 |
+
Returns:
|
| 575 |
+
{
|
| 576 |
+
"discounted_reward": float, # Ξ³-discounted cumulative
|
| 577 |
+
"immediate_r0": float, # reward from the action (caller supplies)
|
| 578 |
+
"trajectory": [ # one entry per simulated day
|
| 579 |
+
{
|
| 580 |
+
"step": int, # 1-indexed future day
|
| 581 |
+
"reward": float,
|
| 582 |
+
"metrics": Dict[str, float], # flattened snapshot
|
| 583 |
+
"discounted_contribution": float,
|
| 584 |
+
},
|
| 585 |
+
...
|
| 586 |
+
],
|
| 587 |
+
"n_steps_completed": int,
|
| 588 |
+
}
|
| 589 |
+
"""
|
| 590 |
+
saved_state = copy.deepcopy(self._internal_state)
|
| 591 |
+
|
| 592 |
+
null_action = LifeStackAction(
|
| 593 |
+
action_type="rest",
|
| 594 |
+
target="time",
|
| 595 |
+
metric_changes={},
|
| 596 |
+
resource_cost={},
|
| 597 |
+
actions_taken=0,
|
| 598 |
+
)
|
| 599 |
+
|
| 600 |
+
trajectory = []
|
| 601 |
+
cumulative = 0.0
|
| 602 |
+
|
| 603 |
+
for t in range(n_steps):
|
| 604 |
+
obs = self.step(null_action)
|
| 605 |
+
disc = (gamma ** (t + 1)) * float(obs.reward)
|
| 606 |
+
cumulative += disc
|
| 607 |
+
trajectory.append({
|
| 608 |
+
"step": t + 1,
|
| 609 |
+
"reward": float(obs.reward),
|
| 610 |
+
"metrics": dict(obs.metrics),
|
| 611 |
+
"discounted_contribution": round(disc, 5),
|
| 612 |
+
})
|
| 613 |
+
if obs.done:
|
| 614 |
+
break
|
| 615 |
+
|
| 616 |
+
# Restore β rollout must not mutate the env visible to the caller
|
| 617 |
+
self._internal_state = saved_state
|
| 618 |
+
|
| 619 |
+
return {
|
| 620 |
+
"discounted_reward": round(cumulative, 5),
|
| 621 |
+
"trajectory": trajectory,
|
| 622 |
+
"n_steps_completed": len(trajectory),
|
| 623 |
+
}
|
| 624 |
+
|
| 625 |
+
def render(self):
|
| 626 |
+
"""Vibrant status report of the current state and task progress."""
|
| 627 |
+
task = self._internal_state.current_task
|
| 628 |
+
print("\n" + "β"*70)
|
| 629 |
+
print(f"π― GOAL: {task.goal} | Horizon: {self._internal_state.step_count}/{self.max_steps}")
|
| 630 |
+
print(f"β TIME: {self._internal_state.budget.time_hours:.1f}h | π΅ MONEY: ${self._internal_state.budget.money_dollars:.1f} | β‘ ENERGY: {self._internal_state.budget.energy_units:.1f}")
|
| 631 |
+
|
| 632 |
+
if self._internal_state.active_route_id:
|
| 633 |
+
print(f"π£οΈ ACTIVE ROUTE: {self._internal_state.active_route_id}")
|
| 634 |
+
|
| 635 |
+
print(f"β MILESTONES: {', '.join(self._internal_state.milestones_achieved) or 'None'}")
|
| 636 |
+
|
| 637 |
+
if self._internal_state.fired_event_ids:
|
| 638 |
+
print(f"π¨ EVENTS: {', '.join(self._internal_state.fired_event_ids)}")
|
| 639 |
+
|
| 640 |
+
flat = self._internal_state.current_metrics.flatten()
|
| 641 |
+
domain_labels = {
|
| 642 |
+
"career": "πΌ CAREER",
|
| 643 |
+
"finances": "π° FINANCES",
|
| 644 |
+
"relationships": "β€οΈ RELATIONSHIPS",
|
| 645 |
+
"physical_health": "πͺ PHYSICAL",
|
| 646 |
+
"mental_wellbeing": "π§ MENTAL",
|
| 647 |
+
"time": "π
TIME"
|
| 648 |
+
}
|
| 649 |
+
|
| 650 |
+
for dom, label in domain_labels.items():
|
| 651 |
+
print(f"\n{label}")
|
| 652 |
+
submetrics = {k: v for k, v in flat.items() if k.startswith(dom + ".")}
|
| 653 |
+
inverted = {"stress_level", "debt_pressure", "workload", "commute_burden", "admin_overhead"}
|
| 654 |
+
for name, val in submetrics.items():
|
| 655 |
+
short = name.split('.')[1]
|
| 656 |
+
icon = ("π΄" if val > 70 else "π’") if short in inverted else ("π’" if val > 70 else "π΄")
|
| 657 |
+
if 40 <= val <= 70: icon = "π‘"
|
| 658 |
+
print(f" {icon} {short:20} : {val:5.2f}")
|
| 659 |
+
print("β"*70)
|
| 660 |
+
|
| 661 |
+
|
| 662 |
+
def env_render_compact(env, obs):
|
| 663 |
+
"""Compact printer for testing."""
|
| 664 |
+
print(f"STEP: {obs.step} | REWARD: {obs.reward:.3f} | DONE: {obs.done}")
|
| 665 |
+
if obs.metadata.get("breakdown", {}).get("penalties_fired"):
|
| 666 |
+
print(f" β οΈ PENALTIES: {obs.metadata['breakdown']['penalties_fired']}")
|
| 667 |
+
|
| 668 |
+
|
| 669 |
+
def main():
|
| 670 |
+
env = LifeStackEnv()
|
| 671 |
+
|
| 672 |
+
# 1. Reset with Friday 6PM Conflict
|
| 673 |
+
conflict = {
|
| 674 |
+
"career.workload": 30.0,
|
| 675 |
+
"finances.liquidity": -40.0
|
| 676 |
+
}
|
| 677 |
+
print("Initializing environment with Friday 6PM conflict...")
|
| 678 |
+
env.reset(conflict=conflict)
|
| 679 |
+
env.render()
|
| 680 |
+
|
| 681 |
+
total_reward = 0
|
| 682 |
+
metrics_history = []
|
| 683 |
+
|
| 684 |
+
# 2. Sequential Actions
|
| 685 |
+
scenarios = [
|
| 686 |
+
{
|
| 687 |
+
"name": "GOOD ACTION: Delegating and budget review",
|
| 688 |
+
"action": {
|
| 689 |
+
"metric_changes": {"career.workload": -15.0, "finances.liquidity": 10.0, "mental_wellbeing.stress_level": -5.0},
|
| 690 |
+
"resource_cost": {"time": 4.0, "money": 100.0, "energy": 20.0},
|
| 691 |
+
"actions_taken": 2
|
| 692 |
+
}
|
| 693 |
+
},
|
| 694 |
+
{
|
| 695 |
+
"name": "MEDIUM ACTION: Small self-care rest",
|
| 696 |
+
"action": {
|
| 697 |
+
"metric_changes": {"physical_health.sleep_quality": 6.0, "mental_wellbeing.clarity": 3.0},
|
| 698 |
+
"resource_cost": {"time": 2.0, "energy": -20.0}, # Rest recovers energy
|
| 699 |
+
"actions_taken": 1
|
| 700 |
+
}
|
| 701 |
+
},
|
| 702 |
+
{
|
| 703 |
+
"name": "INACTION: Let the cascade run",
|
| 704 |
+
"action": {
|
| 705 |
+
"metric_changes": {},
|
| 706 |
+
"resource_cost": {},
|
| 707 |
+
"actions_taken": 0
|
| 708 |
+
}
|
| 709 |
+
}
|
| 710 |
+
]
|
| 711 |
+
|
| 712 |
+
for sce in scenarios:
|
| 713 |
+
print(f"\nTaking Action: {sce['name']}...")
|
| 714 |
+
action_obj = LifeStackAction(**sce['action'])
|
| 715 |
+
obs = env.step(action_obj)
|
| 716 |
+
env_render_compact(env, obs)
|
| 717 |
+
total_reward += (obs.reward or 0.0)
|
| 718 |
+
|
| 719 |
+
# 3. Final Summary
|
| 720 |
+
final_flat = env.state.current_metrics.flatten()
|
| 721 |
+
critical = [k for k, v in final_flat.items() if v < 20]
|
| 722 |
+
|
| 723 |
+
print("\n" + "β"*60)
|
| 724 |
+
print("EPISODE SUMMARY")
|
| 725 |
+
print(f"Steps Taken : {env.state.step_count}")
|
| 726 |
+
print(f"Total Cumulative Reward : {total_reward:.4f}")
|
| 727 |
+
if critical:
|
| 728 |
+
print(f"Critical Floor Violations: {', '.join(critical)}")
|
| 729 |
+
else:
|
| 730 |
+
print("Critical Violations: NONE")
|
| 731 |
+
print("β"*60)
|
| 732 |
+
|
| 733 |
+
if __name__ == "__main__":
|
| 734 |
+
main()
|
core/lifestack_gym_env.py
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
lifestack_gym_env.py β Gymnasium-compatible wrapper for LifeStack
|
| 3 |
+
|
| 4 |
+
Exposes the LifeStack environment as a standard gym.Env with:
|
| 5 |
+
- observation_space: Box(0, 100, shape=(26,)) β 23 sub-metrics + 3 resources
|
| 6 |
+
- action_space: Discrete(7) β 7 action types mapped to template actions
|
| 7 |
+
- Standard reset() / step() / render() API
|
| 8 |
+
"""
|
| 9 |
+
'''we are not using this as of now, this was been used in old model :)'''
|
| 10 |
+
import gymnasium as gym
|
| 11 |
+
import numpy as np
|
| 12 |
+
from gymnasium import spaces
|
| 13 |
+
import random, copy
|
| 14 |
+
from core.life_state import LifeMetrics, ResourceBudget, DependencyGraph
|
| 15 |
+
from core.metric_schema import normalize_metric_path
|
| 16 |
+
from core.reward import compute_reward, compute_task_reward
|
| 17 |
+
from agent.conflict_generator import generate_conflict, ConflictEvent
|
| 18 |
+
from intake.simperson import SimPerson
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
# Map discrete action IDs to action types
|
| 22 |
+
ACTION_TYPE_MAP = {
|
| 23 |
+
0: "negotiate",
|
| 24 |
+
1: "communicate",
|
| 25 |
+
2: "delegate",
|
| 26 |
+
3: "spend",
|
| 27 |
+
4: "reschedule",
|
| 28 |
+
5: "rest",
|
| 29 |
+
6: "execute",
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
class LifeStackGymEnv(gym.Env):
|
| 34 |
+
"""
|
| 35 |
+
LifeStack as a Gymnasium environment.
|
| 36 |
+
|
| 37 |
+
Observation: 26-dim vector (23 life sub-metrics + 3 resource values)
|
| 38 |
+
Action: Discrete(7) β one of 7 action types
|
| 39 |
+
Reward: float in [-1, 1]
|
| 40 |
+
"""
|
| 41 |
+
metadata = {"render_modes": ["human", "ansi"]}
|
| 42 |
+
|
| 43 |
+
def __init__(self, task=None, difficulty: int = None, render_mode: str = None, max_steps: int = 30):
|
| 44 |
+
super().__init__()
|
| 45 |
+
self.observation_space = spaces.Box(
|
| 46 |
+
low=0.0, high=100.0, shape=(26,), dtype=np.float32
|
| 47 |
+
)
|
| 48 |
+
self.action_space = spaces.Discrete(7)
|
| 49 |
+
self.render_mode = render_mode
|
| 50 |
+
self.task = task
|
| 51 |
+
self.difficulty = difficulty
|
| 52 |
+
self.max_steps = max_steps
|
| 53 |
+
|
| 54 |
+
from core.lifestack_env import LifeStackEnv
|
| 55 |
+
self.env = LifeStackEnv()
|
| 56 |
+
self._metric_keys = list(LifeMetrics().flatten().keys())
|
| 57 |
+
|
| 58 |
+
def _obs_vector(self) -> np.ndarray:
|
| 59 |
+
flat = self.env.state.current_metrics.flatten()
|
| 60 |
+
metric_vals = [flat[k] for k in self._metric_keys]
|
| 61 |
+
budget = self.env.state.budget
|
| 62 |
+
resource_vals = [
|
| 63 |
+
budget.time_hours,
|
| 64 |
+
budget.money_dollars,
|
| 65 |
+
budget.energy_units,
|
| 66 |
+
]
|
| 67 |
+
return np.array(metric_vals + resource_vals, dtype=np.float32)
|
| 68 |
+
|
| 69 |
+
def reset(self, seed=None, options=None):
|
| 70 |
+
super().reset(seed=seed)
|
| 71 |
+
|
| 72 |
+
conflict = None
|
| 73 |
+
if self.task is None:
|
| 74 |
+
from agent.conflict_generator import generate_conflict
|
| 75 |
+
conflict = generate_conflict(self.difficulty)
|
| 76 |
+
|
| 77 |
+
obs_obj = self.env.reset(task=self.task, conflict=conflict)
|
| 78 |
+
return self._obs_vector(), obs_obj.metadata
|
| 79 |
+
|
| 80 |
+
def step(self, action: int):
|
| 81 |
+
from core.lifestack_env import LifeStackAction
|
| 82 |
+
action_type = ACTION_TYPE_MAP[action]
|
| 83 |
+
|
| 84 |
+
# Build logical action from template
|
| 85 |
+
metric_changes, resource_cost = self._action_to_changes(action_type)
|
| 86 |
+
|
| 87 |
+
# In this wrapper, we pick a reasonable target if needed
|
| 88 |
+
target = ""
|
| 89 |
+
current_task = self.env.state.current_task
|
| 90 |
+
if action_type == "execute" and current_task:
|
| 91 |
+
for r in current_task.viable_routes:
|
| 92 |
+
if r.id not in self.env.state.closed_route_ids:
|
| 93 |
+
target = r.id
|
| 94 |
+
break
|
| 95 |
+
|
| 96 |
+
ls_action = LifeStackAction(
|
| 97 |
+
action_type=action_type,
|
| 98 |
+
target=target,
|
| 99 |
+
reasoning=f"Agent chose {action_type} for discrete action {action}.",
|
| 100 |
+
metric_changes=metric_changes,
|
| 101 |
+
resource_cost=resource_cost,
|
| 102 |
+
actions_taken=1
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
obs_obj = self.env.step(ls_action)
|
| 106 |
+
|
| 107 |
+
terminated = obs_obj.done
|
| 108 |
+
# Truncated only if not naturally terminated
|
| 109 |
+
truncated = (not terminated) and (self.env.state.step_count >= (self.task.horizon if self.task else self.max_steps))
|
| 110 |
+
|
| 111 |
+
return self._obs_vector(), obs_obj.reward, terminated, truncated, {"breakdown": obs_obj.metadata.get("breakdown", {})}
|
| 112 |
+
|
| 113 |
+
def _action_to_changes(self, action_type: str):
|
| 114 |
+
"""Maps an action type string to (metric_changes, resource_cost)."""
|
| 115 |
+
templates = {
|
| 116 |
+
"negotiate": (
|
| 117 |
+
{"career.workload": -15.0, "mental_wellbeing.stress_level": -5.0},
|
| 118 |
+
{"time": 1.5, "energy": 20.0},
|
| 119 |
+
),
|
| 120 |
+
"communicate": (
|
| 121 |
+
{"relationships.romantic": 10.0, "mental_wellbeing.stress_level": -5.0},
|
| 122 |
+
{"time": 0.5, "energy": 10.0},
|
| 123 |
+
),
|
| 124 |
+
"delegate": (
|
| 125 |
+
{"career.workload": -10.0, "relationships.professional_network": -5.0},
|
| 126 |
+
{"time": 1.0, "energy": 15.0},
|
| 127 |
+
),
|
| 128 |
+
"spend": (
|
| 129 |
+
{"finances.liquidity": -20.0, "mental_wellbeing.stress_level": -10.0},
|
| 130 |
+
{"time": 1.0, "energy": 15.0},
|
| 131 |
+
),
|
| 132 |
+
"reschedule": (
|
| 133 |
+
{"career.workload": -10.0, "time.free_hours_per_week": 5.0},
|
| 134 |
+
{"time": 2.0, "energy": 15.0},
|
| 135 |
+
),
|
| 136 |
+
"rest": (
|
| 137 |
+
{"mental_wellbeing.stress_level": -12.0, "physical_health.energy": 10.0},
|
| 138 |
+
{"time": 1.0},
|
| 139 |
+
),
|
| 140 |
+
"execute": (
|
| 141 |
+
{}, # executes a route target
|
| 142 |
+
{"time": 1.0, "energy": 10.0},
|
| 143 |
+
),
|
| 144 |
+
}
|
| 145 |
+
return templates.get(action_type, ({}, {}))
|
| 146 |
+
|
| 147 |
+
def render(self):
|
| 148 |
+
if self.render_mode == "human":
|
| 149 |
+
# Delegate to the internal env's render
|
| 150 |
+
self.env.render()
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
# ββ Quick smoke test ββ
|
| 154 |
+
if __name__ == "__main__":
|
| 155 |
+
env = LifeStackGymEnv(difficulty=3, render_mode="human")
|
| 156 |
+
obs, info = env.reset()
|
| 157 |
+
print(f"Conflict: {info['conflict_title']} | Person: {info['person']}")
|
| 158 |
+
print(f"Obs shape: {obs.shape}, dtype: {obs.dtype}")
|
| 159 |
+
env.render()
|
| 160 |
+
|
| 161 |
+
total = 0.0
|
| 162 |
+
done = False
|
| 163 |
+
while not done:
|
| 164 |
+
act = env.action_space.sample()
|
| 165 |
+
obs, rew, term, trunc, info = env.step(act)
|
| 166 |
+
total += rew
|
| 167 |
+
done = term or trunc
|
| 168 |
+
print(f" Action {act} β reward {rew:.3f}")
|
| 169 |
+
|
| 170 |
+
env.render()
|
| 171 |
+
print(f"\nTotal reward: {total:.3f}")
|
core/metric_schema.py
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
from core.life_state import LifeMetrics
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
VALID_METRIC_PATHS = tuple(sorted(LifeMetrics().flatten().keys()))
|
| 6 |
+
|
| 7 |
+
LEGACY_METRIC_ALIASES = {
|
| 8 |
+
"physical_health.exercise_routine": "physical_health.fitness",
|
| 9 |
+
}
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def normalize_metric_path(path: str) -> str:
|
| 13 |
+
"""Map legacy or malformed metric names onto the current LifeMetrics schema."""
|
| 14 |
+
if not isinstance(path, str):
|
| 15 |
+
return ""
|
| 16 |
+
path = path.strip()
|
| 17 |
+
return LEGACY_METRIC_ALIASES.get(path, path)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def is_valid_metric_path(path: str) -> bool:
|
| 21 |
+
return normalize_metric_path(path) in VALID_METRIC_PATHS
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def format_valid_metrics() -> str:
|
| 25 |
+
grouped = {}
|
| 26 |
+
for path in VALID_METRIC_PATHS:
|
| 27 |
+
domain, metric = path.split(".", 1)
|
| 28 |
+
grouped.setdefault(domain, []).append(metric)
|
| 29 |
+
return "\n".join(
|
| 30 |
+
f"{domain}: {', '.join(metrics)}" for domain, metrics in grouped.items()
|
| 31 |
+
)
|
core/reward.py
ADDED
|
@@ -0,0 +1,463 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import math
|
| 2 |
+
import copy
|
| 3 |
+
import json
|
| 4 |
+
import re
|
| 5 |
+
from core.life_state import LifeMetrics
|
| 6 |
+
from core.task import Task
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def compute_reward(
|
| 11 |
+
state_before: LifeMetrics,
|
| 12 |
+
state_after: LifeMetrics,
|
| 13 |
+
resources_used: dict,
|
| 14 |
+
actions_taken: int,
|
| 15 |
+
metric_changes: dict = None,
|
| 16 |
+
completion: str = None,
|
| 17 |
+
disruption_baseline: int = None,
|
| 18 |
+
action_type: str = ""
|
| 19 |
+
) -> tuple[float, dict]:
|
| 20 |
+
"""
|
| 21 |
+
Computes the reward for a life step based on changes in LifeMetrics and resource usage.
|
| 22 |
+
|
| 23 |
+
Args:
|
| 24 |
+
state_before: The state at the start of the step.
|
| 25 |
+
state_after: The state after actions and cascades.
|
| 26 |
+
resources_used: Dict with keys 'time', 'money', 'energy'.
|
| 27 |
+
actions_taken: Integer count of intentional actions performed.
|
| 28 |
+
disruption_baseline: Expected number of metrics affected by an action.
|
| 29 |
+
|
| 30 |
+
Returns:
|
| 31 |
+
tuple[float, dict]: (final_reward, breakdown_dict)
|
| 32 |
+
"""
|
| 33 |
+
before_flat = state_before.flatten()
|
| 34 |
+
after_flat = state_after.flatten()
|
| 35 |
+
|
| 36 |
+
# 1. OUTCOME SCORE (Weighted average of positive deltas)
|
| 37 |
+
domain_weights = {
|
| 38 |
+
"career": 1/6,
|
| 39 |
+
"finances": 1/6,
|
| 40 |
+
"relationships": 1/6,
|
| 41 |
+
"physical_health": 1/6,
|
| 42 |
+
"mental_wellbeing": 1/6,
|
| 43 |
+
"time": 1/6
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
# Map sub-metrics to their domains
|
| 47 |
+
submetrics_per_domain = {}
|
| 48 |
+
for k in before_flat.keys():
|
| 49 |
+
domain = k.split('.')[0]
|
| 50 |
+
submetrics_per_domain[domain] = submetrics_per_domain.get(domain, 0) + 1
|
| 51 |
+
|
| 52 |
+
outcome_score = 0.0
|
| 53 |
+
for k in before_flat.keys():
|
| 54 |
+
domain = k.split('.')[0]
|
| 55 |
+
delta = after_flat[k] - before_flat[k]
|
| 56 |
+
if delta > 0:
|
| 57 |
+
# Each domain is 1/6. Each sub-metric within a domain gets its equal share of that 1/6.
|
| 58 |
+
# Normalize delta by 100 (max possible increase is 100).
|
| 59 |
+
weight = domain_weights[domain] / submetrics_per_domain[domain]
|
| 60 |
+
outcome_score += (delta / 100.0) * weight
|
| 61 |
+
|
| 62 |
+
# 2. CASCADE CONTAINMENT SCORE
|
| 63 |
+
worsened_count = sum(1 for k in before_flat.keys() if after_flat[k] < before_flat[k])
|
| 64 |
+
total_metrics = len(before_flat)
|
| 65 |
+
cascade_containment_score = 1.0 - (worsened_count / total_metrics)
|
| 66 |
+
|
| 67 |
+
# 3. RESOURCE EFFICIENCY SCORE
|
| 68 |
+
# Available: time 20, money 500, energy 100
|
| 69 |
+
m_time = resources_used.get('time', 0.0) / 20.0
|
| 70 |
+
m_money = resources_used.get('money', 0.0) / 500.0
|
| 71 |
+
m_energy = resources_used.get('energy', 0.0) / 100.0
|
| 72 |
+
|
| 73 |
+
# Normalize by total slots (3 resources)
|
| 74 |
+
resource_efficiency_score = 1.0 - ((m_time + m_money + m_energy) / 3.0)
|
| 75 |
+
resource_efficiency_score = max(0.0, min(1.0, resource_efficiency_score))
|
| 76 |
+
|
| 77 |
+
# 4. RELATIONSHIP PRESERVATION SCORE (Sigmoid applied to average delta)
|
| 78 |
+
rel_keys = [k for k in before_flat.keys() if k.startswith('relationships.')]
|
| 79 |
+
avg_rel_before = sum(before_flat[k] for k in rel_keys) / len(rel_keys)
|
| 80 |
+
avg_rel_after = sum(after_flat[k] for k in rel_keys) / len(rel_keys)
|
| 81 |
+
delta_rel = avg_rel_after - avg_rel_before
|
| 82 |
+
|
| 83 |
+
# score = 1 / (1 + exp(-delta/10))
|
| 84 |
+
relationship_preservation_score = 1.0 / (1.0 + math.exp(-delta_rel / 10.0))
|
| 85 |
+
|
| 86 |
+
# FINAL REWARD FORMULA
|
| 87 |
+
base_reward = (
|
| 88 |
+
(0.40 * outcome_score) +
|
| 89 |
+
(0.25 * cascade_containment_score) +
|
| 90 |
+
(0.20 * resource_efficiency_score) +
|
| 91 |
+
(0.15 * relationship_preservation_score)
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
# PENALTIES
|
| 95 |
+
penalties = 0.0
|
| 96 |
+
fired = []
|
| 97 |
+
|
| 98 |
+
# -0.50 if ANY metric is below 20 after the step
|
| 99 |
+
if any(v < 20 for v in after_flat.values()):
|
| 100 |
+
penalties -= 0.50
|
| 101 |
+
fired.append("CRITICAL_FLOOR_VIOLATION")
|
| 102 |
+
|
| 103 |
+
# -0.30 if cascade spread wider than the number of metrics the agent directly changed
|
| 104 |
+
# Scaled baseline from task metadata preferred over hardcoded default
|
| 105 |
+
if disruption_baseline is None:
|
| 106 |
+
disruption_baseline = len(metric_changes) if metric_changes else 2
|
| 107 |
+
|
| 108 |
+
if worsened_count > disruption_baseline:
|
| 109 |
+
penalties -= 0.30
|
| 110 |
+
fired.append("CASCADE_SPREAD_WIDER")
|
| 111 |
+
|
| 112 |
+
# -0.40 if actions_taken == 0
|
| 113 |
+
if actions_taken == 0:
|
| 114 |
+
penalties -= 0.40
|
| 115 |
+
fired.append("INACTION_PENALTY")
|
| 116 |
+
|
| 117 |
+
# -0.15 if relationships domain average dropped more than 20 points
|
| 118 |
+
if delta_rel < -20:
|
| 119 |
+
penalties -= 0.15
|
| 120 |
+
fired.append("RELATIONSHIP_COLLAPSE")
|
| 121 |
+
|
| 122 |
+
# [NEW] Plausibility Penalty
|
| 123 |
+
plaus = 0.0
|
| 124 |
+
if metric_changes:
|
| 125 |
+
plaus = reward_plausibility_check(metric_changes, resources_used)
|
| 126 |
+
if plaus < 0:
|
| 127 |
+
penalties += plaus
|
| 128 |
+
fired.append("PLAUSIBILITY_VIOLATION")
|
| 129 |
+
|
| 130 |
+
# [NEW] Format Compliance & Reasoning
|
| 131 |
+
comp_reward = 0.0
|
| 132 |
+
reasoning = ""
|
| 133 |
+
if completion:
|
| 134 |
+
comp_reward = reward_format_compliance(completion)
|
| 135 |
+
try:
|
| 136 |
+
# Simple extract reasoning from JSON if possible
|
| 137 |
+
import json
|
| 138 |
+
data = json.loads(completion)
|
| 139 |
+
reasoning = data.get("reasoning", "")
|
| 140 |
+
except:
|
| 141 |
+
pass
|
| 142 |
+
|
| 143 |
+
# [NEW] Reasoning Alignment (tied to action_type)
|
| 144 |
+
reasoning_score = reward_reasoning_coherence(reasoning, action_type=action_type)
|
| 145 |
+
|
| 146 |
+
final_reward = max(-1.0, min(1.0, base_reward + penalties))
|
| 147 |
+
|
| 148 |
+
breakdown = {
|
| 149 |
+
"components": {
|
| 150 |
+
"outcome": outcome_score,
|
| 151 |
+
"containment": cascade_containment_score,
|
| 152 |
+
"efficiency": resource_efficiency_score,
|
| 153 |
+
"preservation": relationship_preservation_score,
|
| 154 |
+
"format_compliance": comp_reward,
|
| 155 |
+
"plausibility": plaus,
|
| 156 |
+
"reasoning_alignment": reasoning_score
|
| 157 |
+
},
|
| 158 |
+
"base_reward": base_reward,
|
| 159 |
+
"penalties_total": penalties,
|
| 160 |
+
"penalties_fired": fired,
|
| 161 |
+
"metrics_worsened": worsened_count,
|
| 162 |
+
"rel_delta": delta_rel
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
return final_reward, breakdown
|
| 166 |
+
|
| 167 |
+
def compute_milestone_reward(milestones_achieved: list[str], task: Task) -> float:
|
| 168 |
+
if not task.milestones:
|
| 169 |
+
return 0.0
|
| 170 |
+
total_possible = sum(m.reward for m in task.milestones)
|
| 171 |
+
if total_possible == 0:
|
| 172 |
+
return 0.0
|
| 173 |
+
achieved = sum(m.reward for m in task.milestones if m.id in milestones_achieved)
|
| 174 |
+
return min(1.0, achieved / total_possible)
|
| 175 |
+
|
| 176 |
+
def compute_task_completion_reward(success_conditions_met: list[bool], task: Task) -> float:
|
| 177 |
+
# A task is completed if any of its target success conditions are satisfied.
|
| 178 |
+
# This handles tasks with multiple alternative goal-states (e.g. choice of routes).
|
| 179 |
+
if not success_conditions_met:
|
| 180 |
+
return 0.0
|
| 181 |
+
return 1.0 if any(success_conditions_met) else 0.0
|
| 182 |
+
|
| 183 |
+
def compute_replan_bonus(exo_events_seen: int, milestones_after_event: int) -> float:
|
| 184 |
+
# Scale bonus based on ability to bounce back after exogenous events
|
| 185 |
+
if exo_events_seen == 0:
|
| 186 |
+
return 0.0
|
| 187 |
+
return min(1.0, (milestones_after_event / exo_events_seen) * 0.5)
|
| 188 |
+
|
| 189 |
+
def compute_dead_end_penalty(routes_remaining: int) -> float:
|
| 190 |
+
return -0.5 if routes_remaining <= 0 else 0.0
|
| 191 |
+
|
| 192 |
+
def compute_task_reward(
|
| 193 |
+
state_before: LifeMetrics,
|
| 194 |
+
state_after: LifeMetrics,
|
| 195 |
+
resources_used: dict,
|
| 196 |
+
actions_taken: int,
|
| 197 |
+
milestones_achieved: list[str],
|
| 198 |
+
success_conditions_met: list[bool],
|
| 199 |
+
exo_events_seen: int,
|
| 200 |
+
milestones_after_event: int,
|
| 201 |
+
routes_remaining: int,
|
| 202 |
+
rollback_used: bool,
|
| 203 |
+
cascade_collapse: bool,
|
| 204 |
+
task: Task,
|
| 205 |
+
reasoning: str = "",
|
| 206 |
+
completion: str = "",
|
| 207 |
+
conflict_domain: str = "",
|
| 208 |
+
step_count: int = 0,
|
| 209 |
+
max_steps: int = 0,
|
| 210 |
+
metric_changes: dict = None,
|
| 211 |
+
cumulative_rel_delta: float = 0.0,
|
| 212 |
+
action_type: str = ""
|
| 213 |
+
) -> tuple[float, dict]:
|
| 214 |
+
# 1. Base local components (with scaled disruption baseline from task metadata)
|
| 215 |
+
d_baseline = len(task.mutable_world) if task and hasattr(task, 'mutable_world') else None
|
| 216 |
+
local_reward, local_breakdown = compute_reward(state_before, state_after, resources_used, actions_taken,
|
| 217 |
+
metric_changes=metric_changes, completion=completion,
|
| 218 |
+
disruption_baseline=d_baseline, action_type=action_type)
|
| 219 |
+
|
| 220 |
+
# 2. Orchestrator components
|
| 221 |
+
# Use only the raw outcome component from local_breakdown to avoid double-counting
|
| 222 |
+
# efficiency, containment, or preservation which are added separately below.
|
| 223 |
+
outcome_score_local = local_breakdown["components"].get("outcome", 0.0)
|
| 224 |
+
milestone_score = compute_milestone_reward(milestones_achieved, task)
|
| 225 |
+
completion_score = compute_task_completion_reward(success_conditions_met, task)
|
| 226 |
+
replan_score = compute_replan_bonus(exo_events_seen, milestones_after_event)
|
| 227 |
+
efficiency_score = local_breakdown["components"].get("efficiency", 0.0)
|
| 228 |
+
preservation_score = local_breakdown["components"].get("preservation", 0.0)
|
| 229 |
+
reasoning_score = reward_reasoning_coherence(reasoning, action_type=action_type)
|
| 230 |
+
|
| 231 |
+
# Check for specific failure cases
|
| 232 |
+
timeout_pen = reward_timeout_check(step_count, max_steps, any(success_met for success_met in success_conditions_met) if success_conditions_met else False)
|
| 233 |
+
dead_end_pen = compute_dead_end_penalty(routes_remaining)
|
| 234 |
+
|
| 235 |
+
# 3. Final weighting (all components are now unique/non-overlapping)
|
| 236 |
+
# Weights: Milestone 35%, Completion 25%, Outcome 10%, Preservation 5%, Replan 10%, Efficiency 10%, Reasoning 5%
|
| 237 |
+
base_reward = (
|
| 238 |
+
(0.35 * milestone_score) +
|
| 239 |
+
(0.25 * completion_score) +
|
| 240 |
+
(0.10 * outcome_score_local) +
|
| 241 |
+
(0.05 * preservation_score) +
|
| 242 |
+
(0.10 * replan_score) +
|
| 243 |
+
(0.10 * efficiency_score) +
|
| 244 |
+
(0.05 * reasoning_score)
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
# 4. Penalties
|
| 248 |
+
penalties = 0.0
|
| 249 |
+
fired = []
|
| 250 |
+
|
| 251 |
+
if timeout_pen < 0:
|
| 252 |
+
penalties += timeout_pen
|
| 253 |
+
fired.append("TIMEOUT")
|
| 254 |
+
|
| 255 |
+
if dead_end_pen < 0:
|
| 256 |
+
penalties += dead_end_pen
|
| 257 |
+
fired.append("DEAD_END")
|
| 258 |
+
|
| 259 |
+
if rollback_used:
|
| 260 |
+
penalties += -0.1
|
| 261 |
+
fired.append("ROLLBACK_USED")
|
| 262 |
+
|
| 263 |
+
if cascade_collapse:
|
| 264 |
+
penalties += -0.3
|
| 265 |
+
fired.append("CASCADE_COLLAPSE")
|
| 266 |
+
|
| 267 |
+
# Direct inaction penalty β not diluted by the 0.05 local weight
|
| 268 |
+
if actions_taken == 0:
|
| 269 |
+
penalties += -0.20
|
| 270 |
+
fired.append("TASK_INACTION_PENALTY")
|
| 271 |
+
|
| 272 |
+
# Cumulative relationship erosion across the episode
|
| 273 |
+
if cumulative_rel_delta < -20:
|
| 274 |
+
penalties += -0.15
|
| 275 |
+
fired.append("CUMULATIVE_RELATIONSHIP_EROSION")
|
| 276 |
+
|
| 277 |
+
final_reward = max(-1.0, min(1.0, base_reward + penalties))
|
| 278 |
+
|
| 279 |
+
breakdown = {
|
| 280 |
+
"components": {
|
| 281 |
+
"local_metric_delta": outcome_score_local,
|
| 282 |
+
"milestone": milestone_score,
|
| 283 |
+
"completion": completion_score,
|
| 284 |
+
"replan": replan_score,
|
| 285 |
+
"efficiency": efficiency_score,
|
| 286 |
+
"reasoning": reasoning_score,
|
| 287 |
+
"format_compliance": local_breakdown["components"].get("format_compliance", 0.0),
|
| 288 |
+
"plausibility": local_breakdown["components"].get("plausibility", 0.0),
|
| 289 |
+
"timeout_penalty": timeout_pen
|
| 290 |
+
},
|
| 291 |
+
"base_reward": base_reward,
|
| 292 |
+
"penalties_total": penalties,
|
| 293 |
+
"penalties_fired": fired,
|
| 294 |
+
"local_breakdown": local_breakdown
|
| 295 |
+
}
|
| 296 |
+
|
| 297 |
+
return final_reward, breakdown
|
| 298 |
+
|
| 299 |
+
def reward_format_compliance(completion: str) -> float:
|
| 300 |
+
"""
|
| 301 |
+
Scores the completion based on its format (JSON validity and required fields).
|
| 302 |
+
|
| 303 |
+
Returns:
|
| 304 |
+
+1.0: Valid JSON with all required fields:
|
| 305 |
+
action_type, target_domain, metric_changes, resource_cost, reasoning
|
| 306 |
+
+0.5: Any parseable JSON (including partial/incomplete dicts)
|
| 307 |
+
-0.5: Invalid JSON / unparseable
|
| 308 |
+
-1.0: Empty strings or refusal content
|
| 309 |
+
"""
|
| 310 |
+
if not completion or len(completion.strip()) < 10:
|
| 311 |
+
return -1.0
|
| 312 |
+
|
| 313 |
+
# Potential refusal indicators
|
| 314 |
+
if any(x in completion.lower() for x in ["i cannot", "i'm sorry", "as an ai"]):
|
| 315 |
+
return -1.0
|
| 316 |
+
|
| 317 |
+
# Extract JSON content from markdown code blocks if present
|
| 318 |
+
json_str = completion.strip()
|
| 319 |
+
if "```json" in json_str:
|
| 320 |
+
json_str = json_str.split("```json")[-1].split("```")[0].strip()
|
| 321 |
+
elif "```" in json_str:
|
| 322 |
+
json_str = json_str.split("```")[-1].split("```")[0].strip()
|
| 323 |
+
|
| 324 |
+
try:
|
| 325 |
+
data = json.loads(json_str)
|
| 326 |
+
required = ["action_type", "target_domain", "metric_changes", "resource_cost", "reasoning"]
|
| 327 |
+
if isinstance(data, dict) and all(k in data and data.get(k) is not None for k in required):
|
| 328 |
+
return 1.0
|
| 329 |
+
return 0.5
|
| 330 |
+
except json.JSONDecodeError:
|
| 331 |
+
# Final attempt: try to find anything between { and }
|
| 332 |
+
match = re.search(r'\{.*\}', json_str, re.DOTALL)
|
| 333 |
+
if match:
|
| 334 |
+
try:
|
| 335 |
+
data = json.loads(match.group(0))
|
| 336 |
+
required = ["action_type", "target_domain", "metric_changes", "resource_cost", "reasoning"]
|
| 337 |
+
if isinstance(data, dict) and all(k in data and data.get(k) is not None for k in required):
|
| 338 |
+
return 1.0
|
| 339 |
+
return 0.5
|
| 340 |
+
except:
|
| 341 |
+
pass
|
| 342 |
+
return -0.5
|
| 343 |
+
|
| 344 |
+
def reward_plausibility_check(metric_changes: dict, resource_cost: dict) -> float:
|
| 345 |
+
"""
|
| 346 |
+
Anti-gaming check. Prevents the model from claiming massive metric changes while spending 0 resources.
|
| 347 |
+
Resource cost is normalized to comparable units (time/20h, money/$500, energy/100pts).
|
| 348 |
+
"""
|
| 349 |
+
total_delta = sum(abs(v) for v in metric_changes.values())
|
| 350 |
+
|
| 351 |
+
# Zero-cost shortcut: any non-trivial claim with no cost at all is implausible
|
| 352 |
+
# Also handles empty resource_cost.
|
| 353 |
+
if not resource_cost or all(v == 0 for v in resource_cost.values()):
|
| 354 |
+
if total_delta > 3.0:
|
| 355 |
+
return -0.30
|
| 356 |
+
return 0.0
|
| 357 |
+
|
| 358 |
+
# Normalize each resource dimension to [0,1] before summing
|
| 359 |
+
norm_time = resource_cost.get('time', 0.0) / 20.0
|
| 360 |
+
norm_money = resource_cost.get('money', 0.0) / 500.0
|
| 361 |
+
norm_energy = resource_cost.get('energy', 0.0) / 100.0
|
| 362 |
+
total_cost = norm_time + norm_money + norm_energy
|
| 363 |
+
|
| 364 |
+
ratio = total_delta / max(0.01, total_cost)
|
| 365 |
+
|
| 366 |
+
if ratio > 150:
|
| 367 |
+
return -0.30 # Claiming massive change for virtually free
|
| 368 |
+
if ratio > 80:
|
| 369 |
+
return -0.10 # Highly suspicious efficiency
|
| 370 |
+
return 0.0 # Plausible ratio
|
| 371 |
+
|
| 372 |
+
def reward_timeout_check(step_count: int, max_steps: int, done: bool) -> float:
|
| 373 |
+
"""
|
| 374 |
+
Penalizes episodes that end by reaching the step limit without being resolved.
|
| 375 |
+
"""
|
| 376 |
+
if step_count >= max_steps and not done:
|
| 377 |
+
return -0.20
|
| 378 |
+
return 0.0
|
| 379 |
+
|
| 380 |
+
def reward_reasoning_coherence(reasoning: str, action_type: str = "") -> float:
|
| 381 |
+
"""
|
| 382 |
+
Harden verification of logical consistency. Requires both length and
|
| 383 |
+
alignment with the chosen action to prevent word-stuffing.
|
| 384 |
+
"""
|
| 385 |
+
if not reasoning or len(reasoning.strip()) < 20:
|
| 386 |
+
return -0.20 # Severe penalty for lack of effort
|
| 387 |
+
|
| 388 |
+
reasoning_lower = reasoning.lower()
|
| 389 |
+
score = 0.0
|
| 390 |
+
|
| 391 |
+
# 1. Structural Logic Check
|
| 392 |
+
# Reward use of logical connectors rather than just list of facts
|
| 393 |
+
connectors = ["because", "since", "therefore", "due to", "resulting in", "consequently"]
|
| 394 |
+
if any(c in reasoning_lower for c in connectors):
|
| 395 |
+
score += 0.05
|
| 396 |
+
|
| 397 |
+
# 2. Action Alignment (Non-Gammable Anti-Hacking)
|
| 398 |
+
# The reasoning MUST logically justify the chosen category.
|
| 399 |
+
action_keywords = {
|
| 400 |
+
"spend": ["cost", "price", "expensive", "money", "budget", "finance"],
|
| 401 |
+
"rest": ["energy", "sleep", "exhaustion", "recharge", "break"],
|
| 402 |
+
"communicate": ["talk", "discuss", "speak", "message", "call", "explain"],
|
| 403 |
+
"delegate": ["hand off", "assign", "help", "junior", "colleague"],
|
| 404 |
+
"negotiate": ["bargain", "trade", "deal", "terms"],
|
| 405 |
+
"deprioritize": ["later", "postpone", "unimportant", "drop"],
|
| 406 |
+
"reschedule": ["reschedule", "delay", "postpone", "move", "time", "calendar", "slot"],
|
| 407 |
+
"execute": ["route", "plan", "action", "implement", "complete", "resolve", "execute"],
|
| 408 |
+
}
|
| 409 |
+
|
| 410 |
+
if action_type and action_type in action_keywords:
|
| 411 |
+
match = any(kw in reasoning_lower for kw in action_keywords[action_type])
|
| 412 |
+
if match:
|
| 413 |
+
score += 0.10
|
| 414 |
+
else:
|
| 415 |
+
score -= 0.20
|
| 416 |
+
|
| 417 |
+
return max(-0.30, min(0.30, score))
|
| 418 |
+
|
| 419 |
+
def main():
|
| 420 |
+
# Scenario setup
|
| 421 |
+
print("--- TESTING REWARD SYSTEM ---")
|
| 422 |
+
|
| 423 |
+
# 1. PERFECT ACTION: All metrics improve by 10 points
|
| 424 |
+
state_start = LifeMetrics() # Defaults at 70
|
| 425 |
+
state_perfect = copy.deepcopy(state_start)
|
| 426 |
+
for k in state_perfect.flatten().keys():
|
| 427 |
+
domain, sub = k.split('.')
|
| 428 |
+
current = getattr(getattr(state_perfect, domain), sub)
|
| 429 |
+
setattr(getattr(state_perfect, domain), sub, current + 10)
|
| 430 |
+
|
| 431 |
+
res_perfect = {"time": 2, "money": 50, "energy": 10}
|
| 432 |
+
reward_p, break_p = compute_reward(state_start, state_perfect, res_perfect, actions_taken=5)
|
| 433 |
+
|
| 434 |
+
print("\n[SCENARIO 1: PERFECT ACTION]")
|
| 435 |
+
print(f"Reward: {reward_p:.4f}")
|
| 436 |
+
print(f"Breakdown: {break_p}")
|
| 437 |
+
|
| 438 |
+
# 2. BAD ACTION: Relationships tank by 30 points, everything else stays same
|
| 439 |
+
state_bad = copy.deepcopy(state_start)
|
| 440 |
+
for k in state_bad.flatten().keys():
|
| 441 |
+
if k.startswith('relationships.'):
|
| 442 |
+
domain, sub = k.split('.')
|
| 443 |
+
current = getattr(getattr(state_bad, domain), sub)
|
| 444 |
+
setattr(getattr(state_bad, domain), sub, current - 30)
|
| 445 |
+
|
| 446 |
+
res_bad = {"time": 10, "money": 300, "energy": 80}
|
| 447 |
+
reward_b, break_b = compute_reward(state_start, state_bad, res_bad, actions_taken=1)
|
| 448 |
+
|
| 449 |
+
print("\n[SCENARIO 2: BAD ACTION (Relationships Tank)]")
|
| 450 |
+
print(f"Reward: {reward_b:.4f}")
|
| 451 |
+
print(f"Breakdown: {break_b}")
|
| 452 |
+
|
| 453 |
+
# 3. INACTION: Nothing changes
|
| 454 |
+
state_nothing = copy.deepcopy(state_start)
|
| 455 |
+
res_none = {}
|
| 456 |
+
reward_n, break_n = compute_reward(state_start, state_nothing, res_none, actions_taken=0)
|
| 457 |
+
|
| 458 |
+
print("\n[SCENARIO 3: INACTION]")
|
| 459 |
+
print(f"Reward: {reward_n:.4f}")
|
| 460 |
+
print(f"Breakdown: {break_n}")
|
| 461 |
+
|
| 462 |
+
if __name__ == "__main__":
|
| 463 |
+
main()
|
core/task.py
ADDED
|
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from dataclasses import dataclass, field
|
| 2 |
+
from typing import Any, List, Dict
|
| 3 |
+
|
| 4 |
+
@dataclass
|
| 5 |
+
class HiddenStateField:
|
| 6 |
+
key: str # e.g. "boss_mood"
|
| 7 |
+
initial_value: Any # e.g. "neutral"
|
| 8 |
+
inspect_target: str # e.g. "call_boss" β which inspect action type reveals this
|
| 9 |
+
description: str # shown to agent after reveal
|
| 10 |
+
|
| 11 |
+
@dataclass
|
| 12 |
+
class ExoEvent:
|
| 13 |
+
step: int # inject at this step (inclusive); -1 = probabilistic
|
| 14 |
+
probability: float # 1.0 = deterministic; <1.0 = random at each step
|
| 15 |
+
id: str # e.g. "ticket_price_spike"
|
| 16 |
+
description: str # what agent sees in next observation
|
| 17 |
+
world_mutation: dict # e.g. {"ticket_price": 450, "seats_remaining": 1}
|
| 18 |
+
hidden_state_mutation: dict # e.g. {"boss_mood": "angry"}
|
| 19 |
+
closes_routes: list[str] = field(default_factory=list) # route IDs this event blocks
|
| 20 |
+
|
| 21 |
+
@dataclass
|
| 22 |
+
class Milestone:
|
| 23 |
+
id: str # e.g. "flight_rebooked"
|
| 24 |
+
description: str
|
| 25 |
+
condition_key: str # world/hidden key to check, e.g. "flight_rebooked"
|
| 26 |
+
condition_value: Any # e.g. True
|
| 27 |
+
reward: float # milestone reward added to episode total
|
| 28 |
+
|
| 29 |
+
@dataclass
|
| 30 |
+
class Route:
|
| 31 |
+
id: str # e.g. "rebook_premium"
|
| 32 |
+
name: str
|
| 33 |
+
description: str
|
| 34 |
+
required_action_types: list[str] # must use these tool actions to complete
|
| 35 |
+
preconditions: dict # world/hidden state checks, e.g. {"card_available": True}
|
| 36 |
+
consequences: dict # world mutations on route completion, e.g. {"flight_rebooked": True}
|
| 37 |
+
closes_routes: list[str] # route IDs this blocks
|
| 38 |
+
milestones_unlocked: list[str] # milestone IDs this route can hit
|
| 39 |
+
final_reward: float # bonus on route completion
|
| 40 |
+
|
| 41 |
+
@dataclass
|
| 42 |
+
class Task:
|
| 43 |
+
id: str
|
| 44 |
+
domain: str # "flight_crisis" | "code_merge_crisis"
|
| 45 |
+
goal: str
|
| 46 |
+
constraints: dict # e.g. {"budget_max": 400, "deadline_step": 18}
|
| 47 |
+
hidden_state: dict # full truth, agent never sees directly
|
| 48 |
+
mutable_world: dict # partial truth, some fields revealed by inspect
|
| 49 |
+
visible_world: dict # agent sees this at each step (subset of mutable_world)
|
| 50 |
+
success_conditions: list[dict] # e.g. [{"key": "flight_rebooked", "value": True}]
|
| 51 |
+
failure_conditions: list[dict] # e.g. [{"key": "missed_deadline", "value": True}]
|
| 52 |
+
event_schedule: list[ExoEvent]
|
| 53 |
+
viable_routes: list[Route]
|
| 54 |
+
milestones: list[Milestone]
|
| 55 |
+
horizon: int # max steps (20β50)
|
| 56 |
+
difficulty: int # 1β5
|
| 57 |
+
domain_metadata: dict # domain-specific extra data (story text, etc.)
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def FlightCrisisTask() -> Task:
|
| 61 |
+
routes = [
|
| 62 |
+
Route(
|
| 63 |
+
id="rebook_premium",
|
| 64 |
+
name="Rebook Premium Option",
|
| 65 |
+
description="Call agent and rebook on premium ticket",
|
| 66 |
+
required_action_types=["communicate", "execute"],
|
| 67 |
+
preconditions={"card_available": True},
|
| 68 |
+
consequences={"flight_rebooked": True},
|
| 69 |
+
closes_routes=["wait_lounge"],
|
| 70 |
+
milestones_unlocked=["m1"],
|
| 71 |
+
final_reward=2.5
|
| 72 |
+
),
|
| 73 |
+
Route(
|
| 74 |
+
id="wait_lounge",
|
| 75 |
+
name="Accept Delay & Work",
|
| 76 |
+
description="Stay at airport lounge and work on laptop",
|
| 77 |
+
required_action_types=["wait", "plan"],
|
| 78 |
+
preconditions={"lounge_access": True},
|
| 79 |
+
consequences={"caught_up": True},
|
| 80 |
+
closes_routes=["rebook_premium"],
|
| 81 |
+
milestones_unlocked=["m2"],
|
| 82 |
+
final_reward=1.8
|
| 83 |
+
)
|
| 84 |
+
]
|
| 85 |
+
milestones = [
|
| 86 |
+
Milestone(id="m1", description="Successfully rebooked flight before deadline", condition_key="flight_rebooked", condition_value=True, reward=1.0),
|
| 87 |
+
Milestone(id="m2", description="Caught up with all emergency slack messages", condition_key="caught_up", condition_value=True, reward=0.8),
|
| 88 |
+
]
|
| 89 |
+
events = [
|
| 90 |
+
ExoEvent(step=5, probability=1.0, id="price_surge", description="Ticket prices sharply increased by $300.", world_mutation={}, hidden_state_mutation={"card_available": False}, closes_routes=[]),
|
| 91 |
+
ExoEvent(step=8, probability=1.0, id="lounge_full", description="The airport lounge is now at maximum capacity.", world_mutation={"lounge_access": False}, hidden_state_mutation={}, closes_routes=["wait_lounge"]),
|
| 92 |
+
]
|
| 93 |
+
return Task(
|
| 94 |
+
id="flight_crisis_task_main",
|
| 95 |
+
domain="flight_crisis",
|
| 96 |
+
goal="Survive Airport Cancellation",
|
| 97 |
+
constraints={"budget_max": 800, "deadline_step": 20},
|
| 98 |
+
hidden_state={
|
| 99 |
+
"card_available": True
|
| 100 |
+
},
|
| 101 |
+
mutable_world={
|
| 102 |
+
"lounge_access": True,
|
| 103 |
+
"flight_rebooked": False,
|
| 104 |
+
"caught_up": False
|
| 105 |
+
},
|
| 106 |
+
visible_world={
|
| 107 |
+
"lounge_access": True
|
| 108 |
+
},
|
| 109 |
+
success_conditions=[{"key": "flight_rebooked", "value": True}],
|
| 110 |
+
failure_conditions=[{"key": "missed_deadline", "value": True}],
|
| 111 |
+
event_schedule=events,
|
| 112 |
+
viable_routes=routes,
|
| 113 |
+
milestones=milestones,
|
| 114 |
+
horizon=30,
|
| 115 |
+
difficulty=4,
|
| 116 |
+
domain_metadata={"story": "A major storm grounded commercial flights."}
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
def CodeMergeCrisisTask() -> Task:
|
| 120 |
+
"""A high-difficulty technical crisis requiring rollback or hotfix."""
|
| 121 |
+
routes = [
|
| 122 |
+
Route(id="revert_commit", name="Revert Commit", description="Quickly revert the broken merge to unblock the team.", required_action_types=["delegate", "communicate"], preconditions={}, consequences={"pipeline_unblocked": True}, closes_routes=["hotfix"], milestones_unlocked=["m1"], final_reward=1.5),
|
| 123 |
+
Route(id="hotfix", name="Patch Forward", description="Find the logic error and push a hotfix.", required_action_types=["communicate", "spend"], preconditions={}, consequences={"bug_resolved": True}, closes_routes=["revert_commit"], milestones_unlocked=["m2"], final_reward=3.0),
|
| 124 |
+
]
|
| 125 |
+
milestones = [
|
| 126 |
+
Milestone(id="m1", description="CI pipeline is green again", condition_key="pipeline_unblocked", condition_value=True, reward=1.0),
|
| 127 |
+
Milestone(id="m2", description="Bug resolved without losing features", condition_key="bug_resolved", condition_value=True, reward=2.0),
|
| 128 |
+
]
|
| 129 |
+
return Task(
|
| 130 |
+
id="code_merge_task_fallback",
|
| 131 |
+
domain="code_merge_crisis",
|
| 132 |
+
goal="Resolve Production Outage",
|
| 133 |
+
constraints={"budget_max": 1000, "deadline_step": 8},
|
| 134 |
+
hidden_state={"on_call_status": "alert"},
|
| 135 |
+
mutable_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
|
| 136 |
+
visible_world={"career.stability": -20.0, "mental_wellbeing.stress_level": 30.0},
|
| 137 |
+
success_conditions=[{"key": "pipeline_unblocked", "value": True}, {"key": "bug_resolved", "value": True}],
|
| 138 |
+
failure_conditions=[],
|
| 139 |
+
event_schedule=[],
|
| 140 |
+
viable_routes=routes,
|
| 141 |
+
milestones=milestones,
|
| 142 |
+
horizon=10,
|
| 143 |
+
difficulty=4,
|
| 144 |
+
domain_metadata={}
|
| 145 |
+
)
|
| 146 |
+
|
| 147 |
+
class TaskGenerator:
|
| 148 |
+
def __init__(self):
|
| 149 |
+
self.tasks = [FlightCrisisTask, CodeMergeCrisisTask]
|
| 150 |
+
|
| 151 |
+
def get_random_task(self) -> Task:
|
| 152 |
+
import random
|
| 153 |
+
return random.choice(self.tasks)()
|
core/verifier.py
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Dict, List, Set, Any, Tuple
|
| 2 |
+
from core.task import Task, Milestone, Route
|
| 3 |
+
|
| 4 |
+
class LifeStackVerifier:
|
| 5 |
+
"""Standalone verifier for Task success, failure, and progression."""
|
| 6 |
+
|
| 7 |
+
@staticmethod
|
| 8 |
+
def _check_cond(cond: dict, world_state: dict, hidden_state: dict, metrics_flat: dict = None) -> bool:
|
| 9 |
+
key = cond['key']
|
| 10 |
+
target = cond['value']
|
| 11 |
+
op = cond.get('op', 'eq')
|
| 12 |
+
|
| 13 |
+
# Priority: Metrics > Hidden > World
|
| 14 |
+
val = None
|
| 15 |
+
if metrics_flat and key in metrics_flat:
|
| 16 |
+
val = metrics_flat[key]
|
| 17 |
+
else:
|
| 18 |
+
val = hidden_state.get(key, world_state.get(key))
|
| 19 |
+
|
| 20 |
+
if val is None:
|
| 21 |
+
return False
|
| 22 |
+
|
| 23 |
+
if op == 'eq': return val == target
|
| 24 |
+
if op == 'ne': return val != target
|
| 25 |
+
if op == 'gt': return val > target
|
| 26 |
+
if op == 'lt': return val < target
|
| 27 |
+
if op == 'ge': return val >= target
|
| 28 |
+
if op == 'le': return val <= target
|
| 29 |
+
return False
|
| 30 |
+
|
| 31 |
+
@staticmethod
|
| 32 |
+
def check_success(task: Task, world_state: dict, hidden_state: dict) -> list[bool]:
|
| 33 |
+
"""Checks if task-specific success conditions are met."""
|
| 34 |
+
return [LifeStackVerifier._check_cond(c, world_state, hidden_state) for c in task.success_conditions]
|
| 35 |
+
|
| 36 |
+
@staticmethod
|
| 37 |
+
def check_failure(task: Task, world_state: dict, hidden_state: dict, metrics_flat: dict) -> list[bool]:
|
| 38 |
+
"""Checks if task-specific or global failure conditions (metric death) are met."""
|
| 39 |
+
results = [LifeStackVerifier._check_cond(c, world_state, hidden_state, metrics_flat) for c in task.failure_conditions]
|
| 40 |
+
# 2. Metric death
|
| 41 |
+
if any(v <= 10 for v in metrics_flat.values()):
|
| 42 |
+
results.append(True)
|
| 43 |
+
return results
|
| 44 |
+
|
| 45 |
+
@staticmethod
|
| 46 |
+
def check_new_milestones(task: Task, world_state: dict, hidden_state: dict, achieved_ids: list) -> list[str]:
|
| 47 |
+
"""Identifies any milestones that have just been met by current state."""
|
| 48 |
+
newly_met = []
|
| 49 |
+
for m in task.milestones:
|
| 50 |
+
if m.id not in achieved_ids:
|
| 51 |
+
val = hidden_state.get(m.condition_key, world_state.get(m.condition_key))
|
| 52 |
+
if val == m.condition_value:
|
| 53 |
+
newly_met.append(m.id)
|
| 54 |
+
return newly_met
|
| 55 |
+
|
| 56 |
+
@staticmethod
|
| 57 |
+
def get_route_status(task: Task, closed_ids: set, world_state: dict, hidden_state: dict) -> Tuple[int, bool]:
|
| 58 |
+
"""Returns (remaining_routes_count, is_dead_end)."""
|
| 59 |
+
remaining = 0
|
| 60 |
+
for route in task.viable_routes:
|
| 61 |
+
if route.id in closed_ids:
|
| 62 |
+
continue
|
| 63 |
+
|
| 64 |
+
# Check if reachable via preconditions
|
| 65 |
+
pre_ok = True
|
| 66 |
+
for k, v in route.preconditions.items():
|
| 67 |
+
current_v = hidden_state.get(k, world_state.get(k))
|
| 68 |
+
if current_v != v:
|
| 69 |
+
pre_ok = False
|
| 70 |
+
break
|
| 71 |
+
|
| 72 |
+
if pre_ok:
|
| 73 |
+
remaining += 1
|
| 74 |
+
|
| 75 |
+
return remaining, remaining == 0
|
data/before_after_comparison.json
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"summary": {
|
| 3 |
+
"runs": 5,
|
| 4 |
+
"avg_no_memory": 1.13,
|
| 5 |
+
"avg_with_memory": 2.45,
|
| 6 |
+
"pct_improvement": 116.81,
|
| 7 |
+
"most_common_action_no_memory": "delegate",
|
| 8 |
+
"most_common_action_with_memory": "communicate",
|
| 9 |
+
"comm_usage_no_memory_pct": 40.0,
|
| 10 |
+
"comm_usage_yes_memory_pct": 100.0
|
| 11 |
+
},
|
| 12 |
+
"no_memory": [
|
| 13 |
+
{
|
| 14 |
+
"total_reward": 1.0,
|
| 15 |
+
"first_action": "delegate"
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"total_reward": 1.2
|
| 19 |
+
}
|
| 20 |
+
],
|
| 21 |
+
"with_memory": [
|
| 22 |
+
{
|
| 23 |
+
"total_reward": 2.5,
|
| 24 |
+
"first_action": "communicate"
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"total_reward": 2.4
|
| 28 |
+
}
|
| 29 |
+
]
|
| 30 |
+
}
|
data/conflicts.json
ADDED
|
@@ -0,0 +1,314 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "d1_gym",
|
| 4 |
+
"title": "The Slump",
|
| 5 |
+
"story": "You haven't seen the inside of a gym in ten days. Your energy is flagging and your favorite jeans feel tight.",
|
| 6 |
+
"primary_disruption": {
|
| 7 |
+
"physical_health.fitness": -15.0
|
| 8 |
+
},
|
| 9 |
+
"decisions_required": [
|
| 10 |
+
"Wake up early for a run",
|
| 11 |
+
"Join a weekend boot camp",
|
| 12 |
+
"Ignore it and rest"
|
| 13 |
+
],
|
| 14 |
+
"resource_budget": {
|
| 15 |
+
"time": 4.0,
|
| 16 |
+
"money": 0.0,
|
| 17 |
+
"energy": 20.0
|
| 18 |
+
},
|
| 19 |
+
"difficulty": 1
|
| 20 |
+
},
|
| 21 |
+
{
|
| 22 |
+
"id": "d1_bill",
|
| 23 |
+
"title": "Forgotten Invoice",
|
| 24 |
+
"story": "A late notice arrived for your electricity bill. It's not a lot, but the late fee is annoying.",
|
| 25 |
+
"primary_disruption": {
|
| 26 |
+
"finances.liquidity": -20.0
|
| 27 |
+
},
|
| 28 |
+
"decisions_required": [
|
| 29 |
+
"Pay it now",
|
| 30 |
+
"Call to dispute the fee",
|
| 31 |
+
"Set up autopay for next time"
|
| 32 |
+
],
|
| 33 |
+
"resource_budget": {
|
| 34 |
+
"time": 1.0,
|
| 35 |
+
"money": 100.0,
|
| 36 |
+
"energy": 5.0
|
| 37 |
+
},
|
| 38 |
+
"difficulty": 1
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"id": "d1_argument",
|
| 42 |
+
"title": "Heated Group Chat",
|
| 43 |
+
"story": "A minor political disagreement in the group chat turned personal. Everyone is being quiet now.",
|
| 44 |
+
"primary_disruption": {
|
| 45 |
+
"relationships.social": -20.0
|
| 46 |
+
},
|
| 47 |
+
"decisions_required": [
|
| 48 |
+
"Apologize to the group",
|
| 49 |
+
"Message the friend privately",
|
| 50 |
+
"Mute the chat for a week"
|
| 51 |
+
],
|
| 52 |
+
"resource_budget": {
|
| 53 |
+
"time": 2.0,
|
| 54 |
+
"money": 30.0,
|
| 55 |
+
"energy": 15.0
|
| 56 |
+
},
|
| 57 |
+
"difficulty": 1
|
| 58 |
+
},
|
| 59 |
+
{
|
| 60 |
+
"id": "d2_project",
|
| 61 |
+
"title": "The Surge",
|
| 62 |
+
"story": "Your boss just walked by and dropped a 'small favor' on your desk. It looks like it'll take ten hours.",
|
| 63 |
+
"primary_disruption": {
|
| 64 |
+
"career.workload": 25.0,
|
| 65 |
+
"time.free_hours_per_week": -20.0
|
| 66 |
+
},
|
| 67 |
+
"decisions_required": [
|
| 68 |
+
"Work late all week",
|
| 69 |
+
"Delegate parts to a junior",
|
| 70 |
+
"Refuse the assignment"
|
| 71 |
+
],
|
| 72 |
+
"resource_budget": {
|
| 73 |
+
"time": 10.0,
|
| 74 |
+
"money": 0.0,
|
| 75 |
+
"energy": 40.0
|
| 76 |
+
},
|
| 77 |
+
"difficulty": 2
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"id": "d2_car",
|
| 81 |
+
"title": "Check Engine Light",
|
| 82 |
+
"story": "Your car started making a rhythmic thumping sound on the highway. The mechanic says the repair isn't cheap.",
|
| 83 |
+
"primary_disruption": {
|
| 84 |
+
"finances.liquidity": -30.0,
|
| 85 |
+
"time.commute_burden": 25.0
|
| 86 |
+
},
|
| 87 |
+
"decisions_required": [
|
| 88 |
+
"Repair it immediately",
|
| 89 |
+
"Take the bus for a week",
|
| 90 |
+
"Borrow a car from a friend"
|
| 91 |
+
],
|
| 92 |
+
"resource_budget": {
|
| 93 |
+
"time": 5.0,
|
| 94 |
+
"money": 500.0,
|
| 95 |
+
"energy": 10.0
|
| 96 |
+
},
|
| 97 |
+
"difficulty": 2
|
| 98 |
+
},
|
| 99 |
+
{
|
| 100 |
+
"id": "d2_neglect",
|
| 101 |
+
"title": "Cold Dinner",
|
| 102 |
+
"story": "Your partner mentions they feel like 'roommates' lately. You realize you haven't had a real conversation in weeks.",
|
| 103 |
+
"primary_disruption": {
|
| 104 |
+
"relationships.romantic": -25.0,
|
| 105 |
+
"mental_wellbeing.stress_level": 20.0
|
| 106 |
+
},
|
| 107 |
+
"decisions_required": [
|
| 108 |
+
"Plan a surprise date",
|
| 109 |
+
"Have a long talk tonight",
|
| 110 |
+
"Buy a thoughtful gift"
|
| 111 |
+
],
|
| 112 |
+
"resource_budget": {
|
| 113 |
+
"time": 6.0,
|
| 114 |
+
"money": 150.0,
|
| 115 |
+
"energy": 30.0
|
| 116 |
+
},
|
| 117 |
+
"difficulty": 2
|
| 118 |
+
},
|
| 119 |
+
{
|
| 120 |
+
"id": "d3_interview",
|
| 121 |
+
"title": "The Opportunity",
|
| 122 |
+
"story": "An old contact reached out for a dream job interview. You need to prep while keeping your current job afloat.",
|
| 123 |
+
"primary_disruption": {
|
| 124 |
+
"career.workload": 20.0,
|
| 125 |
+
"time.free_hours_per_week": -15.0,
|
| 126 |
+
"mental_wellbeing.stress_level": 20.0
|
| 127 |
+
},
|
| 128 |
+
"decisions_required": [
|
| 129 |
+
"Intensive weekend prep",
|
| 130 |
+
"Fake a sick day to interview",
|
| 131 |
+
"Turn it down to stay stable"
|
| 132 |
+
],
|
| 133 |
+
"resource_budget": {
|
| 134 |
+
"time": 12.0,
|
| 135 |
+
"money": 50.0,
|
| 136 |
+
"energy": 50.0
|
| 137 |
+
},
|
| 138 |
+
"difficulty": 3
|
| 139 |
+
},
|
| 140 |
+
{
|
| 141 |
+
"id": "d3_family",
|
| 142 |
+
"title": "Family SOS",
|
| 143 |
+
"story": "Your sibling is going through a rough patch and needs help moving out and some financial support.",
|
| 144 |
+
"primary_disruption": {
|
| 145 |
+
"relationships.family": 20.0,
|
| 146 |
+
"time.free_hours_per_week": -25.0,
|
| 147 |
+
"finances.liquidity": -20.0
|
| 148 |
+
},
|
| 149 |
+
"decisions_required": [
|
| 150 |
+
"Spend the weekend helping",
|
| 151 |
+
"Send them money but stay home",
|
| 152 |
+
"Help them find other movers"
|
| 153 |
+
],
|
| 154 |
+
"resource_budget": {
|
| 155 |
+
"time": 15.0,
|
| 156 |
+
"money": 400.0,
|
| 157 |
+
"energy": 60.0
|
| 158 |
+
},
|
| 159 |
+
"difficulty": 3
|
| 160 |
+
},
|
| 161 |
+
{
|
| 162 |
+
"id": "d3_health",
|
| 163 |
+
"title": "The Warning Sign",
|
| 164 |
+
"story": "You had a fainting spell at the office. Tests are expensive, and doctors say you need immediate change.",
|
| 165 |
+
"primary_disruption": {
|
| 166 |
+
"physical_health.energy": -30.0,
|
| 167 |
+
"mental_wellbeing.stress_level": 30.0,
|
| 168 |
+
"finances.liquidity": -40.0
|
| 169 |
+
},
|
| 170 |
+
"decisions_required": [
|
| 171 |
+
"Take a week of medical leave",
|
| 172 |
+
"Consult a high-end specialist",
|
| 173 |
+
"Change diet and sleep habits"
|
| 174 |
+
],
|
| 175 |
+
"resource_budget": {
|
| 176 |
+
"time": 20.0,
|
| 177 |
+
"money": 800.0,
|
| 178 |
+
"energy": 5.0
|
| 179 |
+
},
|
| 180 |
+
"difficulty": 3
|
| 181 |
+
},
|
| 182 |
+
{
|
| 183 |
+
"id": "d4_review",
|
| 184 |
+
"title": "Judgment Day",
|
| 185 |
+
"story": "A major performance review is in three days. Rumors of layoffs are circulating and the atmosphere is tense.",
|
| 186 |
+
"primary_disruption": {
|
| 187 |
+
"career.workload": 30.0,
|
| 188 |
+
"mental_wellbeing.stress_level": 25.0,
|
| 189 |
+
"relationships.romantic": -15.0,
|
| 190 |
+
"time.free_hours_per_week": -20.0
|
| 191 |
+
},
|
| 192 |
+
"decisions_required": [
|
| 193 |
+
"Pull all-nighters to prove worth",
|
| 194 |
+
"Start networking for new roles",
|
| 195 |
+
"Draft a defensive report"
|
| 196 |
+
],
|
| 197 |
+
"resource_budget": {
|
| 198 |
+
"time": 18.0,
|
| 199 |
+
"money": 0.0,
|
| 200 |
+
"energy": 80.0
|
| 201 |
+
},
|
| 202 |
+
"difficulty": 4
|
| 203 |
+
},
|
| 204 |
+
{
|
| 205 |
+
"id": "d4_move",
|
| 206 |
+
"title": "The Big Relocation",
|
| 207 |
+
"story": "You've decided to move across the country for growth. The logistics are a nightmare and friends are sad to see you go.",
|
| 208 |
+
"primary_disruption": {
|
| 209 |
+
"finances.liquidity": -50.0,
|
| 210 |
+
"relationships.social": -30.0,
|
| 211 |
+
"career.growth_trajectory": 20.0,
|
| 212 |
+
"time.admin_overhead": 30.0
|
| 213 |
+
},
|
| 214 |
+
"decisions_required": [
|
| 215 |
+
"Hire full-service movers",
|
| 216 |
+
"Host a series of farewell dinners",
|
| 217 |
+
"DIY pack everything"
|
| 218 |
+
],
|
| 219 |
+
"resource_budget": {
|
| 220 |
+
"time": 30.0,
|
| 221 |
+
"money": 1500.0,
|
| 222 |
+
"energy": 100.0
|
| 223 |
+
},
|
| 224 |
+
"difficulty": 4
|
| 225 |
+
},
|
| 226 |
+
{
|
| 227 |
+
"id": "d4_audit",
|
| 228 |
+
"title": "Tax Audit",
|
| 229 |
+
"story": "The IRS has flagged your last three years of returns. You need to dig through thousands of documents while paying a CPA.",
|
| 230 |
+
"primary_disruption": {
|
| 231 |
+
"finances.long_term_health": -20.0,
|
| 232 |
+
"mental_wellbeing.stress_level": 30.0,
|
| 233 |
+
"time.admin_overhead": 40.0,
|
| 234 |
+
"finances.liquidity": -15.0
|
| 235 |
+
},
|
| 236 |
+
"decisions_required": [
|
| 237 |
+
"Spend nights scanning receipts",
|
| 238 |
+
"Hire a tax lawyer",
|
| 239 |
+
"Try to settle immediately"
|
| 240 |
+
],
|
| 241 |
+
"resource_budget": {
|
| 242 |
+
"time": 25.0,
|
| 243 |
+
"money": 1000.0,
|
| 244 |
+
"energy": 40.0
|
| 245 |
+
},
|
| 246 |
+
"difficulty": 4
|
| 247 |
+
},
|
| 248 |
+
{
|
| 249 |
+
"id": "d5_friday",
|
| 250 |
+
"title": "Friday 6PM",
|
| 251 |
+
"story": "Your flight just got cancelled. Your card declined trying to rebook. Your boss moved Monday deadline to Sunday.",
|
| 252 |
+
"primary_disruption": {
|
| 253 |
+
"career.workload": 35.0,
|
| 254 |
+
"finances.liquidity": -40.0,
|
| 255 |
+
"mental_wellbeing.stress_level": 30.0,
|
| 256 |
+
"time.free_hours_per_week": -25.0
|
| 257 |
+
},
|
| 258 |
+
"decisions_required": [
|
| 259 |
+
"Book a bus and work on it",
|
| 260 |
+
"Call boss to negotiate",
|
| 261 |
+
"Crash at a nearby friend's"
|
| 262 |
+
],
|
| 263 |
+
"resource_budget": {
|
| 264 |
+
"time": 10.0,
|
| 265 |
+
"money": 500.0,
|
| 266 |
+
"energy": 60.0
|
| 267 |
+
},
|
| 268 |
+
"difficulty": 5
|
| 269 |
+
},
|
| 270 |
+
{
|
| 271 |
+
"id": "d5_storm",
|
| 272 |
+
"title": "The Perfect Storm",
|
| 273 |
+
"story": "Your firm lost its biggest client, your partner moved out, and your car got towed\u2014all on the same Tuesday.",
|
| 274 |
+
"primary_disruption": {
|
| 275 |
+
"career.stability": -30.0,
|
| 276 |
+
"relationships.romantic": -25.0,
|
| 277 |
+
"finances.debt_pressure": 35.0,
|
| 278 |
+
"physical_health.energy": -25.0
|
| 279 |
+
},
|
| 280 |
+
"decisions_required": [
|
| 281 |
+
"Find an emergency side hustle",
|
| 282 |
+
"Beg partner for a second chance",
|
| 283 |
+
"Take a mental health day"
|
| 284 |
+
],
|
| 285 |
+
"resource_budget": {
|
| 286 |
+
"time": 8.0,
|
| 287 |
+
"money": 200.0,
|
| 288 |
+
"energy": 20.0
|
| 289 |
+
},
|
| 290 |
+
"difficulty": 5
|
| 291 |
+
},
|
| 292 |
+
{
|
| 293 |
+
"id": "d5_burnout",
|
| 294 |
+
"title": "The Total Collapse",
|
| 295 |
+
"story": "You can't get out of bed. Your body has quit, your motivation is gone, and work emails are piling into the hundreds.",
|
| 296 |
+
"primary_disruption": {
|
| 297 |
+
"mental_wellbeing.motivation": -40.0,
|
| 298 |
+
"physical_health.sleep_quality": -30.0,
|
| 299 |
+
"career.satisfaction": -35.0,
|
| 300 |
+
"relationships.family": -20.0
|
| 301 |
+
},
|
| 302 |
+
"decisions_required": [
|
| 303 |
+
"Request indefinite medical leave",
|
| 304 |
+
"Disconnect all electronics",
|
| 305 |
+
"Let it all burn and sleep"
|
| 306 |
+
],
|
| 307 |
+
"resource_budget": {
|
| 308 |
+
"time": 40.0,
|
| 309 |
+
"money": 2000.0,
|
| 310 |
+
"energy": 0.0
|
| 311 |
+
},
|
| 312 |
+
"difficulty": 5
|
| 313 |
+
}
|
| 314 |
+
]
|
data/demo_signals.json
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"persona": "Jordan (PM at Series-B startup)",
|
| 3 |
+
"generated_at": "2026-04-25T09:00:00",
|
| 4 |
+
"note": "Pre-baked demo payload β represents a stressed product manager mid-sprint",
|
| 5 |
+
|
| 6 |
+
"gmail": {
|
| 7 |
+
"unread_count": 47,
|
| 8 |
+
"late_night_count": 8,
|
| 9 |
+
"weekend_count": 11,
|
| 10 |
+
"overtime_count": 14,
|
| 11 |
+
"social_activity": 3.2,
|
| 12 |
+
"work_pressure": 8.7,
|
| 13 |
+
"relationship_neglect_risk": 7.4,
|
| 14 |
+
"responsiveness": 2.1,
|
| 15 |
+
"email_overload": 9.4,
|
| 16 |
+
"work_bleeding_personal": 7.2,
|
| 17 |
+
"key_contacts": [
|
| 18 |
+
"priya.shah@acme-ventures.com",
|
| 19 |
+
"cto@startupco.io",
|
| 20 |
+
"hr@startupco.io",
|
| 21 |
+
"mom@gmail.com",
|
| 22 |
+
"alex@cofounder.io"
|
| 23 |
+
],
|
| 24 |
+
"notable_threads": [
|
| 25 |
+
{"subject": "URGENT: Board deck needs rework before Friday", "sender": "cto@startupco.io", "time": "11:47 PM"},
|
| 26 |
+
{"subject": "Re: Q2 roadmap β are we on track?", "sender": "priya.shah@acme-ventures.com", "time": "Saturday 10:12 AM"},
|
| 27 |
+
{"subject": "Have you eaten today?", "sender": "mom@gmail.com", "time": "7:03 PM"}
|
| 28 |
+
],
|
| 29 |
+
"summary": "47 unread. 8 emails sent after 10 PM. Board deck deadline pressure. Investor checking roadmap. Family reaching out."
|
| 30 |
+
},
|
| 31 |
+
|
| 32 |
+
"calendar": {
|
| 33 |
+
"week_occupancy_pct": 91,
|
| 34 |
+
"days_with_no_breaks": 4,
|
| 35 |
+
"avg_meeting_hours_per_day": 6.2,
|
| 36 |
+
"focus_blocks_count": 0,
|
| 37 |
+
"upcoming_deadlines": [
|
| 38 |
+
{"title": "Board Deck Final Draft", "due_in_hours": 38, "priority": "critical"},
|
| 39 |
+
{"title": "Sprint Review with Engineering", "due_in_hours": 52, "priority": "high"},
|
| 40 |
+
{"title": "Investor 1:1 (Priya Shah)", "due_in_hours": 72, "priority": "high"}
|
| 41 |
+
],
|
| 42 |
+
"back_to_back_blocks": 3,
|
| 43 |
+
"personal_events_this_week": 1,
|
| 44 |
+
"cancelled_personal_events": 2,
|
| 45 |
+
"summary": "91% of working hours booked. Zero deep-work blocks. Board deck in 38h. 3 back-to-back meeting chains. 2 personal events cancelled this week."
|
| 46 |
+
},
|
| 47 |
+
|
| 48 |
+
"fitness": {
|
| 49 |
+
"avg_sleep_hours": 5.3,
|
| 50 |
+
"sleep_quality_score": 38,
|
| 51 |
+
"resting_heart_rate": 82,
|
| 52 |
+
"hrv_score": 24,
|
| 53 |
+
"daily_steps_avg": 2800,
|
| 54 |
+
"active_minutes_avg": 9,
|
| 55 |
+
"stress_score": 78,
|
| 56 |
+
"recovery_score": 31,
|
| 57 |
+
"last_workout_days_ago": 9,
|
| 58 |
+
"summary": "5.3h sleep avg. Resting HR 82 bpm (elevated). HRV 24 (low β high stress load). 2,800 steps/day. Last workout 9 days ago."
|
| 59 |
+
},
|
| 60 |
+
|
| 61 |
+
"derived_metric_deltas": {
|
| 62 |
+
"career.workload": 28.0,
|
| 63 |
+
"mental_wellbeing.stress_level": 32.0,
|
| 64 |
+
"mental_wellbeing.focus_quality": -25.0,
|
| 65 |
+
"mental_wellbeing.emotional_regulation": -18.0,
|
| 66 |
+
"physical_health.sleep_quality": -30.0,
|
| 67 |
+
"physical_health.energy_level": -22.0,
|
| 68 |
+
"physical_health.exercise_consistency": -35.0,
|
| 69 |
+
"time.free_hours_per_week": -18.0,
|
| 70 |
+
"time.schedule_control": -24.0,
|
| 71 |
+
"relationships.romantic": -15.0,
|
| 72 |
+
"relationships.family": -12.0,
|
| 73 |
+
"finances.liquidity": 0.0
|
| 74 |
+
}
|
| 75 |
+
}
|
data/holdout_tasks.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{"id": "holdout_0", "seed": 9000, "domain": "flight_crisis"},
|
| 3 |
+
{"id": "holdout_1", "seed": 9001, "domain": "flight_crisis"},
|
| 4 |
+
{"id": "holdout_2", "seed": 9002, "domain": "code_merge_crisis"},
|
| 5 |
+
{"id": "holdout_3", "seed": 9003, "domain": "flight_crisis"},
|
| 6 |
+
{"id": "holdout_4", "seed": 9004, "domain": "code_merge_crisis"},
|
| 7 |
+
{"id": "holdout_5", "seed": 9005, "domain": "flight_crisis"},
|
| 8 |
+
{"id": "holdout_6", "seed": 9006, "domain": "code_merge_crisis"},
|
| 9 |
+
{"id": "holdout_7", "seed": 9007, "domain": "flight_crisis"},
|
| 10 |
+
{"id": "holdout_8", "seed": 9008, "domain": "code_merge_crisis"},
|
| 11 |
+
{"id": "holdout_9", "seed": 9009, "domain": "flight_crisis"}
|
| 12 |
+
]
|
data/reward_curve.png
ADDED
|
data/simperson_profiles.json
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"name": "Alex (High-Stress Executive)",
|
| 4 |
+
"openness": 0.4,
|
| 5 |
+
"conscientiousness": 0.9,
|
| 6 |
+
"extraversion": 0.7,
|
| 7 |
+
"agreeableness": 0.25,
|
| 8 |
+
"neuroticism": 0.8
|
| 9 |
+
},
|
| 10 |
+
{
|
| 11 |
+
"name": "Chloe (Laid-Back Creative)",
|
| 12 |
+
"openness": 0.9,
|
| 13 |
+
"conscientiousness": 0.2,
|
| 14 |
+
"extraversion": 0.5,
|
| 15 |
+
"agreeableness": 0.7,
|
| 16 |
+
"neuroticism": 0.15
|
| 17 |
+
},
|
| 18 |
+
{
|
| 19 |
+
"name": "Sam (Anxious Introvert)",
|
| 20 |
+
"openness": 0.5,
|
| 21 |
+
"conscientiousness": 0.6,
|
| 22 |
+
"extraversion": 0.1,
|
| 23 |
+
"agreeableness": 0.65,
|
| 24 |
+
"neuroticism": 0.9
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"name": "Maya (Balanced Family Person)",
|
| 28 |
+
"openness": 0.5,
|
| 29 |
+
"conscientiousness": 0.7,
|
| 30 |
+
"extraversion": 0.5,
|
| 31 |
+
"agreeableness": 0.95,
|
| 32 |
+
"neuroticism": 0.3
|
| 33 |
+
},
|
| 34 |
+
{
|
| 35 |
+
"name": "Leo (Ambitious Student)",
|
| 36 |
+
"openness": 0.85,
|
| 37 |
+
"conscientiousness": 0.8,
|
| 38 |
+
"extraversion": 0.4,
|
| 39 |
+
"agreeableness": 0.4,
|
| 40 |
+
"neuroticism": 0.55
|
| 41 |
+
}
|
| 42 |
+
]
|
data/training_log.json
ADDED
|
@@ -0,0 +1,526 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"episode": 1,
|
| 4 |
+
"reward": 1.6325,
|
| 5 |
+
"difficulty": 1,
|
| 6 |
+
"person": "Leo (Student)",
|
| 7 |
+
"conflicts_seen": [
|
| 8 |
+
"Forgotten Invoice"
|
| 9 |
+
],
|
| 10 |
+
"steps": 5
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"episode": 2,
|
| 14 |
+
"reward": 1.7879,
|
| 15 |
+
"difficulty": 2,
|
| 16 |
+
"person": "Chloe (Creative)",
|
| 17 |
+
"conflicts_seen": [
|
| 18 |
+
"The Surge",
|
| 19 |
+
"ESCALATED: The Surge"
|
| 20 |
+
],
|
| 21 |
+
"steps": 5
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"episode": 3,
|
| 25 |
+
"reward": 2.5763,
|
| 26 |
+
"difficulty": 1,
|
| 27 |
+
"person": "Chloe (Creative)",
|
| 28 |
+
"conflicts_seen": [
|
| 29 |
+
"Heated Group Chat",
|
| 30 |
+
"ESCALATED: Heated Group Chat"
|
| 31 |
+
],
|
| 32 |
+
"steps": 5
|
| 33 |
+
},
|
| 34 |
+
{
|
| 35 |
+
"episode": 4,
|
| 36 |
+
"reward": 2.5755,
|
| 37 |
+
"difficulty": 1,
|
| 38 |
+
"person": "Leo (Student)",
|
| 39 |
+
"conflicts_seen": [
|
| 40 |
+
"Heated Group Chat"
|
| 41 |
+
],
|
| 42 |
+
"steps": 5
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"episode": 5,
|
| 46 |
+
"reward": 2.5754,
|
| 47 |
+
"difficulty": 1,
|
| 48 |
+
"person": "Alex (Executive)",
|
| 49 |
+
"conflicts_seen": [
|
| 50 |
+
"Heated Group Chat"
|
| 51 |
+
],
|
| 52 |
+
"steps": 5
|
| 53 |
+
},
|
| 54 |
+
{
|
| 55 |
+
"episode": 6,
|
| 56 |
+
"reward": 2.5402,
|
| 57 |
+
"difficulty": 2,
|
| 58 |
+
"person": "Leo (Student)",
|
| 59 |
+
"conflicts_seen": [
|
| 60 |
+
"Cold Dinner",
|
| 61 |
+
"ESCALATED: Cold Dinner"
|
| 62 |
+
],
|
| 63 |
+
"steps": 5
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"episode": 7,
|
| 67 |
+
"reward": 2.5793,
|
| 68 |
+
"difficulty": 1,
|
| 69 |
+
"person": "Sam (Introvert)",
|
| 70 |
+
"conflicts_seen": [
|
| 71 |
+
"The Slump"
|
| 72 |
+
],
|
| 73 |
+
"steps": 5
|
| 74 |
+
},
|
| 75 |
+
{
|
| 76 |
+
"episode": 8,
|
| 77 |
+
"reward": 2.5574,
|
| 78 |
+
"difficulty": 2,
|
| 79 |
+
"person": "Maya (Family)",
|
| 80 |
+
"conflicts_seen": [
|
| 81 |
+
"Cold Dinner",
|
| 82 |
+
"ESCALATED: Cold Dinner"
|
| 83 |
+
],
|
| 84 |
+
"steps": 5
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"episode": 9,
|
| 88 |
+
"reward": 2.5277,
|
| 89 |
+
"difficulty": 2,
|
| 90 |
+
"person": "Sam (Introvert)",
|
| 91 |
+
"conflicts_seen": [
|
| 92 |
+
"The Surge"
|
| 93 |
+
],
|
| 94 |
+
"steps": 5
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"episode": 10,
|
| 98 |
+
"reward": 2.4812,
|
| 99 |
+
"difficulty": 2,
|
| 100 |
+
"person": "Alex (Executive)",
|
| 101 |
+
"conflicts_seen": [
|
| 102 |
+
"Check Engine Light",
|
| 103 |
+
"ESCALATED: Check Engine Light"
|
| 104 |
+
],
|
| 105 |
+
"steps": 5
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"episode": 11,
|
| 109 |
+
"reward": 2.4932,
|
| 110 |
+
"difficulty": 2,
|
| 111 |
+
"person": "Leo (Student)",
|
| 112 |
+
"conflicts_seen": [
|
| 113 |
+
"Check Engine Light"
|
| 114 |
+
],
|
| 115 |
+
"steps": 5
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"episode": 12,
|
| 119 |
+
"reward": 2.5473,
|
| 120 |
+
"difficulty": 2,
|
| 121 |
+
"person": "Leo (Student)",
|
| 122 |
+
"conflicts_seen": [
|
| 123 |
+
"The Surge",
|
| 124 |
+
"ESCALATED: The Surge"
|
| 125 |
+
],
|
| 126 |
+
"steps": 5
|
| 127 |
+
},
|
| 128 |
+
{
|
| 129 |
+
"episode": 13,
|
| 130 |
+
"reward": 2.5707,
|
| 131 |
+
"difficulty": 1,
|
| 132 |
+
"person": "Alex (Executive)",
|
| 133 |
+
"conflicts_seen": [
|
| 134 |
+
"The Slump",
|
| 135 |
+
"ESCALATED: The Slump"
|
| 136 |
+
],
|
| 137 |
+
"steps": 5
|
| 138 |
+
},
|
| 139 |
+
{
|
| 140 |
+
"episode": 14,
|
| 141 |
+
"reward": 2.5507,
|
| 142 |
+
"difficulty": 1,
|
| 143 |
+
"person": "Chloe (Creative)",
|
| 144 |
+
"conflicts_seen": [
|
| 145 |
+
"Forgotten Invoice",
|
| 146 |
+
"ESCALATED: Forgotten Invoice"
|
| 147 |
+
],
|
| 148 |
+
"steps": 5
|
| 149 |
+
},
|
| 150 |
+
{
|
| 151 |
+
"episode": 15,
|
| 152 |
+
"reward": 2.572,
|
| 153 |
+
"difficulty": 1,
|
| 154 |
+
"person": "Alex (Executive)",
|
| 155 |
+
"conflicts_seen": [
|
| 156 |
+
"Heated Group Chat"
|
| 157 |
+
],
|
| 158 |
+
"steps": 5
|
| 159 |
+
},
|
| 160 |
+
{
|
| 161 |
+
"episode": 16,
|
| 162 |
+
"reward": 2.5534,
|
| 163 |
+
"difficulty": 3,
|
| 164 |
+
"person": "Alex (Executive)",
|
| 165 |
+
"conflicts_seen": [
|
| 166 |
+
"The Opportunity"
|
| 167 |
+
],
|
| 168 |
+
"steps": 5
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"episode": 17,
|
| 172 |
+
"reward": 2.5396,
|
| 173 |
+
"difficulty": 3,
|
| 174 |
+
"person": "Leo (Student)",
|
| 175 |
+
"conflicts_seen": [
|
| 176 |
+
"Family SOS"
|
| 177 |
+
],
|
| 178 |
+
"steps": 5
|
| 179 |
+
},
|
| 180 |
+
{
|
| 181 |
+
"episode": 18,
|
| 182 |
+
"reward": 2.5572,
|
| 183 |
+
"difficulty": 2,
|
| 184 |
+
"person": "Alex (Executive)",
|
| 185 |
+
"conflicts_seen": [
|
| 186 |
+
"Cold Dinner",
|
| 187 |
+
"ESCALATED: Cold Dinner"
|
| 188 |
+
],
|
| 189 |
+
"steps": 5
|
| 190 |
+
},
|
| 191 |
+
{
|
| 192 |
+
"episode": 19,
|
| 193 |
+
"reward": 2.5503,
|
| 194 |
+
"difficulty": 3,
|
| 195 |
+
"person": "Maya (Family)",
|
| 196 |
+
"conflicts_seen": [
|
| 197 |
+
"The Warning Sign",
|
| 198 |
+
"ESCALATED: The Warning Sign"
|
| 199 |
+
],
|
| 200 |
+
"steps": 5
|
| 201 |
+
},
|
| 202 |
+
{
|
| 203 |
+
"episode": 20,
|
| 204 |
+
"reward": 2.5437,
|
| 205 |
+
"difficulty": 3,
|
| 206 |
+
"person": "Maya (Family)",
|
| 207 |
+
"conflicts_seen": [
|
| 208 |
+
"The Warning Sign",
|
| 209 |
+
"ESCALATED: The Warning Sign"
|
| 210 |
+
],
|
| 211 |
+
"steps": 5
|
| 212 |
+
},
|
| 213 |
+
{
|
| 214 |
+
"episode": 21,
|
| 215 |
+
"reward": 2.5045,
|
| 216 |
+
"difficulty": 2,
|
| 217 |
+
"person": "Alex (Executive)",
|
| 218 |
+
"conflicts_seen": [
|
| 219 |
+
"Check Engine Light"
|
| 220 |
+
],
|
| 221 |
+
"steps": 5
|
| 222 |
+
},
|
| 223 |
+
{
|
| 224 |
+
"episode": 22,
|
| 225 |
+
"reward": 2.5447,
|
| 226 |
+
"difficulty": 2,
|
| 227 |
+
"person": "Maya (Family)",
|
| 228 |
+
"conflicts_seen": [
|
| 229 |
+
"Cold Dinner",
|
| 230 |
+
"ESCALATED: Cold Dinner"
|
| 231 |
+
],
|
| 232 |
+
"steps": 5
|
| 233 |
+
},
|
| 234 |
+
{
|
| 235 |
+
"episode": 23,
|
| 236 |
+
"reward": 2.5427,
|
| 237 |
+
"difficulty": 3,
|
| 238 |
+
"person": "Leo (Student)",
|
| 239 |
+
"conflicts_seen": [
|
| 240 |
+
"Family SOS"
|
| 241 |
+
],
|
| 242 |
+
"steps": 5
|
| 243 |
+
},
|
| 244 |
+
{
|
| 245 |
+
"episode": 24,
|
| 246 |
+
"reward": 2.534,
|
| 247 |
+
"difficulty": 2,
|
| 248 |
+
"person": "Alex (Executive)",
|
| 249 |
+
"conflicts_seen": [
|
| 250 |
+
"The Surge",
|
| 251 |
+
"ESCALATED: The Surge"
|
| 252 |
+
],
|
| 253 |
+
"steps": 5
|
| 254 |
+
},
|
| 255 |
+
{
|
| 256 |
+
"episode": 25,
|
| 257 |
+
"reward": 2.5273,
|
| 258 |
+
"difficulty": 2,
|
| 259 |
+
"person": "Alex (Executive)",
|
| 260 |
+
"conflicts_seen": [
|
| 261 |
+
"The Surge"
|
| 262 |
+
],
|
| 263 |
+
"steps": 5
|
| 264 |
+
},
|
| 265 |
+
{
|
| 266 |
+
"episode": 26,
|
| 267 |
+
"reward": 2.5436,
|
| 268 |
+
"difficulty": 3,
|
| 269 |
+
"person": "Maya (Family)",
|
| 270 |
+
"conflicts_seen": [
|
| 271 |
+
"The Warning Sign"
|
| 272 |
+
],
|
| 273 |
+
"steps": 5
|
| 274 |
+
},
|
| 275 |
+
{
|
| 276 |
+
"episode": 27,
|
| 277 |
+
"reward": 2.5452,
|
| 278 |
+
"difficulty": 3,
|
| 279 |
+
"person": "Maya (Family)",
|
| 280 |
+
"conflicts_seen": [
|
| 281 |
+
"The Opportunity",
|
| 282 |
+
"ESCALATED: The Opportunity"
|
| 283 |
+
],
|
| 284 |
+
"steps": 5
|
| 285 |
+
},
|
| 286 |
+
{
|
| 287 |
+
"episode": 28,
|
| 288 |
+
"reward": 2.5287,
|
| 289 |
+
"difficulty": 2,
|
| 290 |
+
"person": "Chloe (Creative)",
|
| 291 |
+
"conflicts_seen": [
|
| 292 |
+
"The Surge",
|
| 293 |
+
"ESCALATED: The Surge"
|
| 294 |
+
],
|
| 295 |
+
"steps": 5
|
| 296 |
+
},
|
| 297 |
+
{
|
| 298 |
+
"episode": 29,
|
| 299 |
+
"reward": 2.4947,
|
| 300 |
+
"difficulty": 2,
|
| 301 |
+
"person": "Alex (Executive)",
|
| 302 |
+
"conflicts_seen": [
|
| 303 |
+
"Check Engine Light",
|
| 304 |
+
"ESCALATED: Check Engine Light"
|
| 305 |
+
],
|
| 306 |
+
"steps": 5
|
| 307 |
+
},
|
| 308 |
+
{
|
| 309 |
+
"episode": 30,
|
| 310 |
+
"reward": 2.5534,
|
| 311 |
+
"difficulty": 2,
|
| 312 |
+
"person": "Sam (Introvert)",
|
| 313 |
+
"conflicts_seen": [
|
| 314 |
+
"Cold Dinner"
|
| 315 |
+
],
|
| 316 |
+
"steps": 5
|
| 317 |
+
},
|
| 318 |
+
{
|
| 319 |
+
"episode": 31,
|
| 320 |
+
"reward": 2.5459,
|
| 321 |
+
"difficulty": 2,
|
| 322 |
+
"person": "Chloe (Creative)",
|
| 323 |
+
"conflicts_seen": [
|
| 324 |
+
"Cold Dinner"
|
| 325 |
+
],
|
| 326 |
+
"steps": 5
|
| 327 |
+
},
|
| 328 |
+
{
|
| 329 |
+
"episode": 32,
|
| 330 |
+
"reward": 2.4748,
|
| 331 |
+
"difficulty": 2,
|
| 332 |
+
"person": "Chloe (Creative)",
|
| 333 |
+
"conflicts_seen": [
|
| 334 |
+
"The Surge"
|
| 335 |
+
],
|
| 336 |
+
"steps": 5
|
| 337 |
+
},
|
| 338 |
+
{
|
| 339 |
+
"episode": 33,
|
| 340 |
+
"reward": 2.5597,
|
| 341 |
+
"difficulty": 2,
|
| 342 |
+
"person": "Chloe (Creative)",
|
| 343 |
+
"conflicts_seen": [
|
| 344 |
+
"Cold Dinner",
|
| 345 |
+
"ESCALATED: Cold Dinner"
|
| 346 |
+
],
|
| 347 |
+
"steps": 5
|
| 348 |
+
},
|
| 349 |
+
{
|
| 350 |
+
"episode": 34,
|
| 351 |
+
"reward": 2.4873,
|
| 352 |
+
"difficulty": 2,
|
| 353 |
+
"person": "Sam (Introvert)",
|
| 354 |
+
"conflicts_seen": [
|
| 355 |
+
"Check Engine Light",
|
| 356 |
+
"ESCALATED: Check Engine Light"
|
| 357 |
+
],
|
| 358 |
+
"steps": 5
|
| 359 |
+
},
|
| 360 |
+
{
|
| 361 |
+
"episode": 35,
|
| 362 |
+
"reward": 2.5366,
|
| 363 |
+
"difficulty": 3,
|
| 364 |
+
"person": "Leo (Student)",
|
| 365 |
+
"conflicts_seen": [
|
| 366 |
+
"Family SOS"
|
| 367 |
+
],
|
| 368 |
+
"steps": 5
|
| 369 |
+
},
|
| 370 |
+
{
|
| 371 |
+
"episode": 36,
|
| 372 |
+
"reward": 2.5337,
|
| 373 |
+
"difficulty": 3,
|
| 374 |
+
"person": "Maya (Family)",
|
| 375 |
+
"conflicts_seen": [
|
| 376 |
+
"The Opportunity"
|
| 377 |
+
],
|
| 378 |
+
"steps": 5
|
| 379 |
+
},
|
| 380 |
+
{
|
| 381 |
+
"episode": 37,
|
| 382 |
+
"reward": 2.5552,
|
| 383 |
+
"difficulty": 4,
|
| 384 |
+
"person": "Leo (Student)",
|
| 385 |
+
"conflicts_seen": [
|
| 386 |
+
"The Big Relocation",
|
| 387 |
+
"ESCALATED: The Big Relocation"
|
| 388 |
+
],
|
| 389 |
+
"steps": 5
|
| 390 |
+
},
|
| 391 |
+
{
|
| 392 |
+
"episode": 38,
|
| 393 |
+
"reward": 2.4982,
|
| 394 |
+
"difficulty": 3,
|
| 395 |
+
"person": "Chloe (Creative)",
|
| 396 |
+
"conflicts_seen": [
|
| 397 |
+
"Family SOS",
|
| 398 |
+
"ESCALATED: Family SOS"
|
| 399 |
+
],
|
| 400 |
+
"steps": 5
|
| 401 |
+
},
|
| 402 |
+
{
|
| 403 |
+
"episode": 39,
|
| 404 |
+
"reward": 2.4741,
|
| 405 |
+
"difficulty": 4,
|
| 406 |
+
"person": "Sam (Introvert)",
|
| 407 |
+
"conflicts_seen": [
|
| 408 |
+
"Judgment Day",
|
| 409 |
+
"ESCALATED: Judgment Day"
|
| 410 |
+
],
|
| 411 |
+
"steps": 5
|
| 412 |
+
},
|
| 413 |
+
{
|
| 414 |
+
"episode": 40,
|
| 415 |
+
"reward": 2.5425,
|
| 416 |
+
"difficulty": 3,
|
| 417 |
+
"person": "Maya (Family)",
|
| 418 |
+
"conflicts_seen": [
|
| 419 |
+
"The Opportunity"
|
| 420 |
+
],
|
| 421 |
+
"steps": 5
|
| 422 |
+
},
|
| 423 |
+
{
|
| 424 |
+
"episode": 41,
|
| 425 |
+
"reward": 2.5203,
|
| 426 |
+
"difficulty": 3,
|
| 427 |
+
"person": "Alex (Executive)",
|
| 428 |
+
"conflicts_seen": [
|
| 429 |
+
"Family SOS",
|
| 430 |
+
"ESCALATED: Family SOS"
|
| 431 |
+
],
|
| 432 |
+
"steps": 5
|
| 433 |
+
},
|
| 434 |
+
{
|
| 435 |
+
"episode": 42,
|
| 436 |
+
"reward": 2.5183,
|
| 437 |
+
"difficulty": 3,
|
| 438 |
+
"person": "Alex (Executive)",
|
| 439 |
+
"conflicts_seen": [
|
| 440 |
+
"Family SOS"
|
| 441 |
+
],
|
| 442 |
+
"steps": 5
|
| 443 |
+
},
|
| 444 |
+
{
|
| 445 |
+
"episode": 43,
|
| 446 |
+
"reward": 2.54,
|
| 447 |
+
"difficulty": 3,
|
| 448 |
+
"person": "Leo (Student)",
|
| 449 |
+
"conflicts_seen": [
|
| 450 |
+
"The Warning Sign"
|
| 451 |
+
],
|
| 452 |
+
"steps": 5
|
| 453 |
+
},
|
| 454 |
+
{
|
| 455 |
+
"episode": 44,
|
| 456 |
+
"reward": 2.5525,
|
| 457 |
+
"difficulty": 3,
|
| 458 |
+
"person": "Leo (Student)",
|
| 459 |
+
"conflicts_seen": [
|
| 460 |
+
"The Warning Sign",
|
| 461 |
+
"ESCALATED: The Warning Sign"
|
| 462 |
+
],
|
| 463 |
+
"steps": 5
|
| 464 |
+
},
|
| 465 |
+
{
|
| 466 |
+
"episode": 45,
|
| 467 |
+
"reward": 1.2349,
|
| 468 |
+
"difficulty": 4,
|
| 469 |
+
"person": "Leo (Student)",
|
| 470 |
+
"conflicts_seen": [
|
| 471 |
+
"Tax Audit"
|
| 472 |
+
],
|
| 473 |
+
"steps": 5
|
| 474 |
+
},
|
| 475 |
+
{
|
| 476 |
+
"episode": 46,
|
| 477 |
+
"reward": 2.497,
|
| 478 |
+
"difficulty": 4,
|
| 479 |
+
"person": "Sam (Introvert)",
|
| 480 |
+
"conflicts_seen": [
|
| 481 |
+
"The Big Relocation"
|
| 482 |
+
],
|
| 483 |
+
"steps": 5
|
| 484 |
+
},
|
| 485 |
+
{
|
| 486 |
+
"episode": 47,
|
| 487 |
+
"reward": 2.5601,
|
| 488 |
+
"difficulty": 4,
|
| 489 |
+
"person": "Maya (Family)",
|
| 490 |
+
"conflicts_seen": [
|
| 491 |
+
"The Big Relocation"
|
| 492 |
+
],
|
| 493 |
+
"steps": 5
|
| 494 |
+
},
|
| 495 |
+
{
|
| 496 |
+
"episode": 48,
|
| 497 |
+
"reward": 2.5492,
|
| 498 |
+
"difficulty": 4,
|
| 499 |
+
"person": "Maya (Family)",
|
| 500 |
+
"conflicts_seen": [
|
| 501 |
+
"Judgment Day",
|
| 502 |
+
"ESCALATED: Judgment Day"
|
| 503 |
+
],
|
| 504 |
+
"steps": 5
|
| 505 |
+
},
|
| 506 |
+
{
|
| 507 |
+
"episode": 49,
|
| 508 |
+
"reward": 2.5086,
|
| 509 |
+
"difficulty": 4,
|
| 510 |
+
"person": "Sam (Introvert)",
|
| 511 |
+
"conflicts_seen": [
|
| 512 |
+
"Judgment Day"
|
| 513 |
+
],
|
| 514 |
+
"steps": 5
|
| 515 |
+
},
|
| 516 |
+
{
|
| 517 |
+
"episode": 50,
|
| 518 |
+
"reward": 2.5578,
|
| 519 |
+
"difficulty": 3,
|
| 520 |
+
"person": "Maya (Family)",
|
| 521 |
+
"conflicts_seen": [
|
| 522 |
+
"The Warning Sign"
|
| 523 |
+
],
|
| 524 |
+
"steps": 5
|
| 525 |
+
}
|
| 526 |
+
]
|
docs/CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Contributing to LifeStack
|
| 2 |
+
|
| 3 |
+
This document defines the **documentation rule** for the project.
|
| 4 |
+
**Nothing ships without its matching doc entry.**
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## The Rule: Doc-First Development
|
| 9 |
+
|
| 10 |
+
Every change that adds, removes, or significantly modifies a feature must include
|
| 11 |
+
**all three** of the following before the commit is made:
|
| 12 |
+
|
| 13 |
+
| # | Action | Where |
|
| 14 |
+
|---|---|---|
|
| 15 |
+
| 1 | **Create or update a doc file** | `docs/<topic>.md` |
|
| 16 |
+
| 2 | **Update README.md** | File Structure table + relevant section |
|
| 17 |
+
| 3 | **Update `docs/INDEX.md`** | Add a one-line entry for the new doc |
|
| 18 |
+
|
| 19 |
+
> [!IMPORTANT]
|
| 20 |
+
> A pull request / commit that adds a new script, module, or feature **without**
|
| 21 |
+
> updating `docs/INDEX.md` and `README.md` is considered incomplete and should
|
| 22 |
+
> not be merged.
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## What Counts as "a Feature"
|
| 27 |
+
|
| 28 |
+
| Change type | Doc required? |
|
| 29 |
+
|---|---|
|
| 30 |
+
| New Python module (`core/`, `agent/`, `intake/`) | β
Yes β `docs/<module>.md` |
|
| 31 |
+
| New script (`scripts/*.py`) | β
Yes β entry in `docs/scripts.md` |
|
| 32 |
+
| New Gradio tab in `app.py` | β
Yes β entry in `docs/app.md` |
|
| 33 |
+
| New CLI argument to an existing script | β
Yes β update relevant doc |
|
| 34 |
+
| Bug fix with no API surface change | β No (but update changelog if breaking) |
|
| 35 |
+
| Refactor with no API surface change | β No |
|
| 36 |
+
| New environment variable / secret | β
Yes β update `docs/configuration.md` |
|
| 37 |
+
| New dependency in `requirements.txt` | β
Yes β note in relevant doc + README |
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Doc File Conventions
|
| 42 |
+
|
| 43 |
+
- All docs live in `docs/`. No `.md` files at repo root except `README.md` and this file.
|
| 44 |
+
- File names are lowercase with underscores: `docs/lifestack_env.md`, `docs/eval.md`.
|
| 45 |
+
- Each doc starts with a `# Title` h1 and a one-line summary.
|
| 46 |
+
- Use `## Overview`, `## Usage`, `## API / Parameters`, `## Examples` sections.
|
| 47 |
+
- Code blocks must have a language tag (` ```python `, ` ```bash `).
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## Checklist (copy into every PR / commit message)
|
| 52 |
+
|
| 53 |
+
```
|
| 54 |
+
Docs checklist:
|
| 55 |
+
[ ] docs/<topic>.md created or updated
|
| 56 |
+
[ ] docs/INDEX.md updated with new entry
|
| 57 |
+
[ ] README.md File Structure table updated
|
| 58 |
+
[ ] README.md Quickstart / relevant section updated (if CLI changed)
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## Docs Folder Structure
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
docs/
|
| 67 |
+
βββ INDEX.md β Master index of all docs (ALWAYS update this)
|
| 68 |
+
βββ CONTRIBUTING.md β This file β the rule
|
| 69 |
+
βββ lifestack_env.md β core/lifestack_env.py reference
|
| 70 |
+
βββ reward.md β core/reward.py reference
|
| 71 |
+
βββ task.md β core/task.py schema reference
|
| 72 |
+
βββ memory.md β agent/memory.py reference
|
| 73 |
+
βββ conflict_generator.md β agent/conflict_generator.py reference
|
| 74 |
+
βββ app.md β app.py Gradio interface reference
|
| 75 |
+
βββ eval.md β scripts/eval.py reference
|
| 76 |
+
βββ train_trl.md β scripts/train_trl.md reference
|
| 77 |
+
βββ scripts.md β All other scripts reference
|
| 78 |
+
βββ configuration.md β Env vars, secrets, openenv.yaml
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## Commit Message Format
|
| 84 |
+
|
| 85 |
+
```
|
| 86 |
+
<type>: <short description>
|
| 87 |
+
|
| 88 |
+
- <file changed>: <what changed>
|
| 89 |
+
- docs/<doc>.md: <created|updated>
|
| 90 |
+
- docs/INDEX.md: <added entry for X>
|
| 91 |
+
- README.md: <updated section Y>
|
| 92 |
+
|
| 93 |
+
Docs checklist: β
all three updated
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
Types: `feat` | `fix` | `refactor` | `docs` | `test` | `chore`
|
docs/DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,427 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Meta-R2: Complete HuggingFace Deployment Guide (Option A)
|
| 2 |
+
|
| 3 |
+
> This guide walks you through every single step to deploy Meta-R2 to HuggingFace using the cleanest architecture:
|
| 4 |
+
> - **Your trained model (500MB)** β uploaded as a **HuggingFace Model Repository**
|
| 5 |
+
> - **Your code + environment** β deployed as a **HuggingFace Space** (Docker)
|
| 6 |
+
>
|
| 7 |
+
> The Space will auto-download the model from the Model Repo at startup. No Git LFS. No 500MB in your code repo.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## πΊοΈ Architecture Overview
|
| 12 |
+
|
| 13 |
+
```
|
| 14 |
+
HuggingFace
|
| 15 |
+
βββ YOUR-USERNAME/lifestack-agent β Model Repo (the 500MB weights)
|
| 16 |
+
β βββ config.json
|
| 17 |
+
β βββ tokenizer.json
|
| 18 |
+
β βββ tokenizer_config.json
|
| 19 |
+
β βββ special_tokens_map.json
|
| 20 |
+
β βββ model.safetensors (or pytorch_model.bin)
|
| 21 |
+
β
|
| 22 |
+
βββ YOUR-USERNAME/meta-r2 [SPACE] β Code Repo (Docker Space)
|
| 23 |
+
βββ Dockerfile (already exists β
)
|
| 24 |
+
βββ requirements.txt (already exists β
)
|
| 25 |
+
βββ app_flask.py (entry point β
)
|
| 26 |
+
βββ core/ agent/ scripts/ ... (all your code β
)
|
| 27 |
+
βββ openenv.yaml (already exists β
)
|
| 28 |
+
β at startup
|
| 29 |
+
agent.py calls AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")
|
| 30 |
+
β HuggingFace downloads the model to the Space's /root/.cache/huggingface/
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
---
|
| 34 |
+
|
| 35 |
+
## β
Pre-Flight Checklist (Do These Before Anything Else)
|
| 36 |
+
|
| 37 |
+
Go through every item below before starting the upload steps.
|
| 38 |
+
|
| 39 |
+
### 1. Confirm Your Trained Model Files Exist
|
| 40 |
+
|
| 41 |
+
Unzip the 500MB file from Kaggle. Open the folder. You **must** see these files:
|
| 42 |
+
|
| 43 |
+
```
|
| 44 |
+
lifestack_model/
|
| 45 |
+
βββ config.json β REQUIRED
|
| 46 |
+
βββ tokenizer.json β REQUIRED
|
| 47 |
+
βββ tokenizer_config.json β REQUIRED
|
| 48 |
+
βββ special_tokens_map.json β REQUIRED (may be missing β check below)
|
| 49 |
+
βββ model.safetensors β REQUIRED (the big file)
|
| 50 |
+
OR
|
| 51 |
+
βββ pytorch_model.bin β (alternative format, also fine)
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
> **If any of these are missing**, the model is an incomplete checkpoint. Re-download or re-run training with `save_model=True` at the end of `train_trl.py`.
|
| 55 |
+
|
| 56 |
+
### 2. Confirm `requirements.txt` Is Correct
|
| 57 |
+
|
| 58 |
+
Your `requirements.txt` already has:
|
| 59 |
+
- `openenv-core>=0.2.3` β
(latest version, confirmed)
|
| 60 |
+
- `pydantic>=2.7.0` β
|
| 61 |
+
- `transformers>=4.40.0` β
(needed to download model from Hub)
|
| 62 |
+
- `torch>=2.0.0` β
|
| 63 |
+
|
| 64 |
+
**No changes needed** to `requirements.txt`.
|
| 65 |
+
|
| 66 |
+
### 3. Confirm the `Dockerfile` Entry Point
|
| 67 |
+
|
| 68 |
+
Your `Dockerfile` already runs:
|
| 69 |
+
```dockerfile
|
| 70 |
+
CMD ["python", "app_flask.py"]
|
| 71 |
+
```
|
| 72 |
+
This is correct. `app_flask.py` is the web server.
|
| 73 |
+
|
| 74 |
+
**No changes needed** to the `Dockerfile`.
|
| 75 |
+
|
| 76 |
+
### 4. Make Sure `.env` is in `.gitignore`
|
| 77 |
+
|
| 78 |
+
Check your `.gitignore` β it already has:
|
| 79 |
+
```
|
| 80 |
+
.env
|
| 81 |
+
```
|
| 82 |
+
β
Your `GROQ_API_KEY` will **never** be pushed to GitHub or HuggingFace by accident.
|
| 83 |
+
|
| 84 |
+
### 5. Make the One Required Code Change in `agent.py`
|
| 85 |
+
|
| 86 |
+
This is the only code edit required for Option A.
|
| 87 |
+
|
| 88 |
+
Open `/Users/dayalgupta/Desktop/Meta-R2/agent/agent.py` and find **lines 13β18**:
|
| 89 |
+
|
| 90 |
+
```python
|
| 91 |
+
# CURRENT CODE (lines 13-18):
|
| 92 |
+
self.api_key = os.getenv('GROQ_API_KEY')
|
| 93 |
+
self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
|
| 94 |
+
|
| 95 |
+
# Fallback to current directory if default existence
|
| 96 |
+
if not self.local_model_path and os.path.exists("./lifestack_model"):
|
| 97 |
+
self.local_model_path = "./lifestack_model"
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
**Change it to this** (replace `YOUR-USERNAME` with your actual HuggingFace username):
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
# UPDATED CODE:
|
| 104 |
+
self.api_key = os.getenv('GROQ_API_KEY')
|
| 105 |
+
self.local_model_path = local_model_path or os.getenv('LIFESTACK_MODEL_PATH')
|
| 106 |
+
|
| 107 |
+
# 1. Check for local folder (Kaggle / local dev)
|
| 108 |
+
if not self.local_model_path and os.path.exists("./lifestack_model"):
|
| 109 |
+
self.local_model_path = "./lifestack_model"
|
| 110 |
+
|
| 111 |
+
# 2. Fall back to HuggingFace Hub model repo (production / Space deployment)
|
| 112 |
+
if not self.local_model_path:
|
| 113 |
+
self.local_model_path = "YOUR-USERNAME/lifestack-agent"
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
**Why this works:** `AutoModelForCausalLM.from_pretrained()` (which already exists on line 41) accepts either a local folder path OR a HuggingFace Hub repo ID like `"username/repo-name"`. No other code change is needed.
|
| 117 |
+
|
| 118 |
+
### 6. Verify `lifestack_model/` Is NOT in Your Code Repo
|
| 119 |
+
|
| 120 |
+
Your model (500MB) should NOT be in the `Meta-R2` GitHub repository. Confirm:
|
| 121 |
+
```bash
|
| 122 |
+
ls /Users/dayalgupta/Desktop/Meta-R2/lifestack_model/
|
| 123 |
+
# Should print: "No such file or directory" OR "Empty directory"
|
| 124 |
+
```
|
| 125 |
+
If it has files, remove them:
|
| 126 |
+
```bash
|
| 127 |
+
rm -rf /Users/dayalgupta/Desktop/Meta-R2/lifestack_model/*
|
| 128 |
+
```
|
| 129 |
+
The folder can stay (it's referenced in the code) but must be empty.
|
| 130 |
+
|
| 131 |
+
---
|
| 132 |
+
|
| 133 |
+
## π¦ PART 1: Upload the Model to HuggingFace Hub
|
| 134 |
+
|
| 135 |
+
### Step 1.1 β Create a HuggingFace Account
|
| 136 |
+
|
| 137 |
+
Go to **https://huggingface.co** β click **Sign Up** β create your account. Remember your username (e.g., `dayal-gupta`) β you will use it everywhere.
|
| 138 |
+
|
| 139 |
+
### Step 1.2 β Create a New Model Repository
|
| 140 |
+
|
| 141 |
+
1. Go to **https://huggingface.co/new** (or click the `+` button β "New Model")
|
| 142 |
+
2. Fill in:
|
| 143 |
+
- **Owner:** your username
|
| 144 |
+
- **Model name:** `lifestack-agent` (this becomes `YOUR-USERNAME/lifestack-agent`)
|
| 145 |
+
- **License:** `MIT` (recommended for hackathons)
|
| 146 |
+
- **Visibility:** `Public` (required for the Space to download it without auth)
|
| 147 |
+
3. Click **Create Model**
|
| 148 |
+
|
| 149 |
+
You now have an empty model repo at `https://huggingface.co/YOUR-USERNAME/lifestack-agent`.
|
| 150 |
+
|
| 151 |
+
### Step 1.3 β Install the HuggingFace CLI
|
| 152 |
+
|
| 153 |
+
On your Mac terminal:
|
| 154 |
+
```bash
|
| 155 |
+
pip install huggingface_hub
|
| 156 |
+
huggingface-cli login
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
When prompted, go to **https://huggingface.co/settings/tokens** β click **New token** β name it anything β **Role: Write** β copy the token β paste it into the terminal.
|
| 160 |
+
|
| 161 |
+
### Step 1.4 β Upload the Model Files
|
| 162 |
+
|
| 163 |
+
Navigate to where your unzipped model folder is (e.g., Desktop) and run:
|
| 164 |
+
|
| 165 |
+
```bash
|
| 166 |
+
# Replace the path with wherever your unzipped model folder is:
|
| 167 |
+
huggingface-cli upload YOUR-USERNAME/lifestack-agent /path/to/your/lifestack_model/ .
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
**Example (if you unzipped on Desktop):**
|
| 171 |
+
```bash
|
| 172 |
+
huggingface-cli upload dayal-gupta/lifestack-agent /Users/dayalgupta/Desktop/lifestack_model/ .
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
This uploads ALL files from the local folder to the root of the HF repo. The `.` at the end means "upload to the root of the repo."
|
| 176 |
+
|
| 177 |
+
**This will take 3β8 minutes** for a 500MB file on a normal connection. You'll see a progress bar.
|
| 178 |
+
|
| 179 |
+
### Step 1.5 β Verify the Upload
|
| 180 |
+
|
| 181 |
+
Go to `https://huggingface.co/YOUR-USERNAME/lifestack-agent` in your browser.
|
| 182 |
+
|
| 183 |
+
You should see all files listed: `config.json`, `tokenizer.json`, `model.safetensors`, etc.
|
| 184 |
+
|
| 185 |
+
Click on `config.json` and confirm it contains `"model_type"` β this confirms the model is valid and complete.
|
| 186 |
+
|
| 187 |
+
### Step 1.6 β Add a Model Card (Optional but Impressive for Judges)
|
| 188 |
+
|
| 189 |
+
Click the **"Model Card"** tab on your repo page β click the pencil icon to edit β paste this:
|
| 190 |
+
|
| 191 |
+
```markdown
|
| 192 |
+
---
|
| 193 |
+
language: en
|
| 194 |
+
license: mit
|
| 195 |
+
tags:
|
| 196 |
+
- reinforcement-learning
|
| 197 |
+
- life-simulation
|
| 198 |
+
- grpo
|
| 199 |
+
- llama
|
| 200 |
+
- openenv
|
| 201 |
+
---
|
| 202 |
+
|
| 203 |
+
# LifeStack Agent β GRPO Fine-tuned
|
| 204 |
+
|
| 205 |
+
This model is the trained agent for [Meta-R2](https://huggingface.co/spaces/YOUR-USERNAME/meta-r2),
|
| 206 |
+
a reinforcement learning environment that simulates complex real-life decision-making scenarios.
|
| 207 |
+
|
| 208 |
+
Fine-tuned using GRPO (Group Relative Policy Optimization) via TRL on a custom reward function
|
| 209 |
+
spanning 23 life metrics across 6 domains: career, finances, relationships, physical health,
|
| 210 |
+
mental wellbeing, and time management.
|
| 211 |
+
|
| 212 |
+
## Usage
|
| 213 |
+
```python
|
| 214 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 215 |
+
model = AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")
|
| 216 |
+
tokenizer = AutoTokenizer.from_pretrained("YOUR-USERNAME/lifestack-agent")
|
| 217 |
+
```
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
Click **Save**.
|
| 221 |
+
|
| 222 |
+
---
|
| 223 |
+
|
| 224 |
+
## π PART 2: Deploy the Project as a HuggingFace Space
|
| 225 |
+
|
| 226 |
+
### Step 2.1 β Create a New Space
|
| 227 |
+
|
| 228 |
+
1. Go to **https://huggingface.co/new-space**
|
| 229 |
+
2. Fill in:
|
| 230 |
+
- **Owner:** your username
|
| 231 |
+
- **Space name:** `meta-r2`
|
| 232 |
+
- **License:** `MIT`
|
| 233 |
+
- **SDK:** Select **"Docker"** β very important, NOT Gradio or Streamlit
|
| 234 |
+
- **Visibility:** `Public`
|
| 235 |
+
3. Click **Create Space**
|
| 236 |
+
|
| 237 |
+
You now have an empty Space at `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2`.
|
| 238 |
+
|
| 239 |
+
### Step 2.2 β Connect Your GitHub Repository to the Space
|
| 240 |
+
|
| 241 |
+
This is the cleanest method β HuggingFace will auto-sync from your GitHub repo.
|
| 242 |
+
|
| 243 |
+
1. In your Space, click the **Settings** tab (gear icon)
|
| 244 |
+
2. Scroll down to **"Repository"** section
|
| 245 |
+
3. Click **"Link to a GitHub repository"**
|
| 246 |
+
4. Authorize HuggingFace to access your GitHub
|
| 247 |
+
5. Select the repo: `oki-dokii/Meta-R2`
|
| 248 |
+
6. Set branch: `main`
|
| 249 |
+
7. Click **Save**
|
| 250 |
+
|
| 251 |
+
Now every `git push` to `main` will automatically redeploy the Space.
|
| 252 |
+
|
| 253 |
+
**Alternative (manual push):** If you don't want to link GitHub, you can push directly to the HuggingFace Space repo:
|
| 254 |
+
|
| 255 |
+
```bash
|
| 256 |
+
cd /Users/dayalgupta/Desktop/Meta-R2
|
| 257 |
+
|
| 258 |
+
# Add HF Space as a second remote:
|
| 259 |
+
git remote add space https://huggingface.co/spaces/YOUR-USERNAME/meta-r2
|
| 260 |
+
|
| 261 |
+
# Push your code:
|
| 262 |
+
git push space main
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
### Step 2.3 β Add the `GROQ_API_KEY` Secret to the Space
|
| 266 |
+
|
| 267 |
+
Your app needs the Groq API key at runtime. **Never hardcode it.** HuggingFace Spaces have a Secrets system for this.
|
| 268 |
+
|
| 269 |
+
1. In your Space, click the **Settings** tab
|
| 270 |
+
2. Scroll down to **"Variables and secrets"**
|
| 271 |
+
3. Click **"New secret"**
|
| 272 |
+
4. Fill in:
|
| 273 |
+
- **Name:** `GROQ_API_KEY`
|
| 274 |
+
- **Value:** your actual Groq API key (get it from https://console.groq.com/keys)
|
| 275 |
+
5. Click **Save**
|
| 276 |
+
|
| 277 |
+
Your `agent.py` already reads this via `os.getenv('GROQ_API_KEY')` β
β no code change needed.
|
| 278 |
+
|
| 279 |
+
### Step 2.4 β Add `HF_TOKEN` Secret (Required to Download the Private Model)
|
| 280 |
+
|
| 281 |
+
If your model repo is **Public** (which we set in Step 1.2), you can **skip this step**.
|
| 282 |
+
|
| 283 |
+
If your model repo is **Private**, add another secret:
|
| 284 |
+
- **Name:** `HF_TOKEN`
|
| 285 |
+
- **Value:** your HuggingFace write token (same one from Step 1.3)
|
| 286 |
+
|
| 287 |
+
Then add this line at the top of `app_flask.py` (before any model-loading code):
|
| 288 |
+
```python
|
| 289 |
+
import os
|
| 290 |
+
from huggingface_hub import login
|
| 291 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 292 |
+
if hf_token:
|
| 293 |
+
login(token=hf_token)
|
| 294 |
+
```
|
| 295 |
+
|
| 296 |
+
### Step 2.5 β Trigger the First Build
|
| 297 |
+
|
| 298 |
+
After pushing your code (Step 2.2), the Space will automatically start building.
|
| 299 |
+
|
| 300 |
+
1. Go to your Space URL: `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2`
|
| 301 |
+
2. Click the **"App"** tab β you'll see a build log
|
| 302 |
+
3. The build will take **3β5 minutes** for the first time (Docker pulls base image, installs packages)
|
| 303 |
+
4. After build, it will show **"Running"** status β then the app will boot
|
| 304 |
+
|
| 305 |
+
**During the first boot**, the Space will call `AutoModelForCausalLM.from_pretrained("YOUR-USERNAME/lifestack-agent")` which will download the 500MB model. This takes about 60β90 seconds on HuggingFace infrastructure. **After the first boot, it is cached** and subsequent restarts are instant.
|
| 306 |
+
|
| 307 |
+
---
|
| 308 |
+
|
| 309 |
+
## π PART 3: Verify Everything is Working
|
| 310 |
+
|
| 311 |
+
### Step 3.1 β Check the Build Log
|
| 312 |
+
|
| 313 |
+
In your Space, click **"Logs"** tab. You should see:
|
| 314 |
+
|
| 315 |
+
```
|
| 316 |
+
β
Step 1/7 : FROM python:3.11-slim
|
| 317 |
+
β
Successfully built ...
|
| 318 |
+
β
Successfully tagged ...
|
| 319 |
+
```
|
| 320 |
+
|
| 321 |
+
If you see a red error, check the troubleshooting section below.
|
| 322 |
+
|
| 323 |
+
### Step 3.2 β Check the App Boot Log
|
| 324 |
+
|
| 325 |
+
After the build, click the **"App"** tab. In the log output you should see:
|
| 326 |
+
|
| 327 |
+
```
|
| 328 |
+
π¦ Loading local GRPO model from YOUR-USERNAME/lifestack-agent...
|
| 329 |
+
β
Local model LOADED.
|
| 330 |
+
* Running on http://0.0.0.0:7860
|
| 331 |
+
```
|
| 332 |
+
|
| 333 |
+
If you see `β οΈ Failed to load local model ... Falling back to Groq.` β the model download failed. Check that your HF model repo URL is correct in `agent.py` and the repo is public.
|
| 334 |
+
|
| 335 |
+
### Step 3.3 β Test the Live App
|
| 336 |
+
|
| 337 |
+
Go to `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` and click through the demo:
|
| 338 |
+
1. The web UI (served by `app_flask.py`) should load
|
| 339 |
+
2. Start an episode β the agent should respond with life decisions
|
| 340 |
+
3. Check that rewards are non-zero and steps > 5 (confirms the Task system is working)
|
| 341 |
+
|
| 342 |
+
---
|
| 343 |
+
|
| 344 |
+
## π οΈ Troubleshooting Common Issues
|
| 345 |
+
|
| 346 |
+
| Error | Cause | Fix |
|
| 347 |
+
|---|---|---|
|
| 348 |
+
| `ModuleNotFoundError: openenv` | Wrong package in requirements.txt | Confirm `openenv-core>=0.2.3` is in `requirements.txt` (not `openenv`) |
|
| 349 |
+
| `OSError: Can't load model` | Wrong repo ID in `agent.py` | Make sure it's `"YOUR-ACTUAL-USERNAME/lifestack-agent"` not literally `YOUR-USERNAME` |
|
| 350 |
+
| `Build failed: torch install timeout` | `torch>=2.0.0` is huge (2GB+) | Add `--extra-index-url https://download.pytorch.org/whl/cpu` to Dockerfile before pip install |
|
| 351 |
+
| `Port 7860 not responding` | `app_flask.py` binding to wrong interface | Confirm `app.run(host='0.0.0.0', port=7860)` at the bottom of `app_flask.py` |
|
| 352 |
+
| `GROQ_API_KEY not found` | Secret not set | Go to Space Settings β Variables and secrets β add `GROQ_API_KEY` |
|
| 353 |
+
| `Space keeps restarting` | Out of memory (free tier is 16GB RAM) | torch on CPU for 500MB model may OOM β see "Reducing Memory" note below |
|
| 354 |
+
|
| 355 |
+
### Reducing Memory Usage (If Space OOMs)
|
| 356 |
+
|
| 357 |
+
Free HuggingFace Spaces have 16GB RAM. Loading a 500MB model in float32 uses ~2GB RAM, which is fine. But if you face OOM, add this to `agent.py` line 41β44:
|
| 358 |
+
|
| 359 |
+
```python
|
| 360 |
+
self.local_model = AutoModelForCausalLM.from_pretrained(
|
| 361 |
+
self.local_model_path,
|
| 362 |
+
torch_dtype=torch.float16, # β half precision, halves memory
|
| 363 |
+
low_cpu_mem_usage=True, # β stream-loads, avoids peak RAM spike
|
| 364 |
+
device_map="cpu" # β explicitly CPU on free tier
|
| 365 |
+
)
|
| 366 |
+
```
|
| 367 |
+
|
| 368 |
+
---
|
| 369 |
+
|
| 370 |
+
## π Final Pre-Submission Checklist
|
| 371 |
+
|
| 372 |
+
Before submitting to the hackathon, verify every item:
|
| 373 |
+
|
| 374 |
+
- [ ] `https://huggingface.co/YOUR-USERNAME/lifestack-agent` exists and has all model files
|
| 375 |
+
- [ ] `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` shows **"Running"** status (green dot)
|
| 376 |
+
- [ ] The Space app loads in browser without errors
|
| 377 |
+
- [ ] The Space log shows `β
Local model LOADED` (not "Falling back to Groq")
|
| 378 |
+
- [ ] An episode runs and produces steps > 5 (confirms Task system is working)
|
| 379 |
+
- [ ] `GROQ_API_KEY` secret is set in Space settings (as fallback)
|
| 380 |
+
- [ ] The model repo has a Model Card explaining what it is
|
| 381 |
+
- [ ] Your `README.md` in the code repo links to both: the Space URL and the Model URL
|
| 382 |
+
- [ ] `agent.py` has been updated with `"YOUR-USERNAME/lifestack-agent"` as the HF Hub fallback
|
| 383 |
+
- [ ] `lifestack_model/` folder in your local `Meta-R2/` repo is empty (model not in code repo)
|
| 384 |
+
- [ ] All Bugs 1, 2, 3 are fixed and committed (they are β we did this already β
)
|
| 385 |
+
|
| 386 |
+
---
|
| 387 |
+
|
| 388 |
+
## π Quick Reference β All URLs
|
| 389 |
+
|
| 390 |
+
Replace `YOUR-USERNAME` with your HuggingFace username everywhere:
|
| 391 |
+
|
| 392 |
+
| What | URL |
|
| 393 |
+
|---|---|
|
| 394 |
+
| HuggingFace profile | `https://huggingface.co/YOUR-USERNAME` |
|
| 395 |
+
| Model repo | `https://huggingface.co/YOUR-USERNAME/lifestack-agent` |
|
| 396 |
+
| Space (live demo) | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` |
|
| 397 |
+
| Space settings (secrets) | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2/settings` |
|
| 398 |
+
| Space build logs | `https://huggingface.co/spaces/YOUR-USERNAME/meta-r2` β Logs tab |
|
| 399 |
+
| HuggingFace API tokens | `https://huggingface.co/settings/tokens` |
|
| 400 |
+
| Groq API keys | `https://console.groq.com/keys` |
|
| 401 |
+
|
| 402 |
+
---
|
| 403 |
+
|
| 404 |
+
## β‘ The Exact Commands to Run Right Now (In Order)
|
| 405 |
+
|
| 406 |
+
```bash
|
| 407 |
+
# 1. Install HF CLI
|
| 408 |
+
pip install huggingface_hub
|
| 409 |
+
|
| 410 |
+
# 2. Login (will prompt for token)
|
| 411 |
+
huggingface-cli login
|
| 412 |
+
|
| 413 |
+
# 3. Upload model (change the path to your unzipped model folder)
|
| 414 |
+
huggingface-cli upload YOUR-USERNAME/lifestack-agent /path/to/lifestack_model/ .
|
| 415 |
+
|
| 416 |
+
# 4. Make the agent.py code change (edit manually in VS Code, then):
|
| 417 |
+
cd /Users/dayalgupta/Desktop/Meta-R2
|
| 418 |
+
git add agent/agent.py
|
| 419 |
+
git commit -m "feat: add HuggingFace Hub model fallback for Option A deployment"
|
| 420 |
+
git push origin main
|
| 421 |
+
|
| 422 |
+
# 5. Push to HuggingFace Space (if not using GitHub auto-sync):
|
| 423 |
+
git remote add space https://huggingface.co/spaces/YOUR-USERNAME/meta-r2
|
| 424 |
+
git push space main
|
| 425 |
+
```
|
| 426 |
+
|
| 427 |
+
That's it. The Space will build and boot automatically.
|
docs/INDEX.md
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LifeStack β Documentation Index
|
| 2 |
+
|
| 3 |
+
> **Rule:** Every new feature, script, or module must add a one-line entry here.
|
| 4 |
+
> See [CONTRIBUTING.md](CONTRIBUTING.md) for the full documentation rule.
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Core Modules
|
| 9 |
+
|
| 10 |
+
| Doc | Module | Description |
|
| 11 |
+
|---|---|---|
|
| 12 |
+
| [lifestack_env.md](lifestack_env.md) | `core/lifestack_env.py` | Main OpenEnv environment β step, reset, observation, WorldEngine, PartialObsFilter |
|
| 13 |
+
| [reward.md](reward.md) | `core/reward.py` | Task-aware reward orchestrator with milestone, cascade, and efficiency components |
|
| 14 |
+
| [task.md](task.md) | `core/task.py` | Task / Route / Milestone / ExoEvent dataclass schema |
|
| 15 |
+
| [memory.md](memory.md) | `agent/memory.py` | ChromaDB-backed trajectory + feedback storage |
|
| 16 |
+
| [conflict_generator.md](conflict_generator.md) | `agent/conflict_generator.py` | ConflictEvent templates and TaskGenerator |
|
| 17 |
+
|
| 18 |
+
## Application
|
| 19 |
+
|
| 20 |
+
| Doc | File | Description |
|
| 21 |
+
|---|---|---|
|
| 22 |
+
| [app.md](app.md) | `app.py` | Gradio multi-tab interface β tabs, callbacks, module-level singletons |
|
| 23 |
+
|
| 24 |
+
## Scripts
|
| 25 |
+
|
| 26 |
+
| Doc | Script | Description |
|
| 27 |
+
|---|---|---|
|
| 28 |
+
| [eval.md](eval.md) | `scripts/eval.py` | Standalone random-baseline evaluation runner |
|
| 29 |
+
| [train_trl.md](train_trl.md) | `scripts/train_trl.py` | GRPO curriculum training via HuggingFace TRL + Unsloth |
|
| 30 |
+
| [scripts.md](scripts.md) | `scripts/` (others) | run_episode, smoke_test, test_lifestack, longitudinal_demo |
|
| 31 |
+
|
| 32 |
+
## Configuration & Operations
|
| 33 |
+
|
| 34 |
+
| Doc | File | Description |
|
| 35 |
+
|---|---|---|
|
| 36 |
+
| [configuration.md](configuration.md) | `.env`, `openenv.yaml` | Environment variables, secrets, server config |
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
*Last updated: 2026-04-23 β add a row here whenever a new doc is created.*
|
docs/app.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app.md β Gradio Interface Reference
|
| 2 |
+
|
| 3 |
+
`app.py` β Gradio multi-tab interactive interface for LifeStack.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
`app.py` is the entry point for the demo. It wires together all LifeStack modules into
|
| 10 |
+
a single Gradio `Blocks` application served on `http://127.0.0.1:7860`.
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Module-level Singletons
|
| 15 |
+
|
| 16 |
+
These are instantiated once at import time:
|
| 17 |
+
|
| 18 |
+
| Variable | Type | Purpose |
|
| 19 |
+
|---|---|---|
|
| 20 |
+
| `MEMORY` | `LifeStackMemory` | ChromaDB trajectory + feedback store |
|
| 21 |
+
| `AGENT` | `LifeStackAgent` | LLM-backed decision agent |
|
| 22 |
+
| `INTAKE` | `LifeIntake` | NL β structured conflict parser |
|
| 23 |
+
| `DEMO_CONFLICT` | `ConflictEvent` | Fixed "Friday 6PM" conflict for tab 1 |
|
| 24 |
+
| `DEMO_PREDICTOR` | `TrajectoryPredictor` | 7-day risk score tracker |
|
| 25 |
+
| `LONG_DEMO` | `LongitudinalDemo` | Arjun's multi-week journey |
|
| 26 |
+
| `GMAIL` | `GmailSignalExtractor` | Optional Gmail stress signal extractor |
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Tabs
|
| 31 |
+
|
| 32 |
+
| Tab | Label | Key Function |
|
| 33 |
+
|---|---|---|
|
| 34 |
+
| 1 | π― Live Demo | `run_demo(person_label, conflict_label)` |
|
| 35 |
+
| 2 | π Try Your Situation | `run_custom(situation, sliders..., gmail_signals)` |
|
| 36 |
+
| 3 | π Training Results | `load_training_tab()` |
|
| 37 |
+
| 4 | ποΈ Arjun's Journey | `LONG_DEMO.show_longitudinal_comparison()` |
|
| 38 |
+
| 5 | πΊοΈ Task Explorer | `load_demo_task()` |
|
| 39 |
+
| 6 | π¬ Follow-up | `submit_outcome_feedback(...)` |
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## Key Functions
|
| 44 |
+
|
| 45 |
+
### `submit_outcome_feedback(ep_id, score, domains_up, domains_down, notes, time_spent)`
|
| 46 |
+
|
| 47 |
+
Stores real-world outcome data into ChromaDB via `MEMORY.store_feedback(feedback)`.
|
| 48 |
+
|
| 49 |
+
> **Note:** Uses `MEMORY` (the module-level `LifeStackMemory` instance). The previously
|
| 50 |
+
> undefined `AGENT_MEMORY` reference was corrected to `MEMORY` on 2026-04-23.
|
| 51 |
+
|
| 52 |
+
### `run_demo(person_label, conflict_label)`
|
| 53 |
+
|
| 54 |
+
Generator β yields `(pred_html, before_html, narrative, decision_html)` tuples for each
|
| 55 |
+
animation frame. Runs cascade animation then agent intervention.
|
| 56 |
+
|
| 57 |
+
### `run_custom(situation, ...)`
|
| 58 |
+
|
| 59 |
+
Calls `INTAKE.full_intake()` to parse NL input, then `AGENT.get_action()`, steps the env,
|
| 60 |
+
returns `(life_html, after_html, plan_html)`.
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## Running
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
python app.py
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
Starts on port `7860` with `share=False`. Edit `__main__` block to change port/theme.
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## Change Log
|
| 75 |
+
|
| 76 |
+
| Date | Change |
|
| 77 |
+
|---|---|
|
| 78 |
+
| 2026-04-23 | `AGENT_MEMORY` undefined crash fixed β replaced with `MEMORY` in `submit_outcome_feedback` |
|
docs/configuration.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# configuration.md β Configuration Reference
|
| 2 |
+
|
| 3 |
+
Environment variables, secrets, and server configuration for LifeStack.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Environment Variables
|
| 8 |
+
|
| 9 |
+
Copy `.env.example` to `.env` and fill in values:
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
cp .env.example .env
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
| Variable | Required | Description |
|
| 16 |
+
|---|---|---|
|
| 17 |
+
| `OPENAI_API_KEY` | For agent/training | API key for the LLM agent and GRPO reward function |
|
| 18 |
+
| `GROQ_API_KEY` | Optional | Alternative fast-inference backend |
|
| 19 |
+
| `GMAIL_CREDENTIALS_PATH` | Optional | Path to Gmail OAuth2 credentials JSON |
|
| 20 |
+
|
| 21 |
+
> **Never commit `.env`** β it is listed in `.gitignore`.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## `openenv.yaml`
|
| 26 |
+
|
| 27 |
+
Defines the OpenEnv service manifest for MCP / REST integration.
|
| 28 |
+
|
| 29 |
+
```yaml
|
| 30 |
+
name: lifestack
|
| 31 |
+
version: "1.1.0"
|
| 32 |
+
entry: server.py
|
| 33 |
+
port: 8000
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
Edit this file if you rename the server entry point or change the port.
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## Gradio App
|
| 41 |
+
|
| 42 |
+
Configured in `app.py` `__main__` block:
|
| 43 |
+
|
| 44 |
+
```python
|
| 45 |
+
app.launch(
|
| 46 |
+
share=False,
|
| 47 |
+
server_port=7860,
|
| 48 |
+
show_error=True,
|
| 49 |
+
)
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
Change `server_port` or set `share=True` for a public Gradio link.
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## Docker
|
| 57 |
+
|
| 58 |
+
```bash
|
| 59 |
+
docker build -t lifestack:latest .
|
| 60 |
+
docker run -p 7860:7860 --env-file .env lifestack:latest
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
The `Dockerfile` installs `requirements.txt` and runs `python app.py`.
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## Change Log
|
| 68 |
+
|
| 69 |
+
| Date | Change |
|
| 70 |
+
|---|---|
|
| 71 |
+
| 2026-04-23 | Initial doc created |
|
docs/conflict_generator.md
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# conflict_generator.md β Conflict Generator Reference
|
| 2 |
+
|
| 3 |
+
`agent/conflict_generator.py` β ConflictEvent templates and TaskGenerator.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
Two parallel systems for generating crises:
|
| 10 |
+
|
| 11 |
+
| System | Purpose |
|
| 12 |
+
|---|---|
|
| 13 |
+
| `ConflictEvent` + `TEMPLATES` | 15 handcrafted conflicts at difficulty 1β5 |
|
| 14 |
+
| `TaskGenerator` | Generates long-horizon `Task` objects (two domains) |
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## `ConflictEvent` (Legacy)
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
@dataclass
|
| 22 |
+
class ConflictEvent:
|
| 23 |
+
id: str
|
| 24 |
+
title: str
|
| 25 |
+
story: str
|
| 26 |
+
primary_disruption: dict # Metric deltas applied on env reset
|
| 27 |
+
decisions_required: list[str]
|
| 28 |
+
resource_budget: dict # {"time", "money", "energy"}
|
| 29 |
+
difficulty: int # 1β5
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
### Helper functions
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
conflict = generate_conflict() # random from all 15
|
| 36 |
+
conflict = generate_conflict(difficulty=3) # difficulty-3 pool
|
| 37 |
+
escalated = escalate_conflict(conflict) # 1.4Γ disruption, 0.7Γ budget
|
| 38 |
+
new, reason = adaptive_escalate(conflict, agent_history) # auto-tune
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## `TaskGenerator`
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
generator = TaskGenerator()
|
| 47 |
+
task = generator.generate()
|
| 48 |
+
task = generator.generate(domain="flight_crisis", difficulty=4)
|
| 49 |
+
task = generator.generate(domain="code_merge_crisis")
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
### Supported Domains
|
| 53 |
+
|
| 54 |
+
| Domain | Goal |
|
| 55 |
+
|---|---|
|
| 56 |
+
| `flight_crisis` | Survive Airport Cancellation |
|
| 57 |
+
| `code_merge_crisis` | Resolve Production Outage |
|
| 58 |
+
|
| 59 |
+
Unknown domains fall back to `flight_crisis`.
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## Adding a New Domain
|
| 64 |
+
|
| 65 |
+
1. Add `generate_<domain>(self, difficulty) -> Task` to `TaskGenerator`.
|
| 66 |
+
2. Add to the `if/elif` in `generate()`.
|
| 67 |
+
3. Update this file and `docs/INDEX.md` and `README.md`.
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## Change Log
|
| 72 |
+
|
| 73 |
+
| Date | Change |
|
| 74 |
+
|---|---|
|
| 75 |
+
| 2026-04-23 | Initial doc created |
|
docs/eval.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# eval.py β Evaluation Runner Reference
|
| 2 |
+
|
| 3 |
+
`scripts/eval.py` β Standalone LifeStack evaluation runner using a random-action baseline.
|
| 4 |
+
|
| 5 |
+
No model, no GPU, no API key required.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Overview
|
| 10 |
+
|
| 11 |
+
Runs N independent episodes against `LifeStackEnv` using uniformly random actions as a
|
| 12 |
+
baseline policy. Prints a live per-episode table and aggregate statistics at the end.
|
| 13 |
+
|
| 14 |
+
Useful for:
|
| 15 |
+
- Verifying environment correctness after changes
|
| 16 |
+
- Establishing a random-baseline reward floor before training
|
| 17 |
+
- CI smoke checks (no external dependencies)
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
```bash
|
| 24 |
+
# Default: 10 episodes, any domain
|
| 25 |
+
python scripts/eval.py
|
| 26 |
+
|
| 27 |
+
# 20 episodes, flight_crisis domain only
|
| 28 |
+
python scripts/eval.py --episodes 20 --domain flight_crisis
|
| 29 |
+
|
| 30 |
+
# Verbose per-step output
|
| 31 |
+
python scripts/eval.py --episodes 5 --verbose
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## CLI Arguments
|
| 37 |
+
|
| 38 |
+
| Argument | Type | Default | Description |
|
| 39 |
+
|---|---|---|---|
|
| 40 |
+
| `--episodes` | `int` | `10` | Number of episodes to run |
|
| 41 |
+
| `--domain` | `str` | `None` | Optional domain filter passed to `TaskGenerator.generate()` |
|
| 42 |
+
| `--verbose` | flag | `False` | Print per-step action, reward, and done status |
|
| 43 |
+
|
| 44 |
+
Supported `--domain` values: `flight_crisis`, `code_merge_crisis` (or omit for random).
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Output
|
| 49 |
+
|
| 50 |
+
### Per-episode table
|
| 51 |
+
|
| 52 |
+
```
|
| 53 |
+
EP TOTAL REWARD STEPS DOMAIN SUCCESS
|
| 54 |
+
ββββ ββββββββββββ ββββββ ββββββββββββββββββββ βββββββ
|
| 55 |
+
1 0.3120 8 flight_crisis β
|
| 56 |
+
2 1.8450 12 code_merge_crisis β
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
### Aggregate stats
|
| 60 |
+
|
| 61 |
+
```
|
| 62 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 63 |
+
Episodes : 10
|
| 64 |
+
Mean Reward : 0.8231
|
| 65 |
+
Success Rate : 30.0%
|
| 66 |
+
Mean Steps : 10.4
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## Action Space (Random Baseline)
|
| 72 |
+
|
| 73 |
+
Each step samples uniformly from:
|
| 74 |
+
`execute`, `inspect`, `plan`, `wait`, `communicate`, `spend`, `delegate`
|
| 75 |
+
|
| 76 |
+
- `execute` actions target a real route ID from the active task.
|
| 77 |
+
- `inspect` actions target a real hidden-state key from the active task.
|
| 78 |
+
- Other actions apply a small random metric nudge and resource cost.
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## Change Log
|
| 83 |
+
|
| 84 |
+
| Date | Change |
|
| 85 |
+
|---|---|
|
| 86 |
+
| 2026-04-23 | File created β implements random baseline evaluation runner |
|
docs/lifestack_env.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# lifestack_env.py β Environment Reference
|
| 2 |
+
|
| 3 |
+
`core/lifestack_env.py` β The main OpenEnv-compatible RL environment for LifeStack.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
`LifeStackEnv` wraps the full simulation: metric cascades, world events, partial
|
| 10 |
+
observability, route execution, milestone tracking, and reward calculation.
|
| 11 |
+
|
| 12 |
+
Key classes in this file:
|
| 13 |
+
|
| 14 |
+
| Class | Role |
|
| 15 |
+
|---|---|
|
| 16 |
+
| `LifeStackAction` | Pydantic action schema (metric_changes, resource_cost, action_type, β¦) |
|
| 17 |
+
| `LifeStackObservation` | Pydantic observation schema (metrics, resources, step, done, reward, metadata) |
|
| 18 |
+
| `LifeStackState` | Internal state (current_metrics, budget, task, world_state, hidden_state, β¦) |
|
| 19 |
+
| `PartialObsFilter` | Converts full world state into the agent's partial observation |
|
| 20 |
+
| `WorldEngine` | Fires deterministic/probabilistic ExoEvents each step |
|
| 21 |
+
| `LifeStackEnv` | The environment itself β inherits from OpenEnv `Environment` |
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## API
|
| 26 |
+
|
| 27 |
+
### `LifeStackEnv.__init__(seed, task, max_steps=30)`
|
| 28 |
+
|
| 29 |
+
```python
|
| 30 |
+
env = LifeStackEnv()
|
| 31 |
+
env = LifeStackEnv(seed=42, max_steps=50)
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
### `LifeStackEnv.reset(...) -> LifeStackObservation`
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
obs = env.reset(task=my_task, episode_id="ep_001")
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
Parameters:
|
| 41 |
+
- `task` β a `Task` object (from `core/task.py`). Defaults to `FlightCrisisTask()`.
|
| 42 |
+
- `seed` β optional int for reproducibility.
|
| 43 |
+
- `conflict` β legacy `ConflictEvent` for metric disruption on reset.
|
| 44 |
+
- `budget` β dict with `time`, `money`, `energy` overrides.
|
| 45 |
+
- `person` β optional `SimPerson` for personality-driven drift.
|
| 46 |
+
|
| 47 |
+
### `LifeStackEnv.step(action) -> LifeStackObservation`
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
obs = env.step(LifeStackAction(action_type="execute", target="rebook_premium"))
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
Supported `action_type` values:
|
| 54 |
+
|
| 55 |
+
| Type | Effect |
|
| 56 |
+
|---|---|
|
| 57 |
+
| `inspect` | Reveals a hidden-state key into the observation |
|
| 58 |
+
| `execute` | Attempts to activate a Route by `target` (route id) |
|
| 59 |
+
| `wait` | Passes the step; triggers stress penalty after 4 consecutive waits |
|
| 60 |
+
| `rollback` | Reverts metrics/budget to the previous step (one-time per episode) |
|
| 61 |
+
| `plan` / `communicate` / `spend` / `delegate` | Apply `metric_changes` and `resource_cost` |
|
| 62 |
+
|
| 63 |
+
### `LifeStackEnv.render()`
|
| 64 |
+
|
| 65 |
+
Prints a colour-coded terminal summary of the current state and task progress.
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## PartialObsFilter
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
PartialObsFilter.filter(task, revealed_keys) -> dict
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
- Base: `task.visible_world` (always visible).
|
| 76 |
+
- Keys in `revealed_keys` that exist in `task.mutable_world` β added as-is.
|
| 77 |
+
- Keys in `revealed_keys` that exist in `task.hidden_state` β wrapped as
|
| 78 |
+
`{"value": <val>, "source": "inspect"}` to signal the agent they came from inspect.
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## Observation `metadata` fields
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
obs.metadata = {
|
| 86 |
+
"world_state": dict, # partial view after filter
|
| 87 |
+
"goal": str,
|
| 88 |
+
"active_route": str | None,
|
| 89 |
+
"milestones": list[str],
|
| 90 |
+
"events": list[str],
|
| 91 |
+
"success": bool,
|
| 92 |
+
"failure": bool,
|
| 93 |
+
"failure_reason": str,
|
| 94 |
+
"routes_remaining": int,
|
| 95 |
+
"breakdown": dict, # reward component breakdown
|
| 96 |
+
"info": list[str], # step-level diagnostic messages
|
| 97 |
+
}
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
Key `info` message prefixes:
|
| 101 |
+
|
| 102 |
+
| Prefix | Meaning |
|
| 103 |
+
|---|---|
|
| 104 |
+
| `INSPECT_REVEALED:` | Key added to inspected list |
|
| 105 |
+
| `INSPECT_REVEALED_HIDDEN:` | Key was in `hidden_state` β value included |
|
| 106 |
+
| `INSPECT_REDUNDANT:` | Key already revealed, no-op |
|
| 107 |
+
| `ROUTE_SUCCESS:` | Route executed and consequences applied |
|
| 108 |
+
| `ROUTE_BLOCKED:` | Route was closed by a prior ExoEvent |
|
| 109 |
+
| `PRECONDITIONS_FAILED:` | Route preconditions not met |
|
| 110 |
+
| `MILESTONE_UNLOCKED:` | A milestone condition was met |
|
| 111 |
+
| `EVENT_FIRED:` | An ExoEvent triggered this step |
|
| 112 |
+
| `WAIT_CAP_EXCEEDED:` | 4+ consecutive waits β stress penalty applied |
|
| 113 |
+
|
| 114 |
+
---
|
| 115 |
+
|
| 116 |
+
## End Conditions
|
| 117 |
+
|
| 118 |
+
| Condition | `done` | `success` | `failure` |
|
| 119 |
+
|---|---|---|---|
|
| 120 |
+
| `step_count >= max_steps` | β
| depends | β |
|
| 121 |
+
| All `success_conditions` met | β
| β
| β |
|
| 122 |
+
| `failure_condition` met | β
| β | β
|
|
| 123 |
+
| Any metric hits 0 | β
| β | β
|
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
## Change Log
|
| 128 |
+
|
| 129 |
+
| Date | Change |
|
| 130 |
+
|---|---|
|
| 131 |
+
| 2026-04-23 | `PartialObsFilter.filter()` now reads `mutable_world` + `hidden_state` directly from `Task`; removed `world` param; hidden keys wrapped with `source: inspect`; `INSPECT_REVEALED_HIDDEN` info message added |
|
docs/memory.md
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# memory.md β LifeStackMemory Reference
|
| 2 |
+
|
| 3 |
+
`agent/memory.py` β ChromaDB-backed trajectory and human-feedback storage.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
`LifeStackMemory` persists two types of data:
|
| 10 |
+
|
| 11 |
+
| Collection | What's stored |
|
| 12 |
+
|---|---|
|
| 13 |
+
| `collection` (trajectories) | Successful episode decisions β action type, reward, reasoning |
|
| 14 |
+
| `feedback_collection` | Real-world outcome feedback submitted via the Follow-up tab |
|
| 15 |
+
|
| 16 |
+
Only trajectories with `total_reward >= 2.0` are stored (threshold prevents noise).
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## API
|
| 21 |
+
|
| 22 |
+
### Instantiation
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from agent.memory import LifeStackMemory
|
| 26 |
+
|
| 27 |
+
memory = LifeStackMemory(silent=True) # default path
|
| 28 |
+
memory = LifeStackMemory(silent=True, path="./my_memory") # custom path
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
The module-level singleton in `app.py` is named `MEMORY`:
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
MEMORY = LifeStackMemory(silent=True)
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
### `store_trajectory(...)`
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
memory.store_trajectory(
|
| 41 |
+
conflict_title="Friday 6PM",
|
| 42 |
+
route_taken="communicate",
|
| 43 |
+
total_reward=2.5,
|
| 44 |
+
metrics_diff_str="career.workload: -15.0",
|
| 45 |
+
reasoning="Delegating resolved workload spike",
|
| 46 |
+
)
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Silently skips storage if `total_reward < 2.0`.
|
| 50 |
+
|
| 51 |
+
### `store_feedback(feedback: OutcomeFeedback)`
|
| 52 |
+
|
| 53 |
+
```python
|
| 54 |
+
from core.feedback import OutcomeFeedback
|
| 55 |
+
|
| 56 |
+
feedback = OutcomeFeedback(
|
| 57 |
+
episode_id="A1B2C3D4",
|
| 58 |
+
overall_effectiveness=8,
|
| 59 |
+
domains_improved=["career", "mental_wellbeing"],
|
| 60 |
+
domains_worsened=[],
|
| 61 |
+
unexpected_effects="Felt more confident",
|
| 62 |
+
resolution_time_hours=2.0,
|
| 63 |
+
)
|
| 64 |
+
memory.store_feedback(feedback)
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
Used by the **Follow-up** tab in `app.py`.
|
| 68 |
+
|
| 69 |
+
### `get_stats() -> dict`
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
stats = memory.get_stats()
|
| 73 |
+
# {
|
| 74 |
+
# "total_memories": 42,
|
| 75 |
+
# "average_reward": 2.71,
|
| 76 |
+
# "by_action_type": {"communicate": 18, "delegate": 12, ...}
|
| 77 |
+
# }
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
### `query(conflict_description, n_results=3) -> list[dict]`
|
| 81 |
+
|
| 82 |
+
Retrieves the most semantically similar past decisions for a given situation description.
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
## Change Log
|
| 87 |
+
|
| 88 |
+
| Date | Change |
|
| 89 |
+
|---|---|
|
| 90 |
+
| 2026-04-23 | `AGENT_MEMORY` reference in `app.py` corrected to `MEMORY` (the actual singleton) |
|
docs/reward.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# reward.md β Reward System Reference
|
| 2 |
+
|
| 3 |
+
`core/reward.py` β Task-aware reward orchestrator.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
Two reward functions are available:
|
| 10 |
+
|
| 11 |
+
| Function | Used when |
|
| 12 |
+
|---|---|
|
| 13 |
+
| `compute_reward(...)` | Legacy / no-task episodes |
|
| 14 |
+
| `compute_task_reward(...)` | All task-driven episodes (v2.0+) |
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## `compute_task_reward` β Components
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
reward = (0.35 Γ milestone) # Reaching key progress markers
|
| 22 |
+
+ (0.25 Γ completion) # Final goal achievement (binary 1.0 if any goal met)
|
| 23 |
+
+ (0.15 Γ outcome) # Isolated local metric improvement
|
| 24 |
+
+ (0.10 Γ replan_bonus) # Recovery after ExoEvents
|
| 25 |
+
+ (0.10 Γ efficiency) # Resource preservation relative to delta
|
| 26 |
+
+ (0.05 Γ reasoning) # Logical coherence & action alignment
|
| 27 |
+
+ penalties
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### Penalties
|
| 31 |
+
|
| 32 |
+
| Penalty | Value | Level | Trigger |
|
| 33 |
+
|---|---|---|---|
|
| 34 |
+
| `INACTION_PENALTY` | `-0.40` | Step | `actions_taken == 0` |
|
| 35 |
+
| `TASK_INACTION_PENALTY` | `-0.20` | Task | `actions_taken == 0` (additive to step penalty) |
|
| 36 |
+
| `CRITICAL_FLOOR_VIOLATION` | `-0.50` | Step | Any metric drops below 20 |
|
| 37 |
+
| `DEAD_END` | `-0.50` | Task | All viable routes closed without success |
|
| 38 |
+
| `CASCADE_SPREAD_WIDER` | `-0.30` | Step | Changes spread wider than disruption baseline |
|
| 39 |
+
| `RELATIONSHIP_COLLAPSE` | `-0.15` | Step | Relationships drop more than 20 points in one step |
|
| 40 |
+
| `CUMULATIVE_RELATIONSHIP_EROSION` | `-0.15` | Episode | Cumulative relationship drop more than 20 points |
|
| 41 |
+
| `PLAUSIBILITY_VIOLATION` | `-0.10 to -0.30` | Step | Implausible metric/cost ratio |
|
| 42 |
+
| `TIMEOUT` | `-0.20` | Task | Max steps reached without resolution |
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## Return Value
|
| 47 |
+
|
| 48 |
+
Both functions return `(reward: float, breakdown: dict)`, but the component keys differ slightly.
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
breakdown = {
|
| 52 |
+
"components": {
|
| 53 |
+
# compute_reward(...)
|
| 54 |
+
"outcome": float,
|
| 55 |
+
"containment": float,
|
| 56 |
+
"efficiency": float,
|
| 57 |
+
"preservation": float,
|
| 58 |
+
"format_compliance": float,
|
| 59 |
+
"plausibility": float,
|
| 60 |
+
"reasoning_alignment": float,
|
| 61 |
+
|
| 62 |
+
# compute_task_reward(...)
|
| 63 |
+
"local_metric_delta": float,
|
| 64 |
+
"milestone": float,
|
| 65 |
+
"completion": float,
|
| 66 |
+
"replan": float,
|
| 67 |
+
"reasoning": float,
|
| 68 |
+
"timeout_penalty": float,
|
| 69 |
+
},
|
| 70 |
+
"penalties_fired": list[str],
|
| 71 |
+
"base_reward": float,
|
| 72 |
+
"penalties_total": float,
|
| 73 |
+
}
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## Change Log
|
| 79 |
+
|
| 80 |
+
| Date | Change |
|
| 81 |
+
|---|---|
|
| 82 |
+
| 2026-04-23 | Initial doc created |
|
docs/scripts.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scripts.md β Other Scripts Reference
|
| 2 |
+
|
| 3 |
+
Reference for scripts not covered by dedicated doc files.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## `scripts/run_episode.py`
|
| 8 |
+
|
| 9 |
+
Runs a single full episode with the LLM agent (requires API key).
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
python scripts/run_episode.py
|
| 13 |
+
python scripts/run_episode.py --difficulty 3 --verbose
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
Returns a result dict with `total_reward`, `steps`, `domain`.
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## `scripts/train.py`
|
| 21 |
+
|
| 22 |
+
Legacy training loop (pre-TRL). Uses a simple policy gradient loop without curriculum.
|
| 23 |
+
Prefer `train_trl.py` for new training runs.
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## `scripts/smoke_test.py`
|
| 28 |
+
|
| 29 |
+
Quick sanity check β imports all core modules, resets the env once, takes one step.
|
| 30 |
+
No agent required. Exits with code 0 on success.
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
python scripts/smoke_test.py
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## `scripts/test_lifestack.py`
|
| 39 |
+
|
| 40 |
+
Full edge-case test suite (11 tests). Does not use pytest runner by default β
|
| 41 |
+
run directly or via `pytest scripts/test_lifestack.py`.
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
python scripts/test_lifestack.py
|
| 45 |
+
pytest scripts/test_lifestack.py -v
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
Tests requiring `OPENAI_API_KEY` are automatically skipped when the key is absent.
|
| 49 |
+
|
| 50 |
+
### Tests
|
| 51 |
+
|
| 52 |
+
| # | Name | What it checks |
|
| 53 |
+
|---|---|---|
|
| 54 |
+
| 1 | Cascade floor | Metrics never go below 0 |
|
| 55 |
+
| 2 | Cascade ceiling | Metrics never exceed 100 |
|
| 56 |
+
| 3 | Resource exhaustion | `deduct()` returns False without going negative |
|
| 57 |
+
| 4 | Inaction penalty | `INACTION_PENALTY` fires when `actions_taken=0` |
|
| 58 |
+
| 5 | Critical floor penalty | `CRITICAL_FLOOR_VIOLATION` fires below threshold |
|
| 59 |
+
| 6 | Cascade dampening | Second-order deltas < first-order delta |
|
| 60 |
+
| 7 | SimPerson uptake bounds | All uptake values in [0.1, 1.0] |
|
| 61 |
+
| 8 | Memory threshold | Only reward >= 2.0 stored |
|
| 62 |
+
| 9 | Episode termination | `done=True` after horizon steps |
|
| 63 |
+
| 10 | Task-driven smoke | Inspect + Route execute without crash |
|
| 64 |
+
| 11 | Full episode smoke | `run_episode()` returns float reward *(skipped without API key)* |
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## `scripts/longitudinal_demo.py`
|
| 69 |
+
|
| 70 |
+
Seeds Arjun's multi-week journey into ChromaDB and renders a comparison view.
|
| 71 |
+
Used by Tab 4 (Arjun's Journey) in `app.py`.
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## `scripts/validate_simperson.py`
|
| 76 |
+
|
| 77 |
+
Validates all `SimPerson` personality trait combinations produce valid uptake values.
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## Change Log
|
| 82 |
+
|
| 83 |
+
| Date | Change |
|
| 84 |
+
|---|---|
|
| 85 |
+
| 2026-04-23 | `test_lifestack.py` β `steps<=5` assertion fixed to `steps<=30`; `import pytest` added; `@pytest.mark.skipif` added to test 11 |
|
docs/task.md
ADDED
|
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# task.py β Task Schema Reference
|
| 2 |
+
|
| 3 |
+
`core/task.py` β Dataclass definitions for the LifeStack long-horizon episode schema.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
A `Task` is the complete specification of a single episode. It defines what the agent
|
| 10 |
+
must achieve, how the world can change around it, and what routes are available.
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Dataclasses
|
| 15 |
+
|
| 16 |
+
### `Task`
|
| 17 |
+
|
| 18 |
+
```python
|
| 19 |
+
@dataclass
|
| 20 |
+
class Task:
|
| 21 |
+
id: str # Unique task identifier
|
| 22 |
+
domain: str # e.g. "flight_crisis", "code_merge_crisis"
|
| 23 |
+
goal: str # Human-readable goal description
|
| 24 |
+
constraints: dict # e.g. {"budget_max": 800, "deadline_step": 10}
|
| 25 |
+
hidden_state: dict # Keys not visible without inspect
|
| 26 |
+
mutable_world: dict # Keys that can change during the episode
|
| 27 |
+
visible_world: dict # Keys always visible in the observation
|
| 28 |
+
success_conditions: list[dict] # [{key, value}] β all must be met
|
| 29 |
+
failure_conditions: list[dict] # [{key, value}] β any triggers failure
|
| 30 |
+
event_schedule: list[ExoEvent] # Deterministic/probabilistic events
|
| 31 |
+
viable_routes: list[Route] # Available action paths
|
| 32 |
+
milestones: list[Milestone] # Progress checkpoints
|
| 33 |
+
horizon: int # Max steps per episode
|
| 34 |
+
difficulty: int # 1β5 scale
|
| 35 |
+
domain_metadata: dict # Free-form extra info (e.g. {"story": "..."})
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
### `Route`
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
@dataclass
|
| 42 |
+
class Route:
|
| 43 |
+
id: str
|
| 44 |
+
name: str
|
| 45 |
+
description: str
|
| 46 |
+
required_action_types: list[str] # e.g. ["communicate", "spend"]
|
| 47 |
+
preconditions: dict # World/hidden state conditions that must be true
|
| 48 |
+
consequences: dict # World state mutations on success
|
| 49 |
+
closes_routes: list[str] # Route IDs that become unavailable after this
|
| 50 |
+
milestones_unlocked: list[str] # Milestone IDs unlocked on route success
|
| 51 |
+
final_reward: float # Bonus reward on route completion
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
### `Milestone`
|
| 55 |
+
|
| 56 |
+
```python
|
| 57 |
+
@dataclass
|
| 58 |
+
class Milestone:
|
| 59 |
+
id: str
|
| 60 |
+
description: str
|
| 61 |
+
condition_key: str # World/hidden state key to check
|
| 62 |
+
condition_value: Any # Value it must equal for milestone to be met
|
| 63 |
+
reward: float # Reward added when milestone is first reached
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
### `ExoEvent`
|
| 67 |
+
|
| 68 |
+
```python
|
| 69 |
+
@dataclass
|
| 70 |
+
class ExoEvent:
|
| 71 |
+
step: int # Step at which to fire (-1 = probabilistic each step)
|
| 72 |
+
probability: float # Firing probability if step == -1
|
| 73 |
+
id: str
|
| 74 |
+
description: str
|
| 75 |
+
world_mutation: dict # Applied to mutable_world on fire
|
| 76 |
+
hidden_state_mutation: dict # Applied to hidden_state on fire
|
| 77 |
+
closes_routes: list[str] # Routes closed when this event fires
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## Built-in Tasks
|
| 83 |
+
|
| 84 |
+
| Class | Domain | Description |
|
| 85 |
+
|---|---|---|
|
| 86 |
+
| `FlightCrisisTask` | `flight_crisis` | Cancelled flight β rebook or work from lounge |
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
+
## Creating a Custom Task
|
| 91 |
+
|
| 92 |
+
```python
|
| 93 |
+
from core.task import Task, Route, Milestone, ExoEvent
|
| 94 |
+
|
| 95 |
+
my_task = Task(
|
| 96 |
+
id="my_task",
|
| 97 |
+
domain="my_domain",
|
| 98 |
+
goal="Do the thing",
|
| 99 |
+
constraints={"budget_max": 500, "deadline_step": 8},
|
| 100 |
+
hidden_state={"secret_key": True},
|
| 101 |
+
mutable_world={},
|
| 102 |
+
visible_world={"public_info": "visible"},
|
| 103 |
+
success_conditions=[{"key": "done", "value": True}],
|
| 104 |
+
failure_conditions=[],
|
| 105 |
+
event_schedule=[],
|
| 106 |
+
viable_routes=[
|
| 107 |
+
Route(id="r1", name="Route One", description="...",
|
| 108 |
+
required_action_types=["execute"],
|
| 109 |
+
preconditions={}, consequences={"done": True},
|
| 110 |
+
closes_routes=[], milestones_unlocked=[], final_reward=1.0)
|
| 111 |
+
],
|
| 112 |
+
milestones=[],
|
| 113 |
+
horizon=20,
|
| 114 |
+
difficulty=2,
|
| 115 |
+
domain_metadata={"story": "A short story about the crisis."}
|
| 116 |
+
)
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
Then pass it to the environment:
|
| 120 |
+
|
| 121 |
+
```python
|
| 122 |
+
env = LifeStackEnv()
|
| 123 |
+
obs = env.reset(task=my_task)
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## Change Log
|
| 129 |
+
|
| 130 |
+
| Date | Change |
|
| 131 |
+
|---|---|
|
| 132 |
+
| 2026-04-23 | Initial doc created |
|
docs/train_trl.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# train_trl.py β GRPO Training Reference
|
| 2 |
+
|
| 3 |
+
`scripts/train_trl.py` β Curriculum GRPO training via HuggingFace TRL + Unsloth.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
Trains a small LLM (default: `Qwen2.5-1.5B-Instruct`) to resolve LifeStack life conflicts
|
| 10 |
+
using **Group Relative Policy Optimization (GRPO)**. Implements a success-based curriculum
|
| 11 |
+
that automatically increases difficulty when the agent's average reward exceeds 0.6.
|
| 12 |
+
|
| 13 |
+
Requires: `unsloth`, `trl`, `datasets`, `transformers`, `accelerate` (Colab / GPU).
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## Usage
|
| 18 |
+
|
| 19 |
+
```bash
|
| 20 |
+
# Full curriculum training (5 stages Γ 100 prompts)
|
| 21 |
+
python scripts/train_trl.py
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
No CLI args β edit constants at the top of the file to change stages/prompts/output dir.
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## Architecture
|
| 29 |
+
|
| 30 |
+
### Reward Functions (multi-signal GRPO)
|
| 31 |
+
|
| 32 |
+
| Function | Signal |
|
| 33 |
+
|---|---|
|
| 34 |
+
| `reward_format_fn` | JSON format compliance |
|
| 35 |
+
| `reward_plausibility_fn` | Penalises zero-cost metric changes |
|
| 36 |
+
| `reward_task_success_fn` | Core env-step outcome reward |
|
| 37 |
+
| `reward_milestone_fn` | Milestone progress bonus |
|
| 38 |
+
| `reward_reasoning_fn` | Planning coherence score |
|
| 39 |
+
| `reward_human_feedback_fn` | Alignment with past real-world outcome feedback |
|
| 40 |
+
|
| 41 |
+
### `get_lifestack_evaluation(completion, prompt) -> dict`
|
| 42 |
+
|
| 43 |
+
The central reward computation function. Parses the LLM's JSON completion, reconstructs
|
| 44 |
+
the Task from the prompt's `<SYSTEM_METADATA>` block, steps the env, and returns:
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
{
|
| 48 |
+
"reward": float,
|
| 49 |
+
"breakdown": dict, # from obs.metadata["breakdown"]
|
| 50 |
+
"action": LifeStackAction
|
| 51 |
+
}
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
Returns `{"reward": -0.5, "breakdown": {"error": ...}}` on any parse or env failure.
|
| 55 |
+
|
| 56 |
+
#### Task Construction Hardening (2026-04-23)
|
| 57 |
+
|
| 58 |
+
The `Task(...)` call inside `get_lifestack_evaluation` is wrapped in its own
|
| 59 |
+
`try/except`. On exception, logs `[reward] Task construction failed: <error>` and
|
| 60 |
+
returns the `-0.5` fallback immediately. A field-presence check on
|
| 61 |
+
`(id, goal, constraints, mutable_world, visible_world)` follows construction.
|
| 62 |
+
|
| 63 |
+
### Curriculum (`train_curriculum`)
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
Stage 1: difficulty=1 β train β eval β if avg_reward > 0.6: difficulty++
|
| 67 |
+
Stage 2: difficulty=2 β ...
|
| 68 |
+
...
|
| 69 |
+
Stage 5: difficulty=5 β final save
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
### Dataset (`generate_dataset`)
|
| 73 |
+
|
| 74 |
+
Generates `N` prompts by:
|
| 75 |
+
1. Sampling a `TaskGenerator` task (flight_crisis or code_merge_crisis)
|
| 76 |
+
2. Merging a legacy `ConflictEvent` disruption for variety
|
| 77 |
+
3. Cascading the disruption through the `DependencyGraph`
|
| 78 |
+
4. Embedding task metadata in a `<SYSTEM_METADATA>` block for reward reconstruction
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## Outputs
|
| 83 |
+
|
| 84 |
+
| Path | Contents |
|
| 85 |
+
|---|---|
|
| 86 |
+
| `./lifestack_model/` | Final saved model + tokenizer |
|
| 87 |
+
| `./lifestack_model/stage_N/` | Per-stage checkpoints |
|
| 88 |
+
| `training_logs/generations.jsonl` | Sampled generations (every 20 reward calls) |
|
| 89 |
+
| `grpo_reward_curve.png` | 50-episode eval reward curve |
|
| 90 |
+
|
| 91 |
+
---
|
| 92 |
+
|
| 93 |
+
## Change Log
|
| 94 |
+
|
| 95 |
+
| Date | Change |
|
| 96 |
+
|---|---|
|
| 97 |
+
| 2026-04-23 | `Task()` construction wrapped in try/except + field validation; returns -0.5 fallback on failure |
|