File size: 16,660 Bytes
77da5ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
# LifeStack Hackathon Sprint β€” Implementation Plan

## Context

**Submission deadline:** 26 Apr 5 PM. Offline from 25 Apr 8 AM. ~30 hours of offline build time.

The LifeStack Flask demo (`app_flask.py` + `templates/index.html`) already ships 10 API endpoints, a 6-tab UI, and a working agent/memory/cascade/reward pipeline. This sprint adds **13 additive features** (demo panels, APIs, RLHF loop, multi-step training, real-data connectors, tests, blog) without breaking existing endpoints. All work is additive.

Budget: **$90 HF credits** β€” T4 Small for the always-on demo Space, A10G for GRPO training runs, HF Inference API for the NLP panel. Target trained checkpoint: **`jdsb06/lifestack-grpo-v2`** (user will push).

Key reusable primitives already in repo (do not rebuild):
- `core/cascade_utils.py:5 animate_cascade()` β€” returns list of 4 frames with `flat` + `status` dicts
- `agent/counterfactuals.py:10 generate_counterfactuals()` β€” returns list of alternatives
- `agent/memory.py:74 LifeStackMemory.store_trajectory()` and `:128 store_feedback(OutcomeFeedback)`
- `core/feedback.py OutcomeFeedback` + `compute_human_feedback_reward()`
- `core/life_state.py:61 LifeMetrics.flatten()` β€” 23 metric paths
- `agent/conflict_generator.py TEMPLATES` (13 scenarios) + `generate_conflict()`
- `core/metric_schema.py VALID_METRIC_PATHS`

Already wired in `app_flask.py`: `/api/feedback/submit` (Feature 9 backend is done β€” scope of F9 reduces to frontend panel + training integration); `/api/simulation/cascade` (kept intact, new `/api/cascade/frames` added alongside).

---

## Implementation Order (Offline Sprint)

1. F1 Trained-vs-Baseline comparison (impact demo)
2. F5 Domain risk heatmap (sidebar, always visible)
3. F3 "Try Your Own" NLP + HF Inference fallback
4. F2 D3 cascade visualisation
5. F4 Personality comparison with OCEAN radar
6. F6 Counterfactual explorer panel
7. F8 Multi-step GRPO training loop + `push_to_hub`
8. F9 RLHF feedback panel + training integration
9. F7 Cold-vs-warm memory ablation demo
10. F10 Health + calendar uploads
11. F11 BLOG.md (~700 words)
12. F12 Four tests
13. F13 Episode history/replay

Before starting, run smoke tests (`scripts/smoke_test.py`, `scripts/eval.py --episodes 5`, cascade/counterfactual imports). Fix before adding features.

---

## Cross-Cutting Changes

### `requirements.txt` β€” add
- `huggingface_hub` (for F3 InferenceClient and F8 push_to_hub)
- `icalendar` (F10 calendar upload)

### `intake/intake.py` β€” LLM fallback chain (F3 dependency)
Refactor `_call_llm()` (~line 44) to cascade: **HF Inference API (`HF_TOKEN`) β†’ Groq (`GROQ_API_KEY`) β†’ empty-string fallback** (existing behaviour). `LifeIntake.__init__` constructs both an `InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=HF_TOKEN)` when `HF_TOKEN` is present and the existing Groq `OpenAI` client when `GROQ_API_KEY` is present. `extract_conflict()` already returns an empty `ConflictEvent` when the LLM returns empty β€” keyword fallback below strengthens that path.

**Keyword fallback:** add `_match_template_by_keywords(text: str) -> ConflictEvent | None` that scans `TEMPLATES` for overlap with user text and returns the best match. Called inside `extract_conflict()` when both LLM clients fail.

### `app_flask.py` β€” shared helpers (used by F1, F4, F5, F7)
- `_run_episode(person, conflict, steps, seed, agent_fn) -> list[step_dict]`: initialises a fresh `LifeStackEnv`, applies the conflict disruption, loops `steps` iterations calling `agent_fn(metrics, budget, conflict, person)` to pick an action, runs `env.step()`, and collects `{step, action_type, target, reward, metrics, cost}`. `agent_fn` is injected so F1 can pass a random-action picker and a `LifeStackAgent.get_action`-wrapped version.
- `_random_action(metrics, budget, conflict, person) -> AgentAction`: samples uniformly from `core.action_space.EXAMPLE_ACTIONS` (line 98–196) and jitters `metric_changes` slightly so the baseline isn't deterministic. Same return shape as `AGENT.get_action()`.
- `compute_domain_health(flat_metrics: dict) -> dict[str, float]`: averages sub-metrics per domain, inverts `INVERTED_METRICS` (line 67, already defined), returns `{career, finances, relationships, physical_health, mental_wellbeing, time}` each in [0,1].

### `templates/index.html` β€” UI integration pattern
Every new feature adds one new tab button in the nav bar (line 37–44) and one content `<div id="content-X">` in the main section (line 46–202). Reuse existing classes: `.glass`, `.tab-active`, `.metric-bar`, Tailwind (`.rounded-2xl`, `.p-6`, `.space-y-6`, `.grid grid-cols-2 gap-6`, `.text-slate-400`, `.bg-indigo-500/10`). Chart.js is already loaded via CDN (line 8); D3 v7 to be added.

---

## Feature-by-Feature

### F1 β€” Trained vs Baseline Comparison
**Backend β€” `app_flask.py`:**
- `POST /api/comparison/run` β†’ body `{conflict, person, steps=5, seed=42}`.
  - Resolve `conflict` via `CONFLICT_CHOICES`, `person` via `PERSONS`.
  - Call `_run_episode(..., agent_fn=_random_action)` β†’ `baseline`.
  - Call `_run_episode(..., agent_fn=lambda m,b,c,p: AGENT.get_action(m,b,c,p))` with identical seed β†’ `trained`.
  - Compute `reward_delta = sum(trained_rewards) - sum(baseline_rewards)`.
  - Return `{baseline: [...], trained: [...], reward_delta}`.

**Frontend:**
- New tab "Comparison". Two side-by-side `.glass` cards titled "Baseline (Random)" and "GRPO-Trained". For each step, render action-type badge + reward bar. Delta banner at the bottom (`bg-indigo-500/10`) showing `+X.XX`.

### F2 β€” Live Cascade Visualisation (D3)
**Backend:**
- `POST /api/cascade/frames` β†’ body `{primary_disruption: {metric_path: delta}}`. Calls `animate_cascade(primary_disruption, LifeMetrics())` and returns `{frames}`. Keeps existing `/api/simulation/cascade` untouched.

**Frontend:**
- Add D3 v7 CDN line in `<head>`.
- New section inside the "Situational Portal" tab (below the existing cascade timeline at line ~70): `<svg id="cascade-graph" width="720" height="420">`.
- JS module `renderCascade(frames)`: creates 23 nodes from `VALID_METRIC_PATHS`, clusters by domain (6 cluster centres at: career TL, finances TR, relationships ML, physical_health MR, mental_wellbeing BC, time TC), draws edges from a hardcoded copy of the 20+ edges in `DependencyGraph.edges`. Iterates frames with 600ms `setTimeout`, recolouring nodes based on `frames[i].status[metric]`: `unchanged→#334155`, `primary→#ef4444`, `first→#f97316`, `second→#facc15`.
- Called from the existing simulation-action flow after each `/api/simulation/action` response.

### F3 β€” "Try Your Own Situation" NLP Panel
**Backend:**
- `/api/custom/run` already exists (line 162) and is fully wired. No route changes.
- `intake/intake.py` cross-cutting change above adds HF→Groq→keyword fallback.

**Frontend:**
- Existing "Try Your Case" tab (`#tab-custom`) is currently slider-heavy. Add a prominent textarea + Submit above the sliders. On submit, `fetch('/api/custom/run', {situation: text})` β†’ render a card with detected domain(s), recommended action type/target, metric deltas as coloured badges (green for positive on positive-sense metrics, red otherwise, using `INVERTED_METRICS` set), reward bar.

### F4 β€” Personality Comparison
**Backend:**
- `POST /api/personality/compare` β†’ body `{conflict_id="d5_friday", person_a, person_b, steps=3}`.
  - Look up persons from `PERSONS`. Run `_run_episode` twice with the trained agent on the same conflict + seed.
  - Return `{person_a: {name, actions, total_reward, ocean: {O,C,E,A,N}}, person_b: {...}, dominant_trait: "neuroticism"}` where `dominant_trait = argmax(|ocean_a[t] - ocean_b[t]|)`.

**Frontend:**
- New tab "Personality". Two `.glass` columns. Each has a Chart.js radar chart (already CDN-loaded) with 5 axes (OCEAN). Below the radar: action sequence + total reward. Banner highlighting the dominant trait.

### F5 β€” Domain Risk Heatmap
**Backend:** `compute_domain_health()` helper added (cross-cutting section). Every response from `/api/simulation/start`, `/api/simulation/action`, `/api/custom/run` gets an extra `domain_health` field derived from the metrics already in the payload β€” no new route.

**Frontend:** Persistent top bar above tab nav (inserted at ~line 35): 6 cells (2Γ—3 grid on small, 6Γ—1 on large). Each cell shows the domain emoji from `DOMAIN_EMOJI` and a pill background coloured via `hsl((1 - h) * 120, 70%, 45%)`. Re-rendered from every simulation response.

### F6 β€” Counterfactual Explorer
**Backend:**
- `POST /api/counterfactuals/generate` β†’ body `{conflict, person, chosen_action: {...}}`. Reconstructs state, calls `generate_counterfactuals(AGENT, metrics, budget, conflict, person, chosen_action)`, returns `{chosen: {...}, alternatives: [3 items from the list]}`. (Counterfactuals already appear inside `/api/simulation/action` response β€” this route is the on-demand variant Feature 6 wants.)

**Frontend:** "What If?" collapsible panel appended below each step output. 3 alternative cards sorted by predicted reward. Chosen action outlined in indigo, best alt in green, worst in red.

### F7 β€” Memory Ablation (Cold vs Warm)
**Backend:**
- `POST /api/memory/ablation` β†’ body `{conflict, person, steps=5}`.
  - Episode 1: pass `memory=None` (or a fresh `LifeStackAgent()` with empty `.memory`). Record actions + rewards.
  - `MEMORY.store_trajectory(conflict_title=..., route_taken=..., total_reward=..., reasoning=...)` for episode 1.
  - Episode 2: reuse `AGENT` (global β€” has ChromaDB via `MEMORY`). Query `MEMORY` for similar trajectories (existing retrieval method) and pass the top-k summary into `get_action`'s `few_shot_context` param.
  - Return `{cold: {actions, reward}, warm: {actions, reward, retrieved_context}, improvement_pct}`.

**Frontend:** Two-column timeline in a new "Memory" tab. Callout box with `πŸ’‘ Agent recalled: …` when warm has retrieved context. Big percentage banner at the bottom.

### F8 β€” Multi-Step GRPO Training
**`scripts/train_trl.py` (currently 914 lines, single-prompt per scenario):**
- Add `run_full_episode(task, person, model, tokenizer, max_steps=10) -> tuple[list[step_reward], dict]`:
  - For each step: build prompt from current `LifeMetrics` + `ResourceBudget` + conflict, call `model.generate`, parse JSON action, call `env.step()`, append step reward from existing `compute_task_reward()`.
  - Return per-step rewards and a serialised trajectory.
- New CLI flag `--full-episode`. When set, `generate_dataset()` is replaced by `generate_episodic_dataset()` which calls `run_full_episode` per scenario and uses `sum(step_rewards) / max_steps` as the GRPO reward.
- `--dry-run` compatibility: 1 episode Γ— 2 steps with a mock model (existing dry-run path stays valid).
- After `trainer.save_model()` at line 610, add `if not args.dry_run and args.push_to_hub: model.push_to_hub("jdsb06/lifestack-grpo-v2"); tokenizer.push_to_hub("jdsb06/lifestack-grpo-v2")`. New `--push-to-hub` flag guards it.
- Run on HF A10G once built: `python scripts/train_trl.py --full-episode --stages 5 --push-to-hub` (~$5).

### F9 β€” RLHF Loop
- **Backend:** `/api/feedback/submit` already fully implemented (line 267). No route changes needed.
- **Frontend:** Post-episode feedback panel (rendered after every completed simulation/custom/comparison episode). Slider 0–10, domain checkboxes (6 domains Γ— improved/worsened), textarea. Submit posts `{episode_id, score, improved[], worsened[], notes, time}` to existing endpoint.
- **Training integration (`scripts/train_trl.py`):** New `--with-human-feedback` flag. When set, a new reward component `reward_human_feedback_fn` (hook already exists around line 379) loads stored feedback via `MEMORY.feedback_collection.query()` keyed by episode_id and blends `compute_human_feedback_reward()` output at weight 0.10, rebalancing existing weights proportionally.

### F10 β€” Real Data Integrations
**Backend:**
- `POST /api/data/health/upload` (multipart): accepts `.json` (Google Fit) or `.xml` (Apple Health). Parse `steps`, `heart_rate_resting`, `sleep_hours` (approximate parse; tolerate missing fields). Map to `physical_health.fitness`, `physical_health.energy`, `physical_health.sleep_quality`. Store in new module-level dict `USER_HEALTH_OVERRIDES`. Return `{parsed_metrics, events_found}`.
- `POST /api/data/calendar/upload` (multipart): `.ics` via `icalendar.Calendar.from_ical()`. Count events in next 7 days β†’ `time.free_hours_per_week` (inverse), `career.workload`. Keyword match ("gym", "run", "yoga") β†’ bump `physical_health.fitness`. Return same shape.
- `/api/simulation/start` and `/api/custom/run` consult `USER_HEALTH_OVERRIDES` when initialising `LifeMetrics()`.

**Frontend:** New "Connect My Data" subsection at the top of "Try Your Case". Two file inputs. After upload, render a chip list with `πŸ“Š From your real data β€” physical_health.fitness: 78`.

### F11 β€” BLOG.md (~700 words)
Rewrite the 13-line BLOG.md with 5 sections: Problem, What We Built, Key Results (+125%, +155%, +116% β€” already in README lines 45–71), What We Learned, What's Next. Inline-cite the 4 papers from README lines 233–241 (Starcke & Brand 2012; Roijers et al. 2013; Mullainathan & Shafir 2013; Wang et al. 2024).

### F12 β€” Four Tests (tests/)
- `test_env_reset.py`: `LifeStackEnv().reset()` β†’ budget is fresh; reset twice β†’ metrics identical. ~20 lines, pytest.
- `test_cascade.py`: `animate_cascade({"mental_wellbeing.stress_level": 30}, LifeMetrics())` returns 4 frames; frame 0 status all `unchanged`; frame 1 has at least one `primary`.
- `test_task_generator.py` (scoped per user answer): asserts `generate_conflict()` returns a valid `ConflictEvent` for each of the 6 life domains and `TEMPLATES` covers difficulties 1–5.
- `test_reward.py`: `compute_reward()` result in `[-1, 1]`; plausibility component penalises a 0-cost, 50-delta action.

### F13 β€” Episode History
**Backend:** 
- Maintain ring buffer `EPISODE_HISTORY: deque[dict] = deque(maxlen=5)` module-level in `app_flask.py`. After every episode-producing route, append `{id, conflict, steps[], final_reward, timestamp}`.
- `GET /api/history/list` returns summaries. `GET /api/history/replay/<episode_id>` returns full step log.

**Frontend:** New "History" tab, accordion list, click-to-expand per episode.

---

## Critical Files to Modify

| File | Features touching it |
|------|------|
| `app_flask.py` | F1, F2, F4, F5, F6, F7, F10, F13 (7 new routes, 3 helpers, 1 deque) |
| `intake/intake.py` | F3 (LLM fallback chain, keyword match) |
| `templates/index.html` | F1, F2, F3, F4, F5, F6, F7, F9, F10, F13 (new tabs, heatmap bar, D3 SVG, feedback panel) |
| `scripts/train_trl.py` | F8 (`run_full_episode`, `--full-episode`, `--push-to-hub`), F9 (`--with-human-feedback`) |
| `requirements.txt` | `huggingface_hub`, `icalendar` |
| `BLOG.md` | F11 (full rewrite) |
| `tests/test_env_reset.py`, `test_cascade.py`, `test_task_generator.py`, `test_reward.py` | F12 (new files) |

No other files get edited. No existing route or dataclass is modified.

---

## Verification

**Local (no GPU):**
```bash
python scripts/smoke_test.py
python scripts/eval.py --episodes 5
python -m pytest tests/ -v
python scripts/train_trl.py --full-episode --dry-run   # F8 dry-run
python app_flask.py  # open localhost:7860, click through each new tab
```

**HF Inference API check (F3):**
```python
from huggingface_hub import InferenceClient; import os
c = InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=os.getenv("HF_TOKEN"))
print(c.chat_completion([{"role":"user","content":"Reply OK"}], max_tokens=5).choices[0].message.content)
```

**HF Space (T4, $0.60/hr, leave running 25 Apr 8 AM β†’ 26 Apr 5 PM β‰ˆ $20):**
1. Space settings β†’ hardware: T4 Small.
2. Secrets: `HF_TOKEN`, `GROQ_API_KEY`.
3. Push branch β†’ confirm Flask app starts on port 7860 β†’ open every tab.

**A10G training run (F8, ~$5, one-off):** 
```bash
python scripts/train_trl.py --full-episode --stages 5 --push-to-hub
```
Afterwards: `https://huggingface.co/jdsb06/lifestack-grpo-v2` should show the checkpoint.

**End-to-end demo walkthrough to rehearse before 26 Apr 5 PM:**
1. Open Situational Portal β†’ run Friday 6PM conflict β†’ cascade SVG animates, heatmap shifts red.
2. Switch to Comparison tab β†’ same conflict β†’ watch delta bar fill positive.
3. Personality tab β†’ Alex vs Chloe β†’ radars + different rewards.
4. Try Your Case β†’ paste "I just got fired and rent is due tomorrow" β†’ plan card renders.
5. Memory tab β†’ cold vs warm ablation β†’ +116% banner.
6. Submit a feedback slider β†’ stats endpoint reflects new feedback count.