# LifeStack Hackathon Sprint — Implementation Plan
## Context
**Submission deadline:** 26 Apr 5 PM. Offline from 25 Apr 8 AM. ~30 hours of offline build time.
The LifeStack Flask demo (`app_flask.py` + `templates/index.html`) already ships 10 API endpoints, a 6-tab UI, and a working agent/memory/cascade/reward pipeline. This sprint adds **13 additive features** (demo panels, APIs, RLHF loop, multi-step training, real-data connectors, tests, blog) without breaking existing endpoints. All work is additive.
Budget: **$90 HF credits** — T4 Small for the always-on demo Space, A10G for GRPO training runs, HF Inference API for the NLP panel. Target trained checkpoint: **`jdsb06/lifestack-grpo-v2`** (user will push).
Key reusable primitives already in repo (do not rebuild):
- `core/cascade_utils.py:5 animate_cascade()` — returns list of 4 frames with `flat` + `status` dicts
- `agent/counterfactuals.py:10 generate_counterfactuals()` — returns list of alternatives
- `agent/memory.py:74 LifeStackMemory.store_trajectory()` and `:128 store_feedback(OutcomeFeedback)`
- `core/feedback.py OutcomeFeedback` + `compute_human_feedback_reward()`
- `core/life_state.py:61 LifeMetrics.flatten()` — 23 metric paths
- `agent/conflict_generator.py TEMPLATES` (13 scenarios) + `generate_conflict()`
- `core/metric_schema.py VALID_METRIC_PATHS`
Already wired in `app_flask.py`: `/api/feedback/submit` (Feature 9 backend is done — scope of F9 reduces to frontend panel + training integration); `/api/simulation/cascade` (kept intact, new `/api/cascade/frames` added alongside).
---
## Implementation Order (Offline Sprint)
1. F1 Trained-vs-Baseline comparison (impact demo)
2. F5 Domain risk heatmap (sidebar, always visible)
3. F3 "Try Your Own" NLP + HF Inference fallback
4. F2 D3 cascade visualisation
5. F4 Personality comparison with OCEAN radar
6. F6 Counterfactual explorer panel
7. F8 Multi-step GRPO training loop + `push_to_hub`
8. F9 RLHF feedback panel + training integration
9. F7 Cold-vs-warm memory ablation demo
10. F10 Health + calendar uploads
11. F11 BLOG.md (~700 words)
12. F12 Four tests
13. F13 Episode history/replay
Before starting, run smoke tests (`scripts/smoke_test.py`, `scripts/eval.py --episodes 5`, cascade/counterfactual imports). Fix before adding features.
---
## Cross-Cutting Changes
### `requirements.txt` — add
- `huggingface_hub` (for F3 InferenceClient and F8 push_to_hub)
- `icalendar` (F10 calendar upload)
### `intake/intake.py` — LLM fallback chain (F3 dependency)
Refactor `_call_llm()` (~line 44) to cascade: **HF Inference API (`HF_TOKEN`) → Groq (`GROQ_API_KEY`) → empty-string fallback** (existing behaviour). `LifeIntake.__init__` constructs both an `InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=HF_TOKEN)` when `HF_TOKEN` is present and the existing Groq `OpenAI` client when `GROQ_API_KEY` is present. `extract_conflict()` already returns an empty `ConflictEvent` when the LLM returns empty — keyword fallback below strengthens that path.
**Keyword fallback:** add `_match_template_by_keywords(text: str) -> ConflictEvent | None` that scans `TEMPLATES` for overlap with user text and returns the best match. Called inside `extract_conflict()` when both LLM clients fail.
### `app_flask.py` — shared helpers (used by F1, F4, F5, F7)
- `_run_episode(person, conflict, steps, seed, agent_fn) -> list[step_dict]`: initialises a fresh `LifeStackEnv`, applies the conflict disruption, loops `steps` iterations calling `agent_fn(metrics, budget, conflict, person)` to pick an action, runs `env.step()`, and collects `{step, action_type, target, reward, metrics, cost}`. `agent_fn` is injected so F1 can pass a random-action picker and a `LifeStackAgent.get_action`-wrapped version.
- `_random_action(metrics, budget, conflict, person) -> AgentAction`: samples uniformly from `core.action_space.EXAMPLE_ACTIONS` (line 98–196) and jitters `metric_changes` slightly so the baseline isn't deterministic. Same return shape as `AGENT.get_action()`.
- `compute_domain_health(flat_metrics: dict) -> dict[str, float]`: averages sub-metrics per domain, inverts `INVERTED_METRICS` (line 67, already defined), returns `{career, finances, relationships, physical_health, mental_wellbeing, time}` each in [0,1].
### `templates/index.html` — UI integration pattern
Every new feature adds one new tab button in the nav bar (line 37–44) and one content `
` in the main section (line 46–202). Reuse existing classes: `.glass`, `.tab-active`, `.metric-bar`, Tailwind (`.rounded-2xl`, `.p-6`, `.space-y-6`, `.grid grid-cols-2 gap-6`, `.text-slate-400`, `.bg-indigo-500/10`). Chart.js is already loaded via CDN (line 8); D3 v7 to be added.
---
## Feature-by-Feature
### F1 — Trained vs Baseline Comparison
**Backend — `app_flask.py`:**
- `POST /api/comparison/run` → body `{conflict, person, steps=5, seed=42}`.
- Resolve `conflict` via `CONFLICT_CHOICES`, `person` via `PERSONS`.
- Call `_run_episode(..., agent_fn=_random_action)` → `baseline`.
- Call `_run_episode(..., agent_fn=lambda m,b,c,p: AGENT.get_action(m,b,c,p))` with identical seed → `trained`.
- Compute `reward_delta = sum(trained_rewards) - sum(baseline_rewards)`.
- Return `{baseline: [...], trained: [...], reward_delta}`.
**Frontend:**
- New tab "Comparison". Two side-by-side `.glass` cards titled "Baseline (Random)" and "GRPO-Trained". For each step, render action-type badge + reward bar. Delta banner at the bottom (`bg-indigo-500/10`) showing `+X.XX`.
### F2 — Live Cascade Visualisation (D3)
**Backend:**
- `POST /api/cascade/frames` → body `{primary_disruption: {metric_path: delta}}`. Calls `animate_cascade(primary_disruption, LifeMetrics())` and returns `{frames}`. Keeps existing `/api/simulation/cascade` untouched.
**Frontend:**
- Add D3 v7 CDN line in ``.
- New section inside the "Situational Portal" tab (below the existing cascade timeline at line ~70): `