LifeStack Hackathon Sprint β Implementation Plan
Context
Submission deadline: 26 Apr 5 PM. Offline from 25 Apr 8 AM. ~30 hours of offline build time.
The LifeStack Flask demo (app_flask.py + templates/index.html) already ships 10 API endpoints, a 6-tab UI, and a working agent/memory/cascade/reward pipeline. This sprint adds 13 additive features (demo panels, APIs, RLHF loop, multi-step training, real-data connectors, tests, blog) without breaking existing endpoints. All work is additive.
Budget: $90 HF credits β T4 Small for the always-on demo Space, A10G for GRPO training runs, HF Inference API for the NLP panel. Target trained checkpoint: jdsb06/lifestack-grpo-v2 (user will push).
Key reusable primitives already in repo (do not rebuild):
core/cascade_utils.py:5 animate_cascade()β returns list of 4 frames withflat+statusdictsagent/counterfactuals.py:10 generate_counterfactuals()β returns list of alternativesagent/memory.py:74 LifeStackMemory.store_trajectory()and:128 store_feedback(OutcomeFeedback)core/feedback.py OutcomeFeedback+compute_human_feedback_reward()core/life_state.py:61 LifeMetrics.flatten()β 23 metric pathsagent/conflict_generator.py TEMPLATES(13 scenarios) +generate_conflict()core/metric_schema.py VALID_METRIC_PATHS
Already wired in app_flask.py: /api/feedback/submit (Feature 9 backend is done β scope of F9 reduces to frontend panel + training integration); /api/simulation/cascade (kept intact, new /api/cascade/frames added alongside).
Implementation Order (Offline Sprint)
- F1 Trained-vs-Baseline comparison (impact demo)
- F5 Domain risk heatmap (sidebar, always visible)
- F3 "Try Your Own" NLP + HF Inference fallback
- F2 D3 cascade visualisation
- F4 Personality comparison with OCEAN radar
- F6 Counterfactual explorer panel
- F8 Multi-step GRPO training loop +
push_to_hub - F9 RLHF feedback panel + training integration
- F7 Cold-vs-warm memory ablation demo
- F10 Health + calendar uploads
- F11 BLOG.md (~700 words)
- F12 Four tests
- F13 Episode history/replay
Before starting, run smoke tests (scripts/smoke_test.py, scripts/eval.py --episodes 5, cascade/counterfactual imports). Fix before adding features.
Cross-Cutting Changes
requirements.txt β add
huggingface_hub(for F3 InferenceClient and F8 push_to_hub)icalendar(F10 calendar upload)
intake/intake.py β LLM fallback chain (F3 dependency)
Refactor _call_llm() (~line 44) to cascade: HF Inference API (HF_TOKEN) β Groq (GROQ_API_KEY) β empty-string fallback (existing behaviour). LifeIntake.__init__ constructs both an InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=HF_TOKEN) when HF_TOKEN is present and the existing Groq OpenAI client when GROQ_API_KEY is present. extract_conflict() already returns an empty ConflictEvent when the LLM returns empty β keyword fallback below strengthens that path.
Keyword fallback: add _match_template_by_keywords(text: str) -> ConflictEvent | None that scans TEMPLATES for overlap with user text and returns the best match. Called inside extract_conflict() when both LLM clients fail.
app_flask.py β shared helpers (used by F1, F4, F5, F7)
_run_episode(person, conflict, steps, seed, agent_fn) -> list[step_dict]: initialises a freshLifeStackEnv, applies the conflict disruption, loopsstepsiterations callingagent_fn(metrics, budget, conflict, person)to pick an action, runsenv.step(), and collects{step, action_type, target, reward, metrics, cost}.agent_fnis injected so F1 can pass a random-action picker and aLifeStackAgent.get_action-wrapped version._random_action(metrics, budget, conflict, person) -> AgentAction: samples uniformly fromcore.action_space.EXAMPLE_ACTIONS(line 98β196) and jittersmetric_changesslightly so the baseline isn't deterministic. Same return shape asAGENT.get_action().compute_domain_health(flat_metrics: dict) -> dict[str, float]: averages sub-metrics per domain, invertsINVERTED_METRICS(line 67, already defined), returns{career, finances, relationships, physical_health, mental_wellbeing, time}each in [0,1].
templates/index.html β UI integration pattern
Every new feature adds one new tab button in the nav bar (line 37β44) and one content <div id="content-X"> in the main section (line 46β202). Reuse existing classes: .glass, .tab-active, .metric-bar, Tailwind (.rounded-2xl, .p-6, .space-y-6, .grid grid-cols-2 gap-6, .text-slate-400, .bg-indigo-500/10). Chart.js is already loaded via CDN (line 8); D3 v7 to be added.
Feature-by-Feature
F1 β Trained vs Baseline Comparison
Backend β app_flask.py:
POST /api/comparison/runβ body{conflict, person, steps=5, seed=42}.- Resolve
conflictviaCONFLICT_CHOICES,personviaPERSONS. - Call
_run_episode(..., agent_fn=_random_action)βbaseline. - Call
_run_episode(..., agent_fn=lambda m,b,c,p: AGENT.get_action(m,b,c,p))with identical seed βtrained. - Compute
reward_delta = sum(trained_rewards) - sum(baseline_rewards). - Return
{baseline: [...], trained: [...], reward_delta}.
- Resolve
Frontend:
- New tab "Comparison". Two side-by-side
.glasscards titled "Baseline (Random)" and "GRPO-Trained". For each step, render action-type badge + reward bar. Delta banner at the bottom (bg-indigo-500/10) showing+X.XX.
F2 β Live Cascade Visualisation (D3)
Backend:
POST /api/cascade/framesβ body{primary_disruption: {metric_path: delta}}. Callsanimate_cascade(primary_disruption, LifeMetrics())and returns{frames}. Keeps existing/api/simulation/cascadeuntouched.
Frontend:
- Add D3 v7 CDN line in
<head>. - New section inside the "Situational Portal" tab (below the existing cascade timeline at line ~70):
<svg id="cascade-graph" width="720" height="420">. - JS module
renderCascade(frames): creates 23 nodes fromVALID_METRIC_PATHS, clusters by domain (6 cluster centres at: career TL, finances TR, relationships ML, physical_health MR, mental_wellbeing BC, time TC), draws edges from a hardcoded copy of the 20+ edges inDependencyGraph.edges. Iterates frames with 600mssetTimeout, recolouring nodes based onframes[i].status[metric]:unchangedβ#334155,primaryβ#ef4444,firstβ#f97316,secondβ#facc15. - Called from the existing simulation-action flow after each
/api/simulation/actionresponse.
F3 β "Try Your Own Situation" NLP Panel
Backend:
/api/custom/runalready exists (line 162) and is fully wired. No route changes.intake/intake.pycross-cutting change above adds HFβGroqβkeyword fallback.
Frontend:
- Existing "Try Your Case" tab (
#tab-custom) is currently slider-heavy. Add a prominent textarea + Submit above the sliders. On submit,fetch('/api/custom/run', {situation: text})β render a card with detected domain(s), recommended action type/target, metric deltas as coloured badges (green for positive on positive-sense metrics, red otherwise, usingINVERTED_METRICSset), reward bar.
F4 β Personality Comparison
Backend:
POST /api/personality/compareβ body{conflict_id="d5_friday", person_a, person_b, steps=3}.- Look up persons from
PERSONS. Run_run_episodetwice with the trained agent on the same conflict + seed. - Return
{person_a: {name, actions, total_reward, ocean: {O,C,E,A,N}}, person_b: {...}, dominant_trait: "neuroticism"}wheredominant_trait = argmax(|ocean_a[t] - ocean_b[t]|).
- Look up persons from
Frontend:
- New tab "Personality". Two
.glasscolumns. Each has a Chart.js radar chart (already CDN-loaded) with 5 axes (OCEAN). Below the radar: action sequence + total reward. Banner highlighting the dominant trait.
F5 β Domain Risk Heatmap
Backend: compute_domain_health() helper added (cross-cutting section). Every response from /api/simulation/start, /api/simulation/action, /api/custom/run gets an extra domain_health field derived from the metrics already in the payload β no new route.
Frontend: Persistent top bar above tab nav (inserted at ~line 35): 6 cells (2Γ3 grid on small, 6Γ1 on large). Each cell shows the domain emoji from DOMAIN_EMOJI and a pill background coloured via hsl((1 - h) * 120, 70%, 45%). Re-rendered from every simulation response.
F6 β Counterfactual Explorer
Backend:
POST /api/counterfactuals/generateβ body{conflict, person, chosen_action: {...}}. Reconstructs state, callsgenerate_counterfactuals(AGENT, metrics, budget, conflict, person, chosen_action), returns{chosen: {...}, alternatives: [3 items from the list]}. (Counterfactuals already appear inside/api/simulation/actionresponse β this route is the on-demand variant Feature 6 wants.)
Frontend: "What If?" collapsible panel appended below each step output. 3 alternative cards sorted by predicted reward. Chosen action outlined in indigo, best alt in green, worst in red.
F7 β Memory Ablation (Cold vs Warm)
Backend:
POST /api/memory/ablationβ body{conflict, person, steps=5}.- Episode 1: pass
memory=None(or a freshLifeStackAgent()with empty.memory). Record actions + rewards. MEMORY.store_trajectory(conflict_title=..., route_taken=..., total_reward=..., reasoning=...)for episode 1.- Episode 2: reuse
AGENT(global β has ChromaDB viaMEMORY). QueryMEMORYfor similar trajectories (existing retrieval method) and pass the top-k summary intoget_action'sfew_shot_contextparam. - Return
{cold: {actions, reward}, warm: {actions, reward, retrieved_context}, improvement_pct}.
- Episode 1: pass
Frontend: Two-column timeline in a new "Memory" tab. Callout box with π‘ Agent recalled: β¦ when warm has retrieved context. Big percentage banner at the bottom.
F8 β Multi-Step GRPO Training
scripts/train_trl.py (currently 914 lines, single-prompt per scenario):
- Add
run_full_episode(task, person, model, tokenizer, max_steps=10) -> tuple[list[step_reward], dict]:- For each step: build prompt from current
LifeMetrics+ResourceBudget+ conflict, callmodel.generate, parse JSON action, callenv.step(), append step reward from existingcompute_task_reward(). - Return per-step rewards and a serialised trajectory.
- For each step: build prompt from current
- New CLI flag
--full-episode. When set,generate_dataset()is replaced bygenerate_episodic_dataset()which callsrun_full_episodeper scenario and usessum(step_rewards) / max_stepsas the GRPO reward. --dry-runcompatibility: 1 episode Γ 2 steps with a mock model (existing dry-run path stays valid).- After
trainer.save_model()at line 610, addif not args.dry_run and args.push_to_hub: model.push_to_hub("jdsb06/lifestack-grpo-v2"); tokenizer.push_to_hub("jdsb06/lifestack-grpo-v2"). New--push-to-hubflag guards it. - Run on HF A10G once built:
python scripts/train_trl.py --full-episode --stages 5 --push-to-hub(~$5).
F9 β RLHF Loop
- Backend:
/api/feedback/submitalready fully implemented (line 267). No route changes needed. - Frontend: Post-episode feedback panel (rendered after every completed simulation/custom/comparison episode). Slider 0β10, domain checkboxes (6 domains Γ improved/worsened), textarea. Submit posts
{episode_id, score, improved[], worsened[], notes, time}to existing endpoint. - Training integration (
scripts/train_trl.py): New--with-human-feedbackflag. When set, a new reward componentreward_human_feedback_fn(hook already exists around line 379) loads stored feedback viaMEMORY.feedback_collection.query()keyed by episode_id and blendscompute_human_feedback_reward()output at weight 0.10, rebalancing existing weights proportionally.
F10 β Real Data Integrations
Backend:
POST /api/data/health/upload(multipart): accepts.json(Google Fit) or.xml(Apple Health). Parsesteps,heart_rate_resting,sleep_hours(approximate parse; tolerate missing fields). Map tophysical_health.fitness,physical_health.energy,physical_health.sleep_quality. Store in new module-level dictUSER_HEALTH_OVERRIDES. Return{parsed_metrics, events_found}.POST /api/data/calendar/upload(multipart):.icsviaicalendar.Calendar.from_ical(). Count events in next 7 days βtime.free_hours_per_week(inverse),career.workload. Keyword match ("gym", "run", "yoga") β bumpphysical_health.fitness. Return same shape./api/simulation/startand/api/custom/runconsultUSER_HEALTH_OVERRIDESwhen initialisingLifeMetrics().
Frontend: New "Connect My Data" subsection at the top of "Try Your Case". Two file inputs. After upload, render a chip list with π From your real data β physical_health.fitness: 78.
F11 β BLOG.md (~700 words)
Rewrite the 13-line BLOG.md with 5 sections: Problem, What We Built, Key Results (+125%, +155%, +116% β already in README lines 45β71), What We Learned, What's Next. Inline-cite the 4 papers from README lines 233β241 (Starcke & Brand 2012; Roijers et al. 2013; Mullainathan & Shafir 2013; Wang et al. 2024).
F12 β Four Tests (tests/)
test_env_reset.py:LifeStackEnv().reset()β budget is fresh; reset twice β metrics identical. ~20 lines, pytest.test_cascade.py:animate_cascade({"mental_wellbeing.stress_level": 30}, LifeMetrics())returns 4 frames; frame 0 status allunchanged; frame 1 has at least oneprimary.test_task_generator.py(scoped per user answer): assertsgenerate_conflict()returns a validConflictEventfor each of the 6 life domains andTEMPLATEScovers difficulties 1β5.test_reward.py:compute_reward()result in[-1, 1]; plausibility component penalises a 0-cost, 50-delta action.
F13 β Episode History
Backend:
- Maintain ring buffer
EPISODE_HISTORY: deque[dict] = deque(maxlen=5)module-level inapp_flask.py. After every episode-producing route, append{id, conflict, steps[], final_reward, timestamp}. GET /api/history/listreturns summaries.GET /api/history/replay/<episode_id>returns full step log.
Frontend: New "History" tab, accordion list, click-to-expand per episode.
Critical Files to Modify
| File | Features touching it |
|---|---|
app_flask.py |
F1, F2, F4, F5, F6, F7, F10, F13 (7 new routes, 3 helpers, 1 deque) |
intake/intake.py |
F3 (LLM fallback chain, keyword match) |
templates/index.html |
F1, F2, F3, F4, F5, F6, F7, F9, F10, F13 (new tabs, heatmap bar, D3 SVG, feedback panel) |
scripts/train_trl.py |
F8 (run_full_episode, --full-episode, --push-to-hub), F9 (--with-human-feedback) |
requirements.txt |
huggingface_hub, icalendar |
BLOG.md |
F11 (full rewrite) |
tests/test_env_reset.py, test_cascade.py, test_task_generator.py, test_reward.py |
F12 (new files) |
No other files get edited. No existing route or dataclass is modified.
Verification
Local (no GPU):
python scripts/smoke_test.py
python scripts/eval.py --episodes 5
python -m pytest tests/ -v
python scripts/train_trl.py --full-episode --dry-run # F8 dry-run
python app_flask.py # open localhost:7860, click through each new tab
HF Inference API check (F3):
from huggingface_hub import InferenceClient; import os
c = InferenceClient(model="Qwen/Qwen2.5-1.5B-Instruct", token=os.getenv("HF_TOKEN"))
print(c.chat_completion([{"role":"user","content":"Reply OK"}], max_tokens=5).choices[0].message.content)
HF Space (T4, $0.60/hr, leave running 25 Apr 8 AM β 26 Apr 5 PM β $20):
- Space settings β hardware: T4 Small.
- Secrets:
HF_TOKEN,GROQ_API_KEY. - Push branch β confirm Flask app starts on port 7860 β open every tab.
A10G training run (F8, ~$5, one-off):
python scripts/train_trl.py --full-episode --stages 5 --push-to-hub
Afterwards: https://huggingface.co/jdsb06/lifestack-grpo-v2 should show the checkpoint.
End-to-end demo walkthrough to rehearse before 26 Apr 5 PM:
- Open Situational Portal β run Friday 6PM conflict β cascade SVG animates, heatmap shifts red.
- Switch to Comparison tab β same conflict β watch delta bar fill positive.
- Personality tab β Alex vs Chloe β radars + different rewards.
- Try Your Case β paste "I just got fired and rent is due tomorrow" β plan card renders.
- Memory tab β cold vs warm ablation β +116% banner.
- Submit a feedback slider β stats endpoint reflects new feedback count.