Spaces:

Israelbliz
/

User-Modeling-Agent

Sleeping

App Files Files Community

Israelbliz commited on 17 days ago

Commit

07c68ca

verified ·

1 Parent(s): 1539e17

Upload app and README

Browse files

Files changed (2) hide show

README.md +127 -0
app.py +538 -0

README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+---
+title: User Modeling Agent
+emoji: 📝
+colorFrom: green
+colorTo: red
+sdk: docker
+app_port: 7860
+pinned: false
+---
+# User Modeling Agent
+**DSN × BCT LLM Agent Challenge 2026 — Task A.**
+An agent that reads a person into a behavioural *persona*, then writes the
+star rating and the review that person would leave for an unseen product —
+and critiques and revises its own draft before returning it.
+> Live demo: *(your HuggingFace Space URL)*
+> Code: *(your GitHub repo URL)*
+---
+## What it does
+Given a **user persona** and **product details**, the agent produces:
+- a **star rating** (1–5) the user would likely give, and
+- a **written review** in that user's voice — tone, length, and quirks matched.
+It is not a generic review generator. Every output is conditioned on a
+specific reader, and the rating is reasoned, not guessed.
+## The agentic workflow
+The system is an agent, not a single prompt. It runs a five-step loop:
+1. **Build the persona.** A `PersonaEngine` extracts a structured persona —
+   quantitative signals (average rating, rating spread, review length,
+   domains, rating distribution) and qualitative voice (tone, preferred
+   themes, common complaints, a one-line voice descriptor) distilled by an
+   LLM from sample reviews. In the deployed app the persona can also be
+   *composed directly* from typed input — the brief's persona-as-input
+   contract.
+2. **Select grounding history.** For a real user, the agent picks the few
+   past reviews most similar to the target item, so it writes from concrete
+   evidence of how this person actually phrases things.
+3. **Generate the rating and review.** A single LLM call, with the rating
+   reasoned in two explicit steps — first the persona *prior* (what this
+   user usually gives), then the *item evidence* (what the title and
+   description signal). The final rating is the prior adjusted by the
+   evidence, so a generous reviewer still rates a poor item low and a
+   critical reviewer still rates a strong item high.
+4. **Self-reflection — critique and revise.** A critic LLM audits the draft
+   for rating–text consistency, voice match, and on-topic fit. If it
+   objects, the agent rewrites with that feedback and re-checks — up to two
+   cycles. This act → critique → revise loop is what makes it an agent.
+5. **Post-process.** The rating is clamped to range. An optional Nigerian
+   Pidgin rendering layer can restyle the review while preserving meaning,
+   sentiment, and rating.
+The agent degrades gracefully: if an LLM call fails, it falls back to a
+deterministic persona rather than crashing.
+## How it maps to the Task A rubric
+- **Review Text Quality** — reviews are grounded in the user's real past
+  reviews and self-critiqued for voice match.
+- **Rating Accuracy** — the two-step prior-plus-evidence rating logic
+  corrects the common failure of predicting from the user average alone.
+- **Behavioural Fidelity** — persona-conditioned generation; the persona
+  portrait is visible in the app for inspection.
+- **Nigerian contextualization (bonus)** — a toggleable Nigerian Pidgin
+  rendering layer; off by default so scored output stays standard English.
+## Running locally
+```bash
+pip install -r requirements.txt
+# set your key in a .env file:  LLM_PROVIDER=gemini  and  GEMINI_API_KEY=...
+streamlit run app.py
+```
+The processed data (`data/processed/*.parquet`) must be present.
+A FastAPI service is also available:
+```bash
+uvicorn task_a_user_modeling.main:app --reload
+```
+## Project layout
+```
+core/                 shared engine — config, llm, persona, reflection, nigerian
+task_a_user_modeling/ the Impersonation agent + FastAPI service
+scripts/              test harness (test_task_a.py)
+data/processed/       Amazon Reviews 2023 — Books · Movies & TV · Kindle Store
+app.py                Streamlit demo
+```
+## Configuration
+Set in a `.env` file (never commit it):
+- `LLM_PROVIDER` — `gemini` or `openai`
+- `GEMINI_API_KEY` / `OPENAI_API_KEY`
+On a HuggingFace Space, set these as **Secrets** in Space settings.
+## Notes and honest limitations
+- The self-reflection critic checks internal consistency; it cannot catch a
+  rating that is wrong but self-consistent.
+- Rating prediction on hard cases (a critical user who loved something) is
+  improved by the two-step logic but can still be ~0.5–1.0★ off.
+- LLM output is non-deterministic; single-run results vary, so evaluation
+  averages across many users.
+## Credits
+Built for the DSN × BCT LLM Agent Challenge 2026.
+Author: *(your name)*. Dataset: Amazon Reviews 2023.

app.py ADDED Viewed

	@@ -0,0 +1,538 @@

+"""User Modeling Agent — the demo.
+DSN × BCT LLM Agent Challenge · Task A.
+Takes a user persona and product details as input, and generates a star
+rating and a written review as that user would write it — then critiques
+and revises its own draft (self-reflection). Optionally renders the review
+in Nigerian English.
+Two ways to use it:
+  1. Compose a persona  — type a persona + product (the brief's input contract)
+  2. Dataset reader     — pick a real user, compare against ground truth
+Run:
+    streamlit run app.py
+"""
+from __future__ import annotations
+import html
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parent
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+import pandas as pd
+import streamlit as st
+from core.config import settings
+from core.persona import PersonaEngine, UserPersona
+from task_a_user_modeling.agent import ImpersonationAgent, ItemInput
+st.set_page_config(page_title="User Modeling Agent", page_icon="✶",
+                   layout="wide", initial_sidebar_state="expanded")
+esc = html.escape
+# ══════════════════════════════════════════════════════════════════════════════
+# Design system
+# ══════════════════════════════════════════════════════════════════════════════
+CSS = """
+<style>
+@import url('https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,400;9..144,500;9..144,600;9..144,900&family=Newsreader:ital,opsz,wght@0,6..72,400;0,6..72,500;0,6..72,600;1,6..72,400&family=Spline+Sans+Mono:wght@400;500;600&display=swap');
+:root {
+  --paper:#f3ecdb; --paper-2:#fffdf6; --paper-3:#ece2cb;
+  --pine:#1d3a2b; --pine-2:#2c5440; --pine-ink:#14241b;
+  --clay:#b0472b; --ochre:#c98a3c; --gold:#d8a64a;
+  --ink:#221e16; --muted:#6f6651; --hair:#d4c8aa;
+}
+.stApp { background:var(--paper); color:var(--ink); }
+.stApp::before {
+  content:""; position:fixed; inset:0; pointer-events:none; z-index:0;
+  background:
+    radial-gradient(900px 600px at 12% -5%, rgba(45,84,64,.10), transparent 60%),
+    radial-gradient(800px 600px at 95% 8%, rgba(176,71,43,.08), transparent 55%);
+}
+[data-testid="stMainBlockContainer"] { max-width:1140px; padding-top:2rem; padding-bottom:4rem; }
+h1,h2,h3,h4 { font-family:'Fraunces',Georgia,serif !important; color:var(--pine) !important;
+  letter-spacing:-0.015em; font-weight:600 !important; }
+html,body,p,div,span,label,li,.stMarkdown { font-family:'Newsreader',Georgia,serif; }
+.stCaption,[data-testid="stCaptionContainer"] { font-family:'Spline Sans Mono',monospace !important; }
+.masthead { position:relative; z-index:1; margin-bottom:0.3rem; }
+.mast-rule { height:2px; background:var(--pine); margin-bottom:0.5rem; }
+.mast-kicker { font-family:'Spline Sans Mono',monospace; font-size:0.70rem;
+  letter-spacing:0.30em; text-transform:uppercase; color:var(--clay); font-weight:600; }
+.mast-title { font-family:'Fraunces',serif; font-weight:900;
+  font-size:clamp(2.3rem,5.5vw,3.7rem); line-height:1.0; color:var(--pine);
+  margin:0.16rem 0 0.1rem; letter-spacing:-0.03em; }
+.mast-title .em { color:var(--clay); font-style:italic; font-weight:500; }
+.mast-stand { font-family:'Newsreader',serif; font-size:1.08rem; color:#45402f;
+  max-width:66ch; line-height:1.45; }
+.mast-stand em { color:var(--clay); font-style:italic; }
+.mast-rule-bot { height:1px; background:var(--hair); margin:0.85rem 0 0.2rem; }
+.sec-label { font-family:'Spline Sans Mono',monospace; font-size:0.70rem;
+  letter-spacing:0.2em; text-transform:uppercase; color:var(--clay);
+  font-weight:600; margin:0.3rem 0 0.15rem; }
+.card { background:var(--paper-2); border:1px solid var(--hair); border-radius:3px;
+  padding:1.1rem 1.3rem; margin:0.5rem 0 0.85rem; position:relative; z-index:1; }
+.card-kicker { font-family:'Spline Sans Mono',monospace; font-size:0.64rem;
+  letter-spacing:0.2em; text-transform:uppercase; color:var(--clay);
+  font-weight:600; margin-bottom:0.5rem; }
+.persona-quote { font-family:'Fraunces',serif; font-weight:500; font-style:italic;
+  font-size:1.26rem; line-height:1.34; color:var(--pine); margin:0.1rem 0 0.8rem;
+  padding-left:0.8rem; border-left:3px solid var(--ochre); }
+.pstats { display:flex; gap:1.7rem; flex-wrap:wrap; align-items:flex-end; }
+.pstat .num { font-family:'Fraunces',serif; font-weight:900; font-size:1.5rem;
+  color:var(--pine); line-height:1; }
+.pstat .lab { font-family:'Spline Sans Mono',monospace; font-size:0.60rem;
+  letter-spacing:0.13em; text-transform:uppercase; color:var(--muted); margin-top:0.2rem; }
+.chips { margin-top:0.6rem; }
+.chip-lab { font-family:'Spline Sans Mono',monospace; font-size:0.60rem;
+  letter-spacing:0.12em; text-transform:uppercase; color:var(--muted); margin-right:0.4rem; }
+.chip { display:inline-block; margin:0.15rem 0.25rem 0.15rem 0; padding:0.15rem 0.6rem;
+  border-radius:999px; font-family:'Spline Sans Mono',monospace; font-size:0.72rem;
+  background:var(--paper-3); color:var(--pine-2); border:1px solid var(--hair); }
+.chip.warn { background:#f0ddd2; color:var(--clay); border-color:#e3c4b4; }
+.panel { background:var(--pine-ink); border-radius:3px; padding:1.35rem 1.55rem;
+  margin:0.5rem 0 0.85rem; position:relative; z-index:1;
+  box-shadow:0 14px 34px -22px rgba(20,36,27,.7); }
+.panel .card-kicker { color:var(--gold); }
+.rating-row { display:flex; align-items:center; gap:0.8rem; margin:0.25rem 0 0.65rem; }
+.rating-chip { font-family:'Fraunces',serif; font-weight:900; font-size:1.6rem;
+  background:var(--clay); color:#fff7ec; padding:0.05rem 0.65rem; border-radius:3px; }
+.stars { font-size:1.15rem; letter-spacing:0.1em; color:var(--gold); }
+.review-body { font-family:'Newsreader',serif; font-size:1.1rem; line-height:1.7;
+  color:#f0e9d6; white-space:pre-wrap; }
+.naija-badge { display:inline-block; margin-left:0.45rem; font-family:'Spline Sans Mono',monospace;
+  font-size:0.60rem; letter-spacing:0.12em; font-weight:600; background:#e9f0e2;
+  color:var(--pine); padding:0.12rem 0.5rem; border-radius:999px; border:1px solid #cdd9bf; }
+.stepper { display:flex; gap:0; margin:0.3rem 0 0.2rem; flex-wrap:wrap; }
+.step { flex:1; min-width:125px; padding:0.5rem 0.65rem; position:relative; }
+.step .dot { width:11px; height:11px; border-radius:50%; background:var(--pine); margin-bottom:0.35rem; }
+.step.flag .dot { background:var(--clay); }
+.step.pass .dot { background:var(--pine-2); }
+.step .st-name { font-family:'Fraunces',serif; font-weight:600; font-size:0.93rem;
+  color:var(--pine); line-height:1.1; }
+.step .st-sub { font-family:'Spline Sans Mono',monospace; font-size:0.63rem;
+  color:var(--muted); margin-top:0.18rem; }
+.step:not(:last-child)::after { content:""; position:absolute; top:0.87rem; right:-2px;
+  width:100%; height:1px;
+  background:repeating-linear-gradient(90deg,var(--hair) 0 6px,transparent 6px 12px); }
+.critique-note { font-family:'Newsreader',serif; font-style:italic; font-size:0.93rem;
+  color:#5a4030; line-height:1.45; background:#f0ddd2; border-left:3px solid var(--clay);
+  padding:0.5rem 0.75rem; border-radius:2px; margin-top:0.45rem; }
+.cmp { background:var(--paper-2); border:1px solid var(--hair); border-radius:3px;
+  padding:0.9rem 1.05rem; height:100%; }
+.cmp.truth { border-top:3px solid var(--pine-2); }
+.cmp.agent { border-top:3px solid var(--clay); }
+.cmp-head { font-family:'Spline Sans Mono',monospace; font-size:0.62rem;
+  letter-spacing:0.15em; text-transform:uppercase; color:var(--muted); margin-bottom:0.35rem; }
+.cmp-body { font-family:'Newsreader',serif; font-size:0.97rem; line-height:1.5;
+  color:#4a4434; white-space:pre-wrap; }
+.delta { font-family:'Spline Sans Mono',monospace; font-size:0.70rem; font-weight:600;
+  padding:0.16rem 0.55rem; border-radius:999px; }
+.delta.good { background:#e3ecd9; color:var(--pine); }
+.delta.mid { background:#f3e6c8; color:#8a6420; }
+.delta.far { background:#f0d8cc; color:var(--clay); }
+.empty { border:1px dashed var(--hair); border-radius:3px; padding:1.5rem; text-align:center;
+  font-family:'Newsreader',serif; font-style:italic; color:var(--muted); font-size:1rem;
+  background:rgba(255,253,246,.5); }
+@keyframes rise { from{opacity:0;transform:translateY(13px);} to{opacity:1;transform:translateY(0);} }
+.reveal { animation:rise 0.55s cubic-bezier(.2,.7,.2,1) both; }
+.d1{animation-delay:.04s;} .d2{animation-delay:.13s;} .d3{animation-delay:.22s;}
+.stButton > button { background:var(--pine); color:var(--paper); border:none; border-radius:3px;
+  font-family:'Spline Sans Mono',monospace; font-weight:600; font-size:0.82rem;
+  letter-spacing:0.05em; padding:0.55rem 1rem; }
+.stButton > button:hover { background:var(--clay); color:#fff7ec; }
+[data-testid="stSidebar"] { background:var(--pine-ink); border-right:1px solid #2c4133; }
+[data-testid="stSidebar"] * { color:#e7e0cd; }
+[data-testid="stSidebar"] h1,[data-testid="stSidebar"] h2,[data-testid="stSidebar"] h3 { color:#f3ecdb !important; }
+[data-baseweb="tab-list"] { gap:0.3rem; border-bottom:2px solid var(--pine); }
+[data-baseweb="tab"] { font-family:'Fraunces',serif !important; font-weight:600;
+  font-size:1rem; color:var(--muted); }
+[data-baseweb="tab"][aria-selected="true"] { color:var(--pine) !important; }
+[data-baseweb="tab-highlight"] { background:var(--clay) !important; height:3px; }
+.foot { margin-top:2.2rem; padding-top:0.85rem; border-top:1px solid var(--hair);
+  font-family:'Spline Sans Mono',monospace; font-size:0.68rem; color:var(--muted); line-height:1.6; }
+</style>
+"""
+st.markdown(CSS, unsafe_allow_html=True)
+# ══════════════════════════════════════════════════════════════════════════════
+# HTML builders
+# ═══════════════���══════════════════════════════════════════════════════════════
+def stars(r: float) -> str:
+    f = int(round(r))
+    return "★" * f + "☆" * (5 - f)
+def persona_card(p: UserPersona) -> str:
+    themes = "".join(f'<span class="chip">{esc(t)}</span>'
+                     for t in p.preferred_themes) or '<span class="chip">—</span>'
+    comps = "".join(f'<span class="chip warn">{esc(t)}</span>'
+                    for t in p.common_complaints) or '<span class="chip warn">—</span>'
+    nrev = (f'{p.n_reviews}' if p.n_reviews else 'composed')
+    return f"""
+    <div class="card reveal d1">
+      <div class="card-kicker">The Reader · persona</div>
+      <div class="persona-quote">“{esc(p.voice_one_liner or 'No voice captured.')}”</div>
+      <div class="pstats">
+        <div class="pstat"><div class="num">{nrev}</div><div class="lab">history</div></div>
+        <div class="pstat"><div class="num">{p.avg_rating:.1f}★</div><div class="lab">avg rating</div></div>
+        <div class="pstat"><div class="num">{esc(p.tone or '—')}</div><div class="lab">tone</div></div>
+      </div>
+      <div class="chips"><span class="chip-lab">drawn to</span>{themes}</div>
+      <div class="chips"><span class="chip-lab">put off by</span>{comps}</div>
+    </div>"""
+def reflection_stepper(iters: int, refined: bool, notes: list[str] | None) -> str:
+    steps = ['<div class="step pass"><div class="dot"></div>'
+             '<div class="st-name">First draft</div>'
+             '<div class="st-sub">generated in-voice</div></div>']
+    if refined:
+        steps += ['<div class="step flag"><div class="dot"></div>'
+                  '<div class="st-name">Self-critique</div>'
+                  '<div class="st-sub">found issues</div></div>',
+                  '<div class="step pass"><div class="dot"></div>'
+                  '<div class="st-name">Revised draft</div>'
+                  '<div class="st-sub">rewritten with feedback</div></div>',
+                  '<div class="step pass"><div class="dot"></div>'
+                  '<div class="st-name">Re-checked</div>'
+                  '<div class="st-sub">critique cleared</div></div>']
+    else:
+        steps += ['<div class="step pass"><div class="dot"></div>'
+                  '<div class="st-name">Self-critique</div>'
+                  '<div class="st-sub">passed first pass</div></div>',
+                  '<div class="step pass"><div class="dot"></div>'
+                  '<div class="st-name">Accepted</div>'
+                  '<div class="st-sub">no revision needed</div></div>']
+    note = ""
+    if notes:
+        real = [n for n in notes if n and n.strip().lower() != "passed"]
+        if real:
+            note = f'<div class="critique-note">The critic flagged: {esc(real[0])}</div>'
+    return f"""
+    <div class="card reveal d3">
+      <div class="card-kicker">Self-reflection · {iters} critique cycle(s)</div>
+      <div class="stepper">{''.join(steps)}</div>
+      {note}
+    </div>"""
+# ══════════════════════════════════════════════════════════════════════════════
+# Cached resources
+# ══════════════════════════════════════════════════════════════════════════════
+@st.cache_data(show_spinner=False)
+def load_data():
+    rev = pd.read_parquet(settings.processed_dir / "reviews.parquet")
+    items = pd.read_parquet(settings.processed_dir / "items.parquet")
+    return rev, items
+@st.cache_resource(show_spinner=False)
+def get_engines():
+    return PersonaEngine(), ImpersonationAgent()
+def composed_persona(desc: str, themes: list[str], dislikes: list[str],
+                     tone: str, avg_rating: float) -> UserPersona:
+    """Build a UserPersona from typed input — the brief's persona-as-input contract."""
+    # rating distribution skewed around the stated average
+    lo, hi = int(avg_rating), min(5, int(avg_rating) + 1)
+    dist = {lo: 0.55, hi: 0.35} if lo != hi else {lo: 0.9}
+    dist.setdefault(3, 0.1)
+    return UserPersona(
+        user_id="composed", n_reviews=0, avg_rating=avg_rating,
+        std_rating=0.6, avg_review_length=90.0, std_review_length=30.0,
+        verified_rate=1.0, domains=[], n_domains=0,
+        rating_distribution=dist, top_terms=[],
+        tone=tone, preferred_themes=themes, common_complaints=dislikes,
+        voice_one_liner=desc, history_samples=[],
+    )
+# ══════════════════════════════════════════════════════════════════════════════
+# Masthead
+# ═════════��════════════════════════════════════════════════════════════════════
+st.markdown("""
+<div class="masthead">
+  <div class="mast-rule"></div>
+  <div class="mast-kicker">DSN × BCT LLM Agent Challenge · Task A</div>
+  <div class="mast-title">User Modeling <span class="em">Agent</span></div>
+  <div class="mast-stand">
+    Give it a <em>user persona</em> and a <em>product</em>. It writes the star
+    rating and the review that user would write — weighing what they usually do
+    against what this specific item signals — then <em>critiques and revises</em>
+    its own draft before showing it.
+  </div>
+  <div class="mast-rule-bot"></div>
+</div>
+""", unsafe_allow_html=True)
+try:
+    reviews, items = load_data()
+except Exception as e:
+    st.error(f"Could not load data — ensure data/processed/*.parquet exist.\n\n{e}")
+    st.stop()
+train = reviews[reviews["split"] == "train"]
+test = reviews[reviews["split"] == "test"]
+persona_engine, agent = get_engines()
+with st.sidebar:
+    st.markdown("## ✶  Controls")
+    naija = st.toggle("🇳🇬  Naija mode", value=False,
+                      help="Render the review in Nigerian English. Meaning, "
+                           "sentiment and rating are preserved — only voice shifts.")
+    st.caption("Naija mode ON — review in Nigerian English."
+               if naija else "Standard English output.")
+    st.divider()
+    st.markdown("### How it works")
+    st.caption("The agent builds a persona, drafts a review in that voice, then "
+               "runs a self-reflection loop — a critic LLM checks rating-text "
+               "consistency, voice match and on-topic fit, and the agent revises "
+               "if the critic objects.")
+    st.divider()
+    st.caption(f"Built by Israel")
+st.session_state.setdefault("result", None)
+st.session_state.setdefault("ctx", None)
+# ══════════════════════════════════════════════════════════════════════════════
+# Tabs — Compose (primary) · Dataset reader (secondary)
+# ══════════════════════════════════════════════════════════════════════════════
+tab_compose, tab_dataset = st.tabs(["✎  Compose a persona",
+                                    "⊞  Dataset reader"])
+# ── COMPOSE ───────────────────────────────────────────────────────────────────
+with tab_compose:
+    st.markdown('<div class="sec-label">Input · persona and product</div>',
+                unsafe_allow_html=True)
+    st.markdown("Describe a reader and a product. The agent will write the "
+                "review that reader would leave.")
+    cL, cR = st.columns(2)
+    with cL:
+        st.markdown("**The reader**")
+        p_desc = st.text_area(
+            "Describe the reader's reviewing voice",
+            value="A thoughtful reader who loves character-driven stories and "
+                  "rich world-building, but is impatient with slow pacing.",
+            height=90, key="p_desc")
+        p_themes = st.text_input("Drawn to (comma-separated)",
+                                 value="character development, immersive worlds, "
+                                       "original plots", key="p_themes")
+        p_dislikes = st.text_input("Put off by (comma-separated)",
+                                   value="slow pacing, thin characters", key="p_dis")
+        c1, c2 = st.columns(2)
+        with c1:
+            p_tone = st.selectbox("Tone", ["enthusiastic", "analytical", "casual",
+                                           "critical", "earnest", "terse"], key="p_tone")
+        with c2:
+            p_rating = st.slider("Typical rating", 1.0, 5.0, 4.0, 0.5, key="p_rate")
+    with cR:
+        st.markdown("**The product**")
+        i_title = st.text_input("Title",
+                                value="The Midnight Library", key="i_title")
+        i_domain = st.selectbox("Domain", ["Books", "Movies_and_TV", "Kindle_Store"],
+                                key="i_domain")
+        i_desc = st.text_area(
+            "Description / synopsis",
+            value="A novel about a library between life and death, where each "
+                  "book lets a woman try a different version of her life.",
+            height=110, key="i_desc")
+    go = st.button("Generate review  ✶", key="go_compose", use_container_width=True)
+    if go:
+        try:
+            with st.status("The agent is working…", expanded=True) as status:
+                themes = [t.strip() for t in p_themes.split(",") if t.strip()]
+                dislikes = [t.strip() for t in p_dislikes.split(",") if t.strip()]
+                st.write("Assembling the persona…")
+                persona = composed_persona(p_desc, themes, dislikes, p_tone, p_rating)
+                item = ItemInput(parent_asin="composed", title=i_title,
+                                 description=i_desc, categories="",
+                                 domain=i_domain)
+                st.write("Drafting in the reader's voice, then self-critiquing…")
+                result = agent.run(persona, item, naija_mode=naija)
+                st.write("Self-reflection complete")
+                status.update(label="Review generated", state="complete")
+            st.session_state.result = result
+            st.session_state.ctx = {"persona": persona, "item": item, "truth": None}
+        except Exception as e:
+            st.session_state.result = None
+            st.markdown(f'<div class="card" style="border-left:3px solid var(--clay)">'
+                        f'<div class="card-kicker">Generation interrupted</div>'
+                        f'The model call did not complete — it may be rate-limited. '
+                        f'Try again shortly.<br><span style="font-family:Spline Sans Mono,'
+                        f'monospace;font-size:0.72rem;color:#6f6651">'
+                        f'{esc(type(e).__name__)}</span></div>', unsafe_allow_html=True)
+# ── DATASET READER ────────────────────────────────────────────────────────────
+with tab_dataset:
+    st.markdown('<div class="sec-label">Input · a real reader from the data</div>',
+                unsafe_allow_html=True)
+    st.markdown("Pick a reader. The agent builds their persona from real history "
+                "and writes a review of a held-out item — compared to what they "
+                "actually wrote.")
+    elig = train.groupby("user_id").size().reset_index(name="n")
+    elig = elig[(elig["n"] >= 5) & (elig["user_id"].isin(set(test["user_id"])))]
+    users = elig.sample(min(40, len(elig)), random_state=7)["user_id"].tolist()
+    cc1, cc2 = st.columns([3, 1])
+    with cc1:
+        user = st.selectbox("Reader", users, key="sel_user",
+                            label_visibility="collapsed")
+    with cc2:
+        go_ds = st.button("Generate  ✶", key="go_ds", use_container_width=True)
+    if go_ds and user:
+        try:
+            with st.status("The agent is working…", expanded=True) as status:
+                ut = test[test["user_id"] == user]
+                if ut.empty:
+                    status.update(label="No held-out item for this reader",
+                                  state="error")
+                    st.stop()
+                tr = ut.iloc[0]
+                tid = tr["parent_asin"]
+                meta = items[items["parent_asin"] == tid]
+                if meta.empty:
+                    item = ItemInput(parent_asin=tid, title=str(tr.get("title", "")),
+                                     description="", categories="", domain=tr["domain"])
+                else:
+                    m = meta.iloc[0]
+                    item = ItemInput(parent_asin=tid, title=str(m.get("title", "")),
+                                     description=str(m.get("description", ""))[:1500],
+                                     categories=str(m.get("categories", "")),
+                                     domain=tr["domain"],
+                                     average_rating=(float(m["average_rating"])
+                                                     if pd.notna(m.get("average_rating"))
+                                                     else None))
+                st.write("Reading the reader's history…")
+                persona = persona_engine.from_dataframe(user, train)
+                persona = persona_engine.enrich(persona)
+                st.write(f"Persona built from {persona.n_reviews} reviews")
+                st.write("Drafting in their voice, then self-critiquing…")
+                result = agent.run(persona, item, naija_mode=naija)
+                st.write("Self-reflection complete")
+                status.update(label="Review generated", state="complete")
+            st.session_state.result = result
+            st.session_state.ctx = {"persona": persona, "item": item,
+                                    "truth": {"rating": float(tr["rating"]),
+                                              "text": str(tr["text"])}}
+        except Exception as e:
+            st.session_state.result = None
+            st.markdown(f'<div class="card" style="border-left:3px solid var(--clay)">'
+                        f'<div class="card-kicker">Generation interrupted</div>'
+                        f'The model call did not complete — it may be rate-limited. '
+                        f'Try again shortly.<br><span style="font-family:Spline Sans Mono,'
+                        f'monospace;font-size:0.72rem;color:#6f6651">'
+                        f'{esc(type(e).__name__)}</span></div>', unsafe_allow_html=True)
+# ══════════════════════════════════════════════════════════════════════════════
+# Result — shown below both tabs
+# ══════════════════════════════════════════════════════════════════════════════
+res = st.session_state.result
+ctx = st.session_state.ctx
+st.markdown("---")
+if res and ctx:
+    st.markdown(persona_card(ctx["persona"]), unsafe_allow_html=True)
+    it = ctx["item"]
+    st.markdown(f"""
+    <div class="card reveal d2">
+      <div class="card-kicker">The Item</div>
+      <span style="font-family:Spline Sans Mono,monospace;font-size:0.6rem;
+        letter-spacing:0.13em;text-transform:uppercase;color:var(--pine-2)">
+        {esc(it.domain)}</span>
+      <div style="font-family:Fraunces,serif;font-weight:600;font-size:1.14rem;
+        color:var(--ink);margin-top:0.1rem">{esc(it.title)}</div>
+    </div>""", unsafe_allow_html=True)
+    badge = '<span class="naija-badge">NAIJA VOICE</span>' if res.naija_mode else ""
+    st.markdown(f"""
+    <div class="panel reveal d3">
+      <div class="card-kicker">The Generated Review · written as the reader</div>
+      <div class="rating-row">
+        <span class="rating-chip">{res.rating:.1f}</span>
+        <span class="stars">{stars(res.rating)}</span>{badge}
+      </div>
+      <div class="review-body">{esc(res.review)}</div>
+    </div>""", unsafe_allow_html=True)
+    st.markdown(reflection_stepper(res.reflection_iterations,
+                                   res.reflection_refined,
+                                   res.reflection_notes), unsafe_allow_html=True)
+    st.markdown('<div class="sec-label">Why this rating</div>', unsafe_allow_html=True)
+    truth = ctx.get("truth")
+    if truth:
+        col1, col2 = st.columns(2)
+        with col1:
+            st.markdown(f"""
+            <div class="cmp agent reveal d1">
+              <div class="cmp-head">The agent rated it {res.rating:.1f}★</div>
+              <div class="cmp-body">{esc(res.reasoning)}</div>
+            </div>""", unsafe_allow_html=True)
+        with col2:
+            d = abs(res.rating - truth["rating"])
+            dc = "good" if d <= 0.5 else ("mid" if d <= 1.0 else "far")
+            t = truth["text"].replace("<br />", "\n").replace("<br>", "\n")
+            t = t[:520] + ("…" if len(t) > 520 else "")
+            st.markdown(f"""
+            <div class="cmp truth reveal d2">
+              <div class="cmp-head">The reader actually wrote &nbsp;
+                <span class="delta {dc}">Δ {d:.1f}★</span></div>
+              <div style="margin:0.15rem 0 0.35rem">
+                <span class="stars" style="color:var(--pine-2)">{stars(truth['rating'])}</span>
+                <span style="font-family:Spline Sans Mono,monospace;font-size:0.74rem;
+                color:#6f6651"> {truth['rating']:.1f}★</span></div>
+              <div class="cmp-body">{esc(t)}</div>
+            </div>""", unsafe_allow_html=True)
+    else:
+        st.markdown(f"""
+        <div class="cmp agent reveal d1">
+          <div class="cmp-head">The agent rated it {res.rating:.1f}★</div>
+          <div class="cmp-body">{esc(res.reasoning)}</div>
+        </div>""", unsafe_allow_html=True)
+    st.caption(f"grounded on {res.used_history_count} similar past reviews")
+else:
+    st.markdown('<div class="empty">Compose a persona and a product, or pick a '
+                'dataset reader — then press <b>Generate</b>. The agent writes '
+                'the review in that reader\'s voice and shows its reasoning.</div>',
+                unsafe_allow_html=True)
+st.markdown("""
+<div class="foot">
+  User Modeling Agent · DSN × BCT LLM Agent Challenge 2026 ·
+  persona → draft in-voice → self-reflection critique &amp; revise ·
+  rating predicted as persona prior adjusted by item evidence
+</div>
+""", unsafe_allow_html=True)