User-Modeling-Agent / README.md
Israelbliz's picture
Update README.md
6dd2a0d verified
metadata
title: User Modeling Agent
emoji: πŸ“
colorFrom: green
colorTo: red
sdk: docker
app_port: 7860
pinned: false

User Modeling Agent

DSN Γ— BCT LLM Agent Challenge 2026 β€” Task A.

An agent that reads a person into a behavioural persona, then writes the star rating and the review that person would leave for an unseen product β€” and critiques and revises its own draft before returning it.

Live demo: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent

Code: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent/tree/main


What it does

Given a person and product details, the agent produces:

  • a star rating (1–5) the person would likely give, and
  • a written review in that person's voice β€” tone, length, and quirks matched.

It is not a generic review generator. Every output is conditioned on a specific person, and the rating is reasoned, not guessed.

Three input modes

The same persona engine is fed by three input modes:

  • Compose a persona β€” describe the person's reviewing voice in free text.
  • Dataset reader β€” a real user from the data; the agent is scored against a genuinely held-out review.
  • Build from past reviews β€” paste a few of the person's actual past reviews, and the agent builds the persona from them.

The agentic workflow

The system is an agent, not a single prompt. It runs a five-step loop:

  1. Build the persona. A PersonaEngine extracts a structured persona β€” quantitative signals (average rating, rating spread, review length, domains, rating distribution) and a qualitative voice (tone, preferred themes, common complaints, a one-line voice descriptor) distilled by an LLM from sample reviews, with a deterministic fallback if that call fails.

  2. Select grounding history. For a real person, the agent picks the few past reviews most similar to the target item, so it writes from concrete evidence of how this person actually phrases things.

  3. Generate the rating and review. A single LLM call, with the rating reasoned in two explicit steps β€” first the persona prior (what this person usually gives), then the item evidence (what the title and description signal). The final rating is the prior adjusted by the evidence, so a generous reviewer still rates a poor item low and a critical reviewer still rates a strong item high.

  4. Self-reflection β€” critique and revise. A critic LLM audits the draft for rating–text consistency, voice match, and on-topic fit. If it objects, the agent rewrites with that feedback and re-checks β€” up to two cycles. This act β†’ critique β†’ revise loop is what makes it an agent.

  5. Post-process. The rating is clamped to range. An optional Nigerian Pidgin rendering layer can restyle the review while preserving meaning, sentiment, and rating.

Reliability

  • Provider failover. The agent runs a primary and a secondary LLM provider. If the primary fails β€” quota, rate limit or a transient service error β€” the same call is retried automatically on the secondary, so a live demo does not break when one provider is briefly unavailable.
  • Graceful degradation. If an LLM call fails, the agent falls back to a deterministic persona rather than crashing.

How it maps to the Task A rubric

  • Review Text Quality β€” reviews are grounded in the person's real past reviews and self-critiqued for voice match.
  • Rating Accuracy β€” the two-step prior-plus-evidence rating logic corrects the common failure of predicting from the user average alone.
  • Behavioural Fidelity β€” persona-conditioned generation; the persona portrait is visible in the app for inspection.
  • Nigerian contextualization (bonus) β€” a toggleable Nigerian Pidgin rendering layer; off by default so scored output stays standard English.

Running locally

pip install -r requirements.txt
# set your keys in a .env file:
#   LLM_PROVIDER=openai
#   OPENAI_API_KEY=...
#   GEMINI_API_KEY=...
streamlit run app.py

LLM_PROVIDER sets the primary provider; the other provider, if its key is present, is used as the automatic failover. The processed data (data/processed/*.parquet) must be present.

Project layout

core/                 shared engine β€” config, llm, persona, reflection, nigerian
task_a_user_modeling/ the User Modeling agent
scripts/              test harness (test_task_a.py)
data/processed/       Amazon Reviews 2023 β€” Books Β· Movies & TV Β· Kindle Store
app.py                Streamlit demo β€” three input modes

Configuration

Set in a .env file (never commit it):

  • LLM_PROVIDER β€” openai or gemini (the primary provider)
  • OPENAI_API_KEY / GEMINI_API_KEY β€” both should be set so the unused one serves as the automatic failover

On a HuggingFace Space, set these as Secrets in Space settings.

Notes and honest limitations

  • The self-reflection critic checks internal consistency; it cannot catch a rating that is wrong but self-consistent.
  • Rating prediction on hard cases (a critical user who loved something) is improved by the two-step logic but can still be ~0.5–1.0β˜… off.
  • LLM output is non-deterministic; single-run results vary, so evaluation averages across many users.

Credits

Built for the DSN Γ— BCT LLM Agent Challenge 2026. Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023.