Spaces:
Running
Running
File size: 5,488 Bytes
6f2d08c 6dd2a0d 6f2d08c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ---
title: User Modeling Agent
emoji: π
colorFrom: green
colorTo: red
sdk: docker
app_port: 7860
pinned: false
---
# User Modeling Agent
**DSN Γ BCT LLM Agent Challenge 2026 β Task A.**
An agent that reads a person into a behavioural *persona*, then writes the
star rating and the review that person would leave for an unseen product β
and critiques and revises its own draft before returning it.
> Live demo: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent
> Code: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent/tree/main
---
## What it does
Given a **person** and **product details**, the agent produces:
- a **star rating** (1β5) the person would likely give, and
- a **written review** in that person's voice β tone, length, and quirks matched.
It is not a generic review generator. Every output is conditioned on a
specific person, and the rating is reasoned, not guessed.
## Three input modes
The same persona engine is fed by three input modes:
- **Compose a persona** β describe the person's reviewing voice in free text.
- **Dataset reader** β a real user from the data; the agent is scored against
a genuinely held-out review.
- **Build from past reviews** β paste a few of the person's actual past
reviews, and the agent builds the persona from them.
## The agentic workflow
The system is an agent, not a single prompt. It runs a five-step loop:
1. **Build the persona.** A `PersonaEngine` extracts a structured persona β
quantitative signals (average rating, rating spread, review length,
domains, rating distribution) and a qualitative voice (tone, preferred
themes, common complaints, a one-line voice descriptor) distilled by an
LLM from sample reviews, with a deterministic fallback if that call fails.
2. **Select grounding history.** For a real person, the agent picks the few
past reviews most similar to the target item, so it writes from concrete
evidence of how this person actually phrases things.
3. **Generate the rating and review.** A single LLM call, with the rating
reasoned in two explicit steps β first the persona *prior* (what this
person usually gives), then the *item evidence* (what the title and
description signal). The final rating is the prior adjusted by the
evidence, so a generous reviewer still rates a poor item low and a
critical reviewer still rates a strong item high.
4. **Self-reflection β critique and revise.** A critic LLM audits the draft
for ratingβtext consistency, voice match, and on-topic fit. If it objects,
the agent rewrites with that feedback and re-checks β up to two cycles.
This act β critique β revise loop is what makes it an agent.
5. **Post-process.** The rating is clamped to range. An optional Nigerian
Pidgin rendering layer can restyle the review while preserving meaning,
sentiment, and rating.
## Reliability
- **Provider failover.** The agent runs a primary and a secondary LLM
provider. If the primary fails β quota, rate limit or a transient service
error β the same call is retried automatically on the secondary, so a live
demo does not break when one provider is briefly unavailable.
- **Graceful degradation.** If an LLM call fails, the agent falls back to a
deterministic persona rather than crashing.
## How it maps to the Task A rubric
- **Review Text Quality** β reviews are grounded in the person's real past
reviews and self-critiqued for voice match.
- **Rating Accuracy** β the two-step prior-plus-evidence rating logic
corrects the common failure of predicting from the user average alone.
- **Behavioural Fidelity** β persona-conditioned generation; the persona
portrait is visible in the app for inspection.
- **Nigerian contextualization (bonus)** β a toggleable Nigerian Pidgin
rendering layer; off by default so scored output stays standard English.
## Running locally
```bash
pip install -r requirements.txt
# set your keys in a .env file:
# LLM_PROVIDER=openai
# OPENAI_API_KEY=...
# GEMINI_API_KEY=...
streamlit run app.py
```
`LLM_PROVIDER` sets the primary provider; the other provider, if its key is
present, is used as the automatic failover. The processed data
(`data/processed/*.parquet`) must be present.
## Project layout
```
core/ shared engine β config, llm, persona, reflection, nigerian
task_a_user_modeling/ the User Modeling agent
scripts/ test harness (test_task_a.py)
data/processed/ Amazon Reviews 2023 β Books Β· Movies & TV Β· Kindle Store
app.py Streamlit demo β three input modes
```
## Configuration
Set in a `.env` file (never commit it):
- `LLM_PROVIDER` β `openai` or `gemini` (the primary provider)
- `OPENAI_API_KEY` / `GEMINI_API_KEY` β both should be set so the unused one
serves as the automatic failover
On a HuggingFace Space, set these as **Secrets** in Space settings.
## Notes and honest limitations
- The self-reflection critic checks internal consistency; it cannot catch a
rating that is wrong but self-consistent.
- Rating prediction on hard cases (a critical user who loved something) is
improved by the two-step logic but can still be ~0.5β1.0β
off.
- LLM output is non-deterministic; single-run results vary, so evaluation
averages across many users.
## Credits
Built for the DSN Γ BCT LLM Agent Challenge 2026.
Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023.
|