Spaces:
Running
Running
| title: User Modeling Agent | |
| emoji: π | |
| colorFrom: green | |
| colorTo: red | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # User Modeling Agent | |
| **DSN Γ BCT LLM Agent Challenge 2026 β Task A.** | |
| An agent that reads a person into a behavioural *persona*, then writes the | |
| star rating and the review that person would leave for an unseen product β | |
| and critiques and revises its own draft before returning it. | |
| > Live demo: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent | |
| > Code: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent/tree/main | |
| --- | |
| ## What it does | |
| Given a **person** and **product details**, the agent produces: | |
| - a **star rating** (1β5) the person would likely give, and | |
| - a **written review** in that person's voice β tone, length, and quirks matched. | |
| It is not a generic review generator. Every output is conditioned on a | |
| specific person, and the rating is reasoned, not guessed. | |
| ## Three input modes | |
| The same persona engine is fed by three input modes: | |
| - **Compose a persona** β describe the person's reviewing voice in free text. | |
| - **Dataset reader** β a real user from the data; the agent is scored against | |
| a genuinely held-out review. | |
| - **Build from past reviews** β paste a few of the person's actual past | |
| reviews, and the agent builds the persona from them. | |
| ## The agentic workflow | |
| The system is an agent, not a single prompt. It runs a five-step loop: | |
| 1. **Build the persona.** A `PersonaEngine` extracts a structured persona β | |
| quantitative signals (average rating, rating spread, review length, | |
| domains, rating distribution) and a qualitative voice (tone, preferred | |
| themes, common complaints, a one-line voice descriptor) distilled by an | |
| LLM from sample reviews, with a deterministic fallback if that call fails. | |
| 2. **Select grounding history.** For a real person, the agent picks the few | |
| past reviews most similar to the target item, so it writes from concrete | |
| evidence of how this person actually phrases things. | |
| 3. **Generate the rating and review.** A single LLM call, with the rating | |
| reasoned in two explicit steps β first the persona *prior* (what this | |
| person usually gives), then the *item evidence* (what the title and | |
| description signal). The final rating is the prior adjusted by the | |
| evidence, so a generous reviewer still rates a poor item low and a | |
| critical reviewer still rates a strong item high. | |
| 4. **Self-reflection β critique and revise.** A critic LLM audits the draft | |
| for ratingβtext consistency, voice match, and on-topic fit. If it objects, | |
| the agent rewrites with that feedback and re-checks β up to two cycles. | |
| This act β critique β revise loop is what makes it an agent. | |
| 5. **Post-process.** The rating is clamped to range. An optional Nigerian | |
| Pidgin rendering layer can restyle the review while preserving meaning, | |
| sentiment, and rating. | |
| ## Reliability | |
| - **Provider failover.** The agent runs a primary and a secondary LLM | |
| provider. If the primary fails β quota, rate limit or a transient service | |
| error β the same call is retried automatically on the secondary, so a live | |
| demo does not break when one provider is briefly unavailable. | |
| - **Graceful degradation.** If an LLM call fails, the agent falls back to a | |
| deterministic persona rather than crashing. | |
| ## How it maps to the Task A rubric | |
| - **Review Text Quality** β reviews are grounded in the person's real past | |
| reviews and self-critiqued for voice match. | |
| - **Rating Accuracy** β the two-step prior-plus-evidence rating logic | |
| corrects the common failure of predicting from the user average alone. | |
| - **Behavioural Fidelity** β persona-conditioned generation; the persona | |
| portrait is visible in the app for inspection. | |
| - **Nigerian contextualization (bonus)** β a toggleable Nigerian Pidgin | |
| rendering layer; off by default so scored output stays standard English. | |
| ## Running locally | |
| ```bash | |
| pip install -r requirements.txt | |
| # set your keys in a .env file: | |
| # LLM_PROVIDER=openai | |
| # OPENAI_API_KEY=... | |
| # GEMINI_API_KEY=... | |
| streamlit run app.py | |
| ``` | |
| `LLM_PROVIDER` sets the primary provider; the other provider, if its key is | |
| present, is used as the automatic failover. The processed data | |
| (`data/processed/*.parquet`) must be present. | |
| ## Project layout | |
| ``` | |
| core/ shared engine β config, llm, persona, reflection, nigerian | |
| task_a_user_modeling/ the User Modeling agent | |
| scripts/ test harness (test_task_a.py) | |
| data/processed/ Amazon Reviews 2023 β Books Β· Movies & TV Β· Kindle Store | |
| app.py Streamlit demo β three input modes | |
| ``` | |
| ## Configuration | |
| Set in a `.env` file (never commit it): | |
| - `LLM_PROVIDER` β `openai` or `gemini` (the primary provider) | |
| - `OPENAI_API_KEY` / `GEMINI_API_KEY` β both should be set so the unused one | |
| serves as the automatic failover | |
| On a HuggingFace Space, set these as **Secrets** in Space settings. | |
| ## Notes and honest limitations | |
| - The self-reflection critic checks internal consistency; it cannot catch a | |
| rating that is wrong but self-consistent. | |
| - Rating prediction on hard cases (a critical user who loved something) is | |
| improved by the two-step logic but can still be ~0.5β1.0β off. | |
| - LLM output is non-deterministic; single-run results vary, so evaluation | |
| averages across many users. | |
| ## Credits | |
| Built for the DSN Γ BCT LLM Agent Challenge 2026. | |
| Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023. | |