Spaces:

Israelbliz
/

User-Modeling-Agent

Running

App Files Files Community

User-Modeling-Agent / README.md

Israelbliz

Update README.md

6dd2a0d verified about 5 hours ago

preview code

raw

history blame contribute delete

5.49 kB

	---
	title: User Modeling Agent
	emoji: 📝
	colorFrom: green
	colorTo: red
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# User Modeling Agent

	DSN × BCT LLM Agent Challenge 2026 — Task A.

	An agent that reads a person into a behavioural persona, then writes the
	star rating and the review that person would leave for an unseen product —
	and critiques and revises its own draft before returning it.

	> Live demo: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent

	> Code: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent/tree/main

	---

	## What it does

	Given a person and product details, the agent produces:

	- a star rating (1–5) the person would likely give, and
	- a written review in that person's voice — tone, length, and quirks matched.

	It is not a generic review generator. Every output is conditioned on a
	specific person, and the rating is reasoned, not guessed.

	## Three input modes

	The same persona engine is fed by three input modes:

	- Compose a persona — describe the person's reviewing voice in free text.
	- Dataset reader — a real user from the data; the agent is scored against
	a genuinely held-out review.
	- Build from past reviews — paste a few of the person's actual past
	reviews, and the agent builds the persona from them.

	## The agentic workflow

	The system is an agent, not a single prompt. It runs a five-step loop:

	1. Build the persona. A `PersonaEngine` extracts a structured persona —
	quantitative signals (average rating, rating spread, review length,
	domains, rating distribution) and a qualitative voice (tone, preferred
	themes, common complaints, a one-line voice descriptor) distilled by an
	LLM from sample reviews, with a deterministic fallback if that call fails.

	2. Select grounding history. For a real person, the agent picks the few
	past reviews most similar to the target item, so it writes from concrete
	evidence of how this person actually phrases things.

	3. Generate the rating and review. A single LLM call, with the rating
	reasoned in two explicit steps — first the persona prior (what this
	person usually gives), then the item evidence (what the title and
	description signal). The final rating is the prior adjusted by the
	evidence, so a generous reviewer still rates a poor item low and a
	critical reviewer still rates a strong item high.

	4. Self-reflection — critique and revise. A critic LLM audits the draft
	for rating–text consistency, voice match, and on-topic fit. If it objects,
	the agent rewrites with that feedback and re-checks — up to two cycles.
	This act → critique → revise loop is what makes it an agent.

	5. Post-process. The rating is clamped to range. An optional Nigerian
	Pidgin rendering layer can restyle the review while preserving meaning,
	sentiment, and rating.

	## Reliability

	- Provider failover. The agent runs a primary and a secondary LLM
	provider. If the primary fails — quota, rate limit or a transient service
	error — the same call is retried automatically on the secondary, so a live
	demo does not break when one provider is briefly unavailable.
	- Graceful degradation. If an LLM call fails, the agent falls back to a
	deterministic persona rather than crashing.

	## How it maps to the Task A rubric

	- Review Text Quality — reviews are grounded in the person's real past
	reviews and self-critiqued for voice match.
	- Rating Accuracy — the two-step prior-plus-evidence rating logic
	corrects the common failure of predicting from the user average alone.
	- Behavioural Fidelity — persona-conditioned generation; the persona
	portrait is visible in the app for inspection.
	- Nigerian contextualization (bonus) — a toggleable Nigerian Pidgin
	rendering layer; off by default so scored output stays standard English.

	## Running locally

	```bash
	pip install -r requirements.txt
	# set your keys in a .env file:
	# LLM_PROVIDER=openai
	# OPENAI_API_KEY=...
	# GEMINI_API_KEY=...
	streamlit run app.py
	```

	`LLM_PROVIDER` sets the primary provider; the other provider, if its key is
	present, is used as the automatic failover. The processed data
	(`data/processed/*.parquet`) must be present.

	## Project layout

	```
	core/ shared engine — config, llm, persona, reflection, nigerian
	task_a_user_modeling/ the User Modeling agent
	scripts/ test harness (test_task_a.py)
	data/processed/ Amazon Reviews 2023 — Books · Movies & TV · Kindle Store
	app.py Streamlit demo — three input modes
	```

	## Configuration

	Set in a `.env` file (never commit it):

	- `LLM_PROVIDER` — `openai` or `gemini` (the primary provider)
	- `OPENAI_API_KEY` / `GEMINI_API_KEY` — both should be set so the unused one
	serves as the automatic failover

	On a HuggingFace Space, set these as Secrets in Space settings.

	## Notes and honest limitations

	- The self-reflection critic checks internal consistency; it cannot catch a
	rating that is wrong but self-consistent.
	- Rating prediction on hard cases (a critical user who loved something) is
	improved by the two-step logic but can still be ~0.5–1.0★ off.
	- LLM output is non-deterministic; single-run results vary, so evaluation
	averages across many users.

	## Credits

	Built for the DSN × BCT LLM Agent Challenge 2026.
	Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023.