Israelbliz commited on
Commit
6f2d08c
Β·
verified Β·
1 Parent(s): a7a3666

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: User Modeling Agent
3
+ emoji: πŸ“
4
+ colorFrom: green
5
+ colorTo: red
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ ---
10
+
11
+ # User Modeling Agent
12
+
13
+ **DSN Γ— BCT LLM Agent Challenge 2026 β€” Task A.**
14
+
15
+ An agent that reads a person into a behavioural *persona*, then writes the
16
+ star rating and the review that person would leave for an unseen product β€”
17
+ and critiques and revises its own draft before returning it.
18
+
19
+ > Live demo: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent
20
+ > Code: https://huggingface.co/spaces/Israelbliz/User-Modeling-Agent/tree/main
21
+
22
+ ---
23
+
24
+ ## What it does
25
+
26
+ Given a **person** and **product details**, the agent produces:
27
+
28
+ - a **star rating** (1–5) the person would likely give, and
29
+ - a **written review** in that person's voice β€” tone, length, and quirks matched.
30
+
31
+ It is not a generic review generator. Every output is conditioned on a
32
+ specific person, and the rating is reasoned, not guessed.
33
+
34
+ ## Three input modes
35
+
36
+ The same persona engine is fed by three input modes:
37
+
38
+ - **Compose a persona** β€” describe the person's reviewing voice in free text.
39
+ - **Dataset reader** β€” a real user from the data; the agent is scored against
40
+ a genuinely held-out review.
41
+ - **Build from past reviews** β€” paste a few of the person's actual past
42
+ reviews, and the agent builds the persona from them.
43
+
44
+ ## The agentic workflow
45
+
46
+ The system is an agent, not a single prompt. It runs a five-step loop:
47
+
48
+ 1. **Build the persona.** A `PersonaEngine` extracts a structured persona β€”
49
+ quantitative signals (average rating, rating spread, review length,
50
+ domains, rating distribution) and a qualitative voice (tone, preferred
51
+ themes, common complaints, a one-line voice descriptor) distilled by an
52
+ LLM from sample reviews, with a deterministic fallback if that call fails.
53
+
54
+ 2. **Select grounding history.** For a real person, the agent picks the few
55
+ past reviews most similar to the target item, so it writes from concrete
56
+ evidence of how this person actually phrases things.
57
+
58
+ 3. **Generate the rating and review.** A single LLM call, with the rating
59
+ reasoned in two explicit steps β€” first the persona *prior* (what this
60
+ person usually gives), then the *item evidence* (what the title and
61
+ description signal). The final rating is the prior adjusted by the
62
+ evidence, so a generous reviewer still rates a poor item low and a
63
+ critical reviewer still rates a strong item high.
64
+
65
+ 4. **Self-reflection β€” critique and revise.** A critic LLM audits the draft
66
+ for rating–text consistency, voice match, and on-topic fit. If it objects,
67
+ the agent rewrites with that feedback and re-checks β€” up to two cycles.
68
+ This act β†’ critique β†’ revise loop is what makes it an agent.
69
+
70
+ 5. **Post-process.** The rating is clamped to range. An optional Nigerian
71
+ Pidgin rendering layer can restyle the review while preserving meaning,
72
+ sentiment, and rating.
73
+
74
+ ## Reliability
75
+
76
+ - **Provider failover.** The agent runs a primary and a secondary LLM
77
+ provider. If the primary fails β€” quota, rate limit or a transient service
78
+ error β€” the same call is retried automatically on the secondary, so a live
79
+ demo does not break when one provider is briefly unavailable.
80
+ - **Graceful degradation.** If an LLM call fails, the agent falls back to a
81
+ deterministic persona rather than crashing.
82
+
83
+ ## How it maps to the Task A rubric
84
+
85
+ - **Review Text Quality** β€” reviews are grounded in the person's real past
86
+ reviews and self-critiqued for voice match.
87
+ - **Rating Accuracy** β€” the two-step prior-plus-evidence rating logic
88
+ corrects the common failure of predicting from the user average alone.
89
+ - **Behavioural Fidelity** β€” persona-conditioned generation; the persona
90
+ portrait is visible in the app for inspection.
91
+ - **Nigerian contextualization (bonus)** β€” a toggleable Nigerian Pidgin
92
+ rendering layer; off by default so scored output stays standard English.
93
+
94
+ ## Running locally
95
+
96
+ ```bash
97
+ pip install -r requirements.txt
98
+ # set your keys in a .env file:
99
+ # LLM_PROVIDER=openai
100
+ # OPENAI_API_KEY=...
101
+ # GEMINI_API_KEY=...
102
+ streamlit run app.py
103
+ ```
104
+
105
+ `LLM_PROVIDER` sets the primary provider; the other provider, if its key is
106
+ present, is used as the automatic failover. The processed data
107
+ (`data/processed/*.parquet`) must be present.
108
+
109
+ ## Project layout
110
+
111
+ ```
112
+ core/ shared engine β€” config, llm, persona, reflection, nigerian
113
+ task_a_user_modeling/ the User Modeling agent
114
+ scripts/ test harness (test_task_a.py)
115
+ data/processed/ Amazon Reviews 2023 β€” Books Β· Movies & TV Β· Kindle Store
116
+ app.py Streamlit demo β€” three input modes
117
+ ```
118
+
119
+ ## Configuration
120
+
121
+ Set in a `.env` file (never commit it):
122
+
123
+ - `LLM_PROVIDER` β€” `openai` or `gemini` (the primary provider)
124
+ - `OPENAI_API_KEY` / `GEMINI_API_KEY` β€” both should be set so the unused one
125
+ serves as the automatic failover
126
+
127
+ On a HuggingFace Space, set these as **Secrets** in Space settings.
128
+
129
+ ## Notes and honest limitations
130
+
131
+ - The self-reflection critic checks internal consistency; it cannot catch a
132
+ rating that is wrong but self-consistent.
133
+ - Rating prediction on hard cases (a critical user who loved something) is
134
+ improved by the two-step logic but can still be ~0.5–1.0β˜… off.
135
+ - LLM output is non-deterministic; single-run results vary, so evaluation
136
+ averages across many users.
137
+
138
+ ## Credits
139
+
140
+ Built for the DSN Γ— BCT LLM Agent Challenge 2026.
141
+ Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023.