Israelbliz commited on
Commit
a7a3666
Β·
verified Β·
1 Parent(s): ba4acf2

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -141
README.md DELETED
@@ -1,141 +0,0 @@
1
- ---
2
- title: User Modeling Agent
3
- emoji: πŸ“
4
- colorFrom: green
5
- colorTo: red
6
- sdk: docker
7
- app_port: 7860
8
- pinned: false
9
- ---
10
-
11
- # User Modeling Agent
12
-
13
- **DSN Γ— BCT LLM Agent Challenge 2026 β€” Task A.**
14
-
15
- An agent that reads a person into a behavioural *persona*, then writes the
16
- star rating and the review that person would leave for an unseen product β€”
17
- and critiques and revises its own draft before returning it.
18
-
19
- > Live demo: *(your HuggingFace Space URL)*
20
- > Code: *(your HuggingFace Space URL)*
21
-
22
- ---
23
-
24
- ## What it does
25
-
26
- Given a **person** and **product details**, the agent produces:
27
-
28
- - a **star rating** (1–5) the person would likely give, and
29
- - a **written review** in that person's voice β€” tone, length, and quirks matched.
30
-
31
- It is not a generic review generator. Every output is conditioned on a
32
- specific person, and the rating is reasoned, not guessed.
33
-
34
- ## Three input modes
35
-
36
- The same persona engine is fed by three input modes:
37
-
38
- - **Compose a persona** β€” describe the person's reviewing voice in free text.
39
- - **Dataset reader** β€” a real user from the data; the agent is scored against
40
- a genuinely held-out review.
41
- - **Build from past reviews** β€” paste a few of the person's actual past
42
- reviews, and the agent builds the persona from them.
43
-
44
- ## The agentic workflow
45
-
46
- The system is an agent, not a single prompt. It runs a five-step loop:
47
-
48
- 1. **Build the persona.** A `PersonaEngine` extracts a structured persona β€”
49
- quantitative signals (average rating, rating spread, review length,
50
- domains, rating distribution) and a qualitative voice (tone, preferred
51
- themes, common complaints, a one-line voice descriptor) distilled by an
52
- LLM from sample reviews, with a deterministic fallback if that call fails.
53
-
54
- 2. **Select grounding history.** For a real person, the agent picks the few
55
- past reviews most similar to the target item, so it writes from concrete
56
- evidence of how this person actually phrases things.
57
-
58
- 3. **Generate the rating and review.** A single LLM call, with the rating
59
- reasoned in two explicit steps β€” first the persona *prior* (what this
60
- person usually gives), then the *item evidence* (what the title and
61
- description signal). The final rating is the prior adjusted by the
62
- evidence, so a generous reviewer still rates a poor item low and a
63
- critical reviewer still rates a strong item high.
64
-
65
- 4. **Self-reflection β€” critique and revise.** A critic LLM audits the draft
66
- for rating–text consistency, voice match, and on-topic fit. If it objects,
67
- the agent rewrites with that feedback and re-checks β€” up to two cycles.
68
- This act β†’ critique β†’ revise loop is what makes it an agent.
69
-
70
- 5. **Post-process.** The rating is clamped to range. An optional Nigerian
71
- Pidgin rendering layer can restyle the review while preserving meaning,
72
- sentiment, and rating.
73
-
74
- ## Reliability
75
-
76
- - **Provider failover.** The agent runs a primary and a secondary LLM
77
- provider. If the primary fails β€” quota, rate limit or a transient service
78
- error β€” the same call is retried automatically on the secondary, so a live
79
- demo does not break when one provider is briefly unavailable.
80
- - **Graceful degradation.** If an LLM call fails, the agent falls back to a
81
- deterministic persona rather than crashing.
82
-
83
- ## How it maps to the Task A rubric
84
-
85
- - **Review Text Quality** β€” reviews are grounded in the person's real past
86
- reviews and self-critiqued for voice match.
87
- - **Rating Accuracy** β€” the two-step prior-plus-evidence rating logic
88
- corrects the common failure of predicting from the user average alone.
89
- - **Behavioural Fidelity** β€” persona-conditioned generation; the persona
90
- portrait is visible in the app for inspection.
91
- - **Nigerian contextualization (bonus)** β€” a toggleable Nigerian Pidgin
92
- rendering layer; off by default so scored output stays standard English.
93
-
94
- ## Running locally
95
-
96
- ```bash
97
- pip install -r requirements.txt
98
- # set your keys in a .env file:
99
- # LLM_PROVIDER=openai
100
- # OPENAI_API_KEY=...
101
- # GEMINI_API_KEY=...
102
- streamlit run app.py
103
- ```
104
-
105
- `LLM_PROVIDER` sets the primary provider; the other provider, if its key is
106
- present, is used as the automatic failover. The processed data
107
- (`data/processed/*.parquet`) must be present.
108
-
109
- ## Project layout
110
-
111
- ```
112
- core/ shared engine β€” config, llm, persona, reflection, nigerian
113
- task_a_user_modeling/ the User Modeling agent
114
- scripts/ test harness (test_task_a.py)
115
- data/processed/ Amazon Reviews 2023 β€” Books Β· Movies & TV Β· Kindle Store
116
- app.py Streamlit demo β€” three input modes
117
- ```
118
-
119
- ## Configuration
120
-
121
- Set in a `.env` file (never commit it):
122
-
123
- - `LLM_PROVIDER` β€” `openai` or `gemini` (the primary provider)
124
- - `OPENAI_API_KEY` / `GEMINI_API_KEY` β€” both should be set so the unused one
125
- serves as the automatic failover
126
-
127
- On a HuggingFace Space, set these as **Secrets** in Space settings.
128
-
129
- ## Notes and honest limitations
130
-
131
- - The self-reflection critic checks internal consistency; it cannot catch a
132
- rating that is wrong but self-consistent.
133
- - Rating prediction on hard cases (a critical user who loved something) is
134
- improved by the two-step logic but can still be ~0.5–1.0β˜… off.
135
- - LLM output is non-deterministic; single-run results vary, so evaluation
136
- averages across many users.
137
-
138
- ## Credits
139
-
140
- Built for the DSN Γ— BCT LLM Agent Challenge 2026.
141
- Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023.