Spaces:

Israelbliz
/

User-Modeling-Agent

Running

App Files Files Community

Israelbliz commited on 10 days ago

Commit

a7a3666

verified ·

1 Parent(s): ba4acf2

Delete README.md

Browse files

Files changed (1) hide show

README.md +0 -141

README.md DELETED Viewed

@@ -1,141 +0,0 @@
----
-title: User Modeling Agent
-emoji: 📝
-colorFrom: green
-colorTo: red
-sdk: docker
-app_port: 7860
-pinned: false
----
-# User Modeling Agent
-**DSN × BCT LLM Agent Challenge 2026 — Task A.**
-An agent that reads a person into a behavioural *persona*, then writes the
-star rating and the review that person would leave for an unseen product —
-and critiques and revises its own draft before returning it.
-> Live demo: *(your HuggingFace Space URL)*
-> Code: *(your HuggingFace Space URL)*
----
-## What it does
-Given a **person** and **product details**, the agent produces:
-- a **star rating** (1–5) the person would likely give, and
-- a **written review** in that person's voice — tone, length, and quirks matched.
-It is not a generic review generator. Every output is conditioned on a
-specific person, and the rating is reasoned, not guessed.
-## Three input modes
-The same persona engine is fed by three input modes:
-- **Compose a persona** — describe the person's reviewing voice in free text.
-- **Dataset reader** — a real user from the data; the agent is scored against
-  a genuinely held-out review.
-- **Build from past reviews** — paste a few of the person's actual past
-  reviews, and the agent builds the persona from them.
-## The agentic workflow
-The system is an agent, not a single prompt. It runs a five-step loop:
-1. **Build the persona.** A `PersonaEngine` extracts a structured persona —
-   quantitative signals (average rating, rating spread, review length,
-   domains, rating distribution) and a qualitative voice (tone, preferred
-   themes, common complaints, a one-line voice descriptor) distilled by an
-   LLM from sample reviews, with a deterministic fallback if that call fails.
-2. **Select grounding history.** For a real person, the agent picks the few
-   past reviews most similar to the target item, so it writes from concrete
-   evidence of how this person actually phrases things.
-3. **Generate the rating and review.** A single LLM call, with the rating
-   reasoned in two explicit steps — first the persona *prior* (what this
-   person usually gives), then the *item evidence* (what the title and
-   description signal). The final rating is the prior adjusted by the
-   evidence, so a generous reviewer still rates a poor item low and a
-   critical reviewer still rates a strong item high.
-4. **Self-reflection — critique and revise.** A critic LLM audits the draft
-   for rating–text consistency, voice match, and on-topic fit. If it objects,
-   the agent rewrites with that feedback and re-checks — up to two cycles.
-   This act → critique → revise loop is what makes it an agent.
-5. **Post-process.** The rating is clamped to range. An optional Nigerian
-   Pidgin rendering layer can restyle the review while preserving meaning,
-   sentiment, and rating.
-## Reliability
-- **Provider failover.** The agent runs a primary and a secondary LLM
-  provider. If the primary fails — quota, rate limit or a transient service
-  error — the same call is retried automatically on the secondary, so a live
-  demo does not break when one provider is briefly unavailable.
-- **Graceful degradation.** If an LLM call fails, the agent falls back to a
-  deterministic persona rather than crashing.
-## How it maps to the Task A rubric
-- **Review Text Quality** — reviews are grounded in the person's real past
-  reviews and self-critiqued for voice match.
-- **Rating Accuracy** — the two-step prior-plus-evidence rating logic
-  corrects the common failure of predicting from the user average alone.
-- **Behavioural Fidelity** — persona-conditioned generation; the persona
-  portrait is visible in the app for inspection.
-- **Nigerian contextualization (bonus)** — a toggleable Nigerian Pidgin
-  rendering layer; off by default so scored output stays standard English.
-## Running locally
-```bash
-pip install -r requirements.txt
-# set your keys in a .env file:
-#   LLM_PROVIDER=openai
-#   OPENAI_API_KEY=...
-#   GEMINI_API_KEY=...
-streamlit run app.py
-```
-`LLM_PROVIDER` sets the primary provider; the other provider, if its key is
-present, is used as the automatic failover. The processed data
-(`data/processed/*.parquet`) must be present.
-## Project layout
-```
-core/                 shared engine — config, llm, persona, reflection, nigerian
-task_a_user_modeling/ the User Modeling agent
-scripts/              test harness (test_task_a.py)
-data/processed/       Amazon Reviews 2023 — Books · Movies & TV · Kindle Store
-app.py                Streamlit demo — three input modes
-```
-## Configuration
-Set in a `.env` file (never commit it):
-- `LLM_PROVIDER` — `openai` or `gemini` (the primary provider)
-- `OPENAI_API_KEY` / `GEMINI_API_KEY` — both should be set so the unused one
-  serves as the automatic failover
-On a HuggingFace Space, set these as **Secrets** in Space settings.
-## Notes and honest limitations
-- The self-reflection critic checks internal consistency; it cannot catch a
-  rating that is wrong but self-consistent.
-- Rating prediction on hard cases (a critical user who loved something) is
-  improved by the two-step logic but can still be ~0.5–1.0★ off.
-- LLM output is non-deterministic; single-run results vary, so evaluation
-  averages across many users.
-## Credits
-Built for the DSN × BCT LLM Agent Challenge 2026.
-Author: Israel Akomodesegbe. Team: Winning Team. Dataset: Amazon Reviews 2023.