Spaces:

RolandM
/

prisma-chatbot

Running

App Files Files Community

RolandM commited on 17 days ago

Commit

1e436e0

1 Parent(s): 0f1fb08

Initial project scaffolding

Browse files

Files changed (16) hide show

.env.example +1 -0
.gitignore +54 -0
ARCHITECTURE.md +61 -0
CLAUDE.md +155 -0
README.md +61 -0
ROADMAP.md +56 -0
app.py +7 -0
assets/about.md +22 -0
requirements.txt +4 -0
src/__init__.py +8 -0
src/config.py +8 -0
src/evaluation.py +8 -0
src/inference.py +9 -0
src/prompt.py +9 -0
tests/__init__.py +1 -0
tests/test_placeholder.py +4 -0

.env.example ADDED Viewed

	@@ -0,0 +1 @@


1	+ HF_TOKEN=your_token_here

.gitignore ADDED Viewed

	@@ -0,0 +1,54 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+# Distribution / packaging
+build/
+dist/
+*.egg-info/
+*.egg
+.eggs/
+# Virtual environments
+.venv/
+venv/
+env/
+ENV/
+# Environment / secrets
+.env
+.env.local
+.env.*.local
+# Testing / coverage
+.pytest_cache/
+.coverage
+.coverage.*
+htmlcov/
+.tox/
+.nox/
+coverage.xml
+*.cover
+# Type checkers / linters
+.mypy_cache/
+.ruff_cache/
+.pyre/
+.pytype/
+# Editors / OS
+.vscode/
+.idea/
+*.swp
+*.swo
+.DS_Store
+Thumbs.db
+# Jupyter
+.ipynb_checkpoints/
+# Gradio / HF cache
+.gradio/
+flagged/

ARCHITECTURE.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# Architecture
+Design decisions and rationale for prisma-chatbot. This document captures
+*why* the system is built the way it is, not just *what* it does. See
+[`CLAUDE.md`](CLAUDE.md) for the higher-level project framing and
+[`ROADMAP.md`](ROADMAP.md) for the deployment plan.
+## System overview
+> **TODO:** High-level diagram or description — user → Gradio UI → single
+> LLM call (dual-role prompt) → response + evaluation → UI.
+## Key design decisions
+### Dual-role prompt, single LLM call per turn
+> **TODO:** Rationale — one call keeps latency and cost predictable, and
+> keeps response and evaluation grounded in the same context. Trade-off:
+> prompt is more complex than two separate calls would be.
+### Structured JSON output
+> **TODO:** Rationale — JSON with `response` (string) and `evaluation`
+> (object of six attribute scores 1–7) makes parsing and display
+> deterministic. Trade-off: the model occasionally produces malformed
+> output and needs validation/repair.
+### Six evaluation attributes
+> **TODO:** Document the attributes (competent, knowledgeable,
+> well-prepared, helpful, likeable, pedantic) and why they are chosen to
+> match the CMCL/EMNLP study — the demo is a faithful artifact of the
+> research, not a redesigned version of it.
+### Llama 3.3 70B Instruct via HF Inference API
+> **TODO:** Rationale — hosted inference removes deployment complexity for
+> a public demo; a 70B-class instruct model is needed for reliable
+> structured output and persona adherence. Trade-off: dependency on HF
+> endpoint availability and rate limits.
+### Gradio on Hugging Face Spaces
+> **TODO:** Rationale — lowest-friction path to a public, shareable
+> artifact; integrates naturally with HF Inference; the research audience
+> is already familiar with the platform.
+## Module responsibilities
+> **TODO:** Expand each line below with a short description once the
+> module is implemented; link back to the relevant design decisions above.
+- `src/config.py` — tunables and constants
+- `src/prompt.py` — dual-role system prompt construction
+- `src/inference.py` — HF Inference API client wrapper
+- `src/evaluation.py` — score parsing, validation, display formatting
+- `app.py` — Gradio UI assembly and event wiring
+## Open design questions
+See the "Open Questions" section in [`CLAUDE.md`](CLAUDE.md).

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,155 @@

+# CLAUDE.md
+## Project: prisma-chatbot
+A conversational AI demo featuring **Prisma** — a chatbot that responds to
+users while simultaneously evaluating them on social/pragmatic dimensions.
+Built as a research-facing artifact accompanying published work on LLM social
+perception (CMCL 2026; EMNLP 2026, under review).
+**PRISMA** stands for *Pragmatic Real-time Inference of Social Meaning in
+Agents*.
+**Tagline:** "Have you ever wondered what your chatbot thinks about you?"
+**Live demo:** [HF Space link — to be added]
+**Research papers:** [CMCL link — to be added] | [EMNLP link — pending]
+## Project Goals
+1. **Research dissemination** — make the social-perception findings tangible
+   and interactive for both NLP researchers and general audiences.
+2. **Portfolio artifact** — serve as a public, polished demonstration of
+   applied LLM/NLP engineering for industry job applications.
+3. **Conversation starter** — generate discussion about how LLMs perceive
+   speakers, not just respond to them.
+## Owner Context
+The maintainer is a theoretical/computational linguist transitioning into
+AI/tech roles. Code should be clean, readable, and well-documented — both
+because this is a public artifact and because the author values clarity over
+cleverness. Industry-standard practices (typing, docstrings, modular design)
+are preferred over research-code shortcuts.
+## Bot Persona: Prisma
+The chatbot introduces herself as Prisma on the first turn. Suggested opening
+(refine later):
+> "Hi, I'm Prisma. I'll chat with you — and while we talk, I'll also form
+> impressions of you based on how you write. You can check what I think at
+> any time."
+**Voice:** lightly curious and observational. Helpful and competent as an
+assistant, but with a subtle awareness that she's also paying attention to
+*how* the user writes, not just *what* they ask. Never roleplay-heavy, never
+clinical or diagnostic, never sycophantic. The personality should be carried
+mostly by the name, the intro, and small observational touches — not by
+constant character performance.
+## Architecture
+**Frontend:** Gradio app deployed on Hugging Face Spaces.
+**Backend:** Single LLM call per turn, dual-role prompt (response + evaluation).
+**Model:** Llama 3.3 70B Instruct via Hugging Face Inference API.
+**Output format:** Structured JSON with `response` (string) and `evaluation`
+(object with six attribute scores 1–7).
+**Evaluation dimensions:** competent, knowledgeable, well-prepared, helpful,
+likeable, pedantic. (Matches the CMCL/EMNLP study attributes.)
+**Key design property:** evaluations update *dynamically* across the
+conversation. This reflects the research finding that social meaning is
+constructed turn by turn, not fixed by a single utterance. The "mirror"
+metaphor and PRISMA acronym both lean into this real-time aspect.
+## Tech Stack
+- Python 3.11+
+- Gradio (UI framework)
+- `huggingface_hub` (Inference API client)
+- `python-dotenv` (local secrets)
+- `pytest` (testing)
+Keep dependencies minimal. Add new ones only when clearly justified.
+## Code Style
+- Type hints on all function signatures.
+- Docstrings (Google or NumPy style) on public functions and classes.
+- Module-level docstrings explaining purpose.
+- Prefer pure functions and small modules over large stateful classes.
+- Black for formatting, Ruff for linting.
+- No emojis in code or comments.
+## Repository Structure
+```
+prisma-chatbot/
+├── README.md              # Public project description
+├── CLAUDE.md              # This file — instructions for AI assistants
+├── ARCHITECTURE.md        # Design decisions and rationale
+├── ROADMAP.md             # Deployment plan and milestones
+├── app.py                 # Gradio app entry point (HF Space reads this)
+├── src/
+│   ├── __init__.py
+│   ├── prompt.py          # System prompt construction (Prisma persona + dual-role)
+│   ├── inference.py       # HF Inference API client wrapper
+│   ├── evaluation.py      # Score parsing, validation, display formatting
+│   └── config.py          # Settings, constants, rate limits
+├── tests/
+│   └── ...                # Pytest-based unit tests
+├── assets/
+│   └── about.md           # Research background copy for UI
+├── requirements.txt
+├── .env.example
+├── .gitignore
+└── LICENSE
+```
+## Development Workflow
+- **Claude Code** is used as project orchestrator: structure decisions,
+  cross-file refactoring, documentation, planning, code review.
+- **Cursor Agent** handles focused feature implementation and UI iteration.
+- All non-trivial changes go through a feature branch and PR review (even if
+  solo) — useful both for hygiene and as portfolio evidence of workflow.
+## Research Context (for AI assistants)
+The project builds on the maintainer's published work investigating whether
+LLMs evaluate speakers based on linguistic choices the way humans do — for
+example, whether saying "I'll be there at 7:03" vs. "around 7" influences
+perceived competence, pedantry, etc. Prisma makes this research thesis
+interactive: the model's social perception of the user is surfaced rather
+than hidden, and updates as the conversation evolves.
+When suggesting features, prompt designs, or UI choices, prefer those that
+align with or showcase this research framing. Avoid generic "AI assistant"
+patterns that obscure the social-perception angle.
+## What This Project Is Not
+- Not a production chatbot — it is a research demo with a specific thesis.
+- Not a generic LLM wrapper — the dual-role evaluation is the point.
+- Not a psychological assessment tool — the evaluation is playful, not
+  diagnostic. UI copy should reflect this clearly.
+## Naming Disambiguation
+"Prisma" is also the name of a well-known TypeScript ORM and an older photo
+app. This project is unrelated to both. The repo is intentionally named
+`prisma-chatbot` (not `prisma`) to make the distinction clear in searches
+and project listings. When referring to the bot, "Prisma" is fine in
+user-facing copy; in code comments and docs, prefer "the bot" or "PRISMA"
+(acronym form) where ambiguity could arise.
+## Open Questions
+(Use this section to flag design decisions still being deliberated.)
+- Should evaluation scores update live after each turn, or only on user request?
+- Numeric (1–7) vs. verbal score display, or both?
+- Per-session turn cap value (10? 15?).
+- Should there be a "compare models" mode in v2?

README.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# prisma-chatbot
+> Have you ever wondered what your chatbot thinks about you?
+**Prisma** (*Pragmatic Real-time Inference of Social Meaning in Agents*) is a
+conversational AI demo that responds to users while simultaneously evaluating
+them on social/pragmatic dimensions. It accompanies published research on
+LLM social perception (CMCL 2026; EMNLP 2026, under review).
+> **TODO:** Replace with a short hero paragraph and screenshot once the demo
+> is live.
+## Live demo
+> **TODO:** Add Hugging Face Space link.
+## What it does
+> **TODO:** 2–3 sentence description of the dual-role design — Prisma
+> responds in conversation while producing a structured evaluation of the
+> user across six attributes (competent, knowledgeable, well-prepared,
+> helpful, likeable, pedantic). Evaluation updates turn by turn.
+## Research context
+> **TODO:** Link to CMCL and EMNLP papers; one paragraph on the research
+> thesis (do LLMs evaluate speakers based on linguistic choices the way
+> humans do?).
+## Local development
+> **TODO:** Flesh out once `app.py` and the `src/` modules are implemented.
+```bash
+# Clone, create a virtualenv, install deps
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+# Copy env template and add your HF token
+cp .env.example .env
+# Run the Gradio app locally
+python app.py
+```
+## Project structure
+See [`ARCHITECTURE.md`](ARCHITECTURE.md) for design decisions and
+[`ROADMAP.md`](ROADMAP.md) for the deployment plan.
+## Contributing
+This is a personal research project and public portfolio artifact. Issues
+are welcome — feel free to open one if you spot a bug, have a question
+about the research, or want to suggest a feature. Pull requests are by
+invitation only; please open an issue first to discuss.
+## License
+See [`LICENSE`](LICENSE) (MIT).

ROADMAP.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# Roadmap
+Deployment plan and milestones for prisma-chatbot. Living document — items
+move from *Planned* to *In progress* to *Done* as the project evolves.
+## Milestone 1 — Scaffolding
+> **TODO:** Repo skeleton, docs, dependency manifest, env template,
+> gitignore. No chatbot logic yet.
+- [x] Repo created, license added
+- [x] CLAUDE.md, README, ARCHITECTURE, ROADMAP drafts
+- [x] `src/`, `tests/`, `assets/` directories with placeholder modules
+- [x] `requirements.txt`, `.gitignore`, `.env.example`
+## Milestone 2 — Minimal end-to-end loop
+> **TODO:** Get a single message in / response + evaluation out working
+> locally, even with a rough prompt.
+- [ ] Implement `src/config.py` (model id, attributes, turn cap)
+- [ ] Implement `src/prompt.py` (v1 dual-role prompt)
+- [ ] Implement `src/inference.py` (HF Inference client wrapper)
+- [ ] Implement `src/evaluation.py` (JSON parsing + validation)
+- [ ] Implement `app.py` (minimal Gradio UI)
+- [ ] First pytest tests for parsing/validation
+## Milestone 3 — Prompt and UX iteration
+> **TODO:** Refine Prisma's voice, evaluation display, and the "check what
+> I think" affordance.
+- [ ] Refine system prompt for voice consistency and structured-output
+      reliability
+- [ ] Decide evaluation display: numeric, verbal, or both
+- [ ] Decide update cadence: live each turn vs. on-request
+- [ ] About panel copy (`assets/about.md`)
+## Milestone 4 — Public deployment
+> **TODO:** Ship to a Hugging Face Space and link from the README and
+> papers.
+- [ ] Hugging Face Space configuration
+- [ ] Rate limiting / per-session turn cap
+- [ ] Public URL added to README and papers
+- [ ] Light usage analytics (anonymous, aggregate)
+## Milestone 5 — Stretch ideas
+> **TODO:** Explicitly non-blocking; consider only after the demo is live
+> and stable.
+- [ ] "Compare models" mode
+- [ ] Downloadable conversation + evaluation transcript
+- [ ] Linguistic feature highlighting (which words/choices shifted scores)

app.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""Gradio app entry point for the Prisma chatbot.
+This module is loaded by the Hugging Face Space. It assembles the Gradio
+interface and wires it to the inference and evaluation modules in `src/`.
+Implementation pending — scaffolding only.
+"""

assets/about.md ADDED Viewed

	@@ -0,0 +1,22 @@

+# About Prisma
+> **TODO:** Research background copy for the UI. Write in an accessible,
+> non-academic voice — this panel is shown to general visitors of the
+> Hugging Face Space.
+## The research
+> **TODO:** Short, accessible summary of the CMCL/EMNLP findings on LLM
+> social perception: do LLMs evaluate speakers based on linguistic choices
+> the way humans do?
+## How Prisma works
+> **TODO:** Plain-language description of the dual-role design — one LLM
+> call per turn produces both a response and a structured evaluation of
+> the user along six attributes.
+## What the evaluation means (and what it doesn't)
+> **TODO:** Make clear the evaluation is playful and reflects how the
+> model perceives the user's writing, not a diagnostic assessment.

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio
+huggingface_hub
+python-dotenv
+pytest

src/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+"""Prisma chatbot source package.
+Modules:
+    config: Settings, constants, and rate limits.
+    prompt: System prompt construction (persona + dual-role formatting).
+    inference: Hugging Face Inference API client wrapper.
+    evaluation: Score parsing, validation, and display formatting.
+"""

src/config.py ADDED Viewed

	@@ -0,0 +1,8 @@

+"""Configuration: settings, constants, and rate limits.
+Centralizes tunable values (model id, decoding parameters, per-session
+turn cap, evaluation attributes) so they can be adjusted without touching
+the inference or prompt logic.
+Implementation pending — scaffolding only.
+"""

src/evaluation.py ADDED Viewed

	@@ -0,0 +1,8 @@

+"""Evaluation parsing and presentation.
+Parses the structured JSON evaluation block emitted by the model, validates
+the six attribute scores (1–7), and formats them for display in the Gradio
+UI.
+Implementation pending — scaffolding only.
+"""

src/inference.py ADDED Viewed

	@@ -0,0 +1,9 @@

+"""Hugging Face Inference API client wrapper.
+Thin wrapper around `huggingface_hub`'s inference client that issues a
+single LLM call per turn and returns the raw model output. Keeps API
+concerns (auth, model selection, retries) isolated from prompt and
+evaluation logic.
+Implementation pending — scaffolding only.
+"""

src/prompt.py ADDED Viewed

	@@ -0,0 +1,9 @@

+"""System prompt construction for Prisma.
+Builds the dual-role system prompt that instructs the model to (1) respond
+to the user in Prisma's voice and (2) emit a structured evaluation of the
+user along the six attributes defined by the CMCL/EMNLP study: competent,
+knowledgeable, well-prepared, helpful, likeable, pedantic.
+Implementation pending — scaffolding only.
+"""

tests/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Pytest test suite for the Prisma chatbot."""

tests/test_placeholder.py ADDED Viewed

	@@ -0,0 +1,4 @@

+"""Placeholder test module.
+Replace with real unit tests as modules in `src/` gain implementation.
+"""