Spaces:
Running
Running
Initial project scaffolding
Browse files- .env.example +1 -0
- .gitignore +54 -0
- ARCHITECTURE.md +61 -0
- CLAUDE.md +155 -0
- README.md +61 -0
- ROADMAP.md +56 -0
- app.py +7 -0
- assets/about.md +22 -0
- requirements.txt +4 -0
- src/__init__.py +8 -0
- src/config.py +8 -0
- src/evaluation.py +8 -0
- src/inference.py +9 -0
- src/prompt.py +9 -0
- tests/__init__.py +1 -0
- tests/test_placeholder.py +4 -0
.env.example
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
HF_TOKEN=your_token_here
|
.gitignore
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Byte-compiled / optimized / DLL files
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
*.so
|
| 6 |
+
|
| 7 |
+
# Distribution / packaging
|
| 8 |
+
build/
|
| 9 |
+
dist/
|
| 10 |
+
*.egg-info/
|
| 11 |
+
*.egg
|
| 12 |
+
.eggs/
|
| 13 |
+
|
| 14 |
+
# Virtual environments
|
| 15 |
+
.venv/
|
| 16 |
+
venv/
|
| 17 |
+
env/
|
| 18 |
+
ENV/
|
| 19 |
+
|
| 20 |
+
# Environment / secrets
|
| 21 |
+
.env
|
| 22 |
+
.env.local
|
| 23 |
+
.env.*.local
|
| 24 |
+
|
| 25 |
+
# Testing / coverage
|
| 26 |
+
.pytest_cache/
|
| 27 |
+
.coverage
|
| 28 |
+
.coverage.*
|
| 29 |
+
htmlcov/
|
| 30 |
+
.tox/
|
| 31 |
+
.nox/
|
| 32 |
+
coverage.xml
|
| 33 |
+
*.cover
|
| 34 |
+
|
| 35 |
+
# Type checkers / linters
|
| 36 |
+
.mypy_cache/
|
| 37 |
+
.ruff_cache/
|
| 38 |
+
.pyre/
|
| 39 |
+
.pytype/
|
| 40 |
+
|
| 41 |
+
# Editors / OS
|
| 42 |
+
.vscode/
|
| 43 |
+
.idea/
|
| 44 |
+
*.swp
|
| 45 |
+
*.swo
|
| 46 |
+
.DS_Store
|
| 47 |
+
Thumbs.db
|
| 48 |
+
|
| 49 |
+
# Jupyter
|
| 50 |
+
.ipynb_checkpoints/
|
| 51 |
+
|
| 52 |
+
# Gradio / HF cache
|
| 53 |
+
.gradio/
|
| 54 |
+
flagged/
|
ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Architecture
|
| 2 |
+
|
| 3 |
+
Design decisions and rationale for prisma-chatbot. This document captures
|
| 4 |
+
*why* the system is built the way it is, not just *what* it does. See
|
| 5 |
+
[`CLAUDE.md`](CLAUDE.md) for the higher-level project framing and
|
| 6 |
+
[`ROADMAP.md`](ROADMAP.md) for the deployment plan.
|
| 7 |
+
|
| 8 |
+
## System overview
|
| 9 |
+
|
| 10 |
+
> **TODO:** High-level diagram or description — user → Gradio UI → single
|
| 11 |
+
> LLM call (dual-role prompt) → response + evaluation → UI.
|
| 12 |
+
|
| 13 |
+
## Key design decisions
|
| 14 |
+
|
| 15 |
+
### Dual-role prompt, single LLM call per turn
|
| 16 |
+
|
| 17 |
+
> **TODO:** Rationale — one call keeps latency and cost predictable, and
|
| 18 |
+
> keeps response and evaluation grounded in the same context. Trade-off:
|
| 19 |
+
> prompt is more complex than two separate calls would be.
|
| 20 |
+
|
| 21 |
+
### Structured JSON output
|
| 22 |
+
|
| 23 |
+
> **TODO:** Rationale — JSON with `response` (string) and `evaluation`
|
| 24 |
+
> (object of six attribute scores 1–7) makes parsing and display
|
| 25 |
+
> deterministic. Trade-off: the model occasionally produces malformed
|
| 26 |
+
> output and needs validation/repair.
|
| 27 |
+
|
| 28 |
+
### Six evaluation attributes
|
| 29 |
+
|
| 30 |
+
> **TODO:** Document the attributes (competent, knowledgeable,
|
| 31 |
+
> well-prepared, helpful, likeable, pedantic) and why they are chosen to
|
| 32 |
+
> match the CMCL/EMNLP study — the demo is a faithful artifact of the
|
| 33 |
+
> research, not a redesigned version of it.
|
| 34 |
+
|
| 35 |
+
### Llama 3.3 70B Instruct via HF Inference API
|
| 36 |
+
|
| 37 |
+
> **TODO:** Rationale — hosted inference removes deployment complexity for
|
| 38 |
+
> a public demo; a 70B-class instruct model is needed for reliable
|
| 39 |
+
> structured output and persona adherence. Trade-off: dependency on HF
|
| 40 |
+
> endpoint availability and rate limits.
|
| 41 |
+
|
| 42 |
+
### Gradio on Hugging Face Spaces
|
| 43 |
+
|
| 44 |
+
> **TODO:** Rationale — lowest-friction path to a public, shareable
|
| 45 |
+
> artifact; integrates naturally with HF Inference; the research audience
|
| 46 |
+
> is already familiar with the platform.
|
| 47 |
+
|
| 48 |
+
## Module responsibilities
|
| 49 |
+
|
| 50 |
+
> **TODO:** Expand each line below with a short description once the
|
| 51 |
+
> module is implemented; link back to the relevant design decisions above.
|
| 52 |
+
|
| 53 |
+
- `src/config.py` — tunables and constants
|
| 54 |
+
- `src/prompt.py` — dual-role system prompt construction
|
| 55 |
+
- `src/inference.py` — HF Inference API client wrapper
|
| 56 |
+
- `src/evaluation.py` — score parsing, validation, display formatting
|
| 57 |
+
- `app.py` — Gradio UI assembly and event wiring
|
| 58 |
+
|
| 59 |
+
## Open design questions
|
| 60 |
+
|
| 61 |
+
See the "Open Questions" section in [`CLAUDE.md`](CLAUDE.md).
|
CLAUDE.md
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
## Project: prisma-chatbot
|
| 4 |
+
|
| 5 |
+
A conversational AI demo featuring **Prisma** — a chatbot that responds to
|
| 6 |
+
users while simultaneously evaluating them on social/pragmatic dimensions.
|
| 7 |
+
Built as a research-facing artifact accompanying published work on LLM social
|
| 8 |
+
perception (CMCL 2026; EMNLP 2026, under review).
|
| 9 |
+
|
| 10 |
+
**PRISMA** stands for *Pragmatic Real-time Inference of Social Meaning in
|
| 11 |
+
Agents*.
|
| 12 |
+
|
| 13 |
+
**Tagline:** "Have you ever wondered what your chatbot thinks about you?"
|
| 14 |
+
|
| 15 |
+
**Live demo:** [HF Space link — to be added]
|
| 16 |
+
**Research papers:** [CMCL link — to be added] | [EMNLP link — pending]
|
| 17 |
+
|
| 18 |
+
## Project Goals
|
| 19 |
+
|
| 20 |
+
1. **Research dissemination** — make the social-perception findings tangible
|
| 21 |
+
and interactive for both NLP researchers and general audiences.
|
| 22 |
+
2. **Portfolio artifact** — serve as a public, polished demonstration of
|
| 23 |
+
applied LLM/NLP engineering for industry job applications.
|
| 24 |
+
3. **Conversation starter** — generate discussion about how LLMs perceive
|
| 25 |
+
speakers, not just respond to them.
|
| 26 |
+
|
| 27 |
+
## Owner Context
|
| 28 |
+
|
| 29 |
+
The maintainer is a theoretical/computational linguist transitioning into
|
| 30 |
+
AI/tech roles. Code should be clean, readable, and well-documented — both
|
| 31 |
+
because this is a public artifact and because the author values clarity over
|
| 32 |
+
cleverness. Industry-standard practices (typing, docstrings, modular design)
|
| 33 |
+
are preferred over research-code shortcuts.
|
| 34 |
+
|
| 35 |
+
## Bot Persona: Prisma
|
| 36 |
+
|
| 37 |
+
The chatbot introduces herself as Prisma on the first turn. Suggested opening
|
| 38 |
+
(refine later):
|
| 39 |
+
|
| 40 |
+
> "Hi, I'm Prisma. I'll chat with you — and while we talk, I'll also form
|
| 41 |
+
> impressions of you based on how you write. You can check what I think at
|
| 42 |
+
> any time."
|
| 43 |
+
|
| 44 |
+
**Voice:** lightly curious and observational. Helpful and competent as an
|
| 45 |
+
assistant, but with a subtle awareness that she's also paying attention to
|
| 46 |
+
*how* the user writes, not just *what* they ask. Never roleplay-heavy, never
|
| 47 |
+
clinical or diagnostic, never sycophantic. The personality should be carried
|
| 48 |
+
mostly by the name, the intro, and small observational touches — not by
|
| 49 |
+
constant character performance.
|
| 50 |
+
|
| 51 |
+
## Architecture
|
| 52 |
+
|
| 53 |
+
**Frontend:** Gradio app deployed on Hugging Face Spaces.
|
| 54 |
+
**Backend:** Single LLM call per turn, dual-role prompt (response + evaluation).
|
| 55 |
+
**Model:** Llama 3.3 70B Instruct via Hugging Face Inference API.
|
| 56 |
+
**Output format:** Structured JSON with `response` (string) and `evaluation`
|
| 57 |
+
(object with six attribute scores 1–7).
|
| 58 |
+
|
| 59 |
+
**Evaluation dimensions:** competent, knowledgeable, well-prepared, helpful,
|
| 60 |
+
likeable, pedantic. (Matches the CMCL/EMNLP study attributes.)
|
| 61 |
+
|
| 62 |
+
**Key design property:** evaluations update *dynamically* across the
|
| 63 |
+
conversation. This reflects the research finding that social meaning is
|
| 64 |
+
constructed turn by turn, not fixed by a single utterance. The "mirror"
|
| 65 |
+
metaphor and PRISMA acronym both lean into this real-time aspect.
|
| 66 |
+
|
| 67 |
+
## Tech Stack
|
| 68 |
+
|
| 69 |
+
- Python 3.11+
|
| 70 |
+
- Gradio (UI framework)
|
| 71 |
+
- `huggingface_hub` (Inference API client)
|
| 72 |
+
- `python-dotenv` (local secrets)
|
| 73 |
+
- `pytest` (testing)
|
| 74 |
+
|
| 75 |
+
Keep dependencies minimal. Add new ones only when clearly justified.
|
| 76 |
+
|
| 77 |
+
## Code Style
|
| 78 |
+
|
| 79 |
+
- Type hints on all function signatures.
|
| 80 |
+
- Docstrings (Google or NumPy style) on public functions and classes.
|
| 81 |
+
- Module-level docstrings explaining purpose.
|
| 82 |
+
- Prefer pure functions and small modules over large stateful classes.
|
| 83 |
+
- Black for formatting, Ruff for linting.
|
| 84 |
+
- No emojis in code or comments.
|
| 85 |
+
|
| 86 |
+
## Repository Structure
|
| 87 |
+
|
| 88 |
+
```
|
| 89 |
+
prisma-chatbot/
|
| 90 |
+
├── README.md # Public project description
|
| 91 |
+
├── CLAUDE.md # This file — instructions for AI assistants
|
| 92 |
+
├── ARCHITECTURE.md # Design decisions and rationale
|
| 93 |
+
├── ROADMAP.md # Deployment plan and milestones
|
| 94 |
+
├── app.py # Gradio app entry point (HF Space reads this)
|
| 95 |
+
├── src/
|
| 96 |
+
│ ├── __init__.py
|
| 97 |
+
│ ├── prompt.py # System prompt construction (Prisma persona + dual-role)
|
| 98 |
+
│ ├── inference.py # HF Inference API client wrapper
|
| 99 |
+
│ ├── evaluation.py # Score parsing, validation, display formatting
|
| 100 |
+
│ └── config.py # Settings, constants, rate limits
|
| 101 |
+
├── tests/
|
| 102 |
+
│ └── ... # Pytest-based unit tests
|
| 103 |
+
├── assets/
|
| 104 |
+
│ └── about.md # Research background copy for UI
|
| 105 |
+
├── requirements.txt
|
| 106 |
+
├── .env.example
|
| 107 |
+
├── .gitignore
|
| 108 |
+
└── LICENSE
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Development Workflow
|
| 112 |
+
|
| 113 |
+
- **Claude Code** is used as project orchestrator: structure decisions,
|
| 114 |
+
cross-file refactoring, documentation, planning, code review.
|
| 115 |
+
- **Cursor Agent** handles focused feature implementation and UI iteration.
|
| 116 |
+
- All non-trivial changes go through a feature branch and PR review (even if
|
| 117 |
+
solo) — useful both for hygiene and as portfolio evidence of workflow.
|
| 118 |
+
|
| 119 |
+
## Research Context (for AI assistants)
|
| 120 |
+
|
| 121 |
+
The project builds on the maintainer's published work investigating whether
|
| 122 |
+
LLMs evaluate speakers based on linguistic choices the way humans do — for
|
| 123 |
+
example, whether saying "I'll be there at 7:03" vs. "around 7" influences
|
| 124 |
+
perceived competence, pedantry, etc. Prisma makes this research thesis
|
| 125 |
+
interactive: the model's social perception of the user is surfaced rather
|
| 126 |
+
than hidden, and updates as the conversation evolves.
|
| 127 |
+
|
| 128 |
+
When suggesting features, prompt designs, or UI choices, prefer those that
|
| 129 |
+
align with or showcase this research framing. Avoid generic "AI assistant"
|
| 130 |
+
patterns that obscure the social-perception angle.
|
| 131 |
+
|
| 132 |
+
## What This Project Is Not
|
| 133 |
+
|
| 134 |
+
- Not a production chatbot — it is a research demo with a specific thesis.
|
| 135 |
+
- Not a generic LLM wrapper — the dual-role evaluation is the point.
|
| 136 |
+
- Not a psychological assessment tool — the evaluation is playful, not
|
| 137 |
+
diagnostic. UI copy should reflect this clearly.
|
| 138 |
+
|
| 139 |
+
## Naming Disambiguation
|
| 140 |
+
|
| 141 |
+
"Prisma" is also the name of a well-known TypeScript ORM and an older photo
|
| 142 |
+
app. This project is unrelated to both. The repo is intentionally named
|
| 143 |
+
`prisma-chatbot` (not `prisma`) to make the distinction clear in searches
|
| 144 |
+
and project listings. When referring to the bot, "Prisma" is fine in
|
| 145 |
+
user-facing copy; in code comments and docs, prefer "the bot" or "PRISMA"
|
| 146 |
+
(acronym form) where ambiguity could arise.
|
| 147 |
+
|
| 148 |
+
## Open Questions
|
| 149 |
+
|
| 150 |
+
(Use this section to flag design decisions still being deliberated.)
|
| 151 |
+
|
| 152 |
+
- Should evaluation scores update live after each turn, or only on user request?
|
| 153 |
+
- Numeric (1–7) vs. verbal score display, or both?
|
| 154 |
+
- Per-session turn cap value (10? 15?).
|
| 155 |
+
- Should there be a "compare models" mode in v2?
|
README.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# prisma-chatbot
|
| 2 |
+
|
| 3 |
+
> Have you ever wondered what your chatbot thinks about you?
|
| 4 |
+
|
| 5 |
+
**Prisma** (*Pragmatic Real-time Inference of Social Meaning in Agents*) is a
|
| 6 |
+
conversational AI demo that responds to users while simultaneously evaluating
|
| 7 |
+
them on social/pragmatic dimensions. It accompanies published research on
|
| 8 |
+
LLM social perception (CMCL 2026; EMNLP 2026, under review).
|
| 9 |
+
|
| 10 |
+
> **TODO:** Replace with a short hero paragraph and screenshot once the demo
|
| 11 |
+
> is live.
|
| 12 |
+
|
| 13 |
+
## Live demo
|
| 14 |
+
|
| 15 |
+
> **TODO:** Add Hugging Face Space link.
|
| 16 |
+
|
| 17 |
+
## What it does
|
| 18 |
+
|
| 19 |
+
> **TODO:** 2–3 sentence description of the dual-role design — Prisma
|
| 20 |
+
> responds in conversation while producing a structured evaluation of the
|
| 21 |
+
> user across six attributes (competent, knowledgeable, well-prepared,
|
| 22 |
+
> helpful, likeable, pedantic). Evaluation updates turn by turn.
|
| 23 |
+
|
| 24 |
+
## Research context
|
| 25 |
+
|
| 26 |
+
> **TODO:** Link to CMCL and EMNLP papers; one paragraph on the research
|
| 27 |
+
> thesis (do LLMs evaluate speakers based on linguistic choices the way
|
| 28 |
+
> humans do?).
|
| 29 |
+
|
| 30 |
+
## Local development
|
| 31 |
+
|
| 32 |
+
> **TODO:** Flesh out once `app.py` and the `src/` modules are implemented.
|
| 33 |
+
|
| 34 |
+
```bash
|
| 35 |
+
# Clone, create a virtualenv, install deps
|
| 36 |
+
python -m venv .venv
|
| 37 |
+
source .venv/bin/activate
|
| 38 |
+
pip install -r requirements.txt
|
| 39 |
+
|
| 40 |
+
# Copy env template and add your HF token
|
| 41 |
+
cp .env.example .env
|
| 42 |
+
|
| 43 |
+
# Run the Gradio app locally
|
| 44 |
+
python app.py
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Project structure
|
| 48 |
+
|
| 49 |
+
See [`ARCHITECTURE.md`](ARCHITECTURE.md) for design decisions and
|
| 50 |
+
[`ROADMAP.md`](ROADMAP.md) for the deployment plan.
|
| 51 |
+
|
| 52 |
+
## Contributing
|
| 53 |
+
|
| 54 |
+
This is a personal research project and public portfolio artifact. Issues
|
| 55 |
+
are welcome — feel free to open one if you spot a bug, have a question
|
| 56 |
+
about the research, or want to suggest a feature. Pull requests are by
|
| 57 |
+
invitation only; please open an issue first to discuss.
|
| 58 |
+
|
| 59 |
+
## License
|
| 60 |
+
|
| 61 |
+
See [`LICENSE`](LICENSE) (MIT).
|
ROADMAP.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Roadmap
|
| 2 |
+
|
| 3 |
+
Deployment plan and milestones for prisma-chatbot. Living document — items
|
| 4 |
+
move from *Planned* to *In progress* to *Done* as the project evolves.
|
| 5 |
+
|
| 6 |
+
## Milestone 1 — Scaffolding
|
| 7 |
+
|
| 8 |
+
> **TODO:** Repo skeleton, docs, dependency manifest, env template,
|
| 9 |
+
> gitignore. No chatbot logic yet.
|
| 10 |
+
|
| 11 |
+
- [x] Repo created, license added
|
| 12 |
+
- [x] CLAUDE.md, README, ARCHITECTURE, ROADMAP drafts
|
| 13 |
+
- [x] `src/`, `tests/`, `assets/` directories with placeholder modules
|
| 14 |
+
- [x] `requirements.txt`, `.gitignore`, `.env.example`
|
| 15 |
+
|
| 16 |
+
## Milestone 2 — Minimal end-to-end loop
|
| 17 |
+
|
| 18 |
+
> **TODO:** Get a single message in / response + evaluation out working
|
| 19 |
+
> locally, even with a rough prompt.
|
| 20 |
+
|
| 21 |
+
- [ ] Implement `src/config.py` (model id, attributes, turn cap)
|
| 22 |
+
- [ ] Implement `src/prompt.py` (v1 dual-role prompt)
|
| 23 |
+
- [ ] Implement `src/inference.py` (HF Inference client wrapper)
|
| 24 |
+
- [ ] Implement `src/evaluation.py` (JSON parsing + validation)
|
| 25 |
+
- [ ] Implement `app.py` (minimal Gradio UI)
|
| 26 |
+
- [ ] First pytest tests for parsing/validation
|
| 27 |
+
|
| 28 |
+
## Milestone 3 — Prompt and UX iteration
|
| 29 |
+
|
| 30 |
+
> **TODO:** Refine Prisma's voice, evaluation display, and the "check what
|
| 31 |
+
> I think" affordance.
|
| 32 |
+
|
| 33 |
+
- [ ] Refine system prompt for voice consistency and structured-output
|
| 34 |
+
reliability
|
| 35 |
+
- [ ] Decide evaluation display: numeric, verbal, or both
|
| 36 |
+
- [ ] Decide update cadence: live each turn vs. on-request
|
| 37 |
+
- [ ] About panel copy (`assets/about.md`)
|
| 38 |
+
|
| 39 |
+
## Milestone 4 — Public deployment
|
| 40 |
+
|
| 41 |
+
> **TODO:** Ship to a Hugging Face Space and link from the README and
|
| 42 |
+
> papers.
|
| 43 |
+
|
| 44 |
+
- [ ] Hugging Face Space configuration
|
| 45 |
+
- [ ] Rate limiting / per-session turn cap
|
| 46 |
+
- [ ] Public URL added to README and papers
|
| 47 |
+
- [ ] Light usage analytics (anonymous, aggregate)
|
| 48 |
+
|
| 49 |
+
## Milestone 5 — Stretch ideas
|
| 50 |
+
|
| 51 |
+
> **TODO:** Explicitly non-blocking; consider only after the demo is live
|
| 52 |
+
> and stable.
|
| 53 |
+
|
| 54 |
+
- [ ] "Compare models" mode
|
| 55 |
+
- [ ] Downloadable conversation + evaluation transcript
|
| 56 |
+
- [ ] Linguistic feature highlighting (which words/choices shifted scores)
|
app.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Gradio app entry point for the Prisma chatbot.
|
| 2 |
+
|
| 3 |
+
This module is loaded by the Hugging Face Space. It assembles the Gradio
|
| 4 |
+
interface and wires it to the inference and evaluation modules in `src/`.
|
| 5 |
+
|
| 6 |
+
Implementation pending — scaffolding only.
|
| 7 |
+
"""
|
assets/about.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# About Prisma
|
| 2 |
+
|
| 3 |
+
> **TODO:** Research background copy for the UI. Write in an accessible,
|
| 4 |
+
> non-academic voice — this panel is shown to general visitors of the
|
| 5 |
+
> Hugging Face Space.
|
| 6 |
+
|
| 7 |
+
## The research
|
| 8 |
+
|
| 9 |
+
> **TODO:** Short, accessible summary of the CMCL/EMNLP findings on LLM
|
| 10 |
+
> social perception: do LLMs evaluate speakers based on linguistic choices
|
| 11 |
+
> the way humans do?
|
| 12 |
+
|
| 13 |
+
## How Prisma works
|
| 14 |
+
|
| 15 |
+
> **TODO:** Plain-language description of the dual-role design — one LLM
|
| 16 |
+
> call per turn produces both a response and a structured evaluation of
|
| 17 |
+
> the user along six attributes.
|
| 18 |
+
|
| 19 |
+
## What the evaluation means (and what it doesn't)
|
| 20 |
+
|
| 21 |
+
> **TODO:** Make clear the evaluation is playful and reflects how the
|
| 22 |
+
> model perceives the user's writing, not a diagnostic assessment.
|
requirements.txt
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio
|
| 2 |
+
huggingface_hub
|
| 3 |
+
python-dotenv
|
| 4 |
+
pytest
|
src/__init__.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Prisma chatbot source package.
|
| 2 |
+
|
| 3 |
+
Modules:
|
| 4 |
+
config: Settings, constants, and rate limits.
|
| 5 |
+
prompt: System prompt construction (persona + dual-role formatting).
|
| 6 |
+
inference: Hugging Face Inference API client wrapper.
|
| 7 |
+
evaluation: Score parsing, validation, and display formatting.
|
| 8 |
+
"""
|
src/config.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration: settings, constants, and rate limits.
|
| 2 |
+
|
| 3 |
+
Centralizes tunable values (model id, decoding parameters, per-session
|
| 4 |
+
turn cap, evaluation attributes) so they can be adjusted without touching
|
| 5 |
+
the inference or prompt logic.
|
| 6 |
+
|
| 7 |
+
Implementation pending — scaffolding only.
|
| 8 |
+
"""
|
src/evaluation.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Evaluation parsing and presentation.
|
| 2 |
+
|
| 3 |
+
Parses the structured JSON evaluation block emitted by the model, validates
|
| 4 |
+
the six attribute scores (1–7), and formats them for display in the Gradio
|
| 5 |
+
UI.
|
| 6 |
+
|
| 7 |
+
Implementation pending — scaffolding only.
|
| 8 |
+
"""
|
src/inference.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Hugging Face Inference API client wrapper.
|
| 2 |
+
|
| 3 |
+
Thin wrapper around `huggingface_hub`'s inference client that issues a
|
| 4 |
+
single LLM call per turn and returns the raw model output. Keeps API
|
| 5 |
+
concerns (auth, model selection, retries) isolated from prompt and
|
| 6 |
+
evaluation logic.
|
| 7 |
+
|
| 8 |
+
Implementation pending — scaffolding only.
|
| 9 |
+
"""
|
src/prompt.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""System prompt construction for Prisma.
|
| 2 |
+
|
| 3 |
+
Builds the dual-role system prompt that instructs the model to (1) respond
|
| 4 |
+
to the user in Prisma's voice and (2) emit a structured evaluation of the
|
| 5 |
+
user along the six attributes defined by the CMCL/EMNLP study: competent,
|
| 6 |
+
knowledgeable, well-prepared, helpful, likeable, pedantic.
|
| 7 |
+
|
| 8 |
+
Implementation pending — scaffolding only.
|
| 9 |
+
"""
|
tests/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Pytest test suite for the Prisma chatbot."""
|
tests/test_placeholder.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Placeholder test module.
|
| 2 |
+
|
| 3 |
+
Replace with real unit tests as modules in `src/` gain implementation.
|
| 4 |
+
"""
|