RolandM commited on
Commit
1e436e0
·
1 Parent(s): 0f1fb08

Initial project scaffolding

Browse files
.env.example ADDED
@@ -0,0 +1 @@
 
 
1
+ HF_TOKEN=your_token_here
.gitignore ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+
7
+ # Distribution / packaging
8
+ build/
9
+ dist/
10
+ *.egg-info/
11
+ *.egg
12
+ .eggs/
13
+
14
+ # Virtual environments
15
+ .venv/
16
+ venv/
17
+ env/
18
+ ENV/
19
+
20
+ # Environment / secrets
21
+ .env
22
+ .env.local
23
+ .env.*.local
24
+
25
+ # Testing / coverage
26
+ .pytest_cache/
27
+ .coverage
28
+ .coverage.*
29
+ htmlcov/
30
+ .tox/
31
+ .nox/
32
+ coverage.xml
33
+ *.cover
34
+
35
+ # Type checkers / linters
36
+ .mypy_cache/
37
+ .ruff_cache/
38
+ .pyre/
39
+ .pytype/
40
+
41
+ # Editors / OS
42
+ .vscode/
43
+ .idea/
44
+ *.swp
45
+ *.swo
46
+ .DS_Store
47
+ Thumbs.db
48
+
49
+ # Jupyter
50
+ .ipynb_checkpoints/
51
+
52
+ # Gradio / HF cache
53
+ .gradio/
54
+ flagged/
ARCHITECTURE.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Architecture
2
+
3
+ Design decisions and rationale for prisma-chatbot. This document captures
4
+ *why* the system is built the way it is, not just *what* it does. See
5
+ [`CLAUDE.md`](CLAUDE.md) for the higher-level project framing and
6
+ [`ROADMAP.md`](ROADMAP.md) for the deployment plan.
7
+
8
+ ## System overview
9
+
10
+ > **TODO:** High-level diagram or description — user → Gradio UI → single
11
+ > LLM call (dual-role prompt) → response + evaluation → UI.
12
+
13
+ ## Key design decisions
14
+
15
+ ### Dual-role prompt, single LLM call per turn
16
+
17
+ > **TODO:** Rationale — one call keeps latency and cost predictable, and
18
+ > keeps response and evaluation grounded in the same context. Trade-off:
19
+ > prompt is more complex than two separate calls would be.
20
+
21
+ ### Structured JSON output
22
+
23
+ > **TODO:** Rationale — JSON with `response` (string) and `evaluation`
24
+ > (object of six attribute scores 1–7) makes parsing and display
25
+ > deterministic. Trade-off: the model occasionally produces malformed
26
+ > output and needs validation/repair.
27
+
28
+ ### Six evaluation attributes
29
+
30
+ > **TODO:** Document the attributes (competent, knowledgeable,
31
+ > well-prepared, helpful, likeable, pedantic) and why they are chosen to
32
+ > match the CMCL/EMNLP study — the demo is a faithful artifact of the
33
+ > research, not a redesigned version of it.
34
+
35
+ ### Llama 3.3 70B Instruct via HF Inference API
36
+
37
+ > **TODO:** Rationale — hosted inference removes deployment complexity for
38
+ > a public demo; a 70B-class instruct model is needed for reliable
39
+ > structured output and persona adherence. Trade-off: dependency on HF
40
+ > endpoint availability and rate limits.
41
+
42
+ ### Gradio on Hugging Face Spaces
43
+
44
+ > **TODO:** Rationale — lowest-friction path to a public, shareable
45
+ > artifact; integrates naturally with HF Inference; the research audience
46
+ > is already familiar with the platform.
47
+
48
+ ## Module responsibilities
49
+
50
+ > **TODO:** Expand each line below with a short description once the
51
+ > module is implemented; link back to the relevant design decisions above.
52
+
53
+ - `src/config.py` — tunables and constants
54
+ - `src/prompt.py` — dual-role system prompt construction
55
+ - `src/inference.py` — HF Inference API client wrapper
56
+ - `src/evaluation.py` — score parsing, validation, display formatting
57
+ - `app.py` — Gradio UI assembly and event wiring
58
+
59
+ ## Open design questions
60
+
61
+ See the "Open Questions" section in [`CLAUDE.md`](CLAUDE.md).
CLAUDE.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ ## Project: prisma-chatbot
4
+
5
+ A conversational AI demo featuring **Prisma** — a chatbot that responds to
6
+ users while simultaneously evaluating them on social/pragmatic dimensions.
7
+ Built as a research-facing artifact accompanying published work on LLM social
8
+ perception (CMCL 2026; EMNLP 2026, under review).
9
+
10
+ **PRISMA** stands for *Pragmatic Real-time Inference of Social Meaning in
11
+ Agents*.
12
+
13
+ **Tagline:** "Have you ever wondered what your chatbot thinks about you?"
14
+
15
+ **Live demo:** [HF Space link — to be added]
16
+ **Research papers:** [CMCL link — to be added] | [EMNLP link — pending]
17
+
18
+ ## Project Goals
19
+
20
+ 1. **Research dissemination** — make the social-perception findings tangible
21
+ and interactive for both NLP researchers and general audiences.
22
+ 2. **Portfolio artifact** — serve as a public, polished demonstration of
23
+ applied LLM/NLP engineering for industry job applications.
24
+ 3. **Conversation starter** — generate discussion about how LLMs perceive
25
+ speakers, not just respond to them.
26
+
27
+ ## Owner Context
28
+
29
+ The maintainer is a theoretical/computational linguist transitioning into
30
+ AI/tech roles. Code should be clean, readable, and well-documented — both
31
+ because this is a public artifact and because the author values clarity over
32
+ cleverness. Industry-standard practices (typing, docstrings, modular design)
33
+ are preferred over research-code shortcuts.
34
+
35
+ ## Bot Persona: Prisma
36
+
37
+ The chatbot introduces herself as Prisma on the first turn. Suggested opening
38
+ (refine later):
39
+
40
+ > "Hi, I'm Prisma. I'll chat with you — and while we talk, I'll also form
41
+ > impressions of you based on how you write. You can check what I think at
42
+ > any time."
43
+
44
+ **Voice:** lightly curious and observational. Helpful and competent as an
45
+ assistant, but with a subtle awareness that she's also paying attention to
46
+ *how* the user writes, not just *what* they ask. Never roleplay-heavy, never
47
+ clinical or diagnostic, never sycophantic. The personality should be carried
48
+ mostly by the name, the intro, and small observational touches — not by
49
+ constant character performance.
50
+
51
+ ## Architecture
52
+
53
+ **Frontend:** Gradio app deployed on Hugging Face Spaces.
54
+ **Backend:** Single LLM call per turn, dual-role prompt (response + evaluation).
55
+ **Model:** Llama 3.3 70B Instruct via Hugging Face Inference API.
56
+ **Output format:** Structured JSON with `response` (string) and `evaluation`
57
+ (object with six attribute scores 1–7).
58
+
59
+ **Evaluation dimensions:** competent, knowledgeable, well-prepared, helpful,
60
+ likeable, pedantic. (Matches the CMCL/EMNLP study attributes.)
61
+
62
+ **Key design property:** evaluations update *dynamically* across the
63
+ conversation. This reflects the research finding that social meaning is
64
+ constructed turn by turn, not fixed by a single utterance. The "mirror"
65
+ metaphor and PRISMA acronym both lean into this real-time aspect.
66
+
67
+ ## Tech Stack
68
+
69
+ - Python 3.11+
70
+ - Gradio (UI framework)
71
+ - `huggingface_hub` (Inference API client)
72
+ - `python-dotenv` (local secrets)
73
+ - `pytest` (testing)
74
+
75
+ Keep dependencies minimal. Add new ones only when clearly justified.
76
+
77
+ ## Code Style
78
+
79
+ - Type hints on all function signatures.
80
+ - Docstrings (Google or NumPy style) on public functions and classes.
81
+ - Module-level docstrings explaining purpose.
82
+ - Prefer pure functions and small modules over large stateful classes.
83
+ - Black for formatting, Ruff for linting.
84
+ - No emojis in code or comments.
85
+
86
+ ## Repository Structure
87
+
88
+ ```
89
+ prisma-chatbot/
90
+ ├── README.md # Public project description
91
+ ├── CLAUDE.md # This file — instructions for AI assistants
92
+ ├── ARCHITECTURE.md # Design decisions and rationale
93
+ ├── ROADMAP.md # Deployment plan and milestones
94
+ ├── app.py # Gradio app entry point (HF Space reads this)
95
+ ├── src/
96
+ │ ├── __init__.py
97
+ │ ├── prompt.py # System prompt construction (Prisma persona + dual-role)
98
+ │ ├── inference.py # HF Inference API client wrapper
99
+ │ ├── evaluation.py # Score parsing, validation, display formatting
100
+ │ └── config.py # Settings, constants, rate limits
101
+ ├── tests/
102
+ │ └── ... # Pytest-based unit tests
103
+ ├── assets/
104
+ │ └── about.md # Research background copy for UI
105
+ ├── requirements.txt
106
+ ├── .env.example
107
+ ├── .gitignore
108
+ └── LICENSE
109
+ ```
110
+
111
+ ## Development Workflow
112
+
113
+ - **Claude Code** is used as project orchestrator: structure decisions,
114
+ cross-file refactoring, documentation, planning, code review.
115
+ - **Cursor Agent** handles focused feature implementation and UI iteration.
116
+ - All non-trivial changes go through a feature branch and PR review (even if
117
+ solo) — useful both for hygiene and as portfolio evidence of workflow.
118
+
119
+ ## Research Context (for AI assistants)
120
+
121
+ The project builds on the maintainer's published work investigating whether
122
+ LLMs evaluate speakers based on linguistic choices the way humans do — for
123
+ example, whether saying "I'll be there at 7:03" vs. "around 7" influences
124
+ perceived competence, pedantry, etc. Prisma makes this research thesis
125
+ interactive: the model's social perception of the user is surfaced rather
126
+ than hidden, and updates as the conversation evolves.
127
+
128
+ When suggesting features, prompt designs, or UI choices, prefer those that
129
+ align with or showcase this research framing. Avoid generic "AI assistant"
130
+ patterns that obscure the social-perception angle.
131
+
132
+ ## What This Project Is Not
133
+
134
+ - Not a production chatbot — it is a research demo with a specific thesis.
135
+ - Not a generic LLM wrapper — the dual-role evaluation is the point.
136
+ - Not a psychological assessment tool — the evaluation is playful, not
137
+ diagnostic. UI copy should reflect this clearly.
138
+
139
+ ## Naming Disambiguation
140
+
141
+ "Prisma" is also the name of a well-known TypeScript ORM and an older photo
142
+ app. This project is unrelated to both. The repo is intentionally named
143
+ `prisma-chatbot` (not `prisma`) to make the distinction clear in searches
144
+ and project listings. When referring to the bot, "Prisma" is fine in
145
+ user-facing copy; in code comments and docs, prefer "the bot" or "PRISMA"
146
+ (acronym form) where ambiguity could arise.
147
+
148
+ ## Open Questions
149
+
150
+ (Use this section to flag design decisions still being deliberated.)
151
+
152
+ - Should evaluation scores update live after each turn, or only on user request?
153
+ - Numeric (1–7) vs. verbal score display, or both?
154
+ - Per-session turn cap value (10? 15?).
155
+ - Should there be a "compare models" mode in v2?
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # prisma-chatbot
2
+
3
+ > Have you ever wondered what your chatbot thinks about you?
4
+
5
+ **Prisma** (*Pragmatic Real-time Inference of Social Meaning in Agents*) is a
6
+ conversational AI demo that responds to users while simultaneously evaluating
7
+ them on social/pragmatic dimensions. It accompanies published research on
8
+ LLM social perception (CMCL 2026; EMNLP 2026, under review).
9
+
10
+ > **TODO:** Replace with a short hero paragraph and screenshot once the demo
11
+ > is live.
12
+
13
+ ## Live demo
14
+
15
+ > **TODO:** Add Hugging Face Space link.
16
+
17
+ ## What it does
18
+
19
+ > **TODO:** 2–3 sentence description of the dual-role design — Prisma
20
+ > responds in conversation while producing a structured evaluation of the
21
+ > user across six attributes (competent, knowledgeable, well-prepared,
22
+ > helpful, likeable, pedantic). Evaluation updates turn by turn.
23
+
24
+ ## Research context
25
+
26
+ > **TODO:** Link to CMCL and EMNLP papers; one paragraph on the research
27
+ > thesis (do LLMs evaluate speakers based on linguistic choices the way
28
+ > humans do?).
29
+
30
+ ## Local development
31
+
32
+ > **TODO:** Flesh out once `app.py` and the `src/` modules are implemented.
33
+
34
+ ```bash
35
+ # Clone, create a virtualenv, install deps
36
+ python -m venv .venv
37
+ source .venv/bin/activate
38
+ pip install -r requirements.txt
39
+
40
+ # Copy env template and add your HF token
41
+ cp .env.example .env
42
+
43
+ # Run the Gradio app locally
44
+ python app.py
45
+ ```
46
+
47
+ ## Project structure
48
+
49
+ See [`ARCHITECTURE.md`](ARCHITECTURE.md) for design decisions and
50
+ [`ROADMAP.md`](ROADMAP.md) for the deployment plan.
51
+
52
+ ## Contributing
53
+
54
+ This is a personal research project and public portfolio artifact. Issues
55
+ are welcome — feel free to open one if you spot a bug, have a question
56
+ about the research, or want to suggest a feature. Pull requests are by
57
+ invitation only; please open an issue first to discuss.
58
+
59
+ ## License
60
+
61
+ See [`LICENSE`](LICENSE) (MIT).
ROADMAP.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Roadmap
2
+
3
+ Deployment plan and milestones for prisma-chatbot. Living document — items
4
+ move from *Planned* to *In progress* to *Done* as the project evolves.
5
+
6
+ ## Milestone 1 — Scaffolding
7
+
8
+ > **TODO:** Repo skeleton, docs, dependency manifest, env template,
9
+ > gitignore. No chatbot logic yet.
10
+
11
+ - [x] Repo created, license added
12
+ - [x] CLAUDE.md, README, ARCHITECTURE, ROADMAP drafts
13
+ - [x] `src/`, `tests/`, `assets/` directories with placeholder modules
14
+ - [x] `requirements.txt`, `.gitignore`, `.env.example`
15
+
16
+ ## Milestone 2 — Minimal end-to-end loop
17
+
18
+ > **TODO:** Get a single message in / response + evaluation out working
19
+ > locally, even with a rough prompt.
20
+
21
+ - [ ] Implement `src/config.py` (model id, attributes, turn cap)
22
+ - [ ] Implement `src/prompt.py` (v1 dual-role prompt)
23
+ - [ ] Implement `src/inference.py` (HF Inference client wrapper)
24
+ - [ ] Implement `src/evaluation.py` (JSON parsing + validation)
25
+ - [ ] Implement `app.py` (minimal Gradio UI)
26
+ - [ ] First pytest tests for parsing/validation
27
+
28
+ ## Milestone 3 — Prompt and UX iteration
29
+
30
+ > **TODO:** Refine Prisma's voice, evaluation display, and the "check what
31
+ > I think" affordance.
32
+
33
+ - [ ] Refine system prompt for voice consistency and structured-output
34
+ reliability
35
+ - [ ] Decide evaluation display: numeric, verbal, or both
36
+ - [ ] Decide update cadence: live each turn vs. on-request
37
+ - [ ] About panel copy (`assets/about.md`)
38
+
39
+ ## Milestone 4 — Public deployment
40
+
41
+ > **TODO:** Ship to a Hugging Face Space and link from the README and
42
+ > papers.
43
+
44
+ - [ ] Hugging Face Space configuration
45
+ - [ ] Rate limiting / per-session turn cap
46
+ - [ ] Public URL added to README and papers
47
+ - [ ] Light usage analytics (anonymous, aggregate)
48
+
49
+ ## Milestone 5 — Stretch ideas
50
+
51
+ > **TODO:** Explicitly non-blocking; consider only after the demo is live
52
+ > and stable.
53
+
54
+ - [ ] "Compare models" mode
55
+ - [ ] Downloadable conversation + evaluation transcript
56
+ - [ ] Linguistic feature highlighting (which words/choices shifted scores)
app.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ """Gradio app entry point for the Prisma chatbot.
2
+
3
+ This module is loaded by the Hugging Face Space. It assembles the Gradio
4
+ interface and wires it to the inference and evaluation modules in `src/`.
5
+
6
+ Implementation pending — scaffolding only.
7
+ """
assets/about.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # About Prisma
2
+
3
+ > **TODO:** Research background copy for the UI. Write in an accessible,
4
+ > non-academic voice — this panel is shown to general visitors of the
5
+ > Hugging Face Space.
6
+
7
+ ## The research
8
+
9
+ > **TODO:** Short, accessible summary of the CMCL/EMNLP findings on LLM
10
+ > social perception: do LLMs evaluate speakers based on linguistic choices
11
+ > the way humans do?
12
+
13
+ ## How Prisma works
14
+
15
+ > **TODO:** Plain-language description of the dual-role design — one LLM
16
+ > call per turn produces both a response and a structured evaluation of
17
+ > the user along six attributes.
18
+
19
+ ## What the evaluation means (and what it doesn't)
20
+
21
+ > **TODO:** Make clear the evaluation is playful and reflects how the
22
+ > model perceives the user's writing, not a diagnostic assessment.
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio
2
+ huggingface_hub
3
+ python-dotenv
4
+ pytest
src/__init__.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ """Prisma chatbot source package.
2
+
3
+ Modules:
4
+ config: Settings, constants, and rate limits.
5
+ prompt: System prompt construction (persona + dual-role formatting).
6
+ inference: Hugging Face Inference API client wrapper.
7
+ evaluation: Score parsing, validation, and display formatting.
8
+ """
src/config.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ """Configuration: settings, constants, and rate limits.
2
+
3
+ Centralizes tunable values (model id, decoding parameters, per-session
4
+ turn cap, evaluation attributes) so they can be adjusted without touching
5
+ the inference or prompt logic.
6
+
7
+ Implementation pending — scaffolding only.
8
+ """
src/evaluation.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ """Evaluation parsing and presentation.
2
+
3
+ Parses the structured JSON evaluation block emitted by the model, validates
4
+ the six attribute scores (1–7), and formats them for display in the Gradio
5
+ UI.
6
+
7
+ Implementation pending — scaffolding only.
8
+ """
src/inference.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ """Hugging Face Inference API client wrapper.
2
+
3
+ Thin wrapper around `huggingface_hub`'s inference client that issues a
4
+ single LLM call per turn and returns the raw model output. Keeps API
5
+ concerns (auth, model selection, retries) isolated from prompt and
6
+ evaluation logic.
7
+
8
+ Implementation pending — scaffolding only.
9
+ """
src/prompt.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ """System prompt construction for Prisma.
2
+
3
+ Builds the dual-role system prompt that instructs the model to (1) respond
4
+ to the user in Prisma's voice and (2) emit a structured evaluation of the
5
+ user along the six attributes defined by the CMCL/EMNLP study: competent,
6
+ knowledgeable, well-prepared, helpful, likeable, pedantic.
7
+
8
+ Implementation pending — scaffolding only.
9
+ """
tests/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Pytest test suite for the Prisma chatbot."""
tests/test_placeholder.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ """Placeholder test module.
2
+
3
+ Replace with real unit tests as modules in `src/` gain implementation.
4
+ """