File size: 5,801 Bytes
69da39f
 
 
 
 
 
 
 
a89a7f1
 
7f3a4a6
eaeaa68
a89a7f1
 
 
 
d8ae160
a89a7f1
 
d8ae160
 
 
a89a7f1
 
 
 
 
 
 
 
d8ae160
a9950fb
e8b71ab
 
 
 
 
 
330d092
e8b71ab
 
a89a7f1
d8ae160
a89a7f1
 
a9950fb
a89a7f1
 
330d092
d8ae160
e8b71ab
 
a89a7f1
 
 
e8b71ab
 
a89a7f1
 
 
 
 
 
 
 
eb41f91
 
ae347c6
e8b71ab
76d718f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a89a7f1
 
 
 
 
 
eaeaa68
 
 
 
e8b71ab
eaeaa68
 
 
ae347c6
4c8079c
eaeaa68
 
a89a7f1
 
 
 
 
ae347c6
a89a7f1
e8b71ab
d8ae160
 
e8b71ab
9edffb7
 
b279884
 
a89a7f1
 
e8b71ab
 
 
a89a7f1
330d092
a89a7f1
d8ae160
 
 
 
a89a7f1
 
 
d8ae160
77c2d62
 
a9950fb
a89a7f1
 
 
e8b71ab
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
title: persona-ui
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8501
pinned: false
---
# Persona UI

[![Deploy to Hugging Face Spaces](https://huggingface.co/spaces/implicit-personalization/persona-ui/badge.svg)](https://huggingface.co/spaces/implicit-personalization/persona-ui)

Streamlit interface for persona vector extraction, analysis, and chat.

## Overview

A web app built on top of [persona-vectors](../persona-vectors) that provides these tabs:

- **Chat** β€” interactive conversations with a model using persona-based system prompts (templated or biography)
- **Analysis** β€” load local or Hub persona vectors and explore cosine similarity, PCA, UMAP, attribute-colored projections, and dendrograms
- **Probing** β€” sweep and inspect linear probes trained over saved persona vectors
- **Extract** β€” run persona-vector extraction from HuggingFace persona datasets or a local JSONL dataset directly from the browser

## Repository Layout

```
persona-ui/
β”œβ”€β”€ app.py                   # Main entry point (Streamlit)
β”œβ”€β”€ state.py                 # Session state management (chat history, KV cache)
β”œβ”€β”€ tabs/
β”‚   β”œβ”€β”€ chat.py / chat_ui.py / chat_shared.py  # Chat tab
β”‚   β”œβ”€β”€ compare_chat.py      # Side-by-side chat comparison mode
β”‚   β”œβ”€β”€ analysis_core.py     # Analysis tab entry point
β”‚   β”œβ”€β”€ analysis/            # Analysis tab internals
β”‚   β”‚   β”œβ”€β”€ _shared.py / _state.py            # Shared loading + session state
β”‚   β”‚   β”œβ”€β”€ cosine.py        # Cosine similarity view
β”‚   β”‚   β”œβ”€β”€ dendrogram.py    # Persona dendrograms
β”‚   β”‚   └── layered.py       # PCA/UMAP/Isomap projections
β”‚   β”œβ”€β”€ extract.py           # Extraction tab
β”‚   β”œβ”€β”€ probe.py / probe_ui.py  # Probe diagnostics + upload/tracing controls
β”‚   └── probe_sweep.py       # Probe sweep tab
└── utils/
    β”œβ”€β”€ analysis_sources.py  # Local + Hub persona-vector store wiring
    β”œβ”€β”€ chat.py              # Chat generation logic
    β”œβ”€β”€ chat_export.py       # Export chat logs to JSON
    β”œβ”€β”€ contrast.py          # Contrastive token log-prob coloring
    β”œβ”€β”€ datasets.py          # Dataset loader wrapper
    β”œβ”€β”€ helpers.py           # UI labels and slug helpers
    β”œβ”€β”€ probe_trace.py       # Chat-token activation tracing
    β”œβ”€β”€ probe_overlay.py     # Per-token probe-score overlay
    β”œβ”€β”€ probes.py / probe_files.py  # Probe loading, scoring, artifact paths
    β”œβ”€β”€ preload.py           # Background startup warmup
    └── runtime.py           # Model caching and NDIF queries
```

Dataset loading and environment helpers are provided by the sibling [persona-data](https://github.com/implicit-personalization/persona-data) package. 
Core extraction, analysis, and steering logic comes from [persona-vectors](https://github.com/implicit-personalization/persona-vectors).

## Installation

```bash
uv sync
cp .env.example .env
```

## Local Development

The checked-in dependency config uses published packages. For local package
work, uncomment the `tool.uv.sources` block in `pyproject.toml` and keep sibling checkouts next to this repo.

Example:

```bash
git clone <persona-data-url> ../persona-data
git clone <persona-vectors-url> ../persona-vectors
```

Expected layout:

```text
parent/
β”œβ”€β”€ persona-ui
β”œβ”€β”€ persona-data
└── persona-vectors
```

## Quickstart

```bash
streamlit run app.py
```

## Hugging Face Spaces Deployment

This app can be deployed to Hugging Face Spaces using Docker.

### Build Locally

```bash
docker build -t persona-ui .
# Pass your local .env if you want the container to use the same configuration
docker run --env-file .env --rm -p 8501:8501 persona-ui
```

## Configuration

Copy `.env.example` to `.env` and fill in:

```bash
NDIF_API_KEY=...       # Optional shared NDIF key; users can also enter one per session
HF_HOME=...            # Optional: HuggingFace cache directory
HF_TOKEN=...           # Optional: higher Hugging Face Hub rate limits; public datasets do not require it
ARTIFACTS_DIR=...      # Optional: where persona vectors are read from (default: ./artifacts)
PERSONA_VECTORS_HUB_REPO=...  # Optional: default Analysis/Probing Hub dataset repo
PERSONA_UI_STORE_CACHE_ENTRIES=4      # Optional: open local/Hub vector stores kept warm
PERSONA_UI_VECTOR_CACHE_ENTRIES=4     # Optional: loaded analysis datasets kept warm
PERSONA_UI_PREPARED_CACHE_ENTRIES=8   # Optional: prepared projections / k-means groups kept warm
PERSONA_UI_FIGURE_STATE_ENTRIES=2     # Optional: recent rendered Analysis figures kept in-session
PERSONA_UI_PREPARED_STATE_ENTRIES=4   # Optional: recent projection-ready markers kept in-session
```

The app picks up `.env` automatically via `load_dotenv()` on startup, and hosted
environments such as Hugging Face Spaces can provide the same values as
environment variables. If `NDIF_API_KEY` is unset, Chat and Extract users are prompted for a per-session key when they need remote execution.

## Persona Vectors

The Analysis and Probing tabs read persona vectors from either a Hugging Face
dataset (pushed by `persona-vectors/main.py push` or the
`extraction_*.sh` scripts) or from local artifacts. The Extract tab writes
local artifacts to:

```
artifacts/
β”œβ”€β”€ activations/<model_dir>/<mask_strategy>/<prompt_variant>/   # also: persona-vectors/...
β”‚   β”œβ”€β”€ manifest.json
β”‚   └── <persona_id>.safetensors
└── chats/<model_dir>/<persona_id>/
    └── <export>.json
```

`<model_dir>` is the model name with `/` replaced by `__` (e.g. `google__gemma-2-9b-it`). 
The manifest stores persona names, tensor shape metadata, and sample ids. 
Chat exports still store `dataset_source` in the JSON payload.