persona-ui / README.md
Jac-Zac
Updated to latest persona-vector
e8b71ab
---
title: persona-ui
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8501
pinned: false
---
# Persona UI
[![Deploy to Hugging Face Spaces](https://huggingface.co/spaces/implicit-personalization/persona-ui/badge.svg)](https://huggingface.co/spaces/implicit-personalization/persona-ui)
Streamlit interface for persona vector extraction, analysis, and chat.
## Overview
A web app built on top of [persona-vectors](../persona-vectors) that provides these tabs:
- **Chat** β€” interactive conversations with a model using persona-based system prompts (templated or biography)
- **Analysis** β€” load local or Hub persona vectors and explore cosine similarity, PCA, UMAP, attribute-colored projections, and dendrograms
- **Probing** β€” sweep and inspect linear probes trained over saved persona vectors
- **Extract** β€” run persona-vector extraction from HuggingFace persona datasets or a local JSONL dataset directly from the browser
## Repository Layout
```
persona-ui/
β”œβ”€β”€ app.py # Main entry point (Streamlit)
β”œβ”€β”€ state.py # Session state management (chat history, KV cache)
β”œβ”€β”€ tabs/
β”‚ β”œβ”€β”€ chat.py / chat_ui.py / chat_shared.py # Chat tab
β”‚ β”œβ”€β”€ compare_chat.py # Side-by-side chat comparison mode
β”‚ β”œβ”€β”€ analysis_core.py # Analysis tab entry point
β”‚ β”œβ”€β”€ analysis/ # Analysis tab internals
β”‚ β”‚ β”œβ”€β”€ _shared.py / _state.py # Shared loading + session state
β”‚ β”‚ β”œβ”€β”€ cosine.py # Cosine similarity view
β”‚ β”‚ β”œβ”€β”€ dendrogram.py # Persona dendrograms
β”‚ β”‚ └── layered.py # PCA/UMAP/Isomap projections
β”‚ β”œβ”€β”€ extract.py # Extraction tab
β”‚ β”œβ”€β”€ probe.py / probe_ui.py # Probe diagnostics + upload/tracing controls
β”‚ └── probe_sweep.py # Probe sweep tab
└── utils/
β”œβ”€β”€ analysis_sources.py # Local + Hub persona-vector store wiring
β”œβ”€β”€ chat.py # Chat generation logic
β”œβ”€β”€ chat_export.py # Export chat logs to JSON
β”œβ”€β”€ contrast.py # Contrastive token log-prob coloring
β”œβ”€β”€ datasets.py # Dataset loader wrapper
β”œβ”€β”€ helpers.py # UI labels and slug helpers
β”œβ”€β”€ probe_trace.py # Chat-token activation tracing
β”œβ”€β”€ probe_overlay.py # Per-token probe-score overlay
β”œβ”€β”€ probes.py / probe_files.py # Probe loading, scoring, artifact paths
β”œβ”€β”€ preload.py # Background startup warmup
└── runtime.py # Model caching and NDIF queries
```
Dataset loading and environment helpers are provided by the sibling [persona-data](https://github.com/implicit-personalization/persona-data) package.
Core extraction, analysis, and steering logic comes from [persona-vectors](https://github.com/implicit-personalization/persona-vectors).
## Installation
```bash
uv sync
cp .env.example .env
```
## Local Development
The checked-in dependency config uses published packages. For local package
work, uncomment the `tool.uv.sources` block in `pyproject.toml` and keep sibling checkouts next to this repo.
Example:
```bash
git clone <persona-data-url> ../persona-data
git clone <persona-vectors-url> ../persona-vectors
```
Expected layout:
```text
parent/
β”œβ”€β”€ persona-ui
β”œβ”€β”€ persona-data
└── persona-vectors
```
## Quickstart
```bash
streamlit run app.py
```
## Hugging Face Spaces Deployment
This app can be deployed to Hugging Face Spaces using Docker.
### Build Locally
```bash
docker build -t persona-ui .
# Pass your local .env if you want the container to use the same configuration
docker run --env-file .env --rm -p 8501:8501 persona-ui
```
## Configuration
Copy `.env.example` to `.env` and fill in:
```bash
NDIF_API_KEY=... # Optional shared NDIF key; users can also enter one per session
HF_HOME=... # Optional: HuggingFace cache directory
HF_TOKEN=... # Optional: higher Hugging Face Hub rate limits; public datasets do not require it
ARTIFACTS_DIR=... # Optional: where persona vectors are read from (default: ./artifacts)
PERSONA_VECTORS_HUB_REPO=... # Optional: default Analysis/Probing Hub dataset repo
PERSONA_UI_STORE_CACHE_ENTRIES=4 # Optional: open local/Hub vector stores kept warm
PERSONA_UI_VECTOR_CACHE_ENTRIES=4 # Optional: loaded analysis datasets kept warm
PERSONA_UI_PREPARED_CACHE_ENTRIES=8 # Optional: prepared projections / k-means groups kept warm
PERSONA_UI_FIGURE_STATE_ENTRIES=2 # Optional: recent rendered Analysis figures kept in-session
PERSONA_UI_PREPARED_STATE_ENTRIES=4 # Optional: recent projection-ready markers kept in-session
```
The app picks up `.env` automatically via `load_dotenv()` on startup, and hosted
environments such as Hugging Face Spaces can provide the same values as
environment variables. If `NDIF_API_KEY` is unset, Chat and Extract users are prompted for a per-session key when they need remote execution.
## Persona Vectors
The Analysis and Probing tabs read persona vectors from either a Hugging Face
dataset (pushed by `persona-vectors/main.py push` or the
`extraction_*.sh` scripts) or from local artifacts. The Extract tab writes
local artifacts to:
```
artifacts/
β”œβ”€β”€ activations/<model_dir>/<mask_strategy>/<prompt_variant>/ # also: persona-vectors/...
β”‚ β”œβ”€β”€ manifest.json
β”‚ └── <persona_id>.safetensors
└── chats/<model_dir>/<persona_id>/
└── <export>.json
```
`<model_dir>` is the model name with `/` replaced by `__` (e.g. `google__gemma-2-9b-it`).
The manifest stores persona names, tensor shape metadata, and sample ids.
Chat exports still store `dataset_source` in the JSON payload.