Add HF deploy docs and local Docker runner
Browse files- .gitignore +11 -1
- PLAN.md +101 -0
- docs/hf.md +58 -0
- run_docker_local.sh +24 -0
- start_hf_space.sh +0 -0
.gitignore
CHANGED
|
@@ -1,2 +1,12 @@
|
|
| 1 |
*.pyc
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
*.pyc
|
| 2 |
+
__pycache__/
|
| 3 |
+
.pytest_cache/
|
| 4 |
+
.mypy_cache/
|
| 5 |
+
.ruff_cache/
|
| 6 |
+
.venv/
|
| 7 |
+
|
| 8 |
+
.DS_Store
|
| 9 |
+
|
| 10 |
+
.env
|
| 11 |
+
.env.*
|
| 12 |
+
!.env.example
|
PLAN.md
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PLAN.md — ConverTA Next Deliverables
|
| 2 |
+
|
| 3 |
+
This plan captures two deliverables requested after the PI demo, with the current hosting target being Hugging Face Spaces (Docker).
|
| 4 |
+
|
| 5 |
+
## Deliverables
|
| 6 |
+
|
| 7 |
+
### 1) Configuration Panel (Personas + Prompt Tweaks)
|
| 8 |
+
**Goal:** Let a user select existing surveyor/patient personas and adjust prompt/model parameters used to start a conversation.
|
| 9 |
+
|
| 10 |
+
**User-visible outcomes**
|
| 11 |
+
- UI panel to choose `surveyor_persona_id` and `patient_persona_id` from saved personas.
|
| 12 |
+
- Optional prompt overrides (system prompt additions/edits) and model settings (e.g., temperature, model id) that affect the conversation run.
|
| 13 |
+
|
| 14 |
+
**Primary risks**
|
| 15 |
+
- Keeping UI state in sync with backend state if a conversation is already running.
|
| 16 |
+
- Avoiding “configuration drift” between what user selected and what was actually sent.
|
| 17 |
+
|
| 18 |
+
### 2) Resource Agent Panel (Post-Conversation Insights via Resource Agent Prompt)
|
| 19 |
+
**Goal (higher priority):** After a conversation completes, run a dedicated “resource agent” analysis (an additional LLM call) and render structured insights in the Resources panel.
|
| 20 |
+
|
| 21 |
+
**User-visible outcomes**
|
| 22 |
+
- Resources panel populates automatically at conversation end with:
|
| 23 |
+
- Patient health situation(s) mentioned + supporting evidence snippets.
|
| 24 |
+
- Care experience evaluation (good/bad/neutral) + reasons + evidence snippets.
|
| 25 |
+
- Clear status UI: `idle → running → complete` and error handling.
|
| 26 |
+
|
| 27 |
+
**Primary risks**
|
| 28 |
+
- Reliably detecting “conversation ended” (stop button, backend status, disconnect, timeout).
|
| 29 |
+
- Capturing a complete transcript (including persona metadata).
|
| 30 |
+
- Latency/cost of extra LLM call and its failure modes.
|
| 31 |
+
|
| 32 |
+
## Proposed Implementation Order
|
| 33 |
+
|
| 34 |
+
1. **Deliverable #2 — Slice 1 (plumbing, end-to-end)**
|
| 35 |
+
2. **Deliverable #1 — Minimal configuration UI (persona selection)**
|
| 36 |
+
3. **Deliverable #2 — Slice 2 (quality, schema, robustness)**
|
| 37 |
+
4. **Deliverable #1 — Advanced configuration (prompt edits + model params)**
|
| 38 |
+
|
| 39 |
+
Rationale: implement the highest-value path (#2) first but in thin slices, so we get a fast “works end-to-end” demo without blocking on perfect configuration UX.
|
| 40 |
+
|
| 41 |
+
## Milestones and Acceptance Criteria
|
| 42 |
+
|
| 43 |
+
### Milestone A — Transcript Capture + End-of-Conversation Trigger (for #2)
|
| 44 |
+
- Transcript stored per `conversation_id` with ordered messages and persona metadata.
|
| 45 |
+
- Trigger condition (MVP): run analysis on `conversation_status: completed` only.
|
| 46 |
+
- Transcript scope (MVP): include utterances only (no system prompts, no routing events).
|
| 47 |
+
- Acceptance: transcript matches the conversation shown in the Messages panel.
|
| 48 |
+
|
| 49 |
+
### Milestone B — Resource Agent LLM Call (for #2)
|
| 50 |
+
- On end-of-conversation, trigger analysis request once per conversation.
|
| 51 |
+
- Use the same underlying model as the conversation by default; the “resource agent” difference is the system prompt/context and desired output schema (not a different provider/model).
|
| 52 |
+
- Acceptance: Resources panel shows an analysis result for a completed conversation.
|
| 53 |
+
|
| 54 |
+
### Milestone C — Structured Output + UI Rendering (for #2)
|
| 55 |
+
- Resource agent prompt requests a strict JSON schema (validated on receipt).
|
| 56 |
+
- Evidence best practice (MVP): the model returns evidence pointers into the transcript and the app extracts exact evidence snippets programmatically.
|
| 57 |
+
- UI renders:
|
| 58 |
+
- `health_situation`: list of items with `summary`, `evidence[]`, `confidence`.
|
| 59 |
+
- `care_experience`: `sentiment`, `reasons[]`, `evidence[]`, `confidence`.
|
| 60 |
+
- Acceptance: output is stable across runs (no random placeholders), and errors are displayed without breaking the app.
|
| 61 |
+
|
| 62 |
+
### Milestone D — Minimal Configuration Panel (for #1)
|
| 63 |
+
- UI fetches available personas and allows selecting:
|
| 64 |
+
- surveyor persona
|
| 65 |
+
- patient persona
|
| 66 |
+
- Selection affects the next “Start Conversation” payload.
|
| 67 |
+
- Acceptance: changing persona selection changes the personas used in the conversation.
|
| 68 |
+
|
| 69 |
+
### Milestone E — Prompt/Model Overrides (for #1)
|
| 70 |
+
- UI supports optional overrides:
|
| 71 |
+
- per-agent prompt additions
|
| 72 |
+
- model id, temperature, max tokens (as supported)
|
| 73 |
+
- Acceptance: overrides are visible in logs and reflected in the conversation behavior.
|
| 74 |
+
|
| 75 |
+
## Technical Notes (Current Stack Reality)
|
| 76 |
+
|
| 77 |
+
- The deployed Space runs a single FastAPI server (`frontend/react_gradio_hybrid.py`) and mounts the backend under `/api`.
|
| 78 |
+
- Frontend communicates via `/ws/frontend/{conversation_id}` and bridges to `/api/ws/conversation/{conversation_id}` using `WebSocketManager`.
|
| 79 |
+
|
| 80 |
+
## Design Decisions to Make Early
|
| 81 |
+
|
| 82 |
+
1. **Where does the resource agent run?**
|
| 83 |
+
- Decision (MVP): backend-side analysis function so logic is not duplicated and can be reused by any UI.
|
| 84 |
+
|
| 85 |
+
2. **How do we store transcript?**
|
| 86 |
+
- Decision (MVP): in-memory per conversation (fast path), with a clear reset on new conversation.
|
| 87 |
+
- Persistence (future): save transcript + analysis; start with simple file/JSONL and move to DB later.
|
| 88 |
+
|
| 89 |
+
3. **How do we configure analysis vs conversation?**
|
| 90 |
+
- Default: reuse `LLM_*` for model/provider and only vary prompt/context for the resource agent.
|
| 91 |
+
- Optional (future): allow separate `RESOURCE_LLM_*` overrides to pick a different model/provider for analysis.
|
| 92 |
+
|
| 93 |
+
4. **How do we render partial results?**
|
| 94 |
+
- Optional: stream analysis updates (later); initial version can be single-shot.
|
| 95 |
+
|
| 96 |
+
## Open Questions
|
| 97 |
+
|
| 98 |
+
- Do we want multiple “resource agent” passes (health vs care) or one combined prompt returning two sections?
|
| 99 |
+
- Should users be able to rerun analysis with different prompts/models from the config panel? (out of MVP scope)
|
| 100 |
+
- What’s the minimal on-disk persistence format we want first (JSON per conversation vs JSONL append)?
|
| 101 |
+
- Versioning (MVP): store `schema_version`, `analysis_prompt_version`, and `app_version` (git SHA) with each analysis record.
|
docs/hf.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hugging Face Spaces (Docker) — Deploy + Debug
|
| 2 |
+
|
| 3 |
+
This project is deployed as a Hugging Face Space using the Docker SDK.
|
| 4 |
+
|
| 5 |
+
## One-time setup (Space UI)
|
| 6 |
+
|
| 7 |
+
Space: `https://huggingface.co/spaces/MikelWL/ConverTA`
|
| 8 |
+
|
| 9 |
+
In Space → Settings → Variables and secrets:
|
| 10 |
+
|
| 11 |
+
**Secrets**
|
| 12 |
+
- `LLM_API_KEY`: OpenRouter API key
|
| 13 |
+
|
| 14 |
+
**Variables**
|
| 15 |
+
- `LLM_BACKEND`: `openrouter`
|
| 16 |
+
- `LLM_HOST`: `https://openrouter.ai/api/v1`
|
| 17 |
+
- `LLM_MODEL`: e.g. `google/gemini-3-flash-preview`
|
| 18 |
+
- `LLM_SITE_URL`: `https://huggingface.co/spaces/MikelWL/ConverTA` (optional)
|
| 19 |
+
- `LLM_APP_NAME`: `ConverTA` (optional)
|
| 20 |
+
- `FRONTEND_WEBSOCKET_URL`: `ws://127.0.0.1:7860/api/ws/conversation`
|
| 21 |
+
- `FRONTEND_BACKEND_BASE_URL`: `http://127.0.0.1:7860/api` (optional)
|
| 22 |
+
|
| 23 |
+
Restart the Space after changing secrets/variables.
|
| 24 |
+
|
| 25 |
+
## Local smoke test (HF-like)
|
| 26 |
+
|
| 27 |
+
Run the Docker image locally before pushing to HF:
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
./run_docker_local.sh
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
Then open `http://localhost:7860` and click **Start Conversation**.
|
| 34 |
+
|
| 35 |
+
## Deploy (push to Space repo)
|
| 36 |
+
|
| 37 |
+
The Space is also configured as a git remote locally (`hf`).
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
git push hf main
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
If the Space repo ever gets reset/recreated and your push is rejected with “fetch first”, use:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
git push --force hf main
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## Troubleshooting
|
| 50 |
+
|
| 51 |
+
- **UI loads but QA Monitor shows “Failed to connect to backend”**
|
| 52 |
+
- Ensure `FRONTEND_WEBSOCKET_URL` is set to `ws://127.0.0.1:7860/api/ws/conversation`.
|
| 53 |
+
- **Space crashes on startup**
|
| 54 |
+
- Check Space → Logs for the Python traceback.
|
| 55 |
+
- Confirm `PORT` is being respected (HF sets it automatically; we bind to `0.0.0.0:$PORT`).
|
| 56 |
+
- **OpenRouter errors**
|
| 57 |
+
- Confirm `LLM_API_KEY` secret is set and `LLM_MODEL` is valid on OpenRouter.
|
| 58 |
+
|
run_docker_local.sh
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
set -Eeuo pipefail
|
| 3 |
+
|
| 4 |
+
IMAGE_NAME="${IMAGE_NAME:-converta:local}"
|
| 5 |
+
HOST_PORT="${HOST_PORT:-7860}"
|
| 6 |
+
CONTAINER_PORT="${CONTAINER_PORT:-7860}"
|
| 7 |
+
|
| 8 |
+
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 9 |
+
|
| 10 |
+
echo "Building Docker image: ${IMAGE_NAME}"
|
| 11 |
+
docker build -t "${IMAGE_NAME}" "${ROOT_DIR}"
|
| 12 |
+
|
| 13 |
+
ENV_ARGS=()
|
| 14 |
+
if [[ -f "${ROOT_DIR}/.env" ]]; then
|
| 15 |
+
ENV_ARGS+=(--env-file "${ROOT_DIR}/.env")
|
| 16 |
+
fi
|
| 17 |
+
|
| 18 |
+
echo "Running container on http://localhost:${HOST_PORT}"
|
| 19 |
+
exec docker run --rm -it \
|
| 20 |
+
-p "${HOST_PORT}:${CONTAINER_PORT}" \
|
| 21 |
+
-e "PORT=${CONTAINER_PORT}" \
|
| 22 |
+
"${ENV_ARGS[@]}" \
|
| 23 |
+
"${IMAGE_NAME}"
|
| 24 |
+
|
start_hf_space.sh
CHANGED
|
File without changes
|