MikelWL commited on
Commit
d9fdc34
·
1 Parent(s): a4d5849

Add HF deploy docs and local Docker runner

Browse files
Files changed (5) hide show
  1. .gitignore +11 -1
  2. PLAN.md +101 -0
  3. docs/hf.md +58 -0
  4. run_docker_local.sh +24 -0
  5. start_hf_space.sh +0 -0
.gitignore CHANGED
@@ -1,2 +1,12 @@
1
  *.pyc
2
- .env
 
 
 
 
 
 
 
 
 
 
 
1
  *.pyc
2
+ __pycache__/
3
+ .pytest_cache/
4
+ .mypy_cache/
5
+ .ruff_cache/
6
+ .venv/
7
+
8
+ .DS_Store
9
+
10
+ .env
11
+ .env.*
12
+ !.env.example
PLAN.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PLAN.md — ConverTA Next Deliverables
2
+
3
+ This plan captures two deliverables requested after the PI demo, with the current hosting target being Hugging Face Spaces (Docker).
4
+
5
+ ## Deliverables
6
+
7
+ ### 1) Configuration Panel (Personas + Prompt Tweaks)
8
+ **Goal:** Let a user select existing surveyor/patient personas and adjust prompt/model parameters used to start a conversation.
9
+
10
+ **User-visible outcomes**
11
+ - UI panel to choose `surveyor_persona_id` and `patient_persona_id` from saved personas.
12
+ - Optional prompt overrides (system prompt additions/edits) and model settings (e.g., temperature, model id) that affect the conversation run.
13
+
14
+ **Primary risks**
15
+ - Keeping UI state in sync with backend state if a conversation is already running.
16
+ - Avoiding “configuration drift” between what user selected and what was actually sent.
17
+
18
+ ### 2) Resource Agent Panel (Post-Conversation Insights via Resource Agent Prompt)
19
+ **Goal (higher priority):** After a conversation completes, run a dedicated “resource agent” analysis (an additional LLM call) and render structured insights in the Resources panel.
20
+
21
+ **User-visible outcomes**
22
+ - Resources panel populates automatically at conversation end with:
23
+ - Patient health situation(s) mentioned + supporting evidence snippets.
24
+ - Care experience evaluation (good/bad/neutral) + reasons + evidence snippets.
25
+ - Clear status UI: `idle → running → complete` and error handling.
26
+
27
+ **Primary risks**
28
+ - Reliably detecting “conversation ended” (stop button, backend status, disconnect, timeout).
29
+ - Capturing a complete transcript (including persona metadata).
30
+ - Latency/cost of extra LLM call and its failure modes.
31
+
32
+ ## Proposed Implementation Order
33
+
34
+ 1. **Deliverable #2 — Slice 1 (plumbing, end-to-end)**
35
+ 2. **Deliverable #1 — Minimal configuration UI (persona selection)**
36
+ 3. **Deliverable #2 — Slice 2 (quality, schema, robustness)**
37
+ 4. **Deliverable #1 — Advanced configuration (prompt edits + model params)**
38
+
39
+ Rationale: implement the highest-value path (#2) first but in thin slices, so we get a fast “works end-to-end” demo without blocking on perfect configuration UX.
40
+
41
+ ## Milestones and Acceptance Criteria
42
+
43
+ ### Milestone A — Transcript Capture + End-of-Conversation Trigger (for #2)
44
+ - Transcript stored per `conversation_id` with ordered messages and persona metadata.
45
+ - Trigger condition (MVP): run analysis on `conversation_status: completed` only.
46
+ - Transcript scope (MVP): include utterances only (no system prompts, no routing events).
47
+ - Acceptance: transcript matches the conversation shown in the Messages panel.
48
+
49
+ ### Milestone B — Resource Agent LLM Call (for #2)
50
+ - On end-of-conversation, trigger analysis request once per conversation.
51
+ - Use the same underlying model as the conversation by default; the “resource agent” difference is the system prompt/context and desired output schema (not a different provider/model).
52
+ - Acceptance: Resources panel shows an analysis result for a completed conversation.
53
+
54
+ ### Milestone C — Structured Output + UI Rendering (for #2)
55
+ - Resource agent prompt requests a strict JSON schema (validated on receipt).
56
+ - Evidence best practice (MVP): the model returns evidence pointers into the transcript and the app extracts exact evidence snippets programmatically.
57
+ - UI renders:
58
+ - `health_situation`: list of items with `summary`, `evidence[]`, `confidence`.
59
+ - `care_experience`: `sentiment`, `reasons[]`, `evidence[]`, `confidence`.
60
+ - Acceptance: output is stable across runs (no random placeholders), and errors are displayed without breaking the app.
61
+
62
+ ### Milestone D — Minimal Configuration Panel (for #1)
63
+ - UI fetches available personas and allows selecting:
64
+ - surveyor persona
65
+ - patient persona
66
+ - Selection affects the next “Start Conversation” payload.
67
+ - Acceptance: changing persona selection changes the personas used in the conversation.
68
+
69
+ ### Milestone E — Prompt/Model Overrides (for #1)
70
+ - UI supports optional overrides:
71
+ - per-agent prompt additions
72
+ - model id, temperature, max tokens (as supported)
73
+ - Acceptance: overrides are visible in logs and reflected in the conversation behavior.
74
+
75
+ ## Technical Notes (Current Stack Reality)
76
+
77
+ - The deployed Space runs a single FastAPI server (`frontend/react_gradio_hybrid.py`) and mounts the backend under `/api`.
78
+ - Frontend communicates via `/ws/frontend/{conversation_id}` and bridges to `/api/ws/conversation/{conversation_id}` using `WebSocketManager`.
79
+
80
+ ## Design Decisions to Make Early
81
+
82
+ 1. **Where does the resource agent run?**
83
+ - Decision (MVP): backend-side analysis function so logic is not duplicated and can be reused by any UI.
84
+
85
+ 2. **How do we store transcript?**
86
+ - Decision (MVP): in-memory per conversation (fast path), with a clear reset on new conversation.
87
+ - Persistence (future): save transcript + analysis; start with simple file/JSONL and move to DB later.
88
+
89
+ 3. **How do we configure analysis vs conversation?**
90
+ - Default: reuse `LLM_*` for model/provider and only vary prompt/context for the resource agent.
91
+ - Optional (future): allow separate `RESOURCE_LLM_*` overrides to pick a different model/provider for analysis.
92
+
93
+ 4. **How do we render partial results?**
94
+ - Optional: stream analysis updates (later); initial version can be single-shot.
95
+
96
+ ## Open Questions
97
+
98
+ - Do we want multiple “resource agent” passes (health vs care) or one combined prompt returning two sections?
99
+ - Should users be able to rerun analysis with different prompts/models from the config panel? (out of MVP scope)
100
+ - What’s the minimal on-disk persistence format we want first (JSON per conversation vs JSONL append)?
101
+ - Versioning (MVP): store `schema_version`, `analysis_prompt_version`, and `app_version` (git SHA) with each analysis record.
docs/hf.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Spaces (Docker) — Deploy + Debug
2
+
3
+ This project is deployed as a Hugging Face Space using the Docker SDK.
4
+
5
+ ## One-time setup (Space UI)
6
+
7
+ Space: `https://huggingface.co/spaces/MikelWL/ConverTA`
8
+
9
+ In Space → Settings → Variables and secrets:
10
+
11
+ **Secrets**
12
+ - `LLM_API_KEY`: OpenRouter API key
13
+
14
+ **Variables**
15
+ - `LLM_BACKEND`: `openrouter`
16
+ - `LLM_HOST`: `https://openrouter.ai/api/v1`
17
+ - `LLM_MODEL`: e.g. `google/gemini-3-flash-preview`
18
+ - `LLM_SITE_URL`: `https://huggingface.co/spaces/MikelWL/ConverTA` (optional)
19
+ - `LLM_APP_NAME`: `ConverTA` (optional)
20
+ - `FRONTEND_WEBSOCKET_URL`: `ws://127.0.0.1:7860/api/ws/conversation`
21
+ - `FRONTEND_BACKEND_BASE_URL`: `http://127.0.0.1:7860/api` (optional)
22
+
23
+ Restart the Space after changing secrets/variables.
24
+
25
+ ## Local smoke test (HF-like)
26
+
27
+ Run the Docker image locally before pushing to HF:
28
+
29
+ ```bash
30
+ ./run_docker_local.sh
31
+ ```
32
+
33
+ Then open `http://localhost:7860` and click **Start Conversation**.
34
+
35
+ ## Deploy (push to Space repo)
36
+
37
+ The Space is also configured as a git remote locally (`hf`).
38
+
39
+ ```bash
40
+ git push hf main
41
+ ```
42
+
43
+ If the Space repo ever gets reset/recreated and your push is rejected with “fetch first”, use:
44
+
45
+ ```bash
46
+ git push --force hf main
47
+ ```
48
+
49
+ ## Troubleshooting
50
+
51
+ - **UI loads but QA Monitor shows “Failed to connect to backend”**
52
+ - Ensure `FRONTEND_WEBSOCKET_URL` is set to `ws://127.0.0.1:7860/api/ws/conversation`.
53
+ - **Space crashes on startup**
54
+ - Check Space → Logs for the Python traceback.
55
+ - Confirm `PORT` is being respected (HF sets it automatically; we bind to `0.0.0.0:$PORT`).
56
+ - **OpenRouter errors**
57
+ - Confirm `LLM_API_KEY` secret is set and `LLM_MODEL` is valid on OpenRouter.
58
+
run_docker_local.sh ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -Eeuo pipefail
3
+
4
+ IMAGE_NAME="${IMAGE_NAME:-converta:local}"
5
+ HOST_PORT="${HOST_PORT:-7860}"
6
+ CONTAINER_PORT="${CONTAINER_PORT:-7860}"
7
+
8
+ ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
9
+
10
+ echo "Building Docker image: ${IMAGE_NAME}"
11
+ docker build -t "${IMAGE_NAME}" "${ROOT_DIR}"
12
+
13
+ ENV_ARGS=()
14
+ if [[ -f "${ROOT_DIR}/.env" ]]; then
15
+ ENV_ARGS+=(--env-file "${ROOT_DIR}/.env")
16
+ fi
17
+
18
+ echo "Running container on http://localhost:${HOST_PORT}"
19
+ exec docker run --rm -it \
20
+ -p "${HOST_PORT}:${CONTAINER_PORT}" \
21
+ -e "PORT=${CONTAINER_PORT}" \
22
+ "${ENV_ARGS[@]}" \
23
+ "${IMAGE_NAME}"
24
+
start_hf_space.sh CHANGED
File without changes