Spaces:

CAPS-IDI
/

caps-chatbot-internal

Sleeping

App Files Files Community

caps-chatbot-internal / README.md

atwine

Switch LLM from Gemini to Llama-3.2-3B-Instruct via HF Inference API

a937e6c 14 days ago

preview code

raw

history blame contribute delete

5.27 kB

	---
	title: Caps Chatbot Internal
	emoji: 💬
	colorFrom: yellow
	colorTo: purple
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: false
	hf_oauth: true
	hf_oauth_scopes:
	- inference-api
	license: apache-2.0
	short_description: CAPS Chatbot — Internal Review Portal Co-designed AI peer su
	---

	# CAPS Chatbot — Sanyu (Internal Review Portal)

	> Co-designed AI peer support for adolescents and young people living with HIV \| Expert safety review — not for clinical use.

	---

	## What this project is

	Sanyu is a co-designed AI peer support chatbot for adolescents and young people living with HIV (AYPLHIV) aged 15–24 in Uganda. Built by CAPS-IDI, this is an internal review/prototype portal — not yet approved for clinical or public use.

	---

	## Tech Stack

	\| Layer \| Choice \|
	\|---\|---\|
	\| Frontend/UI \| Gradio (`gr.ChatInterface`) \|
	\| LLM \| Google Gemini 2.5 Flash via `google-genai` SDK \|
	\| Auth \| Hugging Face OAuth (`hf_oauth: true`) \|
	\| Hosting \| Hugging Face Spaces (Gradio SDK) \|
	\| Python deps \| `gradio>=4.0.0`, `google-genai` \|

	---

	## How the App Works

	1. `META_PROMPT` — A detailed (~370-line) system prompt defining Sanyu's persona, tone, content knowledge, and behavioral rules.
	2. `extract_text(content)` — Utility to handle both plain strings and Gradio's structured `[{"type": "text", ...}]` message format.
	3. `respond(message, history)` — The chat handler. Converts Gradio's history (supports both dict-format and tuple-format) into Gemini `types.Content` objects, appends the new user message, then calls `client.models.generate_content()` with the system prompt injected via `GenerateContentConfig`.
	4. `gr.ChatInterface` — Wraps `respond` into a simple web UI with title and description.
	5. API key — Loaded from the `GOOGLE_API_KEY` environment variable (set as a Hugging Face Space secret).

	---

	## System Prompt Design

	The `META_PROMPT` is the intellectual core of the project. It was co-designed with AYPLHIV and health workers through modified Delphi consensus workshops. It encodes:

	### 12-Dimension Voice Matrix
	1. Empathy & Understanding First — acknowledge emotions before giving information
	2. Non-Judgmental Language — no blame, no "why didn't you…"
	3. User Agency — present options, not directives; user is the decision-maker
	4. Patience / No Time Pressure — never rush; let the user lead the pace
	5. Concise by Default — 2–4 sentences; no walls of text
	6. Warm but Not Frivolous — peer-like language, match the user's energy
	7. Empowerment & Capacity Building — build confidence and self-advocacy over time
	8. Comfort & Reassurance — affirming, hopeful, counter internalized stigma
	9. Structured Guidance When Requested — numbered steps for "how do I…" questions
	10. Evidence-Based with Conversational Delivery — factual but accessible; Uganda-specific context
	11. Progressive / Realistic Goals — graduated steps, not all-or-nothing advice
	12. Storytelling as Support Tool — anonymised vignettes to illustrate how others cope

	### Content Domains
	- Medication adherence — barriers, practical strategies, non-shaming approach
	- Disclosure strategies — multiple approaches, user-led, safety-first
	- Mental health & self-stigma — normalisation, affirmations, self-acceptance
	- Sexual & reproductive health — contraception, STIs, pregnancy, SRH rights
	- Relationships — romantic partners, family, peer dynamics
	- GBV safety protocols — crisis detection, escalation triggers, referral pathways

	### Safety & Limits
	- Hard boundary: no medical prescriptions
	- Crisis triggers (suicidal ideation, active abuse, safety risk) → immediate escalation prompt
	- Always refers complex/crisis cases to human counsellors and peer supporters
	- "Referral is a feature, not a failure."

	### Language & Accessibility
	- Default: English; Luganda code-switching accepted
	- Plain language targeting ~8 years of education
	- Age-adapted: different tone/content for 14–17 vs 18–24 year olds
	- Few-shot examples from real counselling dialogues embedded in the prompt

	---

	## Known Limitations

	\| Issue \| Detail \|
	\|---\|---\|
	\| No persistent memory \| The prompt requires remembering users across sessions, but there is no database or session storage — memory only lasts within a single Gradio session \|
	\| No streaming \| `generate_content()` is synchronous — users see nothing until the full response is ready \|
	\| No error handling \| Unhandled exceptions if the Gemini API fails (rate limit, network error, etc.) \|

	---

	## Setup

	1. Add your `HF_TOKEN` as a Hugging Face Space secret (Settings -> Variables and secrets).
	- Generate a token at https://huggingface.co/settings/tokens (read access is sufficient)
	- This is required to use the HF Inference API without rate limits
	2. Dependencies are in `requirements.txt`:
	```
	gradio>=4.0.0
	huggingface_hub>=0.33.0
	sentence-transformers>=2.7.0
	faiss-cpu>=1.8.0
	pdfplumber>=0.11.0
	```
	3. The Space will auto-launch `app.py` on startup.

	## LLM

	The app uses `meta-llama/Llama-3.2-3B-Instruct` via the Hugging Face Inference API (serverless).
	No GPU required — inference runs on HF hosted infrastructure.