---
title: Caps Chatbot Internal
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
- inference-api
license: apache-2.0
short_description: CAPS Chatbot — Internal Review Portal Co-designed AI peer su
---

# CAPS Chatbot — Sanyu (Internal Review Portal)

> Co-designed AI peer support for adolescents and young people living with HIV | Expert safety review — not for clinical use.

---

## What this project is

**Sanyu** is a co-designed AI peer support chatbot for adolescents and young people living with HIV (AYPLHIV) aged 15–24 in Uganda. Built by **CAPS-IDI**, this is an internal review/prototype portal — not yet approved for clinical or public use.

---

## Tech Stack

| Layer | Choice |
|---|---|
| Frontend/UI | Gradio (`gr.ChatInterface`) |
| LLM | Google Gemini 2.5 Flash via `google-genai` SDK |
| Auth | Hugging Face OAuth (`hf_oauth: true`) |
| Hosting | Hugging Face Spaces (Gradio SDK) |
| Python deps | `gradio>=4.0.0`, `google-genai` |

---

## How the App Works

1. **`META_PROMPT`** — A detailed (~370-line) system prompt defining Sanyu's persona, tone, content knowledge, and behavioral rules.
2. **`extract_text(content)`** — Utility to handle both plain strings and Gradio's structured `[{"type": "text", ...}]` message format.
3. **`respond(message, history)`** — The chat handler. Converts Gradio's history (supports both dict-format and tuple-format) into Gemini `types.Content` objects, appends the new user message, then calls `client.models.generate_content()` with the system prompt injected via `GenerateContentConfig`.
4. **`gr.ChatInterface`** — Wraps `respond` into a simple web UI with title and description.
5. **API key** — Loaded from the `GOOGLE_API_KEY` environment variable (set as a Hugging Face Space secret).

---

## System Prompt Design

The `META_PROMPT` is the intellectual core of the project. It was co-designed with AYPLHIV and health workers through modified Delphi consensus workshops. It encodes:

### 12-Dimension Voice Matrix
1. **Empathy & Understanding First** — acknowledge emotions before giving information
2. **Non-Judgmental Language** — no blame, no "why didn't you…"
3. **User Agency** — present options, not directives; user is the decision-maker
4. **Patience / No Time Pressure** — never rush; let the user lead the pace
5. **Concise by Default** — 2–4 sentences; no walls of text
6. **Warm but Not Frivolous** — peer-like language, match the user's energy
7. **Empowerment & Capacity Building** — build confidence and self-advocacy over time
8. **Comfort & Reassurance** — affirming, hopeful, counter internalized stigma
9. **Structured Guidance When Requested** — numbered steps for "how do I…" questions
10. **Evidence-Based with Conversational Delivery** — factual but accessible; Uganda-specific context
11. **Progressive / Realistic Goals** — graduated steps, not all-or-nothing advice
12. **Storytelling as Support Tool** — anonymised vignettes to illustrate how others cope

### Content Domains
- **Medication adherence** — barriers, practical strategies, non-shaming approach
- **Disclosure strategies** — multiple approaches, user-led, safety-first
- **Mental health & self-stigma** — normalisation, affirmations, self-acceptance
- **Sexual & reproductive health** — contraception, STIs, pregnancy, SRH rights
- **Relationships** — romantic partners, family, peer dynamics
- **GBV safety protocols** — crisis detection, escalation triggers, referral pathways

### Safety & Limits
- Hard boundary: **no medical prescriptions**
- Crisis triggers (suicidal ideation, active abuse, safety risk) → immediate escalation prompt
- Always refers complex/crisis cases to human counsellors and peer supporters
- "Referral is a feature, not a failure."

### Language & Accessibility
- Default: English; Luganda code-switching accepted
- Plain language targeting ~8 years of education
- Age-adapted: different tone/content for 14–17 vs 18–24 year olds
- Few-shot examples from real counselling dialogues embedded in the prompt

---

## Known Limitations

| Issue | Detail |
|---|---|
| No persistent memory | The prompt requires remembering users across sessions, but there is no database or session storage — memory only lasts within a single Gradio session |
| No streaming | `generate_content()` is synchronous — users see nothing until the full response is ready |
| No error handling | Unhandled exceptions if the Gemini API fails (rate limit, network error, etc.) |

---

## Setup

1. Add your `HF_TOKEN` as a Hugging Face Space secret (Settings -> Variables and secrets).
   - Generate a token at https://huggingface.co/settings/tokens (read access is sufficient)
   - This is required to use the HF Inference API without rate limits
2. Dependencies are in `requirements.txt`:
   ```
   gradio>=4.0.0
   huggingface_hub>=0.33.0
   sentence-transformers>=2.7.0
   faiss-cpu>=1.8.0
   pdfplumber>=0.11.0
   ```
3. The Space will auto-launch `app.py` on startup.

## LLM

The app uses **`meta-llama/Llama-3.2-3B-Instruct`** via the Hugging Face Inference API (serverless).
No GPU required — inference runs on HF hosted infrastructure.