atwine's picture
Switch LLM from Gemini to Llama-3.2-3B-Instruct via HF Inference API
a937e6c
---
title: Caps Chatbot Internal
emoji: πŸ’¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
- inference-api
license: apache-2.0
short_description: CAPS Chatbot β€” Internal Review Portal Co-designed AI peer su
---
# CAPS Chatbot β€” Sanyu (Internal Review Portal)
> Co-designed AI peer support for adolescents and young people living with HIV | Expert safety review β€” not for clinical use.
---
## What this project is
**Sanyu** is a co-designed AI peer support chatbot for adolescents and young people living with HIV (AYPLHIV) aged 15–24 in Uganda. Built by **CAPS-IDI**, this is an internal review/prototype portal β€” not yet approved for clinical or public use.
---
## Tech Stack
| Layer | Choice |
|---|---|
| Frontend/UI | Gradio (`gr.ChatInterface`) |
| LLM | Google Gemini 2.5 Flash via `google-genai` SDK |
| Auth | Hugging Face OAuth (`hf_oauth: true`) |
| Hosting | Hugging Face Spaces (Gradio SDK) |
| Python deps | `gradio>=4.0.0`, `google-genai` |
---
## How the App Works
1. **`META_PROMPT`** β€” A detailed (~370-line) system prompt defining Sanyu's persona, tone, content knowledge, and behavioral rules.
2. **`extract_text(content)`** β€” Utility to handle both plain strings and Gradio's structured `[{"type": "text", ...}]` message format.
3. **`respond(message, history)`** β€” The chat handler. Converts Gradio's history (supports both dict-format and tuple-format) into Gemini `types.Content` objects, appends the new user message, then calls `client.models.generate_content()` with the system prompt injected via `GenerateContentConfig`.
4. **`gr.ChatInterface`** β€” Wraps `respond` into a simple web UI with title and description.
5. **API key** β€” Loaded from the `GOOGLE_API_KEY` environment variable (set as a Hugging Face Space secret).
---
## System Prompt Design
The `META_PROMPT` is the intellectual core of the project. It was co-designed with AYPLHIV and health workers through modified Delphi consensus workshops. It encodes:
### 12-Dimension Voice Matrix
1. **Empathy & Understanding First** β€” acknowledge emotions before giving information
2. **Non-Judgmental Language** β€” no blame, no "why didn't you…"
3. **User Agency** β€” present options, not directives; user is the decision-maker
4. **Patience / No Time Pressure** β€” never rush; let the user lead the pace
5. **Concise by Default** β€” 2–4 sentences; no walls of text
6. **Warm but Not Frivolous** β€” peer-like language, match the user's energy
7. **Empowerment & Capacity Building** β€” build confidence and self-advocacy over time
8. **Comfort & Reassurance** β€” affirming, hopeful, counter internalized stigma
9. **Structured Guidance When Requested** β€” numbered steps for "how do I…" questions
10. **Evidence-Based with Conversational Delivery** β€” factual but accessible; Uganda-specific context
11. **Progressive / Realistic Goals** β€” graduated steps, not all-or-nothing advice
12. **Storytelling as Support Tool** β€” anonymised vignettes to illustrate how others cope
### Content Domains
- **Medication adherence** β€” barriers, practical strategies, non-shaming approach
- **Disclosure strategies** β€” multiple approaches, user-led, safety-first
- **Mental health & self-stigma** β€” normalisation, affirmations, self-acceptance
- **Sexual & reproductive health** β€” contraception, STIs, pregnancy, SRH rights
- **Relationships** β€” romantic partners, family, peer dynamics
- **GBV safety protocols** β€” crisis detection, escalation triggers, referral pathways
### Safety & Limits
- Hard boundary: **no medical prescriptions**
- Crisis triggers (suicidal ideation, active abuse, safety risk) β†’ immediate escalation prompt
- Always refers complex/crisis cases to human counsellors and peer supporters
- "Referral is a feature, not a failure."
### Language & Accessibility
- Default: English; Luganda code-switching accepted
- Plain language targeting ~8 years of education
- Age-adapted: different tone/content for 14–17 vs 18–24 year olds
- Few-shot examples from real counselling dialogues embedded in the prompt
---
## Known Limitations
| Issue | Detail |
|---|---|
| No persistent memory | The prompt requires remembering users across sessions, but there is no database or session storage β€” memory only lasts within a single Gradio session |
| No streaming | `generate_content()` is synchronous β€” users see nothing until the full response is ready |
| No error handling | Unhandled exceptions if the Gemini API fails (rate limit, network error, etc.) |
---
## Setup
1. Add your `HF_TOKEN` as a Hugging Face Space secret (Settings -> Variables and secrets).
- Generate a token at https://huggingface.co/settings/tokens (read access is sufficient)
- This is required to use the HF Inference API without rate limits
2. Dependencies are in `requirements.txt`:
```
gradio>=4.0.0
huggingface_hub>=0.33.0
sentence-transformers>=2.7.0
faiss-cpu>=1.8.0
pdfplumber>=0.11.0
```
3. The Space will auto-launch `app.py` on startup.
## LLM
The app uses **`meta-llama/Llama-3.2-3B-Instruct`** via the Hugging Face Inference API (serverless).
No GPU required β€” inference runs on HF hosted infrastructure.