atwine's picture
Switch LLM from Gemini to Llama-3.2-3B-Instruct via HF Inference API
a937e6c

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Caps Chatbot Internal
emoji: πŸ’¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
  - inference-api
license: apache-2.0
short_description: CAPS Chatbot β€” Internal Review Portal Co-designed AI peer su

CAPS Chatbot β€” Sanyu (Internal Review Portal)

Co-designed AI peer support for adolescents and young people living with HIV | Expert safety review β€” not for clinical use.


What this project is

Sanyu is a co-designed AI peer support chatbot for adolescents and young people living with HIV (AYPLHIV) aged 15–24 in Uganda. Built by CAPS-IDI, this is an internal review/prototype portal β€” not yet approved for clinical or public use.


Tech Stack

Layer Choice
Frontend/UI Gradio (gr.ChatInterface)
LLM Google Gemini 2.5 Flash via google-genai SDK
Auth Hugging Face OAuth (hf_oauth: true)
Hosting Hugging Face Spaces (Gradio SDK)
Python deps gradio>=4.0.0, google-genai

How the App Works

  1. META_PROMPT β€” A detailed (~370-line) system prompt defining Sanyu's persona, tone, content knowledge, and behavioral rules.
  2. extract_text(content) β€” Utility to handle both plain strings and Gradio's structured [{"type": "text", ...}] message format.
  3. respond(message, history) β€” The chat handler. Converts Gradio's history (supports both dict-format and tuple-format) into Gemini types.Content objects, appends the new user message, then calls client.models.generate_content() with the system prompt injected via GenerateContentConfig.
  4. gr.ChatInterface β€” Wraps respond into a simple web UI with title and description.
  5. API key β€” Loaded from the GOOGLE_API_KEY environment variable (set as a Hugging Face Space secret).

System Prompt Design

The META_PROMPT is the intellectual core of the project. It was co-designed with AYPLHIV and health workers through modified Delphi consensus workshops. It encodes:

12-Dimension Voice Matrix

  1. Empathy & Understanding First β€” acknowledge emotions before giving information
  2. Non-Judgmental Language β€” no blame, no "why didn't you…"
  3. User Agency β€” present options, not directives; user is the decision-maker
  4. Patience / No Time Pressure β€” never rush; let the user lead the pace
  5. Concise by Default β€” 2–4 sentences; no walls of text
  6. Warm but Not Frivolous β€” peer-like language, match the user's energy
  7. Empowerment & Capacity Building β€” build confidence and self-advocacy over time
  8. Comfort & Reassurance β€” affirming, hopeful, counter internalized stigma
  9. Structured Guidance When Requested β€” numbered steps for "how do I…" questions
  10. Evidence-Based with Conversational Delivery β€” factual but accessible; Uganda-specific context
  11. Progressive / Realistic Goals β€” graduated steps, not all-or-nothing advice
  12. Storytelling as Support Tool β€” anonymised vignettes to illustrate how others cope

Content Domains

  • Medication adherence β€” barriers, practical strategies, non-shaming approach
  • Disclosure strategies β€” multiple approaches, user-led, safety-first
  • Mental health & self-stigma β€” normalisation, affirmations, self-acceptance
  • Sexual & reproductive health β€” contraception, STIs, pregnancy, SRH rights
  • Relationships β€” romantic partners, family, peer dynamics
  • GBV safety protocols β€” crisis detection, escalation triggers, referral pathways

Safety & Limits

  • Hard boundary: no medical prescriptions
  • Crisis triggers (suicidal ideation, active abuse, safety risk) β†’ immediate escalation prompt
  • Always refers complex/crisis cases to human counsellors and peer supporters
  • "Referral is a feature, not a failure."

Language & Accessibility

  • Default: English; Luganda code-switching accepted
  • Plain language targeting ~8 years of education
  • Age-adapted: different tone/content for 14–17 vs 18–24 year olds
  • Few-shot examples from real counselling dialogues embedded in the prompt

Known Limitations

Issue Detail
No persistent memory The prompt requires remembering users across sessions, but there is no database or session storage β€” memory only lasts within a single Gradio session
No streaming generate_content() is synchronous β€” users see nothing until the full response is ready
No error handling Unhandled exceptions if the Gemini API fails (rate limit, network error, etc.)

Setup

  1. Add your HF_TOKEN as a Hugging Face Space secret (Settings -> Variables and secrets).
  2. Dependencies are in requirements.txt:
    gradio>=4.0.0
    huggingface_hub>=0.33.0
    sentence-transformers>=2.7.0
    faiss-cpu>=1.8.0
    pdfplumber>=0.11.0
    
  3. The Space will auto-launch app.py on startup.

LLM

The app uses meta-llama/Llama-3.2-3B-Instruct via the Hugging Face Inference API (serverless). No GPU required β€” inference runs on HF hosted infrastructure.