uhc-policy-chatbot / README.md
mxp1404's picture
Upload README.md with huggingface_hub
abdbe55 verified
---
title: UHC Medical Policy Chatbot
emoji: πŸ₯
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
---
# UHC Medical Policy Chatbot
A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements.
## Hosted Chatbot
**URL:** [https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot](https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot)
### How to Use β€” Step-by-Step
1. Open the link above in your browser.
2. Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize).
3. Type your question in the chat input at the bottom β€” for example:
- *"Is bariatric surgery covered for BMI over 40?"*
- *"What documentation is needed for gender-affirming surgery?"*
- *"Are intrapulmonary percussive ventilation devices covered for home use?"*
4. The chatbot will search relevant policy chunks, then stream an answer with citations.
5. Click **"πŸ“š Sources"** below each answer to see the exact policy sections used.
6. Enable **"πŸ”Š Read answers aloud"** in the sidebar to hear answers via Kokoro TTS.
7. Use **"πŸ—‘οΈ Clear conversation"** in the sidebar to start a new session.
The chatbot only answers from official UHC policy documents β€” it will tell you if it doesn't have enough information rather than guessing.
---
## Architecture
### High-Level Design (HLD)
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Browser │────▢│ Streamlit App (HuggingFace Spaces) β”‚
β”‚ (User) │◀────│ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ MedEmbed β”‚ β”‚ Groq API β”‚ β”‚
β”‚ β”‚ (1024-dim) β”‚ β”‚ Llama 3.1 8B β”‚ β”‚
β”‚ β”‚ cached RAM β”‚ β”‚ 560 tok/s β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” context + query β”‚
β”‚ β”‚ Qdrant Cloudβ”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ (vectors) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
**Data flow for each query:**
1. User types a question in the Streamlit chat interface
2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
3. The vector is sent to **Qdrant Cloud** for similarity search β€” returns top-K policy chunks with metadata
4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
6. The response is streamed token-by-token back to the user with source citations
7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser
### Low-Level Design (LLD)
#### Project Structure
```
uhc/
β”œβ”€β”€ app.py # Streamlit web UI entry point
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env.example # Environment variable template
β”‚
β”œβ”€β”€ chatbot/ # Chatbot application layer
β”‚ β”œβ”€β”€ config.py # Centralized config (LLM, retrieval, env vars)
β”‚ β”œβ”€β”€ retriever.py # PolicyRetriever: MedEmbed + Qdrant wrapper
β”‚ β”œβ”€β”€ llm_groq.py # Groq API client (deployed)
β”‚ β”œβ”€β”€ llm.py # Ollama client (local dev)
β”‚ β”œβ”€β”€ prompts.py # System prompt, context formatting, deduplication
β”‚ β”œβ”€β”€ tts.py # Kokoro ONNX text-to-speech
β”‚ └── cli.py # CLI interface (local dev)
β”‚
β”œβ”€β”€ embedding/ # Embedding pipeline
β”‚ └── scripts/
β”‚ β”œβ”€β”€ config.py # Embedding model + Qdrant connection config
β”‚ β”œβ”€β”€ embed_chunks.py # Generate embeddings from RAG chunks
β”‚ β”œβ”€β”€ store_qdrant.py # Upsert embeddings into Qdrant with payload indexes
β”‚ β”œβ”€β”€ search.py # Standalone search CLI for testing
β”‚ └── test_retrieval.py # Batch retrieval evaluation (10 test cases)
β”‚
β”œβ”€β”€ tests/ # Evaluation suite
β”‚ └── eval_100.py # 100-prompt retrieval + LLM evaluation
β”‚
└── scraper/ # Data ingestion pipeline
β”œβ”€β”€ download_policies.py # Scrape PDFs from UHC website
β”œβ”€β”€ extract_pdf_text.py # PDF β†’ structured sections with metadata
β”œβ”€β”€ create_rag_chunks.py # Section-aware semantic chunking
└── data/processed/
β”œβ”€β”€ extracted_sections.json # Extracted text per policy/section
└── rag_chunks.json # Final RAG chunks with metadata
```
#### Module Design
**`chatbot/retriever.py` β€” PolicyRetriever**
- Loads `abhinand/MedEmbed-large-v0.1` (1024-dim medical embeddings) via `sentence-transformers`
- Connects to Qdrant Cloud; supports both cloud and local Qdrant
- Encodes queries β†’ cosine similarity search β†’ returns `ChunkResult` dataclasses
- Filters out low-value sections (References, Application) that pollute results
- Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
- Retry logic with exponential backoff for transient Qdrant errors
**`chatbot/prompts.py` β€” Prompt Engineering**
- System prompt enforces: answer from context only, 2–4 bullet points, cite sources, coverage-awareness
- `deduplicate_chunks()` keeps highest-scoring chunk per (policy, section) pair
- `format_context()` truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars
- Coverage Rationale is explicitly marked as authoritative for coverage decisions
**`chatbot/llm_groq.py` β€” GroqClient**
- Uses `groq` Python SDK with streaming chat completions
- Graceful rate-limit handling (Groq free tier: 250K TPM)
- Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
**`chatbot/tts.py` β€” Text-to-Speech**
- Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB)
- Auto-downloads model files from HuggingFace Hub on first use
- Generates WAV audio from LLM response text, played in-browser via `st.audio`
- Toggleable via sidebar switch β€” disabled by default to save resources
**`scraper/extract_pdf_text.py` β€” PDF Extraction**
- Paragraph-level extraction using `pdfplumber` (not line-by-line)
- Robust header/footer/sidebar removal with regex patterns
- Structured metadata parsing: policy number, effective date, plan type, document type
- Table extraction support; skips boilerplate sections and HTML-disguised files
**`scraper/create_rag_chunks.py` β€” Semantic Chunking**
- Section-aware chunking: different strategies per section type
- Coverage Rationale β†’ criteria-based splitting
- Applicable Codes β†’ table-aware chunking
- Clinical Evidence β†’ study-based splitting
- Others β†’ paragraph-aware with sentence-boundary overlap
- Rich metadata per chunk: policy name, section, plan type, page range, provider
- Deterministic chunk IDs for deduplication during re-indexing
**`embedding/scripts/embed_chunks.py` β€” Embedding Generation**
- Prepends metadata to chunk text before encoding for better retrieval
- Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection
- Saves to `.npz` for efficient storage and reloading
**`embedding/scripts/store_qdrant.py` β€” Vector Storage**
- Creates Qdrant collection with cosine distance
- Upserts embeddings with full metadata payloads
- Creates payload indexes on `section`, `policy_name`, `plan_type`, `doc_type`, `provider` for efficient filtered search
#### Edge Cases Handled
| Edge Case | Handling |
|---|---|
| Empty / whitespace query | Warning message, no API call |
| Qdrant connection failure | Retry with exponential backoff (3 attempts) |
| Groq rate limit (429) | Caught and shown as user-friendly message |
| No relevant chunks found | "I don't have enough policy information" |
| Coverage vs. evidence conflict | System prompt + Coverage Rationale boost ensures correct answer |
| Very long conversation | History trimmed to last 3 turns |
| Model loading on first visit | Spinner shown; cached with `st.cache_resource` |
---
## Extending for Other Insurance Providers
The system is designed for multi-provider extensibility:
1. **Data layer**: Each chunk in Qdrant has a `provider` field (currently `"UnitedHealthcare"`). Adding a new provider means running the same pipeline with a new provider slug β€” chunks coexist in the same collection.
2. **Scraper**: `scraper/download_policies.py` can be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures.
3. **Embedding**: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones.
4. **Retrieval**: Add a `provider_filter` parameter to narrow results by provider, or query across all providers simultaneously.
5. **UI**: Add a provider selector dropdown in the Streamlit sidebar β€” one line change.
```python
# Example: adding Aetna
retriever.retrieve(query, provider_filter="aetna")
```
---
## Local Development Setup
```bash
# 1. Clone the repo
git clone https://github.com/<your-username>/uhc-policy-chatbot.git
cd uhc-policy-chatbot
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment variables
cp .env.example .env
# Edit .env with your Qdrant and Groq API keys
# 5. Run the Streamlit app
streamlit run app.py
# Or use the CLI with Ollama (local LLM)
ollama serve &
ollama pull phi3.5
python -m chatbot.cli
```
### Environment Variables
| Variable | Description | Required |
|---|---|---|
| `QDRANT_URL` | Qdrant Cloud cluster URL | Yes |
| `QDRANT_API_KEY` | Qdrant Cloud API key | Yes |
| `QDRANT_COLLECTION` | Collection name (default: `uhc_policies`) | No |
| `GROQ_API_KEY` | Groq API key ([get free](https://console.groq.com/keys)) | Yes (web) |
| `GROQ_MODEL` | Groq model (default: `llama-3.1-8b-instant`) | No |
---
## Tech Stack
| Component | Technology |
|---|---|
| Embedding Model | [MedEmbed-large-v0.1](https://huggingface.co/abhinand/MedEmbed-large-v0.1) (1024-dim) |
| Vector Database | [Qdrant Cloud](https://qdrant.tech/) |
| LLM (deployed) | [Llama 3.1 8B](https://console.groq.com/) via Groq (560 tok/s) |
| LLM (local dev) | Phi-3.5 Mini via Ollama |
| Web Framework | Streamlit |
| Hosting | HuggingFace Spaces (free tier) |
| Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) |
| PDF Extraction | pdfplumber + BeautifulSoup |