Spaces:

mxp1404
/

uhc-policy-chatbot

Sleeping

App Files Files Community

uhc-policy-chatbot / README.md

mxp1404

Upload README.md with huggingface_hub

abdbe55 verified 3 months ago

preview code

raw

history blame contribute delete

12.1 kB

	---
	title: UHC Medical Policy Chatbot
	emoji: 🏥
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.44.1
	app_file: app.py
	pinned: false
	---

	# UHC Medical Policy Chatbot

	A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements.

	## Hosted Chatbot

	URL: [https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot](https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot)

	### How to Use — Step-by-Step

	1. Open the link above in your browser.
	2. Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize).
	3. Type your question in the chat input at the bottom — for example:
	- "Is bariatric surgery covered for BMI over 40?"
	- "What documentation is needed for gender-affirming surgery?"
	- "Are intrapulmonary percussive ventilation devices covered for home use?"
	4. The chatbot will search relevant policy chunks, then stream an answer with citations.
	5. Click "📚 Sources" below each answer to see the exact policy sections used.
	6. Enable "🔊 Read answers aloud" in the sidebar to hear answers via Kokoro TTS.
	7. Use "🗑️ Clear conversation" in the sidebar to start a new session.

	The chatbot only answers from official UHC policy documents — it will tell you if it doesn't have enough information rather than guessing.

	---

	## Architecture

	### High-Level Design (HLD)

	```
	┌─────────────┐ ┌──────────────────────────────────────────────┐
	│ Browser │────▶│ Streamlit App (HuggingFace Spaces) │
	│ (User) │◀────│ │
	└─────────────┘ │ ┌─────────────┐ ┌─────────────────────┐ │
	│ │ MedEmbed │ │ Groq API │ │
	│ │ (1024-dim) │ │ Llama 3.1 8B │ │
	│ │ cached RAM │ │ 560 tok/s │ │
	│ └──────┬──────┘ └──────▲──────────────┘ │
	│ │ │ │
	│ ▼ │ │
	│ ┌─────────────┐ context + query │
	│ │ Qdrant Cloud│────────────┘ │
	│ │ (vectors) │ │
	│ └─────────────┘ │
	└──────────────────────────────────────────────┘
	```

	Data flow for each query:

	1. User types a question in the Streamlit chat interface
	2. The query is encoded into a 1024-dimensional vector using MedEmbed (loaded once, cached in memory)
	3. The vector is sent to Qdrant Cloud for similarity search — returns top-K policy chunks with metadata
	4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
	5. The context + query + system prompt are sent to Groq API (Llama 3.1 8B) for answer generation
	6. The response is streamed token-by-token back to the user with source citations
	7. If TTS is enabled, the response text is synthesized into audio using Kokoro ONNX and played in-browser

	### Low-Level Design (LLD)

	#### Project Structure

	```
	uhc/
	├── app.py # Streamlit web UI entry point
	├── requirements.txt # Python dependencies
	├── .env.example # Environment variable template
	│
	├── chatbot/ # Chatbot application layer
	│ ├── config.py # Centralized config (LLM, retrieval, env vars)
	│ ├── retriever.py # PolicyRetriever: MedEmbed + Qdrant wrapper
	│ ├── llm_groq.py # Groq API client (deployed)
	│ ├── llm.py # Ollama client (local dev)
	│ ├── prompts.py # System prompt, context formatting, deduplication
	│ ├── tts.py # Kokoro ONNX text-to-speech
	│ └── cli.py # CLI interface (local dev)
	│
	├── embedding/ # Embedding pipeline
	│ └── scripts/
	│ ├── config.py # Embedding model + Qdrant connection config
	│ ├── embed_chunks.py # Generate embeddings from RAG chunks
	│ ├── store_qdrant.py # Upsert embeddings into Qdrant with payload indexes
	│ ├── search.py # Standalone search CLI for testing
	│ └── test_retrieval.py # Batch retrieval evaluation (10 test cases)
	│
	├── tests/ # Evaluation suite
	│ └── eval_100.py # 100-prompt retrieval + LLM evaluation
	│
	└── scraper/ # Data ingestion pipeline
	├── download_policies.py # Scrape PDFs from UHC website
	├── extract_pdf_text.py # PDF → structured sections with metadata
	├── create_rag_chunks.py # Section-aware semantic chunking
	└── data/processed/
	├── extracted_sections.json # Extracted text per policy/section
	└── rag_chunks.json # Final RAG chunks with metadata
	```

	#### Module Design

	`chatbot/retriever.py` — PolicyRetriever
	- Loads `abhinand/MedEmbed-large-v0.1` (1024-dim medical embeddings) via `sentence-transformers`
	- Connects to Qdrant Cloud; supports both cloud and local Qdrant
	- Encodes queries → cosine similarity search → returns `ChunkResult` dataclasses
	- Filters out low-value sections (References, Application) that pollute results
	- Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
	- Retry logic with exponential backoff for transient Qdrant errors

	`chatbot/prompts.py` — Prompt Engineering
	- System prompt enforces: answer from context only, 2–4 bullet points, cite sources, coverage-awareness
	- `deduplicate_chunks()` keeps highest-scoring chunk per (policy, section) pair
	- `format_context()` truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars
	- Coverage Rationale is explicitly marked as authoritative for coverage decisions

	`chatbot/llm_groq.py` — GroqClient
	- Uses `groq` Python SDK with streaming chat completions
	- Graceful rate-limit handling (Groq free tier: 250K TPM)
	- Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability

	`chatbot/tts.py` — Text-to-Speech
	- Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB)
	- Auto-downloads model files from HuggingFace Hub on first use
	- Generates WAV audio from LLM response text, played in-browser via `st.audio`
	- Toggleable via sidebar switch — disabled by default to save resources

	`scraper/extract_pdf_text.py` — PDF Extraction
	- Paragraph-level extraction using `pdfplumber` (not line-by-line)
	- Robust header/footer/sidebar removal with regex patterns
	- Structured metadata parsing: policy number, effective date, plan type, document type
	- Table extraction support; skips boilerplate sections and HTML-disguised files

	`scraper/create_rag_chunks.py` — Semantic Chunking
	- Section-aware chunking: different strategies per section type
	- Coverage Rationale → criteria-based splitting
	- Applicable Codes → table-aware chunking
	- Clinical Evidence → study-based splitting
	- Others → paragraph-aware with sentence-boundary overlap
	- Rich metadata per chunk: policy name, section, plan type, page range, provider
	- Deterministic chunk IDs for deduplication during re-indexing

	`embedding/scripts/embed_chunks.py` — Embedding Generation
	- Prepends metadata to chunk text before encoding for better retrieval
	- Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection
	- Saves to `.npz` for efficient storage and reloading

	`embedding/scripts/store_qdrant.py` — Vector Storage
	- Creates Qdrant collection with cosine distance
	- Upserts embeddings with full metadata payloads
	- Creates payload indexes on `section`, `policy_name`, `plan_type`, `doc_type`, `provider` for efficient filtered search

	#### Edge Cases Handled

	\| Edge Case \| Handling \|
	\|---\|---\|
	\| Empty / whitespace query \| Warning message, no API call \|
	\| Qdrant connection failure \| Retry with exponential backoff (3 attempts) \|
	\| Groq rate limit (429) \| Caught and shown as user-friendly message \|
	\| No relevant chunks found \| "I don't have enough policy information" \|
	\| Coverage vs. evidence conflict \| System prompt + Coverage Rationale boost ensures correct answer \|
	\| Very long conversation \| History trimmed to last 3 turns \|
	\| Model loading on first visit \| Spinner shown; cached with `st.cache_resource` \|

	---

	## Extending for Other Insurance Providers

	The system is designed for multi-provider extensibility:

	1. Data layer: Each chunk in Qdrant has a `provider` field (currently `"UnitedHealthcare"`). Adding a new provider means running the same pipeline with a new provider slug — chunks coexist in the same collection.

	2. Scraper: `scraper/download_policies.py` can be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures.

	3. Embedding: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones.

	4. Retrieval: Add a `provider_filter` parameter to narrow results by provider, or query across all providers simultaneously.

	5. UI: Add a provider selector dropdown in the Streamlit sidebar — one line change.

	```python
	# Example: adding Aetna
	retriever.retrieve(query, provider_filter="aetna")
	```

	---

	## Local Development Setup

	```bash
	# 1. Clone the repo
	git clone https://github.com/<your-username>/uhc-policy-chatbot.git
	cd uhc-policy-chatbot

	# 2. Create virtual environment
	python3 -m venv venv
	source venv/bin/activate

	# 3. Install dependencies
	pip install -r requirements.txt

	# 4. Configure environment variables
	cp .env.example .env
	# Edit .env with your Qdrant and Groq API keys

	# 5. Run the Streamlit app
	streamlit run app.py

	# Or use the CLI with Ollama (local LLM)
	ollama serve &
	ollama pull phi3.5
	python -m chatbot.cli
	```

	### Environment Variables

	\| Variable \| Description \| Required \|
	\|---\|---\|---\|
	\| `QDRANT_URL` \| Qdrant Cloud cluster URL \| Yes \|
	\| `QDRANT_API_KEY` \| Qdrant Cloud API key \| Yes \|
	\| `QDRANT_COLLECTION` \| Collection name (default: `uhc_policies`) \| No \|
	\| `GROQ_API_KEY` \| Groq API key ([get free](https://console.groq.com/keys)) \| Yes (web) \|
	\| `GROQ_MODEL` \| Groq model (default: `llama-3.1-8b-instant`) \| No \|

	---

	## Tech Stack

	\| Component \| Technology \|
	\|---\|---\|
	\| Embedding Model \| [MedEmbed-large-v0.1](https://huggingface.co/abhinand/MedEmbed-large-v0.1) (1024-dim) \|
	\| Vector Database \| [Qdrant Cloud](https://qdrant.tech/) \|
	\| LLM (deployed) \| [Llama 3.1 8B](https://console.groq.com/) via Groq (560 tok/s) \|
	\| LLM (local dev) \| Phi-3.5 Mini via Ollama \|
	\| Web Framework \| Streamlit \|
	\| Hosting \| HuggingFace Spaces (free tier) \|
	\| Text-to-Speech \| [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) \|
	\| PDF Extraction \| pdfplumber + BeautifulSoup \|