Spaces:
Sleeping
Sleeping
| title: UHC Medical Policy Chatbot | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| # UHC Medical Policy Chatbot | |
| A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements. | |
| ## Hosted Chatbot | |
| **URL:** [https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot](https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot) | |
| ### How to Use β Step-by-Step | |
| 1. Open the link above in your browser. | |
| 2. Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize). | |
| 3. Type your question in the chat input at the bottom β for example: | |
| - *"Is bariatric surgery covered for BMI over 40?"* | |
| - *"What documentation is needed for gender-affirming surgery?"* | |
| - *"Are intrapulmonary percussive ventilation devices covered for home use?"* | |
| 4. The chatbot will search relevant policy chunks, then stream an answer with citations. | |
| 5. Click **"π Sources"** below each answer to see the exact policy sections used. | |
| 6. Enable **"π Read answers aloud"** in the sidebar to hear answers via Kokoro TTS. | |
| 7. Use **"ποΈ Clear conversation"** in the sidebar to start a new session. | |
| The chatbot only answers from official UHC policy documents β it will tell you if it doesn't have enough information rather than guessing. | |
| --- | |
| ## Architecture | |
| ### High-Level Design (HLD) | |
| ``` | |
| βββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Browser ββββββΆβ Streamlit App (HuggingFace Spaces) β | |
| β (User) βββββββ β | |
| βββββββββββββββ β βββββββββββββββ βββββββββββββββββββββββ β | |
| β β MedEmbed β β Groq API β β | |
| β β (1024-dim) β β Llama 3.1 8B β β | |
| β β cached RAM β β 560 tok/s β β | |
| β ββββββββ¬βββββββ ββββββββ²βββββββββββββββ β | |
| β β β β | |
| β βΌ β β | |
| β βββββββββββββββ context + query β | |
| β β Qdrant Cloudββββββββββββββ β | |
| β β (vectors) β β | |
| β βββββββββββββββ β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Data flow for each query:** | |
| 1. User types a question in the Streamlit chat interface | |
| 2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory) | |
| 3. The vector is sent to **Qdrant Cloud** for similarity search β returns top-K policy chunks with metadata | |
| 4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block | |
| 5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation | |
| 6. The response is streamed token-by-token back to the user with source citations | |
| 7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser | |
| ### Low-Level Design (LLD) | |
| #### Project Structure | |
| ``` | |
| uhc/ | |
| βββ app.py # Streamlit web UI entry point | |
| βββ requirements.txt # Python dependencies | |
| βββ .env.example # Environment variable template | |
| β | |
| βββ chatbot/ # Chatbot application layer | |
| β βββ config.py # Centralized config (LLM, retrieval, env vars) | |
| β βββ retriever.py # PolicyRetriever: MedEmbed + Qdrant wrapper | |
| β βββ llm_groq.py # Groq API client (deployed) | |
| β βββ llm.py # Ollama client (local dev) | |
| β βββ prompts.py # System prompt, context formatting, deduplication | |
| β βββ tts.py # Kokoro ONNX text-to-speech | |
| β βββ cli.py # CLI interface (local dev) | |
| β | |
| βββ embedding/ # Embedding pipeline | |
| β βββ scripts/ | |
| β βββ config.py # Embedding model + Qdrant connection config | |
| β βββ embed_chunks.py # Generate embeddings from RAG chunks | |
| β βββ store_qdrant.py # Upsert embeddings into Qdrant with payload indexes | |
| β βββ search.py # Standalone search CLI for testing | |
| β βββ test_retrieval.py # Batch retrieval evaluation (10 test cases) | |
| β | |
| βββ tests/ # Evaluation suite | |
| β βββ eval_100.py # 100-prompt retrieval + LLM evaluation | |
| β | |
| βββ scraper/ # Data ingestion pipeline | |
| βββ download_policies.py # Scrape PDFs from UHC website | |
| βββ extract_pdf_text.py # PDF β structured sections with metadata | |
| βββ create_rag_chunks.py # Section-aware semantic chunking | |
| βββ data/processed/ | |
| βββ extracted_sections.json # Extracted text per policy/section | |
| βββ rag_chunks.json # Final RAG chunks with metadata | |
| ``` | |
| #### Module Design | |
| **`chatbot/retriever.py` β PolicyRetriever** | |
| - Loads `abhinand/MedEmbed-large-v0.1` (1024-dim medical embeddings) via `sentence-transformers` | |
| - Connects to Qdrant Cloud; supports both cloud and local Qdrant | |
| - Encodes queries β cosine similarity search β returns `ChunkResult` dataclasses | |
| - Filters out low-value sections (References, Application) that pollute results | |
| - Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies | |
| - Retry logic with exponential backoff for transient Qdrant errors | |
| **`chatbot/prompts.py` β Prompt Engineering** | |
| - System prompt enforces: answer from context only, 2β4 bullet points, cite sources, coverage-awareness | |
| - `deduplicate_chunks()` keeps highest-scoring chunk per (policy, section) pair | |
| - `format_context()` truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars | |
| - Coverage Rationale is explicitly marked as authoritative for coverage decisions | |
| **`chatbot/llm_groq.py` β GroqClient** | |
| - Uses `groq` Python SDK with streaming chat completions | |
| - Graceful rate-limit handling (Groq free tier: 250K TPM) | |
| - Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability | |
| **`chatbot/tts.py` β Text-to-Speech** | |
| - Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB) | |
| - Auto-downloads model files from HuggingFace Hub on first use | |
| - Generates WAV audio from LLM response text, played in-browser via `st.audio` | |
| - Toggleable via sidebar switch β disabled by default to save resources | |
| **`scraper/extract_pdf_text.py` β PDF Extraction** | |
| - Paragraph-level extraction using `pdfplumber` (not line-by-line) | |
| - Robust header/footer/sidebar removal with regex patterns | |
| - Structured metadata parsing: policy number, effective date, plan type, document type | |
| - Table extraction support; skips boilerplate sections and HTML-disguised files | |
| **`scraper/create_rag_chunks.py` β Semantic Chunking** | |
| - Section-aware chunking: different strategies per section type | |
| - Coverage Rationale β criteria-based splitting | |
| - Applicable Codes β table-aware chunking | |
| - Clinical Evidence β study-based splitting | |
| - Others β paragraph-aware with sentence-boundary overlap | |
| - Rich metadata per chunk: policy name, section, plan type, page range, provider | |
| - Deterministic chunk IDs for deduplication during re-indexing | |
| **`embedding/scripts/embed_chunks.py` β Embedding Generation** | |
| - Prepends metadata to chunk text before encoding for better retrieval | |
| - Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection | |
| - Saves to `.npz` for efficient storage and reloading | |
| **`embedding/scripts/store_qdrant.py` β Vector Storage** | |
| - Creates Qdrant collection with cosine distance | |
| - Upserts embeddings with full metadata payloads | |
| - Creates payload indexes on `section`, `policy_name`, `plan_type`, `doc_type`, `provider` for efficient filtered search | |
| #### Edge Cases Handled | |
| | Edge Case | Handling | | |
| |---|---| | |
| | Empty / whitespace query | Warning message, no API call | | |
| | Qdrant connection failure | Retry with exponential backoff (3 attempts) | | |
| | Groq rate limit (429) | Caught and shown as user-friendly message | | |
| | No relevant chunks found | "I don't have enough policy information" | | |
| | Coverage vs. evidence conflict | System prompt + Coverage Rationale boost ensures correct answer | | |
| | Very long conversation | History trimmed to last 3 turns | | |
| | Model loading on first visit | Spinner shown; cached with `st.cache_resource` | | |
| --- | |
| ## Extending for Other Insurance Providers | |
| The system is designed for multi-provider extensibility: | |
| 1. **Data layer**: Each chunk in Qdrant has a `provider` field (currently `"UnitedHealthcare"`). Adding a new provider means running the same pipeline with a new provider slug β chunks coexist in the same collection. | |
| 2. **Scraper**: `scraper/download_policies.py` can be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures. | |
| 3. **Embedding**: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones. | |
| 4. **Retrieval**: Add a `provider_filter` parameter to narrow results by provider, or query across all providers simultaneously. | |
| 5. **UI**: Add a provider selector dropdown in the Streamlit sidebar β one line change. | |
| ```python | |
| # Example: adding Aetna | |
| retriever.retrieve(query, provider_filter="aetna") | |
| ``` | |
| --- | |
| ## Local Development Setup | |
| ```bash | |
| # 1. Clone the repo | |
| git clone https://github.com/<your-username>/uhc-policy-chatbot.git | |
| cd uhc-policy-chatbot | |
| # 2. Create virtual environment | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| # 3. Install dependencies | |
| pip install -r requirements.txt | |
| # 4. Configure environment variables | |
| cp .env.example .env | |
| # Edit .env with your Qdrant and Groq API keys | |
| # 5. Run the Streamlit app | |
| streamlit run app.py | |
| # Or use the CLI with Ollama (local LLM) | |
| ollama serve & | |
| ollama pull phi3.5 | |
| python -m chatbot.cli | |
| ``` | |
| ### Environment Variables | |
| | Variable | Description | Required | | |
| |---|---|---| | |
| | `QDRANT_URL` | Qdrant Cloud cluster URL | Yes | | |
| | `QDRANT_API_KEY` | Qdrant Cloud API key | Yes | | |
| | `QDRANT_COLLECTION` | Collection name (default: `uhc_policies`) | No | | |
| | `GROQ_API_KEY` | Groq API key ([get free](https://console.groq.com/keys)) | Yes (web) | | |
| | `GROQ_MODEL` | Groq model (default: `llama-3.1-8b-instant`) | No | | |
| --- | |
| ## Tech Stack | |
| | Component | Technology | | |
| |---|---| | |
| | Embedding Model | [MedEmbed-large-v0.1](https://huggingface.co/abhinand/MedEmbed-large-v0.1) (1024-dim) | | |
| | Vector Database | [Qdrant Cloud](https://qdrant.tech/) | | |
| | LLM (deployed) | [Llama 3.1 8B](https://console.groq.com/) via Groq (560 tok/s) | | |
| | LLM (local dev) | Phi-3.5 Mini via Ollama | | |
| | Web Framework | Streamlit | | |
| | Hosting | HuggingFace Spaces (free tier) | | |
| | Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) | | |
| | PDF Extraction | pdfplumber + BeautifulSoup | | |