Spaces:
Sleeping
Sleeping
Commit ·
b69b364
1
Parent(s): c217741
First commit
Browse files- .gitattributes +1 -0
- Dockerfile +2 -1
- README.md +169 -10
- requirements.txt +5 -1
- src/config.py +36 -0
- src/query_rag.py +309 -0
- src/search_engine.py +46 -0
- src/streamlit_app.py +242 -38
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.db filter=lfs diff=lfs merge=lfs -text
|
Dockerfile
CHANGED
|
@@ -5,8 +5,9 @@ WORKDIR /app
|
|
| 5 |
RUN apt-get update && apt-get install -y \
|
| 6 |
build-essential \
|
| 7 |
curl \
|
|
|
|
| 8 |
git \
|
| 9 |
-
|
| 10 |
|
| 11 |
COPY requirements.txt ./
|
| 12 |
COPY src/ ./src/
|
|
|
|
| 5 |
RUN apt-get update && apt-get install -y \
|
| 6 |
build-essential \
|
| 7 |
curl \
|
| 8 |
+
software-properties-common \
|
| 9 |
git \
|
| 10 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 11 |
|
| 12 |
COPY requirements.txt ./
|
| 13 |
COPY src/ ./src/
|
README.md
CHANGED
|
@@ -1,20 +1,179 @@
|
|
| 1 |
---
|
| 2 |
-
title: NHS
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: docker
|
| 7 |
app_port: 8501
|
| 8 |
tags:
|
| 9 |
- streamlit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
pinned: false
|
| 11 |
-
short_description:
|
| 12 |
-
license: agpl-3.0
|
| 13 |
---
|
| 14 |
|
| 15 |
-
#
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: NHS Clinical Assistant
|
| 3 |
+
emoji: 🩺
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: docker
|
| 7 |
app_port: 8501
|
| 8 |
tags:
|
| 9 |
- streamlit
|
| 10 |
+
- healthcare
|
| 11 |
+
- nhs
|
| 12 |
+
- rag
|
| 13 |
+
- llm
|
| 14 |
+
- medical
|
| 15 |
pinned: false
|
| 16 |
+
short_description: RAG-powered NHS health information chatbot
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# NHS Clinical Assistant
|
| 20 |
|
| 21 |
+
A RAG-based chatbot for querying NHS health condition information. This application uses Retrieval-Augmented Generation to provide accurate, evidence-based responses from official NHS health documentation.
|
| 22 |
|
| 23 |
+
## 🌟 Features
|
| 24 |
+
|
| 25 |
+
- **NHS Health Information Search**: Search through NHS health conditions using semantic search powered by Voyage AI embeddings
|
| 26 |
+
- **RAG-powered Chat**: Ask questions and get contextually relevant answers from NHS health information with source citations
|
| 27 |
+
- **Multiple LLM Support**: Choose between Gemini models (2.5-flash, 2.5-flash-lite, 2.5-pro) for generating responses
|
| 28 |
+
- **Source Attribution**: All responses include links to original NHS web pages
|
| 29 |
+
- **Streaming Responses**: Real-time response generation for better user experience
|
| 30 |
+
- **Interactive Interface**: Clean Streamlit frontend optimized for healthcare information queries
|
| 31 |
+
|
| 32 |
+
## 📁 Project Structure
|
| 33 |
+
|
| 34 |
+
### Core Application Files
|
| 35 |
+
|
| 36 |
+
#### [`src/streamlit_app.py`](src/streamlit_app.py)
|
| 37 |
+
Main Streamlit application interface providing:
|
| 38 |
+
- User-friendly web interface for NHS health information queries
|
| 39 |
+
- Chat interface with conversation history
|
| 40 |
+
- Model selection (Gemini variants)
|
| 41 |
+
- Source attribution display with NHS links
|
| 42 |
+
- Suggested queries for common health topics
|
| 43 |
+
|
| 44 |
+
#### [`src/query_rag.py`](src/query_rag.py)
|
| 45 |
+
RAG (Retrieval-Augmented Generation) system that handles:
|
| 46 |
+
- Query processing and validation
|
| 47 |
+
- Integration with search engine and LLM clients
|
| 48 |
+
- Context generation from NHS health documents
|
| 49 |
+
- Streaming response generation
|
| 50 |
+
- Source extraction and formatting
|
| 51 |
+
- Can be used as standalone CLI tool for testing
|
| 52 |
+
|
| 53 |
+
#### [`src/search_engine.py`](src/search_engine.py)
|
| 54 |
+
Search functionality using Pinecone vector database:
|
| 55 |
+
- Similarity search using Voyage AI embeddings (voyage-context-3 model)
|
| 56 |
+
- Integration with Pinecone vector database
|
| 57 |
+
- NHS health information retrieval
|
| 58 |
+
|
| 59 |
+
### Configuration
|
| 60 |
+
|
| 61 |
+
#### [`src/config.py`](src/config.py)
|
| 62 |
+
Centralized configuration management:
|
| 63 |
+
- NHS source configuration
|
| 64 |
+
- System prompts and error messages
|
| 65 |
+
- Default search parameters
|
| 66 |
+
|
| 67 |
+
### Infrastructure
|
| 68 |
+
|
| 69 |
+
#### [`requirements.txt`](requirements.txt)
|
| 70 |
+
Python dependencies:
|
| 71 |
+
- `streamlit==1.40.1` - Web application framework
|
| 72 |
+
- `openai` - LLM client (used for Gemini API access)
|
| 73 |
+
- `voyageai` - Embedding generation
|
| 74 |
+
- `pinecone` - Vector database client
|
| 75 |
+
- `pandas` - Data manipulation
|
| 76 |
+
- `altair` - Visualization support
|
| 77 |
+
|
| 78 |
+
#### [`Dockerfile`](Dockerfile)
|
| 79 |
+
Container configuration for deployment:
|
| 80 |
+
- Python 3.9 base image
|
| 81 |
+
- Production-ready setup
|
| 82 |
+
- Health check configuration
|
| 83 |
+
- Streamlit server configuration
|
| 84 |
+
|
| 85 |
+
## 🚀 Getting Started
|
| 86 |
+
|
| 87 |
+
### Prerequisites
|
| 88 |
+
- Python 3.9+
|
| 89 |
+
- Gemini API key (for LLM responses)
|
| 90 |
+
- Voyage AI API key (for embeddings)
|
| 91 |
+
- Pinecone API key (for vector search)
|
| 92 |
+
|
| 93 |
+
### Environment Variables
|
| 94 |
+
Set the following environment variables:
|
| 95 |
+
```bash
|
| 96 |
+
export GEMINI_API_KEY=your_gemini_api_key
|
| 97 |
+
export VOYAGE_API_KEY=your_voyage_api_key
|
| 98 |
+
export PINECONE_API_KEY=your_pinecone_api_key
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### Installation
|
| 102 |
+
1. Clone the repository
|
| 103 |
+
2. Install dependencies:
|
| 104 |
+
```bash
|
| 105 |
+
pip install -r requirements.txt
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
### Run the application
|
| 109 |
+
```bash
|
| 110 |
+
streamlit run src/streamlit_app.py
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
The application will be available at `http://localhost:8501`
|
| 114 |
+
|
| 115 |
+
### Docker Deployment
|
| 116 |
+
```bash
|
| 117 |
+
docker build -t nhs-clinical-assistant .
|
| 118 |
+
docker run -p 8501:8501 \
|
| 119 |
+
-e GEMINI_API_KEY=your_gemini_api_key \
|
| 120 |
+
-e VOYAGE_API_KEY=your_voyage_api_key \
|
| 121 |
+
-e PINECONE_API_KEY=your_pinecone_api_key \
|
| 122 |
+
nhs-clinical-assistant
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
## 🔧 Usage
|
| 126 |
+
|
| 127 |
+
### Web Interface
|
| 128 |
+
1. Open the application in your browser
|
| 129 |
+
2. Select your preferred Gemini model from the sidebar
|
| 130 |
+
3. Type your NHS health-related question in the chat input
|
| 131 |
+
4. View the response with source attribution
|
| 132 |
+
5. Click "View Sources" to see NHS page references
|
| 133 |
+
|
| 134 |
+
### CLI Usage
|
| 135 |
+
Test the RAG system directly:
|
| 136 |
+
```bash
|
| 137 |
+
python src/query_rag.py --query_text "What are the symptoms of ADHD in adults?" --llm_model "gemini-2.5-flash"
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
### Example Queries
|
| 141 |
+
- "What are the symptoms of ADHD in adults?"
|
| 142 |
+
- "How is type 2 diabetes diagnosed?"
|
| 143 |
+
- "What are the treatment options for depression?"
|
| 144 |
+
|
| 145 |
+
## 🏗️ Architecture
|
| 146 |
+
|
| 147 |
+
The system uses a simple but effective RAG architecture:
|
| 148 |
+
|
| 149 |
+
1. **Query Processing**: User query is validated and processed
|
| 150 |
+
2. **Vector Search**: Query is embedded using Voyage AI and searched against Pinecone vector database containing NHS health information
|
| 151 |
+
3. **Context Generation**: Retrieved NHS documents are formatted into context
|
| 152 |
+
4. **LLM Response**: Gemini generates response based strictly on NHS context
|
| 153 |
+
5. **Source Attribution**: Original NHS page links are provided with responses
|
| 154 |
+
|
| 155 |
+
## 📊 Data Sources
|
| 156 |
+
|
| 157 |
+
The system is built on NHS health condition information, stored in a Pinecone vector database with the namespace `nhs_guidelines_voyage_3_large`. All responses include proper attribution to NHS sources with direct links to official NHS web pages.
|
| 158 |
+
|
| 159 |
+
## ⚠️ Important Notes
|
| 160 |
+
|
| 161 |
+
- **Medical Disclaimer**: This tool provides information from NHS sources but should not replace professional medical advice
|
| 162 |
+
- **Data Accuracy**: Always consult official NHS sources for the most current information
|
| 163 |
+
- **Context Limitation**: The system only responds based on information available in the indexed NHS documents
|
| 164 |
+
|
| 165 |
+
## 📄 License
|
| 166 |
+
|
| 167 |
+
This project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.
|
| 168 |
+
|
| 169 |
+
### Code License
|
| 170 |
+
The source code of this application is released under AGPL-3.0, which means:
|
| 171 |
+
- You can freely use, modify, and distribute this software
|
| 172 |
+
- Any modifications or derivative works must also be released under AGPL-3.0
|
| 173 |
+
- If you run this software as a network service, you must provide the source code to users
|
| 174 |
+
- See the [LICENSE](LICENSE) file for full terms
|
| 175 |
+
|
| 176 |
+
### NHS Data Usage
|
| 177 |
+
This tool utilizes NHS health information under the [Open Government Licence](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/). All NHS content remains subject to their original terms and conditions and is used for informational purposes in compliance with UK public sector information licensing.
|
| 178 |
+
|
| 179 |
+
**Note**: While the application code is AGPL-3.0 licensed, the NHS health information content accessed through this application remains under Crown Copyright and the Open Government Licence.
|
requirements.txt
CHANGED
|
@@ -1,3 +1,7 @@
|
|
| 1 |
altair
|
| 2 |
pandas
|
| 3 |
-
streamlit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
altair
|
| 2 |
pandas
|
| 3 |
+
streamlit==1.40.1
|
| 4 |
+
openai
|
| 5 |
+
pandas
|
| 6 |
+
voyageai
|
| 7 |
+
pinecone
|
src/config.py
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from enum import Enum
|
| 3 |
+
from typing import Dict, NamedTuple
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
|
| 6 |
+
class InfoSource(Enum):
|
| 7 |
+
NHS = "nhs"
|
| 8 |
+
|
| 9 |
+
@dataclass
|
| 10 |
+
class SourceConfig:
|
| 11 |
+
context_description: str
|
| 12 |
+
not_found_message: str
|
| 13 |
+
|
| 14 |
+
class Config:
|
| 15 |
+
"""Configuration settings for the RAG system"""
|
| 16 |
+
|
| 17 |
+
# Default similarity search parameters
|
| 18 |
+
DEFAULT_SIMILARITY_K = 5
|
| 19 |
+
|
| 20 |
+
SOURCE_CONFIGS = {
|
| 21 |
+
InfoSource.NHS: SourceConfig(
|
| 22 |
+
context_description="NHS health conditions and medical information",
|
| 23 |
+
not_found_message="no relevant NHS health information is available to answer this question"
|
| 24 |
+
)
|
| 25 |
+
}
|
| 26 |
+
|
| 27 |
+
@classmethod
|
| 28 |
+
def get_source_config(cls, source: str) -> SourceConfig:
|
| 29 |
+
"""Get configuration for a source"""
|
| 30 |
+
try:
|
| 31 |
+
source_enum = InfoSource(source.lower())
|
| 32 |
+
return cls.SOURCE_CONFIGS[source_enum]
|
| 33 |
+
except ValueError:
|
| 34 |
+
raise ValueError(f"Unknown source: {source}. Valid sources: {[s.value for s in InfoSource]}")
|
| 35 |
+
|
| 36 |
+
|
src/query_rag.py
ADDED
|
@@ -0,0 +1,309 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import time
|
| 3 |
+
import argparse
|
| 4 |
+
import logging
|
| 5 |
+
import re
|
| 6 |
+
from typing import Dict, List, Optional, Generator, Tuple
|
| 7 |
+
from openai import OpenAI
|
| 8 |
+
from config import Config, InfoSource
|
| 9 |
+
from search_engine import SearchEngine
|
| 10 |
+
import voyageai
|
| 11 |
+
|
| 12 |
+
# Setup logging
|
| 13 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 14 |
+
logger = logging.getLogger(__name__)
|
| 15 |
+
|
| 16 |
+
class RAGSystem:
|
| 17 |
+
"""Main RAG system class"""
|
| 18 |
+
|
| 19 |
+
def __init__(self, shared_data=None):
|
| 20 |
+
self.config = Config()
|
| 21 |
+
|
| 22 |
+
# Initialize clients
|
| 23 |
+
gemini_api_key = os.getenv("GEMINI_API_KEY")
|
| 24 |
+
if gemini_api_key:
|
| 25 |
+
self.gemini_client = OpenAI(
|
| 26 |
+
api_key=gemini_api_key,
|
| 27 |
+
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
|
| 28 |
+
)
|
| 29 |
+
else:
|
| 30 |
+
self.gemini_client = None
|
| 31 |
+
|
| 32 |
+
openai_api_key = os.getenv("OPENAI_API_KEY")
|
| 33 |
+
if openai_api_key:
|
| 34 |
+
self.openai_client = OpenAI(api_key=openai_api_key)
|
| 35 |
+
else:
|
| 36 |
+
self.openai_client = None
|
| 37 |
+
|
| 38 |
+
self.voyage_client = voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY"))
|
| 39 |
+
self.search_engine = SearchEngine(self.voyage_client)
|
| 40 |
+
|
| 41 |
+
def _validate_inputs(self, query_text: str, similarity_k: int, info_source: str):
|
| 42 |
+
"""Validate input parameters"""
|
| 43 |
+
if not query_text or not query_text.strip():
|
| 44 |
+
raise ValueError("Query text cannot be empty")
|
| 45 |
+
|
| 46 |
+
if similarity_k <= 0:
|
| 47 |
+
raise ValueError("similarity_k must be a positive integer")
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
InfoSource(info_source.lower())
|
| 51 |
+
except ValueError:
|
| 52 |
+
valid_sources = [s.value for s in InfoSource]
|
| 53 |
+
raise ValueError(f"Invalid info_source '{info_source}'. Must be one of: {valid_sources}")
|
| 54 |
+
|
| 55 |
+
def _clean_section_id(self, section_id: str) -> str:
|
| 56 |
+
"""Clean section ID for display - NHS format: condition__section__part"""
|
| 57 |
+
if not section_id or section_id == 'Unknown section':
|
| 58 |
+
return section_id
|
| 59 |
+
|
| 60 |
+
# Handle NHS format: "adhd-adults__Overview__Part_1"
|
| 61 |
+
if '__' in section_id:
|
| 62 |
+
parts = section_id.split('__')
|
| 63 |
+
if len(parts) >= 2:
|
| 64 |
+
# Get condition and section, ignore part number
|
| 65 |
+
condition = parts[0].replace('-', ' ').replace('_', ' ').title()
|
| 66 |
+
section = parts[1].replace('_', ' ').title()
|
| 67 |
+
return f"{condition} - {section}"
|
| 68 |
+
|
| 69 |
+
# Fallback: just clean up underscores and dashes
|
| 70 |
+
clean_section = section_id.replace('_', ' ').replace('-', ' ').title()
|
| 71 |
+
return clean_section
|
| 72 |
+
|
| 73 |
+
def _get_context_text(self, results: List[Dict]) -> str:
|
| 74 |
+
"""Generate context text from search results"""
|
| 75 |
+
context_text_sections = []
|
| 76 |
+
|
| 77 |
+
for doc in results:
|
| 78 |
+
section_id = doc['metadata'].get('original_id', 'Unknown section')
|
| 79 |
+
url = doc['metadata'].get('url', '')
|
| 80 |
+
document_text = doc['metadata'].get('document', '')
|
| 81 |
+
|
| 82 |
+
# Clean up section_id for display
|
| 83 |
+
clean_section_id = self._clean_section_id(section_id)
|
| 84 |
+
|
| 85 |
+
# Create formatted section without showing URL explicitly
|
| 86 |
+
# The URL will be available in the document_text if it was part of the original content
|
| 87 |
+
formatted_section = (
|
| 88 |
+
f"Source Information: [Section: {clean_section_id}]\n"
|
| 89 |
+
f"Context: {document_text}"
|
| 90 |
+
f"{f' Available at: {url}' if url else ''}" # Include URL for LLM to use
|
| 91 |
+
)
|
| 92 |
+
context_text_sections.append(formatted_section)
|
| 93 |
+
|
| 94 |
+
return "\n\n---\n\n".join(context_text_sections)
|
| 95 |
+
|
| 96 |
+
def _create_system_prompt(self, context_text: str, context_description: str,
|
| 97 |
+
not_found_message: str, query_text: str) -> List[Dict]:
|
| 98 |
+
"""Create system prompt for LLM"""
|
| 99 |
+
return [
|
| 100 |
+
{
|
| 101 |
+
"role": "system",
|
| 102 |
+
"content": (
|
| 103 |
+
f"You are a medical AI assistant tasked with answering clinical questions strictly based on the provided {context_description} context. Follow the requirements below to ensure accurate, consistent, and professional responses.\n\n"
|
| 104 |
+
"# Response Rules\n\n"
|
| 105 |
+
"1. **Context Restriction**:\n"
|
| 106 |
+
" - Only use information given in the provided NHS health information context.\n"
|
| 107 |
+
" - Do not generate or speculate with information not explicitly found in the given context.\n\n"
|
| 108 |
+
"2. **Answer Format**:\n"
|
| 109 |
+
" - Provide a clear and concise response based solely on the context.\n"
|
| 110 |
+
" - When including a list, use standard markdown bullet points (`*` or `-`).\n"
|
| 111 |
+
" - If a list follows introductory text, insert a line break before the first bullet point.\n"
|
| 112 |
+
" - Each bullet point must be on its own line.\n\n"
|
| 113 |
+
"3. **Preserve Tables**:\n"
|
| 114 |
+
" - If relevant markdown tables appear in the context, reproduce them in your answer.\n"
|
| 115 |
+
" - Maintain the original structure, formatting, and content of any included tables.\n\n"
|
| 116 |
+
"4. **Links and URLs**:\n"
|
| 117 |
+
" - Include any URLs or web links from the context directly in your response when relevant.\n"
|
| 118 |
+
" - Integrate links naturally within sentences, using markdown syntax for clickable text links.\n"
|
| 119 |
+
" - DO NOT generate or invent any URLs not explicitly present in the context.\n\n"
|
| 120 |
+
"5. **Markdown Link Formatting**:\n"
|
| 121 |
+
" - In responses, only the descriptive text in brackets should be visible and clickable (e.g., `[NHS ADHD information](https://www.nhs.uk/conditions/attention-deficit-hyperactivity-disorder-adhd/)`).\n"
|
| 122 |
+
" - Readers should never see raw URLs in the text.\n"
|
| 123 |
+
" - Use descriptive link text like 'NHS ADHD information' or 'NHS depression guide' rather than generic terms.\n\n"
|
| 124 |
+
"6. **If No Relevant Information**:\n"
|
| 125 |
+
" - If the context contains no relevant information, state clearly:\n"
|
| 126 |
+
f" *\"{not_found_message}\"*\n\n"
|
| 127 |
+
"# Output Format\n\n"
|
| 128 |
+
"- All responses should be in plain text, using markdown formatting for lists and links as required.\n"
|
| 129 |
+
"- Do not use code blocks.\n"
|
| 130 |
+
"- Answers should be concise, accurate, and formatted according to the rules above.\n\n"
|
| 131 |
+
"# Examples\n\n"
|
| 132 |
+
"**Example 1: Integration of markdown link in context**\n"
|
| 133 |
+
"Question: \"What are the symptoms of ADHD?\"\n"
|
| 134 |
+
"Context snippet: ...see the NHS information on ADHD symptoms...\n"
|
| 135 |
+
"Output:\n"
|
| 136 |
+
"According to the [NHS ADHD information](https://www.nhs.uk/conditions/attention-deficit-hyperactivity-disorder-adhd/), symptoms include...\n\n"
|
| 137 |
+
"**Example 2: Multiple condition references**\n"
|
| 138 |
+
"According to NHS guidance:\n"
|
| 139 |
+
"* Initial symptoms may include difficulty concentrating.\n"
|
| 140 |
+
"* For detailed information, see the [NHS ADHD guide](https://www.nhs.uk/conditions/adhd/).\n\n"
|
| 141 |
+
"**Example 3: No relevant context**\n"
|
| 142 |
+
f"{not_found_message}\n\n"
|
| 143 |
+
"# Notes\n\n"
|
| 144 |
+
"- Never output information beyond what is provided in the supplied context.\n"
|
| 145 |
+
"- Always use markdown for lists and links.\n"
|
| 146 |
+
"- Make sure all markdown tables from context are preserved in your answer if relevant.\n"
|
| 147 |
+
"- Present links only as clickable text, not as bare URLs.\n"
|
| 148 |
+
"- Use descriptive link text that indicates the specific NHS condition or topic.\n\n"
|
| 149 |
+
"**REMINDER:**\n"
|
| 150 |
+
"Strictly adhere to all formatting and content rules above for every response."
|
| 151 |
+
),
|
| 152 |
+
},
|
| 153 |
+
{
|
| 154 |
+
"role": "assistant",
|
| 155 |
+
"content": (
|
| 156 |
+
f"Here is the context from {context_description} that you should use to answer the following question:\n\n{context_text}\n\n"
|
| 157 |
+
),
|
| 158 |
+
},
|
| 159 |
+
{
|
| 160 |
+
"role": "user",
|
| 161 |
+
"content": query_text,
|
| 162 |
+
},
|
| 163 |
+
]
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
def get_sources_from_results(self, results: List[Dict], info_source: str) -> List[Dict]:
|
| 168 |
+
"""Extract formatted sources from search results"""
|
| 169 |
+
sources = []
|
| 170 |
+
for doc in results:
|
| 171 |
+
metadata = doc.get('metadata', {})
|
| 172 |
+
section_id = metadata.get('original_id', 'Unknown section')
|
| 173 |
+
source = metadata.get('source', 'Unknown')
|
| 174 |
+
url = metadata.get('url', '')
|
| 175 |
+
|
| 176 |
+
# Clean section ID for display
|
| 177 |
+
clean_section_id = self._clean_section_id(section_id)
|
| 178 |
+
|
| 179 |
+
source_info = {
|
| 180 |
+
'metadata': {
|
| 181 |
+
'source': source,
|
| 182 |
+
'original_id': section_id,
|
| 183 |
+
'url': url,
|
| 184 |
+
'clean_section': clean_section_id
|
| 185 |
+
}
|
| 186 |
+
}
|
| 187 |
+
sources.append(source_info)
|
| 188 |
+
return sources
|
| 189 |
+
|
| 190 |
+
def query_rag_stream(self, query_text: str, llm_model: str, similarity_k: int = 25, info_source: str = "NHS",
|
| 191 |
+
filename_filter: Optional[str] = None) -> Generator[Tuple[str, List[Dict]], None, None]:
|
| 192 |
+
"""Query RAG system with streaming response"""
|
| 193 |
+
try:
|
| 194 |
+
self._validate_inputs(query_text, similarity_k, info_source)
|
| 195 |
+
source_config = self.config.get_source_config(info_source)
|
| 196 |
+
|
| 197 |
+
# Use the correct namespace from your test
|
| 198 |
+
namespace = "nhs_guidelines_voyage_3_large"
|
| 199 |
+
|
| 200 |
+
# Get similar documents using only similarity search
|
| 201 |
+
results = self.search_engine.similarity_search(
|
| 202 |
+
query_text,
|
| 203 |
+
namespace=namespace,
|
| 204 |
+
top_k=similarity_k
|
| 205 |
+
)
|
| 206 |
+
|
| 207 |
+
if not results:
|
| 208 |
+
yield "I couldn't find any relevant information to answer your question.", []
|
| 209 |
+
return
|
| 210 |
+
|
| 211 |
+
# Generate context and system prompt
|
| 212 |
+
context_text = self._get_context_text(results)
|
| 213 |
+
system_messages = self._create_system_prompt(
|
| 214 |
+
context_text,
|
| 215 |
+
source_config.context_description,
|
| 216 |
+
source_config.not_found_message,
|
| 217 |
+
query_text
|
| 218 |
+
)
|
| 219 |
+
|
| 220 |
+
# Get sources for response
|
| 221 |
+
sources_data = self.get_sources_from_results(results, info_source)
|
| 222 |
+
|
| 223 |
+
# Stream LLM response
|
| 224 |
+
yield from self._stream_llm_response(system_messages, query_text, llm_model, sources_data)
|
| 225 |
+
|
| 226 |
+
except Exception as e:
|
| 227 |
+
logger.error(f"Error in query_rag_stream: {e}")
|
| 228 |
+
yield f"An error occurred while processing your query: {str(e)}", []
|
| 229 |
+
|
| 230 |
+
def _stream_llm_response(self, system_messages: List[Dict], query_text: str,
|
| 231 |
+
llm_model: str, sources_data: List[Dict]) -> Generator[Tuple[str, List[Dict]], None, None]:
|
| 232 |
+
"""Stream LLM response"""
|
| 233 |
+
try:
|
| 234 |
+
if "gemini" in llm_model.lower() and self.gemini_client:
|
| 235 |
+
stream = self.gemini_client.chat.completions.create(
|
| 236 |
+
model=llm_model,
|
| 237 |
+
messages=system_messages,
|
| 238 |
+
temperature=0,
|
| 239 |
+
stream=True
|
| 240 |
+
)
|
| 241 |
+
|
| 242 |
+
for chunk in stream:
|
| 243 |
+
if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
|
| 244 |
+
content = chunk.choices[0].delta.content
|
| 245 |
+
yield content, sources_data
|
| 246 |
+
|
| 247 |
+
else:
|
| 248 |
+
error_msg = f"Unsupported LLM model or client not available: {llm_model}"
|
| 249 |
+
logger.error(error_msg)
|
| 250 |
+
yield error_msg, []
|
| 251 |
+
return
|
| 252 |
+
|
| 253 |
+
except Exception as e:
|
| 254 |
+
logger.error(f"Error in LLM completion: {e}")
|
| 255 |
+
yield f"Error generating response: {str(e)}", []
|
| 256 |
+
|
| 257 |
+
|
| 258 |
+
|
| 259 |
+
def main():
|
| 260 |
+
"""Main function for CLI usage"""
|
| 261 |
+
parser = argparse.ArgumentParser(description="RAG System Query Interface")
|
| 262 |
+
parser.add_argument("--query_text", type=str, default="What are the symptoms of ADHD in adults?",
|
| 263 |
+
help="The query text.")
|
| 264 |
+
parser.add_argument("--llm_model", type=str, default="gemini-2.0-flash",
|
| 265 |
+
help="The LLM model to use.")
|
| 266 |
+
parser.add_argument("--similarity_k", type=int, default=5,
|
| 267 |
+
help="Number of results to retrieve in similarity search.")
|
| 268 |
+
parser.add_argument("--info_source", type=str, default="NHS",
|
| 269 |
+
choices=["nhs", "NHS"],
|
| 270 |
+
help="Information source to query.")
|
| 271 |
+
|
| 272 |
+
args = parser.parse_args()
|
| 273 |
+
|
| 274 |
+
try:
|
| 275 |
+
print("Initializing RAG system...")
|
| 276 |
+
rag_system = RAGSystem()
|
| 277 |
+
|
| 278 |
+
print(f"\n=== Query: {args.query_text} ===")
|
| 279 |
+
print(f"Source: {args.info_source}")
|
| 280 |
+
print(f"LLM Model: {args.llm_model}")
|
| 281 |
+
print("\n=== LLM Response ===\n")
|
| 282 |
+
|
| 283 |
+
response_text, sources_data = "", []
|
| 284 |
+
|
| 285 |
+
for chunk, sources in rag_system.query_rag_stream(
|
| 286 |
+
query_text=args.query_text,
|
| 287 |
+
llm_model=args.llm_model,
|
| 288 |
+
similarity_k=args.similarity_k,
|
| 289 |
+
info_source=args.info_source
|
| 290 |
+
):
|
| 291 |
+
print(chunk, end="", flush=True)
|
| 292 |
+
response_text += chunk
|
| 293 |
+
sources_data = sources
|
| 294 |
+
|
| 295 |
+
print("\n\n=== Sources Data ===\n")
|
| 296 |
+
for i, source in enumerate(sources_data, 1):
|
| 297 |
+
metadata = source.get('metadata', {})
|
| 298 |
+
print(f"Source {i}:")
|
| 299 |
+
print(f" Clean Section: {metadata.get('clean_section', 'Unknown')}")
|
| 300 |
+
print(f" URL: {metadata.get('url', 'No URL')}")
|
| 301 |
+
print()
|
| 302 |
+
|
| 303 |
+
except Exception as e:
|
| 304 |
+
logger.error(f"Error in main: {e}")
|
| 305 |
+
print(f"Error: {e}")
|
| 306 |
+
|
| 307 |
+
|
| 308 |
+
if __name__ == "__main__":
|
| 309 |
+
main()
|
src/search_engine.py
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import numpy as np
|
| 2 |
+
import pandas as pd
|
| 3 |
+
import voyageai
|
| 4 |
+
from typing import List, Dict, Tuple, Optional
|
| 5 |
+
from collections import defaultdict
|
| 6 |
+
import logging
|
| 7 |
+
import os
|
| 8 |
+
from pinecone import Pinecone
|
| 9 |
+
|
| 10 |
+
pinecone_api_key = os.getenv("PINECONE_API_KEY")
|
| 11 |
+
|
| 12 |
+
class SearchEngine:
|
| 13 |
+
"""Handles similarity search"""
|
| 14 |
+
|
| 15 |
+
def __init__(self, voyage_client: voyageai.Client):
|
| 16 |
+
self.vo = voyage_client
|
| 17 |
+
self.logger = logging.getLogger(__name__)
|
| 18 |
+
self.pc = Pinecone(api_key=pinecone_api_key)
|
| 19 |
+
self.index = self.pc.Index("nhs-conditions")
|
| 20 |
+
|
| 21 |
+
def similarity_search(self, query_text: str, namespace: str, top_k: int = 25) -> List[dict]:
|
| 22 |
+
"""Perform similarity search using Pinecone"""
|
| 23 |
+
try:
|
| 24 |
+
# Embed the query using the same model - matches your example exactly
|
| 25 |
+
query_embedding = self.vo.contextualized_embed(
|
| 26 |
+
inputs=[[query_text]],
|
| 27 |
+
model="voyage-context-3",
|
| 28 |
+
input_type="query",
|
| 29 |
+
output_dimension=2048
|
| 30 |
+
).results[0].embeddings[0]
|
| 31 |
+
|
| 32 |
+
# Search Pinecone
|
| 33 |
+
results = self.index.query(
|
| 34 |
+
vector=query_embedding,
|
| 35 |
+
top_k=top_k,
|
| 36 |
+
namespace=namespace,
|
| 37 |
+
include_metadata=True
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
matches = results['matches']
|
| 41 |
+
self.logger.info(f"Pinecone search found {len(matches)} results")
|
| 42 |
+
return matches
|
| 43 |
+
|
| 44 |
+
except Exception as e:
|
| 45 |
+
self.logger.error(f"Error in Pinecone similarity search: {e}")
|
| 46 |
+
return []
|
src/streamlit_app.py
CHANGED
|
@@ -1,40 +1,244 @@
|
|
| 1 |
-
import altair as alt
|
| 2 |
-
import numpy as np
|
| 3 |
-
import pandas as pd
|
| 4 |
import streamlit as st
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
"""
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
st.
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import streamlit as st
|
| 2 |
+
from typing import Dict, List
|
| 3 |
|
| 4 |
+
try:
|
| 5 |
+
from query_rag import RAGSystem
|
| 6 |
+
except ImportError as e:
|
| 7 |
+
st.error(f"Import error: {e}. Please ensure all required modules are available.")
|
| 8 |
+
st.stop()
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
# --- Page Configuration and Initialization ---
|
| 12 |
+
st.set_page_config(page_title="NHS Clinical Assistant", layout="wide")
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
# Initialize RAG System
|
| 16 |
+
def get_rag_system():
|
| 17 |
+
"""Initialize the RAG system"""
|
| 18 |
+
try:
|
| 19 |
+
return RAGSystem()
|
| 20 |
+
except Exception as e:
|
| 21 |
+
st.error(f"Failed to initialize RAG system: {e}")
|
| 22 |
+
return None
|
| 23 |
+
|
| 24 |
+
# Initialize RAG system once at startup
|
| 25 |
+
if 'rag_system' not in st.session_state:
|
| 26 |
+
st.session_state.rag_system = get_rag_system()
|
| 27 |
+
|
| 28 |
+
rag_system = st.session_state.rag_system
|
| 29 |
+
if rag_system is None:
|
| 30 |
+
st.error("RAG system failed to initialize. Please check your configuration.")
|
| 31 |
+
st.stop()
|
| 32 |
+
|
| 33 |
+
# --- Helper Functions ---
|
| 34 |
+
def display_sources(sources_data: List[Dict]):
|
| 35 |
+
"""Display sources with clean NHS formatting"""
|
| 36 |
+
if not sources_data:
|
| 37 |
+
st.markdown("No sources available for this response.")
|
| 38 |
+
return
|
| 39 |
+
|
| 40 |
+
for idx, source_info in enumerate(sources_data):
|
| 41 |
+
# Get metadata from source_info
|
| 42 |
+
metadata = source_info.get('metadata', {})
|
| 43 |
+
clean_section = metadata.get('clean_section', 'Unknown Section')
|
| 44 |
+
url = metadata.get('url', '')
|
| 45 |
+
|
| 46 |
+
source_text = f"**Source {idx+1}:** {clean_section}"
|
| 47 |
+
st.markdown(source_text)
|
| 48 |
+
|
| 49 |
+
if url:
|
| 50 |
+
st.markdown(f" 🔗 [View Online]({url})")
|
| 51 |
+
|
| 52 |
+
st.markdown("---")
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def initialize_session_state():
|
| 56 |
+
# Common state
|
| 57 |
+
if "app_mode" not in st.session_state:
|
| 58 |
+
st.session_state.app_mode = "NHS Chat"
|
| 59 |
+
|
| 60 |
+
# Chat specific state
|
| 61 |
+
if "chat_history" not in st.session_state:
|
| 62 |
+
st.session_state.chat_history = []
|
| 63 |
+
if "query" not in st.session_state:
|
| 64 |
+
st.session_state.query = ""
|
| 65 |
+
if "processing_query" not in st.session_state:
|
| 66 |
+
st.session_state.processing_query = False
|
| 67 |
+
if "query_to_run_next" not in st.session_state:
|
| 68 |
+
st.session_state.query_to_run_next = None
|
| 69 |
+
if "similarity_k" not in st.session_state:
|
| 70 |
+
st.session_state.similarity_k = 5
|
| 71 |
+
if "llm_model" not in st.session_state:
|
| 72 |
+
st.session_state.llm_model = "gemini-2.5-flash"
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
initialize_session_state()
|
| 76 |
+
|
| 77 |
+
# --- STYLING ---
|
| 78 |
+
st.markdown("""
|
| 79 |
+
<style>
|
| 80 |
+
.main {background-color: #f9f9f9; font-family: Arial, sans-serif;}
|
| 81 |
+
h1, h2, h3, h4, h5, h6 {color: #2b6777;}
|
| 82 |
+
h1 {font-weight: bold;}
|
| 83 |
+
[data-testid="stSidebar"] {background-color: #e8f0fe; padding: 10px;}
|
| 84 |
+
.result-box {
|
| 85 |
+
border-left: 4px solid #4CAF50;
|
| 86 |
+
padding: 10px;
|
| 87 |
+
background-color: #fff;
|
| 88 |
+
margin-bottom: 10px;
|
| 89 |
+
border-radius: 4px;
|
| 90 |
+
box-shadow: 0 1px 3px rgba(0,0,0,0.1);
|
| 91 |
+
}
|
| 92 |
+
div.stTextArea > div { border-radius: 8px; }
|
| 93 |
+
textarea { font-family: Arial, sans-serif; font-size: 16px; color: #333; resize: vertical; }
|
| 94 |
+
.stButton>button { border-radius: 5px; }
|
| 95 |
+
div.stSelectbox > label {
|
| 96 |
+
font-size: 16px !important;
|
| 97 |
+
font-weight: bold !important;
|
| 98 |
+
}
|
| 99 |
+
</style>
|
| 100 |
+
""", unsafe_allow_html=True)
|
| 101 |
+
|
| 102 |
+
# --- SIDEBAR ---
|
| 103 |
+
with st.sidebar:
|
| 104 |
+
st.header("🩺 NHS Clinical Assistant")
|
| 105 |
+
|
| 106 |
+
st.header("⚙️ Settings")
|
| 107 |
+
|
| 108 |
+
llm_options = ["gemini-2.5-flash", "gemini-2.5-flash-lite", "gemini-2.5-pro"]
|
| 109 |
+
try:
|
| 110 |
+
current_llm_index = llm_options.index(st.session_state.llm_model)
|
| 111 |
+
except ValueError:
|
| 112 |
+
current_llm_index = 0
|
| 113 |
+
st.session_state.llm_model = llm_options[0]
|
| 114 |
+
|
| 115 |
+
selected_llm = st.selectbox(
|
| 116 |
+
"LLM Model",
|
| 117 |
+
options=llm_options,
|
| 118 |
+
key="llm_model_selector",
|
| 119 |
+
index=current_llm_index
|
| 120 |
+
)
|
| 121 |
+
if selected_llm != st.session_state.llm_model:
|
| 122 |
+
st.session_state.llm_model = selected_llm
|
| 123 |
+
|
| 124 |
+
st.markdown("---")
|
| 125 |
+
|
| 126 |
+
def new_chat_callback():
|
| 127 |
+
st.session_state.chat_history = []
|
| 128 |
+
st.session_state.query = ""
|
| 129 |
+
|
| 130 |
+
if st.button("🗑️ New Chat", key="new_chat", on_click=new_chat_callback):
|
| 131 |
+
pass
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
# --- MAIN APPLICATION AREA ---
|
| 135 |
+
st.title("🩺 NHS Clinical Assistant")
|
| 136 |
+
st.markdown("Ask questions and get relevant information from trusted NHS health condition sources.")
|
| 137 |
+
|
| 138 |
+
def submit_and_process_query(query_to_send: str, display_query_text: str):
|
| 139 |
+
st.session_state.processing_query = True
|
| 140 |
+
|
| 141 |
+
try:
|
| 142 |
+
with st.spinner("Retrieving relevant NHS information..."):
|
| 143 |
+
response_chunks = []
|
| 144 |
+
sources_data = []
|
| 145 |
+
temp_response_placeholder = st.empty()
|
| 146 |
+
|
| 147 |
+
for chunk, chunk_sources_data in rag_system.query_rag_stream(
|
| 148 |
+
query_to_send,
|
| 149 |
+
st.session_state.llm_model,
|
| 150 |
+
info_source="NHS",
|
| 151 |
+
similarity_k=st.session_state.similarity_k,
|
| 152 |
+
):
|
| 153 |
+
response_chunks.append(chunk)
|
| 154 |
+
sources_data = chunk_sources_data
|
| 155 |
+
|
| 156 |
+
temp_response_placeholder.markdown(
|
| 157 |
+
f"<div style='border-left: 4px solid #4CAF50; padding-left: 10px;'>{''.join(response_chunks)}</div>",
|
| 158 |
+
unsafe_allow_html=True
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
final_response = ''.join(response_chunks)
|
| 162 |
+
temp_response_placeholder.empty()
|
| 163 |
+
|
| 164 |
+
st.session_state.chat_history.append({
|
| 165 |
+
"query_sent": query_to_send,
|
| 166 |
+
"display_query": display_query_text,
|
| 167 |
+
"response": final_response,
|
| 168 |
+
"sources_data": sources_data,
|
| 169 |
+
"llm_model": st.session_state.llm_model
|
| 170 |
+
})
|
| 171 |
+
|
| 172 |
+
except Exception as e:
|
| 173 |
+
st.error(f"Error processing query: {e}")
|
| 174 |
+
finally:
|
| 175 |
+
st.session_state.processing_query = False
|
| 176 |
+
st.rerun()
|
| 177 |
+
|
| 178 |
+
# Display chat history
|
| 179 |
+
for i, chat_entry in enumerate(st.session_state.chat_history):
|
| 180 |
+
st.markdown(f"👤 **You:** {chat_entry['display_query']}")
|
| 181 |
+
|
| 182 |
+
response_info = f"(LLM: {chat_entry.get('llm_model', 'N/A')})"
|
| 183 |
+
|
| 184 |
+
st.markdown(f"🤖 **Assistant** {response_info}:")
|
| 185 |
+
st.markdown(
|
| 186 |
+
f"<div style='border-left: 4px solid #4CAF50; padding-left: 10px; margin-bottom: 10px;'>{chat_entry['response']}</div>",
|
| 187 |
+
unsafe_allow_html=True
|
| 188 |
+
)
|
| 189 |
+
|
| 190 |
+
st.subheader("📚 Sources:")
|
| 191 |
+
with st.expander("View Sources", expanded=False):
|
| 192 |
+
sources_data = chat_entry.get("sources_data", [])
|
| 193 |
+
if sources_data:
|
| 194 |
+
display_sources(sources_data)
|
| 195 |
+
else:
|
| 196 |
+
st.markdown("No sources available for this response.")
|
| 197 |
+
st.markdown("---")
|
| 198 |
+
|
| 199 |
+
# Suggested queries
|
| 200 |
+
st.markdown("<h6>💡 Suggested Queries:</h6>", unsafe_allow_html=True)
|
| 201 |
+
suggested_queries_list = [
|
| 202 |
+
"What are the symptoms of ADHD in adults?",
|
| 203 |
+
"How is type 2 diabetes diagnosed?",
|
| 204 |
+
"What are the treatment options for depression?"
|
| 205 |
+
]
|
| 206 |
+
sq_cols = st.columns(len(suggested_queries_list))
|
| 207 |
+
for idx, sq_text_item in enumerate(suggested_queries_list):
|
| 208 |
+
if sq_cols[idx].button(
|
| 209 |
+
sq_text_item,
|
| 210 |
+
key=f"suggested_{idx}",
|
| 211 |
+
disabled=st.session_state.processing_query
|
| 212 |
+
):
|
| 213 |
+
st.session_state.processing_query = True
|
| 214 |
+
st.session_state.query_to_run_next = sq_text_item
|
| 215 |
+
st.rerun()
|
| 216 |
+
|
| 217 |
+
|
| 218 |
+
# User input section
|
| 219 |
+
user_query = st.chat_input(
|
| 220 |
+
"e.g., What are the symptoms of ADHD?",
|
| 221 |
+
max_chars=1000,
|
| 222 |
+
disabled=st.session_state.processing_query
|
| 223 |
+
)
|
| 224 |
+
|
| 225 |
+
if user_query:
|
| 226 |
+
st.session_state.processing_query = True
|
| 227 |
+
st.session_state.query_to_run_next = user_query
|
| 228 |
+
st.rerun()
|
| 229 |
+
|
| 230 |
+
# Process query if one is set to run next
|
| 231 |
+
if st.session_state.get("query_to_run_next"):
|
| 232 |
+
query_to_process = st.session_state.query_to_run_next
|
| 233 |
+
st.session_state.query_to_run_next = None # Clear it so it doesn't run again
|
| 234 |
+
submit_and_process_query(query_to_process, query_to_process)
|
| 235 |
+
|
| 236 |
+
# --- Footer with Licensing Information ---
|
| 237 |
+
st.markdown("---")
|
| 238 |
+
st.caption("""
|
| 239 |
+
**Data Usage and Licensing:**
|
| 240 |
+
This tool utilizes information from NHS sources, which is made available under their respective open licensing terms.
|
| 241 |
+
- **NHS:** Content is used under the terms of the Open Government Licence. For full details, please refer to the [NHS Terms and Conditions](https://www.nhs.uk/our-policies/terms-and-conditions/).
|
| 242 |
+
|
| 243 |
+
Always consult the official sources for the most accurate, complete, and up-to-date information.
|
| 244 |
+
""")
|