Spaces:
Sleeping
Sleeping
File size: 7,662 Bytes
1d10b0a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 |
# RAG Capstone Project
A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.
## Features
- π **Multiple RAG Bench Datasets**: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
- π§© **Chunking Strategies**: Dense, Sparse, Hybrid, Re-ranking
- π€ **Medical Embedding Models**:
- sentence-transformers/embeddinggemma-300m-medical
- emilyalsentzer/Bio_ClinicalBERT
- Simonlee711/Clinical_ModernBERT
- πΎ **ChromaDB Vector Storage**: Persistent vector storage with efficient retrieval
- π¦ **Groq LLM Integration**: With rate limiting (30 RPM)
- meta-llama/llama-4-maverick-17b-128e-instruct
- llama-3.1-8b-instant
- openai/gpt-oss-120b
- π **TRACE Evaluation Metrics**:
- **U**tilization: How well the system uses retrieved documents
- **R**elevance: Relevance of retrieved documents to the query
- **A**dherence: How well the response adheres to the retrieved context
- **C**ompleteness: How complete the response is
- π¬ **Chat Interface**: Streamlit-based interactive chat with history
- π **REST API**: FastAPI backend for integration
## Installation
### Prerequisites
- Python 3.8+
- pip
- Groq API key
### Setup
1. Clone the repository:
```bash
git clone <repository-url>
cd "RAG Capstone Project"
```
2. Create a virtual environment:
```bash
python -m venv venv
```
3. Activate the virtual environment:
**Windows:**
```bash
.\venv\Scripts\activate
```
**Linux/Mac:**
```bash
source venv/bin/activate
```
4. Install dependencies:
```bash
pip install -r requirements.txt
```
5. Create a `.env` file from the example:
```bash
copy .env.example .env
```
6. Edit `.env` and add your Groq API key:
```
GROQ_API_KEY=your_groq_api_key_here
```
## Usage
### Streamlit Application
Run the interactive Streamlit interface:
```bash
streamlit run streamlit_app.py
```
Then open your browser to `http://localhost:8501`
**Workflow:**
1. Enter your Groq API key in the sidebar
2. Select a dataset from RAG Bench
3. Choose chunking strategy
4. Select embedding model
5. Choose LLM model
6. Click "Load Data & Create Collection"
7. Start chatting!
8. View retrieved documents
9. Run TRACE evaluation
10. Export chat history
### FastAPI Backend
Run the REST API server:
```bash
python api.py
```
Or with uvicorn:
```bash
uvicorn api:app --reload --host 0.0.0.0 --port 8000
```
API documentation available at: `http://localhost:8000/docs`
#### API Endpoints
- `GET /` - Root endpoint
- `GET /health` - Health check
- `GET /datasets` - List available datasets
- `GET /models/embedding` - List embedding models
- `GET /models/llm` - List LLM models
- `GET /chunking-strategies` - List chunking strategies
- `GET /collections` - List all collections
- `GET /collections/{name}` - Get collection info
- `POST /load-dataset` - Load dataset and create collection
- `POST /query` - Query the RAG system
- `GET /chat-history` - Get chat history
- `DELETE /chat-history` - Clear chat history
- `POST /evaluate` - Run TRACE evaluation
- `DELETE /collections/{name}` - Delete collection
### Python API
Use the components programmatically:
```python
from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator
# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)
# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
collection_name="my_collection",
embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
chunking_strategy="hybrid",
dataset_data=dataset
)
# Initialize LLM
llm = GroqLLMClient(
api_key="your_api_key",
model_name="llama-3.1-8b-instant"
)
# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)
# Query
result = rag.query("What is the capital of France?")
print(result["response"])
# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...] # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)
```
## Project Structure
```
RAG Capstone Project/
βββ __init__.py # Package initialization
βββ config.py # Configuration management
βββ dataset_loader.py # RAG Bench dataset loader
βββ chunking_strategies.py # Document chunking strategies
βββ embedding_models.py # Embedding model implementations
βββ vector_store.py # ChromaDB integration
βββ llm_client.py # Groq LLM client with rate limiting
βββ trace_evaluator.py # TRACE evaluation metrics
βββ streamlit_app.py # Streamlit chat interface
βββ api.py # FastAPI REST API
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore file
βββ README.md # This file
```
## TRACE Metrics Explained
### Utilization (U)
Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.
### Relevance (R)
Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.
### Adherence (A)
Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.
### Completeness (C)
Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.
## Deployment Options
### Heroku
1. Create `Procfile`:
```
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT
```
2. Deploy:
```bash
heroku create your-app-name
git push heroku main
```
### Docker
Create `Dockerfile`:
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501 8000
CMD ["streamlit", "run", "streamlit_app.py"]
```
Build and run:
```bash
docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone
```
### Cloud Run / AWS / Azure
The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.
## Configuration
Edit `config.py` or set environment variables in `.env`:
```env
GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO
```
## Rate Limiting
The application implements rate limiting for Groq API calls:
- Maximum 30 requests per minute (configurable)
- Automatic delay of 2 seconds between requests
- Smart waiting when rate limit is reached
## Troubleshooting
### ChromaDB Issues
If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections.
### Embedding Model Loading
Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:
- Using a smaller model
- Reducing batch size
- Using CPU instead of GPU
### API Key Errors
Ensure your Groq API key is correctly set in the `.env` file or passed to the application.
## License
MIT License
## Contributors
RAG Capstone Team
## Support
For issues and questions, please open an issue on the GitHub repository.
|