File size: 7,662 Bytes
1d10b0a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
# RAG Capstone Project

A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.

## Features

- πŸ” **Multiple RAG Bench Datasets**: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
- 🧩 **Chunking Strategies**: Dense, Sparse, Hybrid, Re-ranking
- πŸ€– **Medical Embedding Models**:
  - sentence-transformers/embeddinggemma-300m-medical
  - emilyalsentzer/Bio_ClinicalBERT
  - Simonlee711/Clinical_ModernBERT
- πŸ’Ύ **ChromaDB Vector Storage**: Persistent vector storage with efficient retrieval
- πŸ¦™ **Groq LLM Integration**: With rate limiting (30 RPM)
  - meta-llama/llama-4-maverick-17b-128e-instruct
  - llama-3.1-8b-instant
  - openai/gpt-oss-120b
- πŸ“Š **TRACE Evaluation Metrics**:
  - **U**tilization: How well the system uses retrieved documents
  - **R**elevance: Relevance of retrieved documents to the query
  - **A**dherence: How well the response adheres to the retrieved context
  - **C**ompleteness: How complete the response is
- πŸ’¬ **Chat Interface**: Streamlit-based interactive chat with history
- πŸ”Œ **REST API**: FastAPI backend for integration

## Installation

### Prerequisites

- Python 3.8+
- pip
- Groq API key

### Setup

1. Clone the repository:
```bash
git clone <repository-url>
cd "RAG Capstone Project"
```

2. Create a virtual environment:
```bash
python -m venv venv
```

3. Activate the virtual environment:

**Windows:**
```bash
.\venv\Scripts\activate
```

**Linux/Mac:**
```bash
source venv/bin/activate
```

4. Install dependencies:
```bash
pip install -r requirements.txt
```

5. Create a `.env` file from the example:
```bash
copy .env.example .env
```

6. Edit `.env` and add your Groq API key:
```
GROQ_API_KEY=your_groq_api_key_here
```

## Usage

### Streamlit Application

Run the interactive Streamlit interface:

```bash
streamlit run streamlit_app.py
```

Then open your browser to `http://localhost:8501`

**Workflow:**
1. Enter your Groq API key in the sidebar
2. Select a dataset from RAG Bench
3. Choose chunking strategy
4. Select embedding model
5. Choose LLM model
6. Click "Load Data & Create Collection"
7. Start chatting!
8. View retrieved documents
9. Run TRACE evaluation
10. Export chat history

### FastAPI Backend

Run the REST API server:

```bash
python api.py
```

Or with uvicorn:
```bash
uvicorn api:app --reload --host 0.0.0.0 --port 8000
```

API documentation available at: `http://localhost:8000/docs`

#### API Endpoints

- `GET /` - Root endpoint
- `GET /health` - Health check
- `GET /datasets` - List available datasets
- `GET /models/embedding` - List embedding models
- `GET /models/llm` - List LLM models
- `GET /chunking-strategies` - List chunking strategies
- `GET /collections` - List all collections
- `GET /collections/{name}` - Get collection info
- `POST /load-dataset` - Load dataset and create collection
- `POST /query` - Query the RAG system
- `GET /chat-history` - Get chat history
- `DELETE /chat-history` - Clear chat history
- `POST /evaluate` - Run TRACE evaluation
- `DELETE /collections/{name}` - Delete collection

### Python API

Use the components programmatically:

```python
from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator

# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)

# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
    collection_name="my_collection",
    embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
    chunking_strategy="hybrid",
    dataset_data=dataset
)

# Initialize LLM
llm = GroqLLMClient(
    api_key="your_api_key",
    model_name="llama-3.1-8b-instant"
)

# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)

# Query
result = rag.query("What is the capital of France?")
print(result["response"])

# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...]  # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)
```

## Project Structure

```
RAG Capstone Project/
β”œβ”€β”€ __init__.py                 # Package initialization
β”œβ”€β”€ config.py                   # Configuration management
β”œβ”€β”€ dataset_loader.py           # RAG Bench dataset loader
β”œβ”€β”€ chunking_strategies.py      # Document chunking strategies
β”œβ”€β”€ embedding_models.py         # Embedding model implementations
β”œβ”€β”€ vector_store.py            # ChromaDB integration
β”œβ”€β”€ llm_client.py              # Groq LLM client with rate limiting
β”œβ”€β”€ trace_evaluator.py         # TRACE evaluation metrics
β”œβ”€β”€ streamlit_app.py           # Streamlit chat interface
β”œβ”€β”€ api.py                     # FastAPI REST API
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ .env.example              # Environment variables template
β”œβ”€β”€ .gitignore                # Git ignore file
└── README.md                 # This file
```

## TRACE Metrics Explained

### Utilization (U)
Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.

### Relevance (R)
Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.

### Adherence (A)
Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.

### Completeness (C)
Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.

## Deployment Options

### Heroku

1. Create `Procfile`:
```
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT
```

2. Deploy:
```bash
heroku create your-app-name
git push heroku main
```

### Docker

Create `Dockerfile`:
```dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501 8000

CMD ["streamlit", "run", "streamlit_app.py"]
```

Build and run:
```bash
docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone
```

### Cloud Run / AWS / Azure

The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.

## Configuration

Edit `config.py` or set environment variables in `.env`:

```env
GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO
```

## Rate Limiting

The application implements rate limiting for Groq API calls:
- Maximum 30 requests per minute (configurable)
- Automatic delay of 2 seconds between requests
- Smart waiting when rate limit is reached

## Troubleshooting

### ChromaDB Issues
If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections.

### Embedding Model Loading
Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:
- Using a smaller model
- Reducing batch size
- Using CPU instead of GPU

### API Key Errors
Ensure your Groq API key is correctly set in the `.env` file or passed to the application.

## License

MIT License

## Contributors

RAG Capstone Team

## Support

For issues and questions, please open an issue on the GitHub repository.