Spaces:

gopikrishnait
/

CapStoneRAG10

Sleeping

File size: 7,662 Bytes

1d10b0a

# RAG Capstone Project

A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.

## Features

- 🔍 **Multiple RAG Bench Datasets**: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
- 🧩 **Chunking Strategies**: Dense, Sparse, Hybrid, Re-ranking
- 🤖 **Medical Embedding Models**:
  - sentence-transformers/embeddinggemma-300m-medical
  - emilyalsentzer/Bio_ClinicalBERT
  - Simonlee711/Clinical_ModernBERT
- 💾 **ChromaDB Vector Storage**: Persistent vector storage with efficient retrieval
- 🦙 **Groq LLM Integration**: With rate limiting (30 RPM)
  - meta-llama/llama-4-maverick-17b-128e-instruct
  - llama-3.1-8b-instant
  - openai/gpt-oss-120b
- 📊 **TRACE Evaluation Metrics**:
  - **U**tilization: How well the system uses retrieved documents
  - **R**elevance: Relevance of retrieved documents to the query
  - **A**dherence: How well the response adheres to the retrieved context
  - **C**ompleteness: How complete the response is
- 💬 **Chat Interface**: Streamlit-based interactive chat with history
- 🔌 **REST API**: FastAPI backend for integration

## Installation

### Prerequisites

- Python 3.8+
- pip
- Groq API key

### Setup

1. Clone the repository:
```bash
git clone <repository-url>
cd "RAG Capstone Project"
```

2. Create a virtual environment:
```bash
python -m venv venv
```

3. Activate the virtual environment:

**Windows:**
```bash
.\venv\Scripts\activate
```

**Linux/Mac:**
```bash
source venv/bin/activate
```

4. Install dependencies:
```bash
pip install -r requirements.txt
```

5. Create a `.env` file from the example:
```bash
copy .env.example .env
```

6. Edit `.env` and add your Groq API key:
```
GROQ_API_KEY=your_groq_api_key_here
```

## Usage

### Streamlit Application

Run the interactive Streamlit interface:

```bash
streamlit run streamlit_app.py
```

Then open your browser to `http://localhost:8501`

**Workflow:**
1. Enter your Groq API key in the sidebar
2. Select a dataset from RAG Bench
3. Choose chunking strategy
4. Select embedding model
5. Choose LLM model
6. Click "Load Data & Create Collection"
7. Start chatting!
8. View retrieved documents
9. Run TRACE evaluation
10. Export chat history

### FastAPI Backend

Run the REST API server:

```bash
python api.py
```

Or with uvicorn:
```bash
uvicorn api:app --reload --host 0.0.0.0 --port 8000
```

API documentation available at: `http://localhost:8000/docs`

#### API Endpoints

- `GET /` - Root endpoint
- `GET /health` - Health check
- `GET /datasets` - List available datasets
- `GET /models/embedding` - List embedding models
- `GET /models/llm` - List LLM models
- `GET /chunking-strategies` - List chunking strategies
- `GET /collections` - List all collections
- `GET /collections/{name}` - Get collection info
- `POST /load-dataset` - Load dataset and create collection
- `POST /query` - Query the RAG system
- `GET /chat-history` - Get chat history
- `DELETE /chat-history` - Clear chat history
- `POST /evaluate` - Run TRACE evaluation
- `DELETE /collections/{name}` - Delete collection

### Python API

Use the components programmatically:

```python
from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator

# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)

# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
    collection_name="my_collection",
    embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
    chunking_strategy="hybrid",
    dataset_data=dataset
)

# Initialize LLM
llm = GroqLLMClient(
    api_key="your_api_key",
    model_name="llama-3.1-8b-instant"
)

# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)

# Query
result = rag.query("What is the capital of France?")
print(result["response"])

# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...]  # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)
```

## Project Structure

```
RAG Capstone Project/
├── __init__.py                 # Package initialization
├── config.py                   # Configuration management
├── dataset_loader.py           # RAG Bench dataset loader
├── chunking_strategies.py      # Document chunking strategies
├── embedding_models.py         # Embedding model implementations
├── vector_store.py            # ChromaDB integration
├── llm_client.py              # Groq LLM client with rate limiting
├── trace_evaluator.py         # TRACE evaluation metrics
├── streamlit_app.py           # Streamlit chat interface
├── api.py                     # FastAPI REST API
├── requirements.txt           # Python dependencies
├── .env.example              # Environment variables template
├── .gitignore                # Git ignore file
└── README.md                 # This file
```

## TRACE Metrics Explained

### Utilization (U)
Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.

### Relevance (R)
Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.

### Adherence (A)
Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.

### Completeness (C)
Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.

## Deployment Options

### Heroku

1. Create `Procfile`:
```
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT
```

2. Deploy:
```bash
heroku create your-app-name
git push heroku main
```

### Docker

Create `Dockerfile`:
```dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501 8000

CMD ["streamlit", "run", "streamlit_app.py"]
```

Build and run:
```bash
docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone
```

### Cloud Run / AWS / Azure

The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.

## Configuration

Edit `config.py` or set environment variables in `.env`:

```env
GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO
```

## Rate Limiting

The application implements rate limiting for Groq API calls:
- Maximum 30 requests per minute (configurable)
- Automatic delay of 2 seconds between requests
- Smart waiting when rate limit is reached

## Troubleshooting

### ChromaDB Issues
If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections.

### Embedding Model Loading
Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:
- Using a smaller model
- Reducing batch size
- Using CPU instead of GPU

### API Key Errors
Ensure your Groq API key is correctly set in the `.env` file or passed to the application.

## License

MIT License

## Contributors

RAG Capstone Team

## Support

For issues and questions, please open an issue on the GitHub repository.