# RAG Capstone Project A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains. ## Features - 🔍 **Multiple RAG Bench Datasets**: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA - 🧩 **Chunking Strategies**: Dense, Sparse, Hybrid, Re-ranking - 🤖 **Medical Embedding Models**: - sentence-transformers/embeddinggemma-300m-medical - emilyalsentzer/Bio_ClinicalBERT - Simonlee711/Clinical_ModernBERT - 💾 **ChromaDB Vector Storage**: Persistent vector storage with efficient retrieval - 🦙 **Groq LLM Integration**: With rate limiting (30 RPM) - meta-llama/llama-4-maverick-17b-128e-instruct - llama-3.1-8b-instant - openai/gpt-oss-120b - 📊 **TRACE Evaluation Metrics**: - **U**tilization: How well the system uses retrieved documents - **R**elevance: Relevance of retrieved documents to the query - **A**dherence: How well the response adheres to the retrieved context - **C**ompleteness: How complete the response is - 💬 **Chat Interface**: Streamlit-based interactive chat with history - 🔌 **REST API**: FastAPI backend for integration ## Installation ### Prerequisites - Python 3.8+ - pip - Groq API key ### Setup 1. Clone the repository: ```bash git clone cd "RAG Capstone Project" ``` 2. Create a virtual environment: ```bash python -m venv venv ``` 3. Activate the virtual environment: **Windows:** ```bash .\venv\Scripts\activate ``` **Linux/Mac:** ```bash source venv/bin/activate ``` 4. Install dependencies: ```bash pip install -r requirements.txt ``` 5. Create a `.env` file from the example: ```bash copy .env.example .env ``` 6. Edit `.env` and add your Groq API key: ``` GROQ_API_KEY=your_groq_api_key_here ``` ## Usage ### Streamlit Application Run the interactive Streamlit interface: ```bash streamlit run streamlit_app.py ``` Then open your browser to `http://localhost:8501` **Workflow:** 1. Enter your Groq API key in the sidebar 2. Select a dataset from RAG Bench 3. Choose chunking strategy 4. Select embedding model 5. Choose LLM model 6. Click "Load Data & Create Collection" 7. Start chatting! 8. View retrieved documents 9. Run TRACE evaluation 10. Export chat history ### FastAPI Backend Run the REST API server: ```bash python api.py ``` Or with uvicorn: ```bash uvicorn api:app --reload --host 0.0.0.0 --port 8000 ``` API documentation available at: `http://localhost:8000/docs` #### API Endpoints - `GET /` - Root endpoint - `GET /health` - Health check - `GET /datasets` - List available datasets - `GET /models/embedding` - List embedding models - `GET /models/llm` - List LLM models - `GET /chunking-strategies` - List chunking strategies - `GET /collections` - List all collections - `GET /collections/{name}` - Get collection info - `POST /load-dataset` - Load dataset and create collection - `POST /query` - Query the RAG system - `GET /chat-history` - Get chat history - `DELETE /chat-history` - Clear chat history - `POST /evaluate` - Run TRACE evaluation - `DELETE /collections/{name}` - Delete collection ### Python API Use the components programmatically: ```python from config import settings from dataset_loader import RAGBenchLoader from vector_store import ChromaDBManager from llm_client import GroqLLMClient, RAGPipeline from trace_evaluator import TRACEEvaluator # Load dataset loader = RAGBenchLoader() dataset = loader.load_dataset("hotpotqa", max_samples=100) # Create vector store vector_store = ChromaDBManager() vector_store.load_dataset_into_collection( collection_name="my_collection", embedding_model_name="emilyalsentzer/Bio_ClinicalBERT", chunking_strategy="hybrid", dataset_data=dataset ) # Initialize LLM llm = GroqLLMClient( api_key="your_api_key", model_name="llama-3.1-8b-instant" ) # Create RAG pipeline rag = RAGPipeline(llm, vector_store) # Query result = rag.query("What is the capital of France?") print(result["response"]) # Evaluate evaluator = TRACEEvaluator() test_cases = [...] # Your test cases results = evaluator.evaluate_batch(test_cases) print(results) ``` ## Project Structure ``` RAG Capstone Project/ ├── __init__.py # Package initialization ├── config.py # Configuration management ├── dataset_loader.py # RAG Bench dataset loader ├── chunking_strategies.py # Document chunking strategies ├── embedding_models.py # Embedding model implementations ├── vector_store.py # ChromaDB integration ├── llm_client.py # Groq LLM client with rate limiting ├── trace_evaluator.py # TRACE evaluation metrics ├── streamlit_app.py # Streamlit chat interface ├── api.py # FastAPI REST API ├── requirements.txt # Python dependencies ├── .env.example # Environment variables template ├── .gitignore # Git ignore file └── README.md # This file ``` ## TRACE Metrics Explained ### Utilization (U) Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents. ### Relevance (R) Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved. ### Adherence (A) Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated. ### Completeness (C) Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available. ## Deployment Options ### Heroku 1. Create `Procfile`: ``` web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0 api: uvicorn api:app --host=0.0.0.0 --port=$PORT ``` 2. Deploy: ```bash heroku create your-app-name git push heroku main ``` ### Docker Create `Dockerfile`: ```dockerfile FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8501 8000 CMD ["streamlit", "run", "streamlit_app.py"] ``` Build and run: ```bash docker build -t rag-capstone . docker run -p 8501:8501 -p 8000:8000 rag-capstone ``` ### Cloud Run / AWS / Azure The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions. ## Configuration Edit `config.py` or set environment variables in `.env`: ```env GROQ_API_KEY=your_api_key CHROMA_PERSIST_DIRECTORY=./chroma_db GROQ_RPM_LIMIT=30 RATE_LIMIT_DELAY=2.0 LOG_LEVEL=INFO ``` ## Rate Limiting The application implements rate limiting for Groq API calls: - Maximum 30 requests per minute (configurable) - Automatic delay of 2 seconds between requests - Smart waiting when rate limit is reached ## Troubleshooting ### ChromaDB Issues If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections. ### Embedding Model Loading Medical embedding models may require significant memory. If you encounter out-of-memory errors, try: - Using a smaller model - Reducing batch size - Using CPU instead of GPU ### API Key Errors Ensure your Groq API key is correctly set in the `.env` file or passed to the application. ## License MIT License ## Contributors RAG Capstone Team ## Support For issues and questions, please open an issue on the GitHub repository.