Spaces:
Sleeping
RAG Capstone Project
A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.
Features
- π Multiple RAG Bench Datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
- π§© Chunking Strategies: Dense, Sparse, Hybrid, Re-ranking
- π€ Medical Embedding Models:
- sentence-transformers/embeddinggemma-300m-medical
- emilyalsentzer/Bio_ClinicalBERT
- Simonlee711/Clinical_ModernBERT
- πΎ ChromaDB Vector Storage: Persistent vector storage with efficient retrieval
- π¦ Groq LLM Integration: With rate limiting (30 RPM)
- meta-llama/llama-4-maverick-17b-128e-instruct
- llama-3.1-8b-instant
- openai/gpt-oss-120b
- π TRACE Evaluation Metrics:
- Utilization: How well the system uses retrieved documents
- Relevance: Relevance of retrieved documents to the query
- Adherence: How well the response adheres to the retrieved context
- Completeness: How complete the response is
- π¬ Chat Interface: Streamlit-based interactive chat with history
- π REST API: FastAPI backend for integration
Installation
Prerequisites
- Python 3.8+
- pip
- Groq API key
Setup
- Clone the repository:
git clone <repository-url>
cd "RAG Capstone Project"
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
Windows:
.\venv\Scripts\activate
Linux/Mac:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile from the example:
copy .env.example .env
- Edit
.envand add your Groq API key:
GROQ_API_KEY=your_groq_api_key_here
Usage
Streamlit Application
Run the interactive Streamlit interface:
streamlit run streamlit_app.py
Then open your browser to http://localhost:8501
Workflow:
- Enter your Groq API key in the sidebar
- Select a dataset from RAG Bench
- Choose chunking strategy
- Select embedding model
- Choose LLM model
- Click "Load Data & Create Collection"
- Start chatting!
- View retrieved documents
- Run TRACE evaluation
- Export chat history
FastAPI Backend
Run the REST API server:
python api.py
Or with uvicorn:
uvicorn api:app --reload --host 0.0.0.0 --port 8000
API documentation available at: http://localhost:8000/docs
API Endpoints
GET /- Root endpointGET /health- Health checkGET /datasets- List available datasetsGET /models/embedding- List embedding modelsGET /models/llm- List LLM modelsGET /chunking-strategies- List chunking strategiesGET /collections- List all collectionsGET /collections/{name}- Get collection infoPOST /load-dataset- Load dataset and create collectionPOST /query- Query the RAG systemGET /chat-history- Get chat historyDELETE /chat-history- Clear chat historyPOST /evaluate- Run TRACE evaluationDELETE /collections/{name}- Delete collection
Python API
Use the components programmatically:
from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator
# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)
# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
collection_name="my_collection",
embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
chunking_strategy="hybrid",
dataset_data=dataset
)
# Initialize LLM
llm = GroqLLMClient(
api_key="your_api_key",
model_name="llama-3.1-8b-instant"
)
# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)
# Query
result = rag.query("What is the capital of France?")
print(result["response"])
# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...] # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)
Project Structure
RAG Capstone Project/
βββ __init__.py # Package initialization
βββ config.py # Configuration management
βββ dataset_loader.py # RAG Bench dataset loader
βββ chunking_strategies.py # Document chunking strategies
βββ embedding_models.py # Embedding model implementations
βββ vector_store.py # ChromaDB integration
βββ llm_client.py # Groq LLM client with rate limiting
βββ trace_evaluator.py # TRACE evaluation metrics
βββ streamlit_app.py # Streamlit chat interface
βββ api.py # FastAPI REST API
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore file
βββ README.md # This file
TRACE Metrics Explained
Utilization (U)
Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.
Relevance (R)
Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.
Adherence (A)
Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.
Completeness (C)
Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.
Deployment Options
Heroku
- Create
Procfile:
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT
- Deploy:
heroku create your-app-name
git push heroku main
Docker
Create Dockerfile:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501 8000
CMD ["streamlit", "run", "streamlit_app.py"]
Build and run:
docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone
Cloud Run / AWS / Azure
The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.
Configuration
Edit config.py or set environment variables in .env:
GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO
Rate Limiting
The application implements rate limiting for Groq API calls:
- Maximum 30 requests per minute (configurable)
- Automatic delay of 2 seconds between requests
- Smart waiting when rate limit is reached
Troubleshooting
ChromaDB Issues
If you encounter ChromaDB errors, try deleting the chroma_db directory and recreating collections.
Embedding Model Loading
Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:
- Using a smaller model
- Reducing batch size
- Using CPU instead of GPU
API Key Errors
Ensure your Groq API key is correctly set in the .env file or passed to the application.
License
MIT License
Contributors
RAG Capstone Team
Support
For issues and questions, please open an issue on the GitHub repository.