CapStoneRAG10 / docs /README.md
Developer
Initial commit for HuggingFace Spaces - RAG Capstone Project with Qdrant Cloud
1d10b0a

RAG Capstone Project

A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.

Features

  • πŸ” Multiple RAG Bench Datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
  • 🧩 Chunking Strategies: Dense, Sparse, Hybrid, Re-ranking
  • πŸ€– Medical Embedding Models:
    • sentence-transformers/embeddinggemma-300m-medical
    • emilyalsentzer/Bio_ClinicalBERT
    • Simonlee711/Clinical_ModernBERT
  • πŸ’Ύ ChromaDB Vector Storage: Persistent vector storage with efficient retrieval
  • πŸ¦™ Groq LLM Integration: With rate limiting (30 RPM)
    • meta-llama/llama-4-maverick-17b-128e-instruct
    • llama-3.1-8b-instant
    • openai/gpt-oss-120b
  • πŸ“Š TRACE Evaluation Metrics:
    • Utilization: How well the system uses retrieved documents
    • Relevance: Relevance of retrieved documents to the query
    • Adherence: How well the response adheres to the retrieved context
    • Completeness: How complete the response is
  • πŸ’¬ Chat Interface: Streamlit-based interactive chat with history
  • πŸ”Œ REST API: FastAPI backend for integration

Installation

Prerequisites

  • Python 3.8+
  • pip
  • Groq API key

Setup

  1. Clone the repository:
git clone <repository-url>
cd "RAG Capstone Project"
  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:

Windows:

.\venv\Scripts\activate

Linux/Mac:

source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file from the example:
copy .env.example .env
  1. Edit .env and add your Groq API key:
GROQ_API_KEY=your_groq_api_key_here

Usage

Streamlit Application

Run the interactive Streamlit interface:

streamlit run streamlit_app.py

Then open your browser to http://localhost:8501

Workflow:

  1. Enter your Groq API key in the sidebar
  2. Select a dataset from RAG Bench
  3. Choose chunking strategy
  4. Select embedding model
  5. Choose LLM model
  6. Click "Load Data & Create Collection"
  7. Start chatting!
  8. View retrieved documents
  9. Run TRACE evaluation
  10. Export chat history

FastAPI Backend

Run the REST API server:

python api.py

Or with uvicorn:

uvicorn api:app --reload --host 0.0.0.0 --port 8000

API documentation available at: http://localhost:8000/docs

API Endpoints

  • GET / - Root endpoint
  • GET /health - Health check
  • GET /datasets - List available datasets
  • GET /models/embedding - List embedding models
  • GET /models/llm - List LLM models
  • GET /chunking-strategies - List chunking strategies
  • GET /collections - List all collections
  • GET /collections/{name} - Get collection info
  • POST /load-dataset - Load dataset and create collection
  • POST /query - Query the RAG system
  • GET /chat-history - Get chat history
  • DELETE /chat-history - Clear chat history
  • POST /evaluate - Run TRACE evaluation
  • DELETE /collections/{name} - Delete collection

Python API

Use the components programmatically:

from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator

# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)

# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
    collection_name="my_collection",
    embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
    chunking_strategy="hybrid",
    dataset_data=dataset
)

# Initialize LLM
llm = GroqLLMClient(
    api_key="your_api_key",
    model_name="llama-3.1-8b-instant"
)

# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)

# Query
result = rag.query("What is the capital of France?")
print(result["response"])

# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...]  # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)

Project Structure

RAG Capstone Project/
β”œβ”€β”€ __init__.py                 # Package initialization
β”œβ”€β”€ config.py                   # Configuration management
β”œβ”€β”€ dataset_loader.py           # RAG Bench dataset loader
β”œβ”€β”€ chunking_strategies.py      # Document chunking strategies
β”œβ”€β”€ embedding_models.py         # Embedding model implementations
β”œβ”€β”€ vector_store.py            # ChromaDB integration
β”œβ”€β”€ llm_client.py              # Groq LLM client with rate limiting
β”œβ”€β”€ trace_evaluator.py         # TRACE evaluation metrics
β”œβ”€β”€ streamlit_app.py           # Streamlit chat interface
β”œβ”€β”€ api.py                     # FastAPI REST API
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ .env.example              # Environment variables template
β”œβ”€β”€ .gitignore                # Git ignore file
└── README.md                 # This file

TRACE Metrics Explained

Utilization (U)

Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.

Relevance (R)

Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.

Adherence (A)

Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.

Completeness (C)

Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.

Deployment Options

Heroku

  1. Create Procfile:
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT
  1. Deploy:
heroku create your-app-name
git push heroku main

Docker

Create Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501 8000

CMD ["streamlit", "run", "streamlit_app.py"]

Build and run:

docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone

Cloud Run / AWS / Azure

The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.

Configuration

Edit config.py or set environment variables in .env:

GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO

Rate Limiting

The application implements rate limiting for Groq API calls:

  • Maximum 30 requests per minute (configurable)
  • Automatic delay of 2 seconds between requests
  • Smart waiting when rate limit is reached

Troubleshooting

ChromaDB Issues

If you encounter ChromaDB errors, try deleting the chroma_db directory and recreating collections.

Embedding Model Loading

Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:

  • Using a smaller model
  • Reducing batch size
  • Using CPU instead of GPU

API Key Errors

Ensure your Groq API key is correctly set in the .env file or passed to the application.

License

MIT License

Contributors

RAG Capstone Team

Support

For issues and questions, please open an issue on the GitHub repository.