Spaces:

gopikrishnait
/

CapStoneRAG10

Sleeping

App Files Files Community

CapStoneRAG10 / docs /README.md

Developer

Initial commit for HuggingFace Spaces - RAG Capstone Project with Qdrant Cloud

1d10b0a about 1 month ago

preview code

raw

history blame contribute delete

7.66 kB

RAG Capstone Project

A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.

Features

🔍 Multiple RAG Bench Datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
🧩 Chunking Strategies: Dense, Sparse, Hybrid, Re-ranking
🤖 Medical Embedding Models:
- sentence-transformers/embeddinggemma-300m-medical
- emilyalsentzer/Bio_ClinicalBERT
- Simonlee711/Clinical_ModernBERT
💾 ChromaDB Vector Storage: Persistent vector storage with efficient retrieval
🦙 Groq LLM Integration: With rate limiting (30 RPM)
- meta-llama/llama-4-maverick-17b-128e-instruct
- llama-3.1-8b-instant
- openai/gpt-oss-120b
📊 TRACE Evaluation Metrics:
- Utilization: How well the system uses retrieved documents
- Relevance: Relevance of retrieved documents to the query
- Adherence: How well the response adheres to the retrieved context
- Completeness: How complete the response is
💬 Chat Interface: Streamlit-based interactive chat with history
🔌 REST API: FastAPI backend for integration

Installation

Prerequisites

Python 3.8+
pip
Groq API key

Setup

Clone the repository:

git clone <repository-url>
cd "RAG Capstone Project"

Create a virtual environment:

python -m venv venv

Activate the virtual environment:

Windows:

.\venv\Scripts\activate

Linux/Mac:

source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Create a .env file from the example:

copy .env.example .env

Edit .env and add your Groq API key:

GROQ_API_KEY=your_groq_api_key_here

Usage

Streamlit Application

Run the interactive Streamlit interface:

streamlit run streamlit_app.py

Then open your browser to http://localhost:8501

Workflow:

Enter your Groq API key in the sidebar
Select a dataset from RAG Bench
Choose chunking strategy
Select embedding model
Choose LLM model
Click "Load Data & Create Collection"
Start chatting!
View retrieved documents
Run TRACE evaluation
Export chat history

FastAPI Backend

Run the REST API server:

python api.py

Or with uvicorn:

uvicorn api:app --reload --host 0.0.0.0 --port 8000

API documentation available at: http://localhost:8000/docs

API Endpoints

GET / - Root endpoint
GET /health - Health check
GET /datasets - List available datasets
GET /models/embedding - List embedding models
GET /models/llm - List LLM models
GET /chunking-strategies - List chunking strategies
GET /collections - List all collections
GET /collections/{name} - Get collection info
POST /load-dataset - Load dataset and create collection
POST /query - Query the RAG system
GET /chat-history - Get chat history
DELETE /chat-history - Clear chat history
POST /evaluate - Run TRACE evaluation
DELETE /collections/{name} - Delete collection

Python API

Use the components programmatically:

from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator

# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)

# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
    collection_name="my_collection",
    embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
    chunking_strategy="hybrid",
    dataset_data=dataset
)

# Initialize LLM
llm = GroqLLMClient(
    api_key="your_api_key",
    model_name="llama-3.1-8b-instant"
)

# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)

# Query
result = rag.query("What is the capital of France?")
print(result["response"])

# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...]  # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)

Project Structure

RAG Capstone Project/
├── __init__.py                 # Package initialization
├── config.py                   # Configuration management
├── dataset_loader.py           # RAG Bench dataset loader
├── chunking_strategies.py      # Document chunking strategies
├── embedding_models.py         # Embedding model implementations
├── vector_store.py            # ChromaDB integration
├── llm_client.py              # Groq LLM client with rate limiting
├── trace_evaluator.py         # TRACE evaluation metrics
├── streamlit_app.py           # Streamlit chat interface
├── api.py                     # FastAPI REST API
├── requirements.txt           # Python dependencies
├── .env.example              # Environment variables template
├── .gitignore                # Git ignore file
└── README.md                 # This file

TRACE Metrics Explained

Utilization (U)

Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.

Relevance (R)

Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.

Adherence (A)

Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.

Completeness (C)

Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.

Deployment Options

Heroku

Create Procfile:

web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT

Deploy:

heroku create your-app-name
git push heroku main

Docker

Create Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501 8000

CMD ["streamlit", "run", "streamlit_app.py"]

Build and run:

docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone

Cloud Run / AWS / Azure

The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.

Configuration

Edit config.py or set environment variables in .env:

GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO

Rate Limiting

The application implements rate limiting for Groq API calls:

Maximum 30 requests per minute (configurable)
Automatic delay of 2 seconds between requests
Smart waiting when rate limit is reached

Troubleshooting

ChromaDB Issues

If you encounter ChromaDB errors, try deleting the chroma_db directory and recreating collections.

Embedding Model Loading

Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:

Using a smaller model
Reducing batch size
Using CPU instead of GPU

API Key Errors

Ensure your Groq API key is correctly set in the .env file or passed to the application.

License

MIT License

Contributors

RAG Capstone Team

Support

For issues and questions, please open an issue on the GitHub repository.