CapStoneRAG10 / docs /README.md
Developer
Initial commit for HuggingFace Spaces - RAG Capstone Project with Qdrant Cloud
1d10b0a
# RAG Capstone Project
A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.
## Features
- πŸ” **Multiple RAG Bench Datasets**: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
- 🧩 **Chunking Strategies**: Dense, Sparse, Hybrid, Re-ranking
- πŸ€– **Medical Embedding Models**:
- sentence-transformers/embeddinggemma-300m-medical
- emilyalsentzer/Bio_ClinicalBERT
- Simonlee711/Clinical_ModernBERT
- πŸ’Ύ **ChromaDB Vector Storage**: Persistent vector storage with efficient retrieval
- πŸ¦™ **Groq LLM Integration**: With rate limiting (30 RPM)
- meta-llama/llama-4-maverick-17b-128e-instruct
- llama-3.1-8b-instant
- openai/gpt-oss-120b
- πŸ“Š **TRACE Evaluation Metrics**:
- **U**tilization: How well the system uses retrieved documents
- **R**elevance: Relevance of retrieved documents to the query
- **A**dherence: How well the response adheres to the retrieved context
- **C**ompleteness: How complete the response is
- πŸ’¬ **Chat Interface**: Streamlit-based interactive chat with history
- πŸ”Œ **REST API**: FastAPI backend for integration
## Installation
### Prerequisites
- Python 3.8+
- pip
- Groq API key
### Setup
1. Clone the repository:
```bash
git clone <repository-url>
cd "RAG Capstone Project"
```
2. Create a virtual environment:
```bash
python -m venv venv
```
3. Activate the virtual environment:
**Windows:**
```bash
.\venv\Scripts\activate
```
**Linux/Mac:**
```bash
source venv/bin/activate
```
4. Install dependencies:
```bash
pip install -r requirements.txt
```
5. Create a `.env` file from the example:
```bash
copy .env.example .env
```
6. Edit `.env` and add your Groq API key:
```
GROQ_API_KEY=your_groq_api_key_here
```
## Usage
### Streamlit Application
Run the interactive Streamlit interface:
```bash
streamlit run streamlit_app.py
```
Then open your browser to `http://localhost:8501`
**Workflow:**
1. Enter your Groq API key in the sidebar
2. Select a dataset from RAG Bench
3. Choose chunking strategy
4. Select embedding model
5. Choose LLM model
6. Click "Load Data & Create Collection"
7. Start chatting!
8. View retrieved documents
9. Run TRACE evaluation
10. Export chat history
### FastAPI Backend
Run the REST API server:
```bash
python api.py
```
Or with uvicorn:
```bash
uvicorn api:app --reload --host 0.0.0.0 --port 8000
```
API documentation available at: `http://localhost:8000/docs`
#### API Endpoints
- `GET /` - Root endpoint
- `GET /health` - Health check
- `GET /datasets` - List available datasets
- `GET /models/embedding` - List embedding models
- `GET /models/llm` - List LLM models
- `GET /chunking-strategies` - List chunking strategies
- `GET /collections` - List all collections
- `GET /collections/{name}` - Get collection info
- `POST /load-dataset` - Load dataset and create collection
- `POST /query` - Query the RAG system
- `GET /chat-history` - Get chat history
- `DELETE /chat-history` - Clear chat history
- `POST /evaluate` - Run TRACE evaluation
- `DELETE /collections/{name}` - Delete collection
### Python API
Use the components programmatically:
```python
from config import settings
from dataset_loader import RAGBenchLoader
from vector_store import ChromaDBManager
from llm_client import GroqLLMClient, RAGPipeline
from trace_evaluator import TRACEEvaluator
# Load dataset
loader = RAGBenchLoader()
dataset = loader.load_dataset("hotpotqa", max_samples=100)
# Create vector store
vector_store = ChromaDBManager()
vector_store.load_dataset_into_collection(
collection_name="my_collection",
embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
chunking_strategy="hybrid",
dataset_data=dataset
)
# Initialize LLM
llm = GroqLLMClient(
api_key="your_api_key",
model_name="llama-3.1-8b-instant"
)
# Create RAG pipeline
rag = RAGPipeline(llm, vector_store)
# Query
result = rag.query("What is the capital of France?")
print(result["response"])
# Evaluate
evaluator = TRACEEvaluator()
test_cases = [...] # Your test cases
results = evaluator.evaluate_batch(test_cases)
print(results)
```
## Project Structure
```
RAG Capstone Project/
β”œβ”€β”€ __init__.py # Package initialization
β”œβ”€β”€ config.py # Configuration management
β”œβ”€β”€ dataset_loader.py # RAG Bench dataset loader
β”œβ”€β”€ chunking_strategies.py # Document chunking strategies
β”œβ”€β”€ embedding_models.py # Embedding model implementations
β”œβ”€β”€ vector_store.py # ChromaDB integration
β”œβ”€β”€ llm_client.py # Groq LLM client with rate limiting
β”œβ”€β”€ trace_evaluator.py # TRACE evaluation metrics
β”œβ”€β”€ streamlit_app.py # Streamlit chat interface
β”œβ”€β”€ api.py # FastAPI REST API
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env.example # Environment variables template
β”œβ”€β”€ .gitignore # Git ignore file
└── README.md # This file
```
## TRACE Metrics Explained
### Utilization (U)
Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.
### Relevance (R)
Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.
### Adherence (A)
Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.
### Completeness (C)
Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.
## Deployment Options
### Heroku
1. Create `Procfile`:
```
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
api: uvicorn api:app --host=0.0.0.0 --port=$PORT
```
2. Deploy:
```bash
heroku create your-app-name
git push heroku main
```
### Docker
Create `Dockerfile`:
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501 8000
CMD ["streamlit", "run", "streamlit_app.py"]
```
Build and run:
```bash
docker build -t rag-capstone .
docker run -p 8501:8501 -p 8000:8000 rag-capstone
```
### Cloud Run / AWS / Azure
The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.
## Configuration
Edit `config.py` or set environment variables in `.env`:
```env
GROQ_API_KEY=your_api_key
CHROMA_PERSIST_DIRECTORY=./chroma_db
GROQ_RPM_LIMIT=30
RATE_LIMIT_DELAY=2.0
LOG_LEVEL=INFO
```
## Rate Limiting
The application implements rate limiting for Groq API calls:
- Maximum 30 requests per minute (configurable)
- Automatic delay of 2 seconds between requests
- Smart waiting when rate limit is reached
## Troubleshooting
### ChromaDB Issues
If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections.
### Embedding Model Loading
Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:
- Using a smaller model
- Reducing batch size
- Using CPU instead of GPU
### API Key Errors
Ensure your Groq API key is correctly set in the `.env` file or passed to the application.
## License
MIT License
## Contributors
RAG Capstone Team
## Support
For issues and questions, please open an issue on the GitHub repository.