Spaces:

gopikrishnait
/

CapStoneRAG10

Sleeping

App Files Files Community

CapStoneRAG10 / docs /README.md

Developer

Initial commit for HuggingFace Spaces - RAG Capstone Project with Qdrant Cloud

1d10b0a about 1 month ago

preview code

raw

history blame contribute delete

7.66 kB

	# RAG Capstone Project

	A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains.

	## Features

	- 🔍 Multiple RAG Bench Datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA
	- 🧩 Chunking Strategies: Dense, Sparse, Hybrid, Re-ranking
	- 🤖 Medical Embedding Models:
	- sentence-transformers/embeddinggemma-300m-medical
	- emilyalsentzer/Bio_ClinicalBERT
	- Simonlee711/Clinical_ModernBERT
	- 💾 ChromaDB Vector Storage: Persistent vector storage with efficient retrieval
	- 🦙 Groq LLM Integration: With rate limiting (30 RPM)
	- meta-llama/llama-4-maverick-17b-128e-instruct
	- llama-3.1-8b-instant
	- openai/gpt-oss-120b
	- 📊 TRACE Evaluation Metrics:
	- Utilization: How well the system uses retrieved documents
	- Relevance: Relevance of retrieved documents to the query
	- Adherence: How well the response adheres to the retrieved context
	- Completeness: How complete the response is
	- 💬 Chat Interface: Streamlit-based interactive chat with history
	- 🔌 REST API: FastAPI backend for integration

	## Installation

	### Prerequisites

	- Python 3.8+
	- pip
	- Groq API key

	### Setup

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd "RAG Capstone Project"
	```

	2. Create a virtual environment:
	```bash
	python -m venv venv
	```

	3. Activate the virtual environment:

	Windows:
	```bash
	.\venv\Scripts\activate
	```

	Linux/Mac:
	```bash
	source venv/bin/activate
	```

	4. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	5. Create a `.env` file from the example:
	```bash
	copy .env.example .env
	```

	6. Edit `.env` and add your Groq API key:
	```
	GROQ_API_KEY=your_groq_api_key_here
	```

	## Usage

	### Streamlit Application

	Run the interactive Streamlit interface:

	```bash
	streamlit run streamlit_app.py
	```

	Then open your browser to `http://localhost:8501`

	Workflow:
	1. Enter your Groq API key in the sidebar
	2. Select a dataset from RAG Bench
	3. Choose chunking strategy
	4. Select embedding model
	5. Choose LLM model
	6. Click "Load Data & Create Collection"
	7. Start chatting!
	8. View retrieved documents
	9. Run TRACE evaluation
	10. Export chat history

	### FastAPI Backend

	Run the REST API server:

	```bash
	python api.py
	```

	Or with uvicorn:
	```bash
	uvicorn api:app --reload --host 0.0.0.0 --port 8000
	```

	API documentation available at: `http://localhost:8000/docs`

	#### API Endpoints

	- `GET /` - Root endpoint
	- `GET /health` - Health check
	- `GET /datasets` - List available datasets
	- `GET /models/embedding` - List embedding models
	- `GET /models/llm` - List LLM models
	- `GET /chunking-strategies` - List chunking strategies
	- `GET /collections` - List all collections
	- `GET /collections/{name}` - Get collection info
	- `POST /load-dataset` - Load dataset and create collection
	- `POST /query` - Query the RAG system
	- `GET /chat-history` - Get chat history
	- `DELETE /chat-history` - Clear chat history
	- `POST /evaluate` - Run TRACE evaluation
	- `DELETE /collections/{name}` - Delete collection

	### Python API

	Use the components programmatically:

	```python
	from config import settings
	from dataset_loader import RAGBenchLoader
	from vector_store import ChromaDBManager
	from llm_client import GroqLLMClient, RAGPipeline
	from trace_evaluator import TRACEEvaluator

	# Load dataset
	loader = RAGBenchLoader()
	dataset = loader.load_dataset("hotpotqa", max_samples=100)

	# Create vector store
	vector_store = ChromaDBManager()
	vector_store.load_dataset_into_collection(
	collection_name="my_collection",
	embedding_model_name="emilyalsentzer/Bio_ClinicalBERT",
	chunking_strategy="hybrid",
	dataset_data=dataset
	)

	# Initialize LLM
	llm = GroqLLMClient(
	api_key="your_api_key",
	model_name="llama-3.1-8b-instant"
	)

	# Create RAG pipeline
	rag = RAGPipeline(llm, vector_store)

	# Query
	result = rag.query("What is the capital of France?")
	print(result["response"])

	# Evaluate
	evaluator = TRACEEvaluator()
	test_cases = [...] # Your test cases
	results = evaluator.evaluate_batch(test_cases)
	print(results)
	```

	## Project Structure

	```
	RAG Capstone Project/
	├── __init__.py # Package initialization
	├── config.py # Configuration management
	├── dataset_loader.py # RAG Bench dataset loader
	├── chunking_strategies.py # Document chunking strategies
	├── embedding_models.py # Embedding model implementations
	├── vector_store.py # ChromaDB integration
	├── llm_client.py # Groq LLM client with rate limiting
	├── trace_evaluator.py # TRACE evaluation metrics
	├── streamlit_app.py # Streamlit chat interface
	├── api.py # FastAPI REST API
	├── requirements.txt # Python dependencies
	├── .env.example # Environment variables template
	├── .gitignore # Git ignore file
	└── README.md # This file
	```

	## TRACE Metrics Explained

	### Utilization (U)
	Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents.

	### Relevance (R)
	Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved.

	### Adherence (A)
	Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated.

	### Completeness (C)
	Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available.

	## Deployment Options

	### Heroku

	1. Create `Procfile`:
	```
	web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
	api: uvicorn api:app --host=0.0.0.0 --port=$PORT
	```

	2. Deploy:
	```bash
	heroku create your-app-name
	git push heroku main
	```

	### Docker

	Create `Dockerfile`:
	```dockerfile
	FROM python:3.9-slim

	WORKDIR /app
	COPY requirements.txt .
	RUN pip install -r requirements.txt

	COPY . .

	EXPOSE 8501 8000

	CMD ["streamlit", "run", "streamlit_app.py"]
	```

	Build and run:
	```bash
	docker build -t rag-capstone .
	docker run -p 8501:8501 -p 8000:8000 rag-capstone
	```

	### Cloud Run / AWS / Azure

	The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions.

	## Configuration

	Edit `config.py` or set environment variables in `.env`:

	```env
	GROQ_API_KEY=your_api_key
	CHROMA_PERSIST_DIRECTORY=./chroma_db
	GROQ_RPM_LIMIT=30
	RATE_LIMIT_DELAY=2.0
	LOG_LEVEL=INFO
	```

	## Rate Limiting

	The application implements rate limiting for Groq API calls:
	- Maximum 30 requests per minute (configurable)
	- Automatic delay of 2 seconds between requests
	- Smart waiting when rate limit is reached

	## Troubleshooting

	### ChromaDB Issues
	If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections.

	### Embedding Model Loading
	Medical embedding models may require significant memory. If you encounter out-of-memory errors, try:
	- Using a smaller model
	- Reducing batch size
	- Using CPU instead of GPU

	### API Key Errors
	Ensure your Groq API key is correctly set in the `.env` file or passed to the application.

	## License

	MIT License

	## Contributors

	RAG Capstone Team

	## Support

	For issues and questions, please open an issue on the GitHub repository.