| | --- |
| | title: RAG10 |
| | emoji: π₯ |
| | colorFrom: red |
| | colorTo: pink |
| | sdk: streamlit |
| | sdk_version: 1.31.0 |
| | app_file: streamlit_app.py |
| | pinned: false |
| | short_description: Capstone project for IIITH |
| | --- |
| | |
| | # RAG Capstone Project |
| |
|
| | A comprehensive Retrieval-Augmented Generation (RAG) system with TRACE evaluation metrics for medical/clinical domains. |
| |
|
| | ## Features |
| |
|
| | - π **Multiple RAG Bench Datasets**: HotpotQA, 2WikiMultihopQA, MuSiQue, Natural Questions, TriviaQA |
| | - π§© **Chunking Strategies**: Dense, Sparse, Hybrid, Re-ranking |
| | - π€ **Medical Embedding Models**: |
| | - sentence-transformers/embeddinggemma-300m-medical |
| | - emilyalsentzer/Bio_ClinicalBERT |
| | - Simonlee711/Clinical_ModernBERT |
| | - πΎ **ChromaDB Vector Storage**: Persistent vector storage with efficient retrieval |
| | - π¦ **Groq LLM Integration**: With rate limiting (30 RPM) |
| | - meta-llama/llama-4-maverick-17b-128e-instruct |
| | - llama-3.1-8b-instant |
| | - openai/gpt-oss-120b |
| | - π **TRACE Evaluation Metrics**: |
| | - **U**tilization: How well the system uses retrieved documents |
| | - **R**elevance: Relevance of retrieved documents to the query |
| | - **A**dherence: How well the response adheres to the retrieved context |
| | - **C**ompleteness: How complete the response is |
| | - π¬ **Chat Interface**: Streamlit-based interactive chat with history |
| | - π **REST API**: FastAPI backend for integration |
| |
|
| | ## Installation |
| |
|
| | ### Prerequisites |
| |
|
| | - Python 3.8+ |
| | - pip |
| | - Groq API key |
| |
|
| | ### Setup |
| |
|
| | 1. Clone the repository: |
| | ```bash |
| | git clone <repository-url> |
| | cd "RAG Capstone Project" |
| | ``` |
| |
|
| | 2. Create a virtual environment: |
| | ```bash |
| | python -m venv venv |
| | ``` |
| |
|
| | 3. Activate the virtual environment: |
| |
|
| | **Windows:** |
| | ```bash |
| | .\venv\Scripts\activate |
| | ``` |
| |
|
| | **Linux/Mac:** |
| | ```bash |
| | source venv/bin/activate |
| | ``` |
| |
|
| | 4. Install dependencies: |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | 5. Create a `.env` file from the example: |
| | ```bash |
| | copy .env.example .env |
| | ``` |
| |
|
| | 6. Edit `.env` and add your Groq API key: |
| | ``` |
| | GROQ_API_KEY=your_groq_api_key_here |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Streamlit Application |
| |
|
| | Run the interactive Streamlit interface: |
| |
|
| | ```bash |
| | streamlit run streamlit_app.py |
| | ``` |
| |
|
| | Then open your browser to `http://localhost:8501` |
| |
|
| | **Workflow:** |
| | 1. Enter your Groq API key in the sidebar |
| | 2. Select a dataset from RAG Bench |
| | 3. Choose chunking strategy |
| | 4. Select embedding model |
| | 5. Choose LLM model |
| | 6. Click "Load Data & Create Collection" |
| | 7. Start chatting! |
| | 8. View retrieved documents |
| | 9. Run TRACE evaluation |
| | 10. Export chat history |
| |
|
| | ### FastAPI Backend |
| |
|
| | Run the REST API server: |
| |
|
| | ```bash |
| | python api.py |
| | ``` |
| |
|
| | Or with uvicorn: |
| | ```bash |
| | uvicorn api:app --reload --host 0.0.0.0 --port 8000 |
| | ``` |
| |
|
| | API documentation available at: `http://localhost:8000/docs` |
| |
|
| | #### API Endpoints |
| |
|
| | - `GET /` - Root endpoint |
| | - `GET /health` - Health check |
| | - `GET /datasets` - List available datasets |
| | - `GET /models/embedding` - List embedding models |
| | - `GET /models/llm` - List LLM models |
| | - `GET /chunking-strategies` - List chunking strategies |
| | - `GET /collections` - List all collections |
| | - `GET /collections/{name}` - Get collection info |
| | - `POST /load-dataset` - Load dataset and create collection |
| | - `POST /query` - Query the RAG system |
| | - `GET /chat-history` - Get chat history |
| | - `DELETE /chat-history` - Clear chat history |
| | - `POST /evaluate` - Run TRACE evaluation |
| | - `DELETE /collections/{name}` - Delete collection |
| |
|
| | ### Python API |
| |
|
| | Use the components programmatically: |
| |
|
| | ```python |
| | from config import settings |
| | from dataset_loader import RAGBenchLoader |
| | from vector_store import ChromaDBManager |
| | from llm_client import GroqLLMClient, RAGPipeline |
| | from trace_evaluator import TRACEEvaluator |
| | |
| | # Load dataset |
| | loader = RAGBenchLoader() |
| | dataset = loader.load_dataset("hotpotqa", max_samples=100) |
| | |
| | # Create vector store |
| | vector_store = ChromaDBManager() |
| | vector_store.load_dataset_into_collection( |
| | collection_name="my_collection", |
| | embedding_model_name="emilyalsentzer/Bio_ClinicalBERT", |
| | chunking_strategy="hybrid", |
| | dataset_data=dataset |
| | ) |
| | |
| | # Initialize LLM |
| | llm = GroqLLMClient( |
| | api_key="your_api_key", |
| | model_name="llama-3.1-8b-instant" |
| | ) |
| | |
| | # Create RAG pipeline |
| | rag = RAGPipeline(llm, vector_store) |
| | |
| | # Query |
| | result = rag.query("What is the capital of France?") |
| | print(result["response"]) |
| | |
| | # Evaluate |
| | evaluator = TRACEEvaluator() |
| | test_cases = [...] # Your test cases |
| | results = evaluator.evaluate_batch(test_cases) |
| | print(results) |
| | ``` |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | RAG Capstone Project/ |
| | βββ __init__.py # Package initialization |
| | βββ config.py # Configuration management |
| | βββ dataset_loader.py # RAG Bench dataset loader |
| | βββ chunking_strategies.py # Document chunking strategies |
| | βββ embedding_models.py # Embedding model implementations |
| | βββ vector_store.py # ChromaDB integration |
| | βββ llm_client.py # Groq LLM client with rate limiting |
| | βββ trace_evaluator.py # TRACE evaluation metrics |
| | βββ streamlit_app.py # Streamlit chat interface |
| | βββ api.py # FastAPI REST API |
| | βββ requirements.txt # Python dependencies |
| | βββ .env.example # Environment variables template |
| | βββ .gitignore # Git ignore file |
| | βββ README.md # This file |
| | ``` |
| |
|
| | ## TRACE Metrics Explained |
| |
|
| | ### Utilization (U) |
| | Measures how well the system uses the retrieved documents in generating the response. Higher scores indicate that the system effectively incorporates information from multiple retrieved documents. |
| |
|
| | ### Relevance (R) |
| | Evaluates the relevance of retrieved documents to the user's query. Uses lexical overlap and keyword matching to determine if the right documents were retrieved. |
| |
|
| | ### Adherence (A) |
| | Assesses how well the generated response adheres to the retrieved context. Ensures the response is grounded in the provided documents rather than hallucinated. |
| |
|
| | ### Completeness (C) |
| | Evaluates how complete the response is in answering the query. Considers response length, question type, and comparison with ground truth if available. |
| |
|
| | ## Deployment Options |
| |
|
| | ### Heroku |
| |
|
| | 1. Create `Procfile`: |
| | ``` |
| | web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0 |
| | api: uvicorn api:app --host=0.0.0.0 --port=$PORT |
| | ``` |
| |
|
| | 2. Deploy: |
| | ```bash |
| | heroku create your-app-name |
| | git push heroku main |
| | ``` |
| |
|
| | ### Docker |
| |
|
| | Create `Dockerfile`: |
| | ```dockerfile |
| | FROM python:3.9-slim |
| | |
| | WORKDIR /app |
| | COPY requirements.txt . |
| | RUN pip install -r requirements.txt |
| | |
| | COPY . . |
| | |
| | EXPOSE 8501 8000 |
| | |
| | CMD ["streamlit", "run", "streamlit_app.py"] |
| | ``` |
| |
|
| | Build and run: |
| | ```bash |
| | docker build -t rag-capstone . |
| | docker run -p 8501:8501 -p 8000:8000 rag-capstone |
| | ``` |
| |
|
| | ### Cloud Run / AWS / Azure |
| |
|
| | The application can be deployed to any cloud platform that supports Python applications. See the respective platform documentation for deployment instructions. |
| |
|
| | ## Configuration |
| |
|
| | Edit `config.py` or set environment variables in `.env`: |
| |
|
| | ```env |
| | GROQ_API_KEY=your_api_key |
| | CHROMA_PERSIST_DIRECTORY=./chroma_db |
| | GROQ_RPM_LIMIT=30 |
| | RATE_LIMIT_DELAY=2.0 |
| | LOG_LEVEL=INFO |
| | ``` |
| |
|
| | ## Rate Limiting |
| |
|
| | The application implements rate limiting for Groq API calls: |
| | - Maximum 30 requests per minute (configurable) |
| | - Automatic delay of 2 seconds between requests |
| | - Smart waiting when rate limit is reached |
| |
|
| | ## Troubleshooting |
| |
|
| | ### ChromaDB Issues |
| | If you encounter ChromaDB errors, try deleting the `chroma_db` directory and recreating collections. |
| |
|
| | ### Embedding Model Loading |
| | Medical embedding models may require significant memory. If you encounter out-of-memory errors, try: |
| | - Using a smaller model |
| | - Reducing batch size |
| | - Using CPU instead of GPU |
| |
|
| | ### API Key Errors |
| | Ensure your Groq API key is correctly set in the `.env` file or passed to the application. |
| |
|
| | ## License |
| |
|
| | MIT License |
| |
|
| | ## Contributors |
| |
|
| | RAG Capstone Team |
| |
|
| | ## Support |
| |
|
| | For issues and questions, please open an issue on the GitHub repository. |
| |
|