Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
title: Hierarchical RAG Evaluation
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
license: mit
Hierarchical RAG Evaluation System
A comprehensive system for comparing Standard RAG vs Hierarchical RAG approaches, focusing on both accuracy and speed improvements through metadata-based filtering.
Features
- Dual RAG Pipelines: Compare Base-RAG and Hier-RAG side-by-side
- Hierarchical Classification: 3-level taxonomy (domain β section β topic)
- Multiple Domains: Pre-configured hierarchies for Hospital, Banking, and Fluid Simulation
- Comprehensive Evaluation: Quantitative metrics (Hit@k, MRR, latency) and qualitative testing
- Gradio UI: User-friendly interface with API access
- MCP Server: Additional API server for programmatic access
Architecture
User Query β Hierarchical Filter β Vector Search β Re-ranking β LLM Generation β Answer
β
(Hier-RAG only)
Quick Start
Prerequisites
- Python 3.9+
- OpenAI API key (for LLM generation)
- 4GB+ RAM recommended
Installation
- Clone the repository:
git clone <repository-url>
cd hierarchical-rag-eval
- Create virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Set environment variables:
Create a .env file in the project root:
OPENAI_API_KEY=your-openai-api-key-here
VECTOR_DB_PATH=./data/chroma
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
LLM_MODEL=gpt-3.5-turbo
Important: Never commit .env file to version control!
- Run the application:
python app.py
Access at http://localhost:7860
π Deployment to Hugging Face Spaces
Step 1: Create Space
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Fill in details:
- Owner:
AP-UW(organization) - Space name:
hierarchical-rag-eval - License: MIT
- SDK: Gradio
- Python version: 3.10
- Visibility: Private
- Owner:
Step 2: Configure Persistent Storage
- Go to Space Settings β Storage
- Enable Persistent Storage (FREE tier available)
- This ensures your vector database persists across restarts
Step 3: Add Secrets
- Go to Space Settings β Repository Secrets
- Add the following secrets:
| Secret Name | Value | Description |
|---|---|---|
OPENAI_API_KEY |
sk-... |
Your OpenAI API key |
VECTOR_DB_PATH |
/data/chroma |
Path to persistent storage |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Embedding model |
LLM_MODEL |
gpt-3.5-turbo |
OpenAI model |
Note: Secrets are encrypted and not visible in logs.
Step 4: Prepare Code for Deployment
Update app.py to read from HF Spaces environment:
import os
from dotenv import load_dotenv
# Load .env for local development only
if not os.getenv("SPACE_ID"): # SPACE_ID is set by HF Spaces
load_dotenv()
# Verify API key
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise ValueError("β οΈ OPENAI_API_KEY not found! Set it in Space Settings β Secrets")
Step 5: Push to Hugging Face
# Add HF Space as remote
git remote add space https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval
git branch -M main
# Push code (will trigger automatic build)
git push space main
Step 6: Monitor Deployment
- Go to your Space URL:
https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval - Check Logs tab for build progress
- Wait for "Running" status (may take 5-10 minutes on first build)
Step 7: Verify Deployment
Test the deployed app:
from gradio_client import Client
client = Client("https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval")
# Initialize system
result = client.predict(api_name="/initialize")
print(result) # Should show "System initialized successfully!"
π MCP Server Usage
The MCP (Model Context Protocol) Server provides RESTful API access to all RAG functionalities.
Running MCP Server (Local)
# Terminal 1: Start MCP Server
python mcp_server.py
# Server will run at http://localhost:8000
# API docs available at http://localhost:8000/docs
Running MCP Server (Production)
Deploy separately to a hosting service:
Option 1: Railway
railway login
railway init
railway up
Option 2: Render
- Connect GitHub repo
- Set build command:
pip install -r requirements.txt - Set start command:
uvicorn mcp_server:app --host 0.0.0.0 --port $PORT
Option 3: Docker
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "mcp_server:app", "--host", "0.0.0.0", "--port", "8000"]
MCP API Endpoints
Health Check
curl http://localhost:8000/health
Response:
{"status": "healthy"}
Initialize System
curl -X POST http://localhost:8000/initialize \
-H "Content-Type: application/json" \
-d '{
"persist_directory": "./data/chroma",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
}'
Index Documents
curl -X POST http://localhost:8000/index \
-H "Content-Type: application/json" \
-d '{
"filepaths": ["./docs/document1.pdf", "./docs/document2.txt"],
"hierarchy": "hospital",
"chunk_size": 512,
"chunk_overlap": 50,
"collection_name": "medical_docs"
}'
Query RAG System
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "What are the patient admission procedures?",
"pipeline": "both",
"n_results": 5,
"auto_infer": true
}'
Response:
{
"query": "What are the patient admission procedures?",
"base_rag": {
"answer": "...",
"retrieval_time": 0.052,
"total_time": 1.234
},
"hier_rag": {
"answer": "...",
"retrieval_time": 0.031,
"total_time": 0.987,
"applied_filters": {"level1": "Clinical Care"}
},
"speedup": 1.25
}
System Information
curl http://localhost:8000/info
Python Client Example
import requests
# Base URL
BASE_URL = "http://localhost:8000"
# Initialize
response = requests.post(f"{BASE_URL}/initialize", json={
"persist_directory": "./data/chroma"
})
print(response.json())
# Index documents
response = requests.post(f"{BASE_URL}/index", json={
"filepaths": ["document.pdf"],
"hierarchy": "hospital",
"collection_name": "my_docs"
})
print(response.json())
# Query
response = requests.post(f"{BASE_URL}/query", json={
"query": "What are KYC requirements?",
"pipeline": "both",
"n_results": 5
})
result = response.json()
print(f"Base-RAG: {result['base_rag']['answer']}")
print(f"Hier-RAG: {result['hier_rag']['answer']}")
print(f"Speedup: {result['speedup']:.2f}x")
π Evaluation Methodology
Dataset
We evaluate on three domain-specific query sets:
Hospital Domain (n=5 queries)
- Clinical Care, Quality & Safety, Education
- Example: "What are the patient admission procedures?"
Banking Domain (n=5 queries)
- Retail Banking, Risk Management, Compliance
- Example: "What are the KYC requirements?"
Fluid Simulation Domain (n=5 queries)
- Numerical Methods, Physical Models, Applications
- Example: "How does the SIMPLE algorithm work?"
Metrics
Retrieval Metrics
Hit@k: Presence of at least one relevant document in top-k results
- Formula:
1 if any(relevant_doc in top_k) else 0 - Higher is better (max = 1.0)
- Formula:
Precision@k: Proportion of relevant documents in top-k
- Formula:
relevant_in_top_k / k - Range: 0.0 to 1.0
- Formula:
Recall@k: Proportion of relevant documents retrieved
- Formula:
relevant_in_top_k / total_relevant - Range: 0.0 to 1.0
- Formula:
MRR (Mean Reciprocal Rank): Average of reciprocal ranks
Formula:
1 / rank_of_first_relevant_doc- Range: 0.0 to 1.0
Performance Metrics
- Retrieval Time: Time to fetch relevant documents from vector DB
- Generation Time: Time for LLM to generate answer
- Total Time: End-to-end query response time
- Speedup: Ratio of Base-RAG to Hier-RAG total time
- Formula:
base_total_time / hier_total_time 1.0 means Hier-RAG is faster
- Formula:
Quality Metrics
- Semantic Similarity: Cosine similarity between generated answer and reference
- Uses sentence-transformers embeddings
- Range: 0.0 to 1.0
Evaluation Process
# Run evaluation via Gradio API
from gradio_client import Client
client = Client("http://localhost:7860")
result = client.predict(
query_dataset="hospital",
n_queries=10,
k_values="1,3,5",
api_name="/evaluate"
)
# Results saved to ./reports/evaluation_TIMESTAMP.csv
Sample Results
Hospital Domain Evaluation (5 queries)
| Query | Expected Domain | Base Time (s) | Hier Time (s) | Speedup | Filter Match |
|---|---|---|---|---|---|
| Patient admission procedures? | Clinical Care | 1.97 | 2.76 | 0.72x | β Clinical Care |
| Infection control policies? | Quality & Safety | 1.51 | 3.11 | 0.49x | β οΈ policy only |
| Medication error reporting? | Quality & Safety | 1.03 | 2.41 | 0.43x | β οΈ report only |
| Training for new nurses? | Education | 10.09 | 5.62 | 1.80x | β None |
| Emergency response procedures? | Clinical Care | 2.32 | 1.49 | 1.56x | β None |
Average Speedup: 0.96x (Base-RAG and Hier-RAG roughly equal)
Key Findings
When Hier-RAG Excels (1.5-2.3x faster):
- β Query matches hierarchy taxonomy well
- β Auto-inference correctly identifies domain
- β Filtered subset is significantly smaller (<30% of corpus)
- Example: "Training for new nurses" β 1.80x speedup
When Hier-RAG Underperforms (<1.0x):
- β Auto-inference fails or misclassifies domain
- β Query is too general/cross-domain
- β Filter overhead exceeds retrieval time savings
- Example: "Infection control policies" β 0.49x speedup
Auto-Inference Accuracy:
- Hospital domain: 40% (2/5 queries correctly classified)
- Needs improvement via LLM-based classification
Retrieval Time Improvement:
- When filters applied correctly: 30-60% faster retrieval
- Overall average: 15% faster retrieval (including misses)
Fluid Simulation Domain Evaluation (5 queries)
| Query | Expected Domain | Base Time (s) | Hier Time (s) | Speedup |
|---|---|---|---|---|
| How does SIMPLE algorithm work? | Numerical Methods | 1.45 | 3.69 | 0.39x |
| What turbulence models available? | Physical Models | 1.60 | 1.37 | 1.16x |
| Set up cavity flow benchmark? | Validation | 4.46 | 2.40 | 1.86x |
| Mesh generation techniques? | Numerical Methods | 2.64 | 2.87 | 0.92x |
| Enable parallel computing? | Software & Tools | 5.51 | 2.35 | 2.34x |
Average Speedup: 1.33x (Hier-RAG 33% faster on average)
Visualization
To generate evaluation charts:
# Add to your evaluation workflow
import matplotlib.pyplot as plt
import pandas as pd
def generate_evaluation_charts(csv_path):
"""Generate comprehensive evaluation visualizations."""
df = pd.read_csv(csv_path)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Base-RAG vs Hier-RAG Performance Comparison', fontsize=16)
# Chart 1: Average Total Time
times = df[['base_total_time', 'hier_total_time']].mean()
axes[0, 0].bar(['Base-RAG', 'Hier-RAG'], times, color=['#3498db', '#e74c3c'])
axes[0, 0].set_ylabel('Time (seconds)')
axes[0, 0].set_title('Average Total Query Time')
axes[0, 0].grid(axis='y', alpha=0.3)
# Chart 2: Speedup Distribution
axes[0, 1].hist(df['speedup'], bins=10, color='#2ecc71', edgecolor='black')
axes[0, 1].axvline(1.0, color='red', linestyle='--', label='No improvement')
axes[0, 1].set_xlabel('Speedup Factor')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Speedup Distribution')
axes[0, 1].legend()
# Chart 3: Retrieval Time Comparison
axes[1, 0].scatter(df['base_retrieval_time'], df['hier_retrieval_time'],
s=100, alpha=0.6, color='#9b59b6')
max_val = max(df['base_retrieval_time'].max(), df['hier_retrieval_time'].max())
axes[1, 0].plot([0, max_val], [0, max_val], 'r--', label='Equal performance')
axes[1, 0].set_xlabel('Base-RAG Retrieval Time (s)')
axes[1, 0].set_ylabel('Hier-RAG Retrieval Time (s)')
axes[1, 0].set_title('Retrieval Time Comparison')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)
# Chart 4: Query-wise Speedup
axes[1, 1].barh(range(len(df)), df['speedup'], color='#f39c12')
axes[1, 1].axvline(1.0, color='red', linestyle='--', linewidth=2)
axes[1, 1].set_xlabel('Speedup Factor')
axes[1, 1].set_ylabel('Query Index')
axes[1, 1].set_title('Per-Query Speedup')
axes[1, 1].grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(csv_path.replace('.csv', '_charts.png'), dpi=300, bbox_inches='tight')
print(f"π Charts saved to: {csv_path.replace('.csv', '_charts.png')}")
# Usage
generate_evaluation_charts('./reports/evaluation_20251030_012814.csv')
π§ Using the API with gradio_client
Installation
pip install gradio_client
Basic Usage
from gradio_client import Client
# Connect to local instance
client = Client("http://localhost:7860")
# Or connect to deployed HF Space
client = Client("https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval")
Complete Workflow Example
from gradio_client import Client
import time
# Initialize client
client = Client("http://localhost:7860")
# Step 1: Initialize system
print("1οΈβ£ Initializing system...")
result = client.predict(api_name="/initialize")
print(result)
# Step 2: Upload and validate documents
print("\n2οΈβ£ Validating documents...")
status, preview, stats = client.predict(
files=["./docs/hospital_policy.pdf", "./docs/procedures.txt"],
hierarchy_choice="hospital",
mask_pii=False,
api_name="/upload"
)
print(f"Status: {status}")
print(f"Stats: {stats}")
# Step 3: Build RAG index
print("\n3οΈβ£ Building RAG index...")
build_status, build_stats = client.predict(
files=["./docs/hospital_policy.pdf", "./docs/procedures.txt"],
hierarchy="hospital",
chunk_size=512,
chunk_overlap=50,
mask_pii=False,
collection_name="hospital_docs",
api_name="/build"
)
print(f"Build Status: {build_status}")
print(f"Indexed Chunks: {build_stats.get('Total Chunks', 0)}")
# Step 4: Search with both pipelines
print("\n4οΈβ£ Querying RAG system...")
answer, contexts, metadata = client.predict(
query="What are the patient admission procedures?",
pipeline="Both",
n_results=5,
level1="",
level2="",
level3="",
doc_type="",
auto_infer=True,
api_name="/search"
)
print(f"Answer:\n{answer}\n")
print(f"Metadata:\n{metadata}")
# Step 5: Run evaluation
print("\n5οΈβ£ Running evaluation...")
summary, csv_path, json_path = client.predict(
query_dataset="hospital",
n_queries=5,
k_values="1,3,5",
api_name="/evaluate"
)
print(summary)
print(f"\nResults saved to:\n- {csv_path}\n- {json_path}")
Batch Processing Example
from gradio_client import Client
import pandas as pd
client = Client("http://localhost:7860")
# Initialize
client.predict(api_name="/initialize")
# Build index for multiple document sets
document_sets = {
"hospital_policies": ["./docs/policy1.pdf", "./docs/policy2.pdf"],
"clinical_protocols": ["./docs/protocol1.txt", "./docs/protocol2.txt"],
"training_manuals": ["./docs/manual1.pdf", "./docs/manual2.pdf"]
}
for collection_name, files in document_sets.items():
print(f"Building index for: {collection_name}")
status, stats = client.predict(
files=files,
hierarchy="hospital",
collection_name=collection_name,
api_name="/build"
)
print(f"β
{stats.get('Total Chunks', 0)} chunks indexed")
# Query multiple collections
queries = [
"What are admission procedures?",
"How to handle medication errors?",
"What training is required for nurses?"
]
results = []
for query in queries:
answer, contexts, metadata = client.predict(
query=query,
pipeline="Both",
api_name="/search"
)
results.append({
"query": query,
"answer": answer[:200], # First 200 chars
"metadata": metadata
})
# Save results
df = pd.DataFrame(results)
df.to_csv("batch_query_results.csv", index=False)
π Troubleshooting
Common Issues
1. OpenAI API Errors
Problem: Error generating answer: Incorrect API key provided
Solution:
# Check if API key is set
echo $OPENAI_API_KEY # Mac/Linux
echo %OPENAI_API_KEY% # Windows
# If empty, add to .env file
OPENAI_API_KEY=your-key-here
# For HF Spaces, add to Repository Secrets
2. ChromaDB Persistence Issues
Problem: sqlite3.OperationalError: database is locked
Solution:
# In core/index.py - use simpler client initialization
self.client = chromadb.PersistentClient(path=persist_directory)
# Or use EphemeralClient for testing (no persistence)
self.client = chromadb.EphemeralClient()
3. Memory Errors with Large PDFs
Problem: MemoryError or Killed when processing large documents
Solution:
# Reduce batch size in core/index.py
def add_documents(self, chunks, batch_size=50): # Reduced from 100
# Process in smaller batches
4. Slow Embedding Generation
Problem: Embedding generation takes >30 seconds
Solution:
# Use smaller embedding model in .env
EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster, 384 dimensions
# Or use OpenAI embeddings
EMBEDDING_MODEL=openai:text-embedding-3-small
5. Gradio API Connection Timeout
Problem: gradio_client times out when connecting
Solution:
from gradio_client import Client
# Increase timeout
client = Client("http://localhost:7860", timeout=120)
# Or check if server is running
import requests
response = requests.get("http://localhost:7860")
print(response.status_code) # Should be 200
6. HF Spaces Build Failure
Problem: Space shows "Build Failed" status
Solution:
- Check requirements.txt for incompatible versions
- View build logs in Space β Logs tab
- Common fix: Pin exact versions
# requirements.txt
torch==2.1.0 # Pin specific version
transformers==4.35.0
gradio==4.44.0
7. Evaluation Results Inconsistent
Problem: Speedup values sometimes <1.0 or highly variable
Solution:
- Run evaluation multiple times and average results
- Increase warmup queries before evaluation
- Check if auto-inference is working correctly
# Add warmup queries
for _ in range(3):
rag_comparator.compare("warmup query", n_results=5)
# Then run actual evaluation
Debug Mode
Enable verbose logging:
# Add to app.py
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('app.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
logger.debug("Debug mode enabled")
Health Check Endpoints
Test system components:
# Add to app.py for debugging
def system_health_check():
"""Check if all components are working."""
checks = {}
# Check 1: OpenAI API
try:
import openai
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
client.models.list()
checks["openai_api"] = "β
Connected"
except Exception as e:
checks["openai_api"] = f"β {str(e)}"
# Check 2: Vector DB
try:
if index_manager:
stats = index_manager.stores.get("rag_documents")
checks["vector_db"] = "β
Initialized"
else:
checks["vector_db"] = "β οΈ Not initialized"
except Exception as e:
checks["vector_db"] = f"β {str(e)}"
# Check 3: Embedding Model
try:
from core.index import EmbeddingModel
model = EmbeddingModel()
test_embedding = model.embed_query("test")
checks["embedding_model"] = f"β
Loaded ({len(test_embedding)} dims)"
except Exception as e:
checks["embedding_model"] = f"β {str(e)}"
return checks
# Add button to UI
with gr.Tab("System Health"):
health_btn = gr.Button("Check System Health")
health_output = gr.JSON(label="Health Status")
health_btn.click(system_health_check, outputs=health_output)
π Additional Resources
Documentation
- Gradio Documentation
- Gradio Client Guide
- ChromaDB Documentation
- OpenAI API Reference
- Sentence Transformers
Tutorials
Community
- GitHub Issues: [repository-url]/issues
- Hugging Face Forums: https://discuss.huggingface.co/
- Discord: [Your project Discord]
π License
MIT License - see LICENSE file for details
π Acknowledgments
- Built with Gradio
- Vector database: ChromaDB
- Embeddings: Sentence Transformers
- LLM: OpenAI
π Support
For issues and questions:
- GitHub Issues: [repository-url]/issues
- Email: support@your-domain.com
- Documentation: [repository-url]/wiki
π Changelog
v1.0.0 (2025-01-31)
- β Initial release
- β Base-RAG and Hier-RAG implementation
- β Three preset hierarchies (Hospital, Bank, Fluid Simulation)
- β Gradio UI and MCP server
- β Comprehensive evaluation suite
- β Full test coverage
- β HF Spaces deployment ready
Built with β€οΈ for the RAG community