shahbazdev0's picture
Update README.md
76d343a verified

A newer version of the Gradio SDK is available: 6.8.0

Upgrade
metadata
title: Hierarchical RAG Evaluation
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
license: mit

Hierarchical RAG Evaluation System

A comprehensive system for comparing Standard RAG vs Hierarchical RAG approaches, focusing on both accuracy and speed improvements through metadata-based filtering.

Features

  • Dual RAG Pipelines: Compare Base-RAG and Hier-RAG side-by-side
  • Hierarchical Classification: 3-level taxonomy (domain β†’ section β†’ topic)
  • Multiple Domains: Pre-configured hierarchies for Hospital, Banking, and Fluid Simulation
  • Comprehensive Evaluation: Quantitative metrics (Hit@k, MRR, latency) and qualitative testing
  • Gradio UI: User-friendly interface with API access
  • MCP Server: Additional API server for programmatic access

Architecture

User Query β†’ Hierarchical Filter β†’ Vector Search β†’ Re-ranking β†’ LLM Generation β†’ Answer
                    ↓
            (Hier-RAG only)

Quick Start

Prerequisites

  • Python 3.9+
  • OpenAI API key (for LLM generation)
  • 4GB+ RAM recommended

Installation

  1. Clone the repository:
git clone <repository-url>
cd hierarchical-rag-eval
  1. Create virtual environment:
python -m venv venv

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set environment variables:

Create a .env file in the project root:

OPENAI_API_KEY=your-openai-api-key-here
VECTOR_DB_PATH=./data/chroma
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
LLM_MODEL=gpt-3.5-turbo

Important: Never commit .env file to version control!

  1. Run the application:
python app.py

Access at http://localhost:7860


πŸš€ Deployment to Hugging Face Spaces

Step 1: Create Space

  1. Go to https://huggingface.co/spaces
  2. Click "Create new Space"
  3. Fill in details:
    • Owner: AP-UW (organization)
    • Space name: hierarchical-rag-eval
    • License: MIT
    • SDK: Gradio
    • Python version: 3.10
    • Visibility: Private

Step 2: Configure Persistent Storage

  1. Go to Space Settings β†’ Storage
  2. Enable Persistent Storage (FREE tier available)
  3. This ensures your vector database persists across restarts

Step 3: Add Secrets

  1. Go to Space Settings β†’ Repository Secrets
  2. Add the following secrets:
Secret Name Value Description
OPENAI_API_KEY sk-... Your OpenAI API key
VECTOR_DB_PATH /data/chroma Path to persistent storage
EMBEDDING_MODEL sentence-transformers/all-MiniLM-L6-v2 Embedding model
LLM_MODEL gpt-3.5-turbo OpenAI model

Note: Secrets are encrypted and not visible in logs.

Step 4: Prepare Code for Deployment

Update app.py to read from HF Spaces environment:

import os
from dotenv import load_dotenv

# Load .env for local development only
if not os.getenv("SPACE_ID"):  # SPACE_ID is set by HF Spaces
    load_dotenv()

# Verify API key
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("⚠️ OPENAI_API_KEY not found! Set it in Space Settings β†’ Secrets")

Step 5: Push to Hugging Face

# Add HF Space as remote
git remote add space https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval
git branch -M main

# Push code (will trigger automatic build)
git push space main

Step 6: Monitor Deployment

  1. Go to your Space URL: https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval
  2. Check Logs tab for build progress
  3. Wait for "Running" status (may take 5-10 minutes on first build)

Step 7: Verify Deployment

Test the deployed app:

from gradio_client import Client

client = Client("https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval")

# Initialize system
result = client.predict(api_name="/initialize")
print(result)  # Should show "System initialized successfully!"

πŸ”Œ MCP Server Usage

The MCP (Model Context Protocol) Server provides RESTful API access to all RAG functionalities.

Running MCP Server (Local)

# Terminal 1: Start MCP Server
python mcp_server.py

# Server will run at http://localhost:8000
# API docs available at http://localhost:8000/docs

Running MCP Server (Production)

Deploy separately to a hosting service:

Option 1: Railway

railway login
railway init
railway up

Option 2: Render

  1. Connect GitHub repo
  2. Set build command: pip install -r requirements.txt
  3. Set start command: uvicorn mcp_server:app --host 0.0.0.0 --port $PORT

Option 3: Docker

FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "mcp_server:app", "--host", "0.0.0.0", "--port", "8000"]

MCP API Endpoints

Health Check

curl http://localhost:8000/health

Response:

{"status": "healthy"}

Initialize System

curl -X POST http://localhost:8000/initialize \
  -H "Content-Type: application/json" \
  -d '{
    "persist_directory": "./data/chroma",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
  }'

Index Documents

curl -X POST http://localhost:8000/index \
  -H "Content-Type: application/json" \
  -d '{
    "filepaths": ["./docs/document1.pdf", "./docs/document2.txt"],
    "hierarchy": "hospital",
    "chunk_size": 512,
    "chunk_overlap": 50,
    "collection_name": "medical_docs"
  }'

Query RAG System

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the patient admission procedures?",
    "pipeline": "both",
    "n_results": 5,
    "auto_infer": true
  }'

Response:

{
  "query": "What are the patient admission procedures?",
  "base_rag": {
    "answer": "...",
    "retrieval_time": 0.052,
    "total_time": 1.234
  },
  "hier_rag": {
    "answer": "...",
    "retrieval_time": 0.031,
    "total_time": 0.987,
    "applied_filters": {"level1": "Clinical Care"}
  },
  "speedup": 1.25
}

System Information

curl http://localhost:8000/info

Python Client Example

import requests

# Base URL
BASE_URL = "http://localhost:8000"

# Initialize
response = requests.post(f"{BASE_URL}/initialize", json={
    "persist_directory": "./data/chroma"
})
print(response.json())

# Index documents
response = requests.post(f"{BASE_URL}/index", json={
    "filepaths": ["document.pdf"],
    "hierarchy": "hospital",
    "collection_name": "my_docs"
})
print(response.json())

# Query
response = requests.post(f"{BASE_URL}/query", json={
    "query": "What are KYC requirements?",
    "pipeline": "both",
    "n_results": 5
})
result = response.json()
print(f"Base-RAG: {result['base_rag']['answer']}")
print(f"Hier-RAG: {result['hier_rag']['answer']}")
print(f"Speedup: {result['speedup']:.2f}x")

πŸ“Š Evaluation Methodology

Dataset

We evaluate on three domain-specific query sets:

  1. Hospital Domain (n=5 queries)

    • Clinical Care, Quality & Safety, Education
    • Example: "What are the patient admission procedures?"
  2. Banking Domain (n=5 queries)

    • Retail Banking, Risk Management, Compliance
    • Example: "What are the KYC requirements?"
  3. Fluid Simulation Domain (n=5 queries)

    • Numerical Methods, Physical Models, Applications
    • Example: "How does the SIMPLE algorithm work?"

Metrics

Retrieval Metrics

  • Hit@k: Presence of at least one relevant document in top-k results

    • Formula: 1 if any(relevant_doc in top_k) else 0
    • Higher is better (max = 1.0)
  • Precision@k: Proportion of relevant documents in top-k

    • Formula: relevant_in_top_k / k
    • Range: 0.0 to 1.0
  • Recall@k: Proportion of relevant documents retrieved

    • Formula: relevant_in_top_k / total_relevant
    • Range: 0.0 to 1.0
  • MRR (Mean Reciprocal Rank): Average of reciprocal ranks

  • Formula: 1 / rank_of_first_relevant_doc

    • Range: 0.0 to 1.0

Performance Metrics

  • Retrieval Time: Time to fetch relevant documents from vector DB
  • Generation Time: Time for LLM to generate answer
  • Total Time: End-to-end query response time
  • Speedup: Ratio of Base-RAG to Hier-RAG total time
    • Formula: base_total_time / hier_total_time
    • 1.0 means Hier-RAG is faster

Quality Metrics

  • Semantic Similarity: Cosine similarity between generated answer and reference
    • Uses sentence-transformers embeddings
    • Range: 0.0 to 1.0

Evaluation Process

# Run evaluation via Gradio API
from gradio_client import Client

client = Client("http://localhost:7860")

result = client.predict(
    query_dataset="hospital",
    n_queries=10,
    k_values="1,3,5",
    api_name="/evaluate"
)

# Results saved to ./reports/evaluation_TIMESTAMP.csv

Sample Results

Hospital Domain Evaluation (5 queries)

Query Expected Domain Base Time (s) Hier Time (s) Speedup Filter Match
Patient admission procedures? Clinical Care 1.97 2.76 0.72x βœ… Clinical Care
Infection control policies? Quality & Safety 1.51 3.11 0.49x ⚠️ policy only
Medication error reporting? Quality & Safety 1.03 2.41 0.43x ⚠️ report only
Training for new nurses? Education 10.09 5.62 1.80x ❌ None
Emergency response procedures? Clinical Care 2.32 1.49 1.56x ❌ None

Average Speedup: 0.96x (Base-RAG and Hier-RAG roughly equal)

Key Findings

  1. When Hier-RAG Excels (1.5-2.3x faster):

    • βœ… Query matches hierarchy taxonomy well
    • βœ… Auto-inference correctly identifies domain
    • βœ… Filtered subset is significantly smaller (<30% of corpus)
    • Example: "Training for new nurses" β†’ 1.80x speedup
  2. When Hier-RAG Underperforms (<1.0x):

    • ❌ Auto-inference fails or misclassifies domain
    • ❌ Query is too general/cross-domain
    • ❌ Filter overhead exceeds retrieval time savings
    • Example: "Infection control policies" β†’ 0.49x speedup
  3. Auto-Inference Accuracy:

    • Hospital domain: 40% (2/5 queries correctly classified)
    • Needs improvement via LLM-based classification
  4. Retrieval Time Improvement:

    • When filters applied correctly: 30-60% faster retrieval
    • Overall average: 15% faster retrieval (including misses)

Fluid Simulation Domain Evaluation (5 queries)

Query Expected Domain Base Time (s) Hier Time (s) Speedup
How does SIMPLE algorithm work? Numerical Methods 1.45 3.69 0.39x
What turbulence models available? Physical Models 1.60 1.37 1.16x
Set up cavity flow benchmark? Validation 4.46 2.40 1.86x
Mesh generation techniques? Numerical Methods 2.64 2.87 0.92x
Enable parallel computing? Software & Tools 5.51 2.35 2.34x

Average Speedup: 1.33x (Hier-RAG 33% faster on average)

Visualization

To generate evaluation charts:

# Add to your evaluation workflow
import matplotlib.pyplot as plt
import pandas as pd

def generate_evaluation_charts(csv_path):
    """Generate comprehensive evaluation visualizations."""
    df = pd.read_csv(csv_path)
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    fig.suptitle('Base-RAG vs Hier-RAG Performance Comparison', fontsize=16)
    
    # Chart 1: Average Total Time
    times = df[['base_total_time', 'hier_total_time']].mean()
    axes[0, 0].bar(['Base-RAG', 'Hier-RAG'], times, color=['#3498db', '#e74c3c'])
    axes[0, 0].set_ylabel('Time (seconds)')
    axes[0, 0].set_title('Average Total Query Time')
    axes[0, 0].grid(axis='y', alpha=0.3)
    
    # Chart 2: Speedup Distribution
    axes[0, 1].hist(df['speedup'], bins=10, color='#2ecc71', edgecolor='black')
    axes[0, 1].axvline(1.0, color='red', linestyle='--', label='No improvement')
    axes[0, 1].set_xlabel('Speedup Factor')
    axes[0, 1].set_ylabel('Frequency')
    axes[0, 1].set_title('Speedup Distribution')
    axes[0, 1].legend()
    
    # Chart 3: Retrieval Time Comparison
    axes[1, 0].scatter(df['base_retrieval_time'], df['hier_retrieval_time'], 
                       s=100, alpha=0.6, color='#9b59b6')
    max_val = max(df['base_retrieval_time'].max(), df['hier_retrieval_time'].max())
    axes[1, 0].plot([0, max_val], [0, max_val], 'r--', label='Equal performance')
    axes[1, 0].set_xlabel('Base-RAG Retrieval Time (s)')
    axes[1, 0].set_ylabel('Hier-RAG Retrieval Time (s)')
    axes[1, 0].set_title('Retrieval Time Comparison')
    axes[1, 0].legend()
    axes[1, 0].grid(alpha=0.3)
    
    # Chart 4: Query-wise Speedup
    axes[1, 1].barh(range(len(df)), df['speedup'], color='#f39c12')
    axes[1, 1].axvline(1.0, color='red', linestyle='--', linewidth=2)
    axes[1, 1].set_xlabel('Speedup Factor')
    axes[1, 1].set_ylabel('Query Index')
    axes[1, 1].set_title('Per-Query Speedup')
    axes[1, 1].grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(csv_path.replace('.csv', '_charts.png'), dpi=300, bbox_inches='tight')
    print(f"πŸ“Š Charts saved to: {csv_path.replace('.csv', '_charts.png')}")

# Usage
generate_evaluation_charts('./reports/evaluation_20251030_012814.csv')

πŸ”§ Using the API with gradio_client

Installation

pip install gradio_client

Basic Usage

from gradio_client import Client

# Connect to local instance
client = Client("http://localhost:7860")

# Or connect to deployed HF Space
client = Client("https://huggingface.co/spaces/AP-UW/hierarchical-rag-eval")

Complete Workflow Example

from gradio_client import Client
import time

# Initialize client
client = Client("http://localhost:7860")

# Step 1: Initialize system
print("1️⃣ Initializing system...")
result = client.predict(api_name="/initialize")
print(result)

# Step 2: Upload and validate documents
print("\n2️⃣ Validating documents...")
status, preview, stats = client.predict(
    files=["./docs/hospital_policy.pdf", "./docs/procedures.txt"],
    hierarchy_choice="hospital",
    mask_pii=False,
    api_name="/upload"
)
print(f"Status: {status}")
print(f"Stats: {stats}")

# Step 3: Build RAG index
print("\n3️⃣ Building RAG index...")
build_status, build_stats = client.predict(
    files=["./docs/hospital_policy.pdf", "./docs/procedures.txt"],
    hierarchy="hospital",
    chunk_size=512,
    chunk_overlap=50,
    mask_pii=False,
    collection_name="hospital_docs",
    api_name="/build"
)
print(f"Build Status: {build_status}")
print(f"Indexed Chunks: {build_stats.get('Total Chunks', 0)}")

# Step 4: Search with both pipelines
print("\n4️⃣ Querying RAG system...")
answer, contexts, metadata = client.predict(
    query="What are the patient admission procedures?",
    pipeline="Both",
    n_results=5,
    level1="",
    level2="",
    level3="",
    doc_type="",
    auto_infer=True,
    api_name="/search"
)
print(f"Answer:\n{answer}\n")
print(f"Metadata:\n{metadata}")

# Step 5: Run evaluation
print("\n5️⃣ Running evaluation...")
summary, csv_path, json_path = client.predict(
    query_dataset="hospital",
    n_queries=5,
    k_values="1,3,5",
    api_name="/evaluate"
)
print(summary)
print(f"\nResults saved to:\n- {csv_path}\n- {json_path}")

Batch Processing Example

from gradio_client import Client
import pandas as pd

client = Client("http://localhost:7860")

# Initialize
client.predict(api_name="/initialize")

# Build index for multiple document sets
document_sets = {
    "hospital_policies": ["./docs/policy1.pdf", "./docs/policy2.pdf"],
    "clinical_protocols": ["./docs/protocol1.txt", "./docs/protocol2.txt"],
    "training_manuals": ["./docs/manual1.pdf", "./docs/manual2.pdf"]
}

for collection_name, files in document_sets.items():
    print(f"Building index for: {collection_name}")
    status, stats = client.predict(
        files=files,
        hierarchy="hospital",
        collection_name=collection_name,
        api_name="/build"
    )
    print(f"βœ… {stats.get('Total Chunks', 0)} chunks indexed")

# Query multiple collections
queries = [
    "What are admission procedures?",
    "How to handle medication errors?",
    "What training is required for nurses?"
]

results = []
for query in queries:
    answer, contexts, metadata = client.predict(
        query=query,
        pipeline="Both",
        api_name="/search"
    )
    results.append({
        "query": query,
        "answer": answer[:200],  # First 200 chars
        "metadata": metadata
    })

# Save results
df = pd.DataFrame(results)
df.to_csv("batch_query_results.csv", index=False)

πŸ› Troubleshooting

Common Issues

1. OpenAI API Errors

Problem: Error generating answer: Incorrect API key provided

Solution:

# Check if API key is set
echo $OPENAI_API_KEY  # Mac/Linux
echo %OPENAI_API_KEY%  # Windows

# If empty, add to .env file
OPENAI_API_KEY=your-key-here

# For HF Spaces, add to Repository Secrets

2. ChromaDB Persistence Issues

Problem: sqlite3.OperationalError: database is locked

Solution:

# In core/index.py - use simpler client initialization
self.client = chromadb.PersistentClient(path=persist_directory)

# Or use EphemeralClient for testing (no persistence)
self.client = chromadb.EphemeralClient()

3. Memory Errors with Large PDFs

Problem: MemoryError or Killed when processing large documents

Solution:

# Reduce batch size in core/index.py
def add_documents(self, chunks, batch_size=50):  # Reduced from 100
    # Process in smaller batches

4. Slow Embedding Generation

Problem: Embedding generation takes >30 seconds

Solution:

# Use smaller embedding model in .env
EMBEDDING_MODEL=all-MiniLM-L6-v2  # Faster, 384 dimensions

# Or use OpenAI embeddings
EMBEDDING_MODEL=openai:text-embedding-3-small

5. Gradio API Connection Timeout

Problem: gradio_client times out when connecting

Solution:

from gradio_client import Client

# Increase timeout
client = Client("http://localhost:7860", timeout=120)

# Or check if server is running
import requests
response = requests.get("http://localhost:7860")
print(response.status_code)  # Should be 200

6. HF Spaces Build Failure

Problem: Space shows "Build Failed" status

Solution:

  1. Check requirements.txt for incompatible versions
  2. View build logs in Space β†’ Logs tab
  3. Common fix: Pin exact versions
# requirements.txt
torch==2.1.0  # Pin specific version
transformers==4.35.0
gradio==4.44.0

7. Evaluation Results Inconsistent

Problem: Speedup values sometimes <1.0 or highly variable

Solution:

  • Run evaluation multiple times and average results
  • Increase warmup queries before evaluation
  • Check if auto-inference is working correctly
# Add warmup queries
for _ in range(3):
    rag_comparator.compare("warmup query", n_results=5)

# Then run actual evaluation

Debug Mode

Enable verbose logging:

# Add to app.py
import logging

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('app.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)
logger.debug("Debug mode enabled")

Health Check Endpoints

Test system components:

# Add to app.py for debugging
def system_health_check():
    """Check if all components are working."""
    checks = {}
    
    # Check 1: OpenAI API
    try:
        import openai
        client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        client.models.list()
        checks["openai_api"] = "βœ… Connected"
    except Exception as e:
        checks["openai_api"] = f"❌ {str(e)}"
    
    # Check 2: Vector DB
    try:
        if index_manager:
            stats = index_manager.stores.get("rag_documents")
            checks["vector_db"] = "βœ… Initialized"
        else:
            checks["vector_db"] = "⚠️ Not initialized"
    except Exception as e:
        checks["vector_db"] = f"❌ {str(e)}"
    
    # Check 3: Embedding Model
    try:
        from core.index import EmbeddingModel
        model = EmbeddingModel()
        test_embedding = model.embed_query("test")
        checks["embedding_model"] = f"βœ… Loaded ({len(test_embedding)} dims)"
    except Exception as e:
        checks["embedding_model"] = f"❌ {str(e)}"
    
    return checks

# Add button to UI
with gr.Tab("System Health"):
    health_btn = gr.Button("Check System Health")
    health_output = gr.JSON(label="Health Status")
    health_btn.click(system_health_check, outputs=health_output)

πŸ“š Additional Resources

Documentation

Tutorials

Community


πŸ“„ License

MIT License - see LICENSE file for details


πŸ™ Acknowledgments


πŸ“ž Support

For issues and questions:


πŸ“ˆ Changelog

v1.0.0 (2025-01-31)

  • βœ… Initial release
  • βœ… Base-RAG and Hier-RAG implementation
  • βœ… Three preset hierarchies (Hospital, Bank, Fluid Simulation)
  • βœ… Gradio UI and MCP server
  • βœ… Comprehensive evaluation suite
  • βœ… Full test coverage
  • βœ… HF Spaces deployment ready

Built with ❀️ for the RAG community