graphrag-doctor / README.md
raayraay's picture
Update README.md
86d31c8 verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: GraphRAG Doctor
emoji: πŸ•ΈοΈ
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false

πŸ•ΈοΈ GraphRAG Doctor

Python 3.9+ License: MIT Gradio Hugging Face Spaces Code style: black

Enterprise-grade hallucination detection and forensic analysis for GraphRAG (Graph Retrieval-Augmented Generation) systems.

GraphRAG Doctor analyzes LLM-generated responses against retrieved knowledge graph contexts (community summaries, entity triples, and structural metadata) to detect hallucinations, quantify groundedness, and provide actionable diagnostics for pipeline optimization.

Open in Spaces

πŸš€ Features

  • πŸ” Multi-Granularity Analysis: Evaluates evidence at sentence-level with semantic similarity scoring
  • πŸ“Š Visual Heatmaps: Interactive color-coded response highlighting (Green/Yellow/Red evidence zones)
  • 🧠 Dual Model Support: Toggle between Speed (MiniLM-L6, 30MB) and Accuracy (MPNet-Base, 400MB) embeddings
  • ⚑ Production Optimized: Async processing, model caching, and memory-efficient batch encoding
  • πŸ“ˆ Comprehensive Metrics: Graph relevance, groundedness scores, entity drift detection
  • πŸ”’ Enterprise Security: Input sanitization, HTML escaping, and structured logging
  • ☁️ HF Spaces Optimized: Defaults to CPU (auto-detects GPU), memory-safe concurrency limits, Secrets integration

πŸ“‹ Table of Contents

⚑ Quick Start

Option A: Deploy to Hugging Face Spaces (Recommended)

  1. Click "Duplicate this Space" or create new Space with SDK gradio
  2. Upload app.py and requirements.txt
  3. Set hardware to CPU (default) or GPU (Settings β†’ Hardware)
  4. Your app is live at https://[username]-[space-name].hf.space

Option B: Local Development

# Clone repository
git clone [https://github.com/your-org/graphrag-doctor.git](https://github.com/your-org/graphrag-doctor.git)
cd graphrag-doctor

# Install dependencies (see requirements.txt)
pip install gradio sentence-transformers numpy pydantic-settings

# Run locally
python app.py

Access at http://localhost:7860

πŸ€— Hugging Face Spaces

Secrets Configuration

In your Space Settings β†’ Secrets (not .env files):

Secret Value Description
DEBUG false Enable verbose JSON logging
DEVICE cpu Force CPU (auto-detected if GPU available)
MAX_CONCURRENT_REQUESTS 3 Limit parallel requests (memory safety)

Hardware Requirements

  • Free Tier (CPU): Works perfectly with "speed" model. First model load takes ~30s (cached thereafter).
  • GPU Tier: Auto-detected, enables "accuracy" model for better quality.

Files Structure

β”œβ”€β”€ app.py              # Main application
β”œβ”€β”€ requirements.txt    # Dependencies
└── README.md           # This file

πŸ’» Usage

Basic Workflow

  1. Input Question: The original user query sent to your GraphRAG system
  2. Input Graph Context: Paste the retrieved context (community summaries, entity descriptions, or knowledge triples)
  3. Input Generated Answer: The LLM response to analyze
  4. Configure Thresholds:
  • Grounded Threshold (default 0.6): Minimum cosine similarity for "GROUNDED" classification
  • Weak Threshold (default 0.4): Minimum similarity for "UNCERTAIN" classification
  • Below 0.4 = "HALLUCINATION"

Input Formats

Community Summaries (Microsoft GraphRAG format):

### Community 1
The entity SpaceX is related to Elon Musk through founded_by relationship...

### Community 2
Tesla Inc. operates in the automotive sector with focus on electric vehicles...

Entity/Triple Format:

Entity: SpaceX (Organization)
Entity: Elon Musk (Person)
[SpaceX] -FOUNDED_BY-> [Elon Musk]
[SpaceX] -HEADQUARTERS-> [Hawthorne, CA]

Raw Text: Paragraphs separated by double newlines

Interpreting Results

Metric Description Healthy Range
Graph Relevance Max similarity between question and graph context > 0.6
Groundedness % of sentences with evidence support > 80%
System Health Overall pipeline status Healthy

Evidence Status Colors:

  • 🟒 Green: Grounded (similarity β‰₯ 0.6) - Evidence found in graph context
  • 🟑 Yellow: Uncertain (0.4 ≀ similarity < 0.6) - Weak evidence, possible paraphrasing
  • πŸ”΄ Red: Hallucination (< 0.4) - No supporting evidence detected

πŸ—οΈ Architecture

graph TD
    A[User Input] --> B{Input Validation}
    B -->|Sanitize| C[Text Normalizer]
    C --> D[Graph Parser]
    D -->|Extract| E[Graph Elements]
    E --> F[Embedding Engine]
    F -->|Async Load| G[Model Manager]
    G -->|Cache| H[SentenceTransformers]
    H --> I[Similarity Matrix]
    I --> J[Evidence Analyzer]
    J -->|Score| K[Sentence Analysis]
    K --> L[Diagnosis Engine]
    L -->|Generate| M[Diagnostics]
    M --> N[HTML Renderer]
    N --> O[Gradio UI]
    
    style A fill:#e1f5ff
    style O fill:#d4edda
    style G fill:#fff3cd

Component Overview

Component Responsibility
ModelManager Singleton model cache with LRU eviction (max 1 model for HF Spaces memory)
TextNormalizer Input validation, sentence segmentation, graph parsing
EvidenceAnalyzer Async embedding generation, similarity computation, scoring logic
HTMLRenderer Secure HTML generation, XSS prevention, responsive layouts
GraphRAGDoctorApp Request orchestration, semaphore concurrency, error handling

Tech Stack

  • Backend: Python 3.9+, Asyncio, Pydantic Settings v2
  • ML: SentenceTransformers, PyTorch (CPU/GPU auto-detect)
  • Frontend: Gradio 4.0+
  • Validation: Pydantic with environment variable support

βš™οΈ Configuration

Environment Variables

Set via Hugging Face Space Secrets (or .env locally):

Variable Default Description
DEBUG false Enable verbose JSON logging
DEVICE cpu/cuda Compute device (auto-detected)
MAX_CONCURRENT_REQUESTS 3 Parallel analysis limit (set to 3 for HF Spaces CPU)
MAX_INPUT_LENGTH 50000 Character limit per input field
BATCH_SIZE 32 Encoding batch size
DEFAULT_MODEL speed Default embedding model

Threshold Tuning Guide

High Precision Mode (minimize false positives):

GREEN_THRESHOLD = 0.75
YELLOW_THRESHOLD = 0.50

High Recall Mode (catch all potential hallucinations):

GREEN_THRESHOLD = 0.50
YELLOW_THRESHOLD = 0.30

HF Spaces Note: On CPU, stick with "speed" model (MiniLM-L6) for <2s response times. Use "accuracy" (MPNet) only on GPU tier.

πŸ› Troubleshooting

Model Loading on HF Spaces

Symptom: Timeout on first request

Solution:

  • First model download takes 30-60s on Spaces (cached for subsequent requests)
  • If timeout persists, upgrade to CPU Upgrade tier or use speed model only

Low GPU Memory on Spaces

Symptom: CUDA out of memory

Solution:

  1. Switch to CPU: Set DEVICE=cpu in Secrets
  2. Or reduce BATCH_SIZE to 16 in code
  3. Ensure only one model loaded at a time (default behavior)

High Latency on Free Tier

Optimization:

  • Use speed model (default): ~100-300ms per analysis
  • Avoid long inputs (>1000 characters) which increase embedding time
  • Enable Persistent Storage in Space Settings to cache downloaded models

Import Error: pydantic_settings

Symptom: ModuleNotFoundError: No module named 'pydantic_settings'

Solution: Ensure requirements.txt includes:

pydantic>=2.0.0
pydantic-settings>=2.0.0

Low Graph Relevance Scores

Symptom: Graph Relevance consistently < 0.4

Diagnostics:

  • Verify GraphRAG retrieval is working (check community detection levels)
  • Ensure context includes entity summaries, not just raw text
  • Tune similarity thresholds based on your domain's semantic similarity distributions

🀝 Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

πŸ“„ License

Distributed under the MIT License. See LICENSE for more information.

πŸ™ Acknowledgments


Built with ❀️ for the GraphRAG community