Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.12.0
title: GraphRAG Doctor
emoji: πΈοΈ
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
πΈοΈ GraphRAG Doctor
Enterprise-grade hallucination detection and forensic analysis for GraphRAG (Graph Retrieval-Augmented Generation) systems.
GraphRAG Doctor analyzes LLM-generated responses against retrieved knowledge graph contexts (community summaries, entity triples, and structural metadata) to detect hallucinations, quantify groundedness, and provide actionable diagnostics for pipeline optimization.
π Features
- π Multi-Granularity Analysis: Evaluates evidence at sentence-level with semantic similarity scoring
- π Visual Heatmaps: Interactive color-coded response highlighting (Green/Yellow/Red evidence zones)
- π§ Dual Model Support: Toggle between Speed (MiniLM-L6, 30MB) and Accuracy (MPNet-Base, 400MB) embeddings
- β‘ Production Optimized: Async processing, model caching, and memory-efficient batch encoding
- π Comprehensive Metrics: Graph relevance, groundedness scores, entity drift detection
- π Enterprise Security: Input sanitization, HTML escaping, and structured logging
- βοΈ HF Spaces Optimized: Defaults to CPU (auto-detects GPU), memory-safe concurrency limits, Secrets integration
π Table of Contents
β‘ Quick Start
Option A: Deploy to Hugging Face Spaces (Recommended)
- Click "Duplicate this Space" or create new Space with SDK
gradio - Upload
app.pyandrequirements.txt - Set hardware to CPU (default) or GPU (Settings β Hardware)
- Your app is live at
https://[username]-[space-name].hf.space
Option B: Local Development
# Clone repository
git clone [https://github.com/your-org/graphrag-doctor.git](https://github.com/your-org/graphrag-doctor.git)
cd graphrag-doctor
# Install dependencies (see requirements.txt)
pip install gradio sentence-transformers numpy pydantic-settings
# Run locally
python app.py
Access at http://localhost:7860
π€ Hugging Face Spaces
Secrets Configuration
In your Space Settings β Secrets (not .env files):
| Secret | Value | Description |
|---|---|---|
DEBUG |
false |
Enable verbose JSON logging |
DEVICE |
cpu |
Force CPU (auto-detected if GPU available) |
MAX_CONCURRENT_REQUESTS |
3 |
Limit parallel requests (memory safety) |
Hardware Requirements
- Free Tier (CPU): Works perfectly with "speed" model. First model load takes ~30s (cached thereafter).
- GPU Tier: Auto-detected, enables "accuracy" model for better quality.
Files Structure
βββ app.py # Main application
βββ requirements.txt # Dependencies
βββ README.md # This file
π» Usage
Basic Workflow
- Input Question: The original user query sent to your GraphRAG system
- Input Graph Context: Paste the retrieved context (community summaries, entity descriptions, or knowledge triples)
- Input Generated Answer: The LLM response to analyze
- Configure Thresholds:
- Grounded Threshold (default 0.6): Minimum cosine similarity for "GROUNDED" classification
- Weak Threshold (default 0.4): Minimum similarity for "UNCERTAIN" classification
- Below 0.4 = "HALLUCINATION"
Input Formats
Community Summaries (Microsoft GraphRAG format):
### Community 1
The entity SpaceX is related to Elon Musk through founded_by relationship...
### Community 2
Tesla Inc. operates in the automotive sector with focus on electric vehicles...
Entity/Triple Format:
Entity: SpaceX (Organization)
Entity: Elon Musk (Person)
[SpaceX] -FOUNDED_BY-> [Elon Musk]
[SpaceX] -HEADQUARTERS-> [Hawthorne, CA]
Raw Text: Paragraphs separated by double newlines
Interpreting Results
| Metric | Description | Healthy Range |
|---|---|---|
| Graph Relevance | Max similarity between question and graph context | > 0.6 |
| Groundedness | % of sentences with evidence support | > 80% |
| System Health | Overall pipeline status | Healthy |
Evidence Status Colors:
- π’ Green: Grounded (similarity β₯ 0.6) - Evidence found in graph context
- π‘ Yellow: Uncertain (0.4 β€ similarity < 0.6) - Weak evidence, possible paraphrasing
- π΄ Red: Hallucination (< 0.4) - No supporting evidence detected
ποΈ Architecture
graph TD
A[User Input] --> B{Input Validation}
B -->|Sanitize| C[Text Normalizer]
C --> D[Graph Parser]
D -->|Extract| E[Graph Elements]
E --> F[Embedding Engine]
F -->|Async Load| G[Model Manager]
G -->|Cache| H[SentenceTransformers]
H --> I[Similarity Matrix]
I --> J[Evidence Analyzer]
J -->|Score| K[Sentence Analysis]
K --> L[Diagnosis Engine]
L -->|Generate| M[Diagnostics]
M --> N[HTML Renderer]
N --> O[Gradio UI]
style A fill:#e1f5ff
style O fill:#d4edda
style G fill:#fff3cd
Component Overview
| Component | Responsibility |
|---|---|
ModelManager |
Singleton model cache with LRU eviction (max 1 model for HF Spaces memory) |
TextNormalizer |
Input validation, sentence segmentation, graph parsing |
EvidenceAnalyzer |
Async embedding generation, similarity computation, scoring logic |
HTMLRenderer |
Secure HTML generation, XSS prevention, responsive layouts |
GraphRAGDoctorApp |
Request orchestration, semaphore concurrency, error handling |
Tech Stack
- Backend: Python 3.9+, Asyncio, Pydantic Settings v2
- ML: SentenceTransformers, PyTorch (CPU/GPU auto-detect)
- Frontend: Gradio 4.0+
- Validation: Pydantic with environment variable support
βοΈ Configuration
Environment Variables
Set via Hugging Face Space Secrets (or .env locally):
| Variable | Default | Description |
|---|---|---|
DEBUG |
false |
Enable verbose JSON logging |
DEVICE |
cpu/cuda |
Compute device (auto-detected) |
MAX_CONCURRENT_REQUESTS |
3 |
Parallel analysis limit (set to 3 for HF Spaces CPU) |
MAX_INPUT_LENGTH |
50000 |
Character limit per input field |
BATCH_SIZE |
32 |
Encoding batch size |
DEFAULT_MODEL |
speed |
Default embedding model |
Threshold Tuning Guide
High Precision Mode (minimize false positives):
GREEN_THRESHOLD = 0.75
YELLOW_THRESHOLD = 0.50
High Recall Mode (catch all potential hallucinations):
GREEN_THRESHOLD = 0.50
YELLOW_THRESHOLD = 0.30
HF Spaces Note: On CPU, stick with "speed" model (MiniLM-L6) for <2s response times. Use "accuracy" (MPNet) only on GPU tier.
π Troubleshooting
Model Loading on HF Spaces
Symptom: Timeout on first request
Solution:
- First model download takes 30-60s on Spaces (cached for subsequent requests)
- If timeout persists, upgrade to CPU Upgrade tier or use
speedmodel only
Low GPU Memory on Spaces
Symptom: CUDA out of memory
Solution:
- Switch to CPU: Set
DEVICE=cpuin Secrets - Or reduce
BATCH_SIZEto 16 in code - Ensure only one model loaded at a time (default behavior)
High Latency on Free Tier
Optimization:
- Use speed model (default): ~100-300ms per analysis
- Avoid long inputs (>1000 characters) which increase embedding time
- Enable Persistent Storage in Space Settings to cache downloaded models
Import Error: pydantic_settings
Symptom: ModuleNotFoundError: No module named 'pydantic_settings'
Solution: Ensure requirements.txt includes:
pydantic>=2.0.0
pydantic-settings>=2.0.0
Low Graph Relevance Scores
Symptom: Graph Relevance consistently < 0.4
Diagnostics:
- Verify GraphRAG retrieval is working (check community detection levels)
- Ensure context includes entity summaries, not just raw text
- Tune similarity thresholds based on your domain's semantic similarity distributions
π€ Contributing
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
π License
Distributed under the MIT License. See LICENSE for more information.
π Acknowledgments
- Sentence-Transformers for embedding models
- Gradio for the interactive UI framework
- Microsoft GraphRAG for the inspiration
- Hugging Face Spaces for free hosting
Built with β€οΈ for the GraphRAG community