GraphResearcher / README.md
yugbirla's picture
Complete 15-question evaluation against Vectorless RAG Master Guide
9d3f611
|
Raw
History Blame Contribute Delete
6.43 kB
metadata
title: GraphResearcher
emoji: πŸ“š
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit

GraphResearcher

Python syntax check

Citation-grounded document intelligence platform combining RAG with graph-based retrieval.

Live demo: yugbirla-graphresearcher.hf.space Β· App UI Β· Admin


Problem

PDF chatbots typically generate answers without showing where the information came from. Users cannot verify whether the output is grounded in the document. GraphResearcher addresses this by combining chunk-based retrieval, a document knowledge graph, source-level citation, and a verification UI that lets users trace every claim back to a specific page and chunk.


Architecture

User Browser
    β”‚  Upload / Ask / Compare / Feedback
    β–Ό
FastAPI Backend
    β”œβ”€β”€ Ingestion: PDF parsing β†’ chunking β†’ metadata preservation
    β”œβ”€β”€ Retrieval: hybrid (BM25 + vector) β†’ cross-encoder reranking β†’ graph fusion
    β”œβ”€β”€ GraphRAG: entity extraction β†’ relation extraction β†’ graph storage β†’ graph-guided retrieval
    β”œβ”€β”€ Generation: evidence extraction β†’ grounded prompt β†’ LLM answer β†’ citation attachment
    β”œβ”€β”€ Product: app UI, source viewer, document comparison, feedback, admin monitoring
    └── Storage: runtime filesystem, SQLite, optional HF Dataset backup

GraphRAG Algorithm

Entities are extracted from processed document chunks using rule-based pattern matching: capitalized multi-word phrases and uppercase acronyms are identified, normalized by stripping punctuation and deduplicating via lowercased entity IDs, and classified as CONCEPT, ACRONYM, ORGANIZATION, or TECHNICAL_TERM. Noisy candidates (stopwords, overly short/long strings) are filtered using a quality module. Relations are constructed by co-occurrence: when two or more entities appear in the same sentence, an edge is created between each pair, with the relation type inferred from verb phrases (e.g., "uses" β†’ USES, "reduces" β†’ REDUCES) or defaulting to RELATED_TO. Edge weights increment with repeated co-occurrence across chunks.

During answering, query terms are tokenized and matched against graph entity names and IDs using exact and substring scoring. The top-k matched entities and their neighboring relations are retrieved, and every chunk ID linked to those entities/relations receives a graph score based on entity mention count and relation weight. These graph-scored chunks are fused with normal hybrid retrieval results: chunks appearing in both lists get a score boost; graph-only chunks are appended. The fused set is re-sorted by score and truncated to the final top-k. The LLM prompt then includes both the standard evidence context and a structured graph context block listing matched entities (with types, mention counts, and pages) and relations (with types and weights).


Evaluation

The project includes an ablation evaluation framework comparing RAG only vs. RAG + Graph retrieval.

Metrics computed:

  • Recall@K (K=3,5,10): fraction of manually labeled gold chunk IDs retrieved in top K
  • Estimated faithfulness: automatic heuristic checking whether answer sentences are supported by retrieved source text (not human judgment)
  • Answer completeness: fraction of expected gold terms present in the answer
  • Latency: end-to-end request time

15-Question Starter Evaluation (Vectorless RAG Master Guide)

Generated: 2026-06-18 QA file: eval/qa_15_starter.jsonl

Mode Recall@3 Recall@5 Recall@10 Faithfulness Completeness Avg Latency (ms) Avg Answer Words Errors
RAG 0.2944 0.4722 0.6389 0.9333 0.7167 7388.7 238.5 0
RAG + Graph 0.2944 0.4722 0.6389 0.9333 0.7167 7114.3 238.5 0

Conclusion: On this specific 15-question dataset, the rule-based GraphRAG implementation had no significant effect on retrieval recall or estimated faithfulness compared to standard hybrid retrieval with reranking. Latency was similar.


Setup

git clone https://github.com/yug-birla/Graph-Researcher.git
cd Graph-Researcher/backend

python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Linux/Mac

pip install -r requirements.txt
cp .env.example .env         # edit with your keys
uvicorn app.main:app --reload

Open: http://127.0.0.1:8000/app


Environment Variables

See .env.example for the full list. Key variables:

Variable Purpose
LLM_PROVIDER huggingface, local, or disabled
HF_API_TOKEN Hugging Face API token for inference
HF_INFERENCE_MODEL Model name (default: Qwen/Qwen3-4B-Instruct-2507)
ADMIN_EMAILS Comma-separated admin email allowlist
ADMIN_DASHBOARD_KEY Password for /admin/secure
SESSION_SECRET_KEY Session middleware secret
HF_FEEDBACK_DATASET Optional HF dataset for permanent feedback backup
HF_FEEDBACK_TOKEN HF write token for feedback backup

Limitations

  • Temporary storage on HF Spaces. Uploaded documents may disappear after runtime restart unless persistent storage is configured.
  • Rule-based entity extraction. The graph layer uses regex patterns, not a trained NER model. It may miss entities that don't follow capitalization conventions and may extract noisy candidates.
  • Heuristic faithfulness. The automatic faithfulness metric is a bag-of-words overlap check, not a semantic entailment model. It can overestimate faithfulness for topically relevant but factually incorrect answers.
  • No proven GraphRAG improvement yet. The ablation framework exists but has not been run with verified gold labels. Whether graph-guided retrieval improves results on real documents is an open question until evaluation is completed.
  • OCR. Scanned PDFs require stronger OCR support than currently provided.
  • Single-user storage. Production-level per-user persistent document workspaces are future work.

License

MIT