docchat-backend / README.md
csabhay's picture
feat: switch retrieval to knowledge-graph GraphRAG
45e4316
metadata
title: DocChat Backend
emoji: πŸ“„
colorFrom: green
colorTo: blue
sdk: docker
pinned: false

Querify

RAG-powered document chat. Upload a PDF, TXT, or CSV and ask questions against its content using a conversational interface.

Features

  • Chunked document ingestion with configurable size and overlap
  • Knowledge graph indexing (entities + relations) for GraphRAG retrieval
  • Retrieval-augmented generation using Qwen, Llama, or Mistral via Hugging Face Inference API
  • Auto-recommended chunking parameters based on document size and type
  • Async graph-index build progress tracking

Project Structure

doc_ingestion/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py          # FastAPI routes
β”‚   β”œβ”€β”€ ingestion.py     # File handling and metadata
β”‚   β”œβ”€β”€ parser.py        # Text extraction (PDF, TXT, CSV)
β”‚   └── rag.py           # Chunking, graph indexing, retrieval, generation
β”œβ”€β”€ frontend/
β”‚   └── app.py           # Streamlit UI
β”œβ”€β”€ Dockerfile
└── requirements.txt

Quick Start

cd doc_ingestion
pip install -r requirements.txt

Set environment variables:

export HF_TOKEN=your_hf_token           # LLM inference

Start the backend:

uvicorn backend.main:app --reload --port 8000

Start the frontend (separate terminal):

cd frontend
streamlit run app.py

Open http://localhost:8501, upload a document, and start chatting.

API Reference

Method Endpoint Description
POST /upload Upload and ingest a file
POST /ingest_text Ingest raw text directly
GET /ingest_status Poll graph-index progress
GET /graph_store_status Graph index health and stats
POST /query Submit a question

Interactive docs available at http://localhost:8000/docs.

Configuration

Parameter Default Description
chunk_size 180 Characters per chunk
chunk_overlap 40 Overlap between adjacent chunks
top_k 3 Chunks retrieved per query