ayushKishor's picture
Add Pluto memory layer and pipeline fixes
23cdeed
metadata
title: Pluto Pipeline
emoji: πŸ“„
colorFrom: gray
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false

Pluto: Real Mode-Switching Extraction Pipeline

Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.

The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.

Why Pluto

Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.

Key goals:

  • query only the relevant parts of a document corpus
  • switch model behavior by chunk type and task difficulty
  • keep document processing reusable across multiple questions
  • surface evidence, agent activity, and confidence to the user
  • support scoped queries to one selected corpus document or the full corpus

What The App Does

  • uploads PDF, DOCX/DOC, TXT, and MD files into a local corpus
  • converts uploaded files to Markdown and chunks them for retrieval
  • classifies chunks as text, table, figure, code, references, and more
  • runs a staged pipeline: Route -> Extract -> Merge -> EvidenceCheck
  • streams live status updates through Server-Sent Events
  • returns a final answer with sections, evidence, trace, confidence, and gaps
  • compares Pluto against a simpler single-model baseline in the benchmark panel

Architecture

flowchart LR
    A["Frontend Dashboard"] --> B["FastAPI Server"]
    B --> C["Upload + Corpus APIs"]
    B --> D["PipelineRunner"]
    D --> E["S0 Route"]
    D --> F["S1 Extract"]
    D --> G["S2 Merge"]
    D --> H["S3 EvidenceCheck"]
    C --> I["DocIndex"]
    C --> J["Corpus Files"]
    F --> K["ExtractionCache"]
    D --> L["Tracer + MessageBus"]
    B --> M["SSE Progress Stream"]

Pipeline Overview

Pluto operates in two broad phases:

  1. Document understanding
  2. Query-time extraction and answer synthesis

At query time the main flow is:

  1. S0 Route Picks relevant chunks, applies document scope, and assigns a processing mode.
  2. S1 Extract Extracts structured claims from selected chunks and reuses cached extraction results when possible.
  3. S2 Merge Combines claims into answer sections, open gaps, and key claims.
  4. S3 EvidenceCheck Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.

Tech Stack

  • Backend: FastAPI, Uvicorn, Pydantic
  • Frontend: custom HTML + CSS + vanilla JavaScript
  • Document parsing: pdfplumber, python-docx
  • Runtime config: python-dotenv
  • Testing: pytest
  • Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime

Repo Layout

mini-project_3rd_yr-main/
β”œβ”€ Dockerfile
β”œβ”€ README.md
β”œβ”€ pytest.ini
β”œβ”€ hf_space/
└─ mp1/
   β”œβ”€ main.py
   β”œβ”€ requirements.txt
   β”œβ”€ frontend/
   β”œβ”€ pluto/
   β”œβ”€ benchmark/
   β”œβ”€ scripts/
   β”œβ”€ corpus/
   └─ test_*.py

Important directories:

  • mp1/frontend/: dashboard UI
  • mp1/pluto/: backend server, pipeline, stages, routing, caching, tracing
  • mp1/benchmark/: Pluto vs baseline comparison logic
  • mp1/corpus/: local document corpus and generated corpus state
  • mp1/scripts/: utility scripts such as the one-page PDF generator

Quick Start

1. Install dependencies

pip install -r mp1/requirements.txt

2. Create your environment file

Use the example file in mp1/.env.example and create mp1/.env.

Minimum practical setup:

  • set NVIDIA_API_KEY for the NVIDIA-backed stack
  • or set GROQ_API_KEY for the fallback stack

3. Run the dashboard

python mp1/main.py --serve --port 8000

Open http://127.0.0.1:8000.

4. Optional CLI run

python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output

Environment Variables

Runtime code in the repo references these variables:

  • NVIDIA_API_KEY
  • NVIDIA_API_KEY_NANO
  • NVIDIA_API_KEY_SUPER
  • NVIDIA_API_KEY_VL
  • NVIDIA_API_KEY_EMBED
  • NVIDIA_API_KEY_RERANK
  • NVIDIA_API_KEY_ULTRA
  • GROQ_API_KEY
  • MISTRAL_API_KEY

In practice, the simplest starting point is either:

  • one NVIDIA key through NVIDIA_API_KEY
  • or one Groq key through GROQ_API_KEY

Useful Endpoints

  • POST /api/run
  • GET /api/stream
  • POST /api/upload
  • GET /api/corpus
  • GET /api/doc-status/{doc_id}
  • POST /api/compare

Tests

A focused local suite used during development:

pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q

Notes

  • generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through .gitignore
  • mp1/output/ is treated as generated output, not source code
  • corpus metadata such as mp1/corpus/.doc_index.json and mp1/corpus/.extraction_cache.json is runtime state