Spaces:

ayushKishor
/

plutoV2_miniProject_3rd-yr

Sleeping

App Files Files Community

plutoV2_miniProject_3rd-yr / README.md

ayushKishor

Add Pluto memory layer and pipeline fixes

23cdeed 21 days ago

preview code

raw

history blame contribute delete

5.46 kB

metadata

title: Pluto Pipeline
emoji: 📄
colorFrom: gray
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false

Pluto: Real Mode-Switching Extraction Pipeline

Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.

The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.

Why Pluto

Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.

Key goals:

query only the relevant parts of a document corpus
switch model behavior by chunk type and task difficulty
keep document processing reusable across multiple questions
surface evidence, agent activity, and confidence to the user
support scoped queries to one selected corpus document or the full corpus

What The App Does

uploads PDF, DOCX/DOC, TXT, and MD files into a local corpus
converts uploaded files to Markdown and chunks them for retrieval
classifies chunks as text, table, figure, code, references, and more
runs a staged pipeline: Route -> Extract -> Merge -> EvidenceCheck
streams live status updates through Server-Sent Events
returns a final answer with sections, evidence, trace, confidence, and gaps
compares Pluto against a simpler single-model baseline in the benchmark panel

Architecture

flowchart LR
    A["Frontend Dashboard"] --> B["FastAPI Server"]
    B --> C["Upload + Corpus APIs"]
    B --> D["PipelineRunner"]
    D --> E["S0 Route"]
    D --> F["S1 Extract"]
    D --> G["S2 Merge"]
    D --> H["S3 EvidenceCheck"]
    C --> I["DocIndex"]
    C --> J["Corpus Files"]
    F --> K["ExtractionCache"]
    D --> L["Tracer + MessageBus"]
    B --> M["SSE Progress Stream"]

Pipeline Overview

Pluto operates in two broad phases:

Document understanding
Query-time extraction and answer synthesis

At query time the main flow is:

S0 Route Picks relevant chunks, applies document scope, and assigns a processing mode.
S1 Extract Extracts structured claims from selected chunks and reuses cached extraction results when possible.
S2 Merge Combines claims into answer sections, open gaps, and key claims.
S3 EvidenceCheck Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.

Tech Stack

Backend: FastAPI, Uvicorn, Pydantic
Frontend: custom HTML + CSS + vanilla JavaScript
Document parsing: pdfplumber, python-docx
Runtime config: python-dotenv
Testing: pytest
Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime

Repo Layout

mini-project_3rd_yr-main/
├─ Dockerfile
├─ README.md
├─ pytest.ini
├─ hf_space/
└─ mp1/
   ├─ main.py
   ├─ requirements.txt
   ├─ frontend/
   ├─ pluto/
   ├─ benchmark/
   ├─ scripts/
   ├─ corpus/
   └─ test_*.py

Important directories:

mp1/frontend/: dashboard UI
mp1/pluto/: backend server, pipeline, stages, routing, caching, tracing
mp1/benchmark/: Pluto vs baseline comparison logic
mp1/corpus/: local document corpus and generated corpus state
mp1/scripts/: utility scripts such as the one-page PDF generator

Quick Start

1. Install dependencies

pip install -r mp1/requirements.txt

2. Create your environment file

Use the example file in mp1/.env.example and create mp1/.env.

Minimum practical setup:

set NVIDIA_API_KEY for the NVIDIA-backed stack
or set GROQ_API_KEY for the fallback stack

3. Run the dashboard

python mp1/main.py --serve --port 8000

Open http://127.0.0.1:8000.

4. Optional CLI run

python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output

Environment Variables

Runtime code in the repo references these variables:

NVIDIA_API_KEY
NVIDIA_API_KEY_NANO
NVIDIA_API_KEY_SUPER
NVIDIA_API_KEY_VL
NVIDIA_API_KEY_EMBED
NVIDIA_API_KEY_RERANK
NVIDIA_API_KEY_ULTRA
GROQ_API_KEY
MISTRAL_API_KEY

In practice, the simplest starting point is either:

one NVIDIA key through NVIDIA_API_KEY
or one Groq key through GROQ_API_KEY

Useful Endpoints

POST /api/run
GET /api/stream
POST /api/upload
GET /api/corpus
GET /api/doc-status/{doc_id}
POST /api/compare

Tests

A focused local suite used during development:

pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q

Notes

generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through .gitignore
mp1/output/ is treated as generated output, not source code
corpus metadata such as mp1/corpus/.doc_index.json and mp1/corpus/.extraction_cache.json is runtime state