title: Pluto Pipeline
emoji: π
colorFrom: gray
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
Pluto: Real Mode-Switching Extraction Pipeline
Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.
The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.
Why Pluto
Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.
Key goals:
- query only the relevant parts of a document corpus
- switch model behavior by chunk type and task difficulty
- keep document processing reusable across multiple questions
- surface evidence, agent activity, and confidence to the user
- support scoped queries to one selected corpus document or the full corpus
What The App Does
- uploads
PDF,DOCX/DOC,TXT, andMDfiles into a local corpus - converts uploaded files to Markdown and chunks them for retrieval
- classifies chunks as text, table, figure, code, references, and more
- runs a staged pipeline:
Route -> Extract -> Merge -> EvidenceCheck - streams live status updates through Server-Sent Events
- returns a final answer with sections, evidence, trace, confidence, and gaps
- compares Pluto against a simpler single-model baseline in the benchmark panel
Architecture
flowchart LR
A["Frontend Dashboard"] --> B["FastAPI Server"]
B --> C["Upload + Corpus APIs"]
B --> D["PipelineRunner"]
D --> E["S0 Route"]
D --> F["S1 Extract"]
D --> G["S2 Merge"]
D --> H["S3 EvidenceCheck"]
C --> I["DocIndex"]
C --> J["Corpus Files"]
F --> K["ExtractionCache"]
D --> L["Tracer + MessageBus"]
B --> M["SSE Progress Stream"]
Pipeline Overview
Pluto operates in two broad phases:
- Document understanding
- Query-time extraction and answer synthesis
At query time the main flow is:
S0 RoutePicks relevant chunks, applies document scope, and assigns a processing mode.S1 ExtractExtracts structured claims from selected chunks and reuses cached extraction results when possible.S2 MergeCombines claims into answer sections, open gaps, and key claims.S3 EvidenceCheckChecks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.
Tech Stack
- Backend:
FastAPI,Uvicorn,Pydantic - Frontend: custom
HTML + CSS + vanilla JavaScript - Document parsing:
pdfplumber,python-docx - Runtime config:
python-dotenv - Testing:
pytest - Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime
Repo Layout
mini-project_3rd_yr-main/
ββ Dockerfile
ββ README.md
ββ pytest.ini
ββ hf_space/
ββ mp1/
ββ main.py
ββ requirements.txt
ββ frontend/
ββ pluto/
ββ benchmark/
ββ scripts/
ββ corpus/
ββ test_*.py
Important directories:
mp1/frontend/: dashboard UImp1/pluto/: backend server, pipeline, stages, routing, caching, tracingmp1/benchmark/: Pluto vs baseline comparison logicmp1/corpus/: local document corpus and generated corpus statemp1/scripts/: utility scripts such as the one-page PDF generator
Quick Start
1. Install dependencies
pip install -r mp1/requirements.txt
2. Create your environment file
Use the example file in mp1/.env.example and create mp1/.env.
Minimum practical setup:
- set
NVIDIA_API_KEYfor the NVIDIA-backed stack - or set
GROQ_API_KEYfor the fallback stack
3. Run the dashboard
python mp1/main.py --serve --port 8000
Open http://127.0.0.1:8000.
4. Optional CLI run
python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output
Environment Variables
Runtime code in the repo references these variables:
NVIDIA_API_KEYNVIDIA_API_KEY_NANONVIDIA_API_KEY_SUPERNVIDIA_API_KEY_VLNVIDIA_API_KEY_EMBEDNVIDIA_API_KEY_RERANKNVIDIA_API_KEY_ULTRAGROQ_API_KEYMISTRAL_API_KEY
In practice, the simplest starting point is either:
- one NVIDIA key through
NVIDIA_API_KEY - or one Groq key through
GROQ_API_KEY
Useful Endpoints
POST /api/runGET /api/streamPOST /api/uploadGET /api/corpusGET /api/doc-status/{doc_id}POST /api/compare
Tests
A focused local suite used during development:
pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q
Notes
- generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through
.gitignore mp1/output/is treated as generated output, not source code- corpus metadata such as
mp1/corpus/.doc_index.jsonandmp1/corpus/.extraction_cache.jsonis runtime state