---
title: Pluto Pipeline
emoji: "📄"
colorFrom: gray
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
---

# Pluto: Real Mode-Switching Extraction Pipeline

Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.

The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.

## Why Pluto

Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.

Key goals:

- query only the relevant parts of a document corpus
- switch model behavior by chunk type and task difficulty
- keep document processing reusable across multiple questions
- surface evidence, agent activity, and confidence to the user
- support scoped queries to one selected corpus document or the full corpus

## What The App Does

- uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus
- converts uploaded files to Markdown and chunks them for retrieval
- classifies chunks as text, table, figure, code, references, and more
- runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck`
- streams live status updates through Server-Sent Events
- returns a final answer with sections, evidence, trace, confidence, and gaps
- compares Pluto against a simpler single-model baseline in the benchmark panel

## Architecture

```mermaid
flowchart LR
    A["Frontend Dashboard"] --> B["FastAPI Server"]
    B --> C["Upload + Corpus APIs"]
    B --> D["PipelineRunner"]
    D --> E["S0 Route"]
    D --> F["S1 Extract"]
    D --> G["S2 Merge"]
    D --> H["S3 EvidenceCheck"]
    C --> I["DocIndex"]
    C --> J["Corpus Files"]
    F --> K["ExtractionCache"]
    D --> L["Tracer + MessageBus"]
    B --> M["SSE Progress Stream"]
```

## Pipeline Overview

Pluto operates in two broad phases:

1. Document understanding
2. Query-time extraction and answer synthesis

At query time the main flow is:

1. `S0 Route`
   Picks relevant chunks, applies document scope, and assigns a processing mode.
2. `S1 Extract`
   Extracts structured claims from selected chunks and reuses cached extraction results when possible.
3. `S2 Merge`
   Combines claims into answer sections, open gaps, and key claims.
4. `S3 EvidenceCheck`
   Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.

## Tech Stack

- Backend: `FastAPI`, `Uvicorn`, `Pydantic`
- Frontend: custom `HTML + CSS + vanilla JavaScript`
- Document parsing: `pdfplumber`, `python-docx`
- Runtime config: `python-dotenv`
- Testing: `pytest`
- Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime

## Repo Layout

```text
mini-project_3rd_yr-main/
├─ Dockerfile
├─ README.md
├─ pytest.ini
├─ hf_space/
└─ mp1/
   ├─ main.py
   ├─ requirements.txt
   ├─ frontend/
   ├─ pluto/
   ├─ benchmark/
   ├─ scripts/
   ├─ corpus/
   └─ test_*.py
```

Important directories:

- `mp1/frontend/`: dashboard UI
- `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing
- `mp1/benchmark/`: Pluto vs baseline comparison logic
- `mp1/corpus/`: local document corpus and generated corpus state
- `mp1/scripts/`: utility scripts such as the one-page PDF generator

## Quick Start

### 1. Install dependencies

```bash
pip install -r mp1/requirements.txt
```

### 2. Create your environment file

Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`.

Minimum practical setup:

- set `NVIDIA_API_KEY` for the NVIDIA-backed stack
- or set `GROQ_API_KEY` for the fallback stack

### 3. Run the dashboard

```bash
python mp1/main.py --serve --port 8000
```

Open `http://127.0.0.1:8000`.

### 4. Optional CLI run

```bash
python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output
```

## Environment Variables

Runtime code in the repo references these variables:

- `NVIDIA_API_KEY`
- `NVIDIA_API_KEY_NANO`
- `NVIDIA_API_KEY_SUPER`
- `NVIDIA_API_KEY_VL`
- `NVIDIA_API_KEY_EMBED`
- `NVIDIA_API_KEY_RERANK`
- `NVIDIA_API_KEY_ULTRA`
- `GROQ_API_KEY`
- `MISTRAL_API_KEY`

In practice, the simplest starting point is either:

- one NVIDIA key through `NVIDIA_API_KEY`
- or one Groq key through `GROQ_API_KEY`

## Useful Endpoints

- `POST /api/run`
- `GET /api/stream`
- `POST /api/upload`
- `GET /api/corpus`
- `GET /api/doc-status/{doc_id}`
- `POST /api/compare`

## Tests

A focused local suite used during development:

```bash
pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q
```

## Notes

- generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore`
- `mp1/output/` is treated as generated output, not source code
- corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state