File size: 5,457 Bytes
d4c2ad9 23cdeed d4c2ad9 23cdeed d4c2ad9 23cdeed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | ---
title: Pluto Pipeline
emoji: "π"
colorFrom: gray
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
---
# Pluto: Real Mode-Switching Extraction Pipeline
Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.
The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.
## Why Pluto
Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.
Key goals:
- query only the relevant parts of a document corpus
- switch model behavior by chunk type and task difficulty
- keep document processing reusable across multiple questions
- surface evidence, agent activity, and confidence to the user
- support scoped queries to one selected corpus document or the full corpus
## What The App Does
- uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus
- converts uploaded files to Markdown and chunks them for retrieval
- classifies chunks as text, table, figure, code, references, and more
- runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck`
- streams live status updates through Server-Sent Events
- returns a final answer with sections, evidence, trace, confidence, and gaps
- compares Pluto against a simpler single-model baseline in the benchmark panel
## Architecture
```mermaid
flowchart LR
A["Frontend Dashboard"] --> B["FastAPI Server"]
B --> C["Upload + Corpus APIs"]
B --> D["PipelineRunner"]
D --> E["S0 Route"]
D --> F["S1 Extract"]
D --> G["S2 Merge"]
D --> H["S3 EvidenceCheck"]
C --> I["DocIndex"]
C --> J["Corpus Files"]
F --> K["ExtractionCache"]
D --> L["Tracer + MessageBus"]
B --> M["SSE Progress Stream"]
```
## Pipeline Overview
Pluto operates in two broad phases:
1. Document understanding
2. Query-time extraction and answer synthesis
At query time the main flow is:
1. `S0 Route`
Picks relevant chunks, applies document scope, and assigns a processing mode.
2. `S1 Extract`
Extracts structured claims from selected chunks and reuses cached extraction results when possible.
3. `S2 Merge`
Combines claims into answer sections, open gaps, and key claims.
4. `S3 EvidenceCheck`
Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.
## Tech Stack
- Backend: `FastAPI`, `Uvicorn`, `Pydantic`
- Frontend: custom `HTML + CSS + vanilla JavaScript`
- Document parsing: `pdfplumber`, `python-docx`
- Runtime config: `python-dotenv`
- Testing: `pytest`
- Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime
## Repo Layout
```text
mini-project_3rd_yr-main/
ββ Dockerfile
ββ README.md
ββ pytest.ini
ββ hf_space/
ββ mp1/
ββ main.py
ββ requirements.txt
ββ frontend/
ββ pluto/
ββ benchmark/
ββ scripts/
ββ corpus/
ββ test_*.py
```
Important directories:
- `mp1/frontend/`: dashboard UI
- `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing
- `mp1/benchmark/`: Pluto vs baseline comparison logic
- `mp1/corpus/`: local document corpus and generated corpus state
- `mp1/scripts/`: utility scripts such as the one-page PDF generator
## Quick Start
### 1. Install dependencies
```bash
pip install -r mp1/requirements.txt
```
### 2. Create your environment file
Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`.
Minimum practical setup:
- set `NVIDIA_API_KEY` for the NVIDIA-backed stack
- or set `GROQ_API_KEY` for the fallback stack
### 3. Run the dashboard
```bash
python mp1/main.py --serve --port 8000
```
Open `http://127.0.0.1:8000`.
### 4. Optional CLI run
```bash
python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output
```
## Environment Variables
Runtime code in the repo references these variables:
- `NVIDIA_API_KEY`
- `NVIDIA_API_KEY_NANO`
- `NVIDIA_API_KEY_SUPER`
- `NVIDIA_API_KEY_VL`
- `NVIDIA_API_KEY_EMBED`
- `NVIDIA_API_KEY_RERANK`
- `NVIDIA_API_KEY_ULTRA`
- `GROQ_API_KEY`
- `MISTRAL_API_KEY`
In practice, the simplest starting point is either:
- one NVIDIA key through `NVIDIA_API_KEY`
- or one Groq key through `GROQ_API_KEY`
## Useful Endpoints
- `POST /api/run`
- `GET /api/stream`
- `POST /api/upload`
- `GET /api/corpus`
- `GET /api/doc-status/{doc_id}`
- `POST /api/compare`
## Tests
A focused local suite used during development:
```bash
pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q
```
## Notes
- generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore`
- `mp1/output/` is treated as generated output, not source code
- corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state
|