ayushKishor's picture
Add Pluto memory layer and pipeline fixes
23cdeed
---
title: Pluto Pipeline
emoji: "πŸ“„"
colorFrom: gray
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
---
# Pluto: Real Mode-Switching Extraction Pipeline
Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.
The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.
## Why Pluto
Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.
Key goals:
- query only the relevant parts of a document corpus
- switch model behavior by chunk type and task difficulty
- keep document processing reusable across multiple questions
- surface evidence, agent activity, and confidence to the user
- support scoped queries to one selected corpus document or the full corpus
## What The App Does
- uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus
- converts uploaded files to Markdown and chunks them for retrieval
- classifies chunks as text, table, figure, code, references, and more
- runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck`
- streams live status updates through Server-Sent Events
- returns a final answer with sections, evidence, trace, confidence, and gaps
- compares Pluto against a simpler single-model baseline in the benchmark panel
## Architecture
```mermaid
flowchart LR
A["Frontend Dashboard"] --> B["FastAPI Server"]
B --> C["Upload + Corpus APIs"]
B --> D["PipelineRunner"]
D --> E["S0 Route"]
D --> F["S1 Extract"]
D --> G["S2 Merge"]
D --> H["S3 EvidenceCheck"]
C --> I["DocIndex"]
C --> J["Corpus Files"]
F --> K["ExtractionCache"]
D --> L["Tracer + MessageBus"]
B --> M["SSE Progress Stream"]
```
## Pipeline Overview
Pluto operates in two broad phases:
1. Document understanding
2. Query-time extraction and answer synthesis
At query time the main flow is:
1. `S0 Route`
Picks relevant chunks, applies document scope, and assigns a processing mode.
2. `S1 Extract`
Extracts structured claims from selected chunks and reuses cached extraction results when possible.
3. `S2 Merge`
Combines claims into answer sections, open gaps, and key claims.
4. `S3 EvidenceCheck`
Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.
## Tech Stack
- Backend: `FastAPI`, `Uvicorn`, `Pydantic`
- Frontend: custom `HTML + CSS + vanilla JavaScript`
- Document parsing: `pdfplumber`, `python-docx`
- Runtime config: `python-dotenv`
- Testing: `pytest`
- Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime
## Repo Layout
```text
mini-project_3rd_yr-main/
β”œβ”€ Dockerfile
β”œβ”€ README.md
β”œβ”€ pytest.ini
β”œβ”€ hf_space/
└─ mp1/
β”œβ”€ main.py
β”œβ”€ requirements.txt
β”œβ”€ frontend/
β”œβ”€ pluto/
β”œβ”€ benchmark/
β”œβ”€ scripts/
β”œβ”€ corpus/
└─ test_*.py
```
Important directories:
- `mp1/frontend/`: dashboard UI
- `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing
- `mp1/benchmark/`: Pluto vs baseline comparison logic
- `mp1/corpus/`: local document corpus and generated corpus state
- `mp1/scripts/`: utility scripts such as the one-page PDF generator
## Quick Start
### 1. Install dependencies
```bash
pip install -r mp1/requirements.txt
```
### 2. Create your environment file
Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`.
Minimum practical setup:
- set `NVIDIA_API_KEY` for the NVIDIA-backed stack
- or set `GROQ_API_KEY` for the fallback stack
### 3. Run the dashboard
```bash
python mp1/main.py --serve --port 8000
```
Open `http://127.0.0.1:8000`.
### 4. Optional CLI run
```bash
python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output
```
## Environment Variables
Runtime code in the repo references these variables:
- `NVIDIA_API_KEY`
- `NVIDIA_API_KEY_NANO`
- `NVIDIA_API_KEY_SUPER`
- `NVIDIA_API_KEY_VL`
- `NVIDIA_API_KEY_EMBED`
- `NVIDIA_API_KEY_RERANK`
- `NVIDIA_API_KEY_ULTRA`
- `GROQ_API_KEY`
- `MISTRAL_API_KEY`
In practice, the simplest starting point is either:
- one NVIDIA key through `NVIDIA_API_KEY`
- or one Groq key through `GROQ_API_KEY`
## Useful Endpoints
- `POST /api/run`
- `GET /api/stream`
- `POST /api/upload`
- `GET /api/corpus`
- `GET /api/doc-status/{doc_id}`
- `POST /api/compare`
## Tests
A focused local suite used during development:
```bash
pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q
```
## Notes
- generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore`
- `mp1/output/` is treated as generated output, not source code
- corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state