Spaces:

ayushKishor
/

plutoV2_miniProject_3rd-yr

Sleeping

App Files Files Community

plutoV2_miniProject_3rd-yr / README.md

ayushKishor

Add Pluto memory layer and pipeline fixes

23cdeed 22 days ago

preview code

raw

history blame contribute delete

5.46 kB

	---
	title: Pluto Pipeline
	emoji: "📄"
	colorFrom: gray
	colorTo: yellow
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# Pluto: Real Mode-Switching Extraction Pipeline

	Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.

	The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.

	## Why Pluto

	Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.

	Key goals:

	- query only the relevant parts of a document corpus
	- switch model behavior by chunk type and task difficulty
	- keep document processing reusable across multiple questions
	- surface evidence, agent activity, and confidence to the user
	- support scoped queries to one selected corpus document or the full corpus

	## What The App Does

	- uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus
	- converts uploaded files to Markdown and chunks them for retrieval
	- classifies chunks as text, table, figure, code, references, and more
	- runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck`
	- streams live status updates through Server-Sent Events
	- returns a final answer with sections, evidence, trace, confidence, and gaps
	- compares Pluto against a simpler single-model baseline in the benchmark panel

	## Architecture

	```mermaid
	flowchart LR
	A["Frontend Dashboard"] --> B["FastAPI Server"]
	B --> C["Upload + Corpus APIs"]
	B --> D["PipelineRunner"]
	D --> E["S0 Route"]
	D --> F["S1 Extract"]
	D --> G["S2 Merge"]
	D --> H["S3 EvidenceCheck"]
	C --> I["DocIndex"]
	C --> J["Corpus Files"]
	F --> K["ExtractionCache"]
	D --> L["Tracer + MessageBus"]
	B --> M["SSE Progress Stream"]
	```

	## Pipeline Overview

	Pluto operates in two broad phases:

	1. Document understanding
	2. Query-time extraction and answer synthesis

	At query time the main flow is:

	1. `S0 Route`
	Picks relevant chunks, applies document scope, and assigns a processing mode.
	2. `S1 Extract`
	Extracts structured claims from selected chunks and reuses cached extraction results when possible.
	3. `S2 Merge`
	Combines claims into answer sections, open gaps, and key claims.
	4. `S3 EvidenceCheck`
	Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.

	## Tech Stack

	- Backend: `FastAPI`, `Uvicorn`, `Pydantic`
	- Frontend: custom `HTML + CSS + vanilla JavaScript`
	- Document parsing: `pdfplumber`, `python-docx`
	- Runtime config: `python-dotenv`
	- Testing: `pytest`
	- Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime

	## Repo Layout

	```text
	mini-project_3rd_yr-main/
	├─ Dockerfile
	├─ README.md
	├─ pytest.ini
	├─ hf_space/
	└─ mp1/
	├─ main.py
	├─ requirements.txt
	├─ frontend/
	├─ pluto/
	├─ benchmark/
	├─ scripts/
	├─ corpus/
	└─ test_*.py
	```

	Important directories:

	- `mp1/frontend/`: dashboard UI
	- `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing
	- `mp1/benchmark/`: Pluto vs baseline comparison logic
	- `mp1/corpus/`: local document corpus and generated corpus state
	- `mp1/scripts/`: utility scripts such as the one-page PDF generator

	## Quick Start

	### 1. Install dependencies

	```bash
	pip install -r mp1/requirements.txt
	```

	### 2. Create your environment file

	Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`.

	Minimum practical setup:

	- set `NVIDIA_API_KEY` for the NVIDIA-backed stack
	- or set `GROQ_API_KEY` for the fallback stack

	### 3. Run the dashboard

	```bash
	python mp1/main.py --serve --port 8000
	```

	Open `http://127.0.0.1:8000`.

	### 4. Optional CLI run

	```bash
	python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output
	```

	## Environment Variables

	Runtime code in the repo references these variables:

	- `NVIDIA_API_KEY`
	- `NVIDIA_API_KEY_NANO`
	- `NVIDIA_API_KEY_SUPER`
	- `NVIDIA_API_KEY_VL`
	- `NVIDIA_API_KEY_EMBED`
	- `NVIDIA_API_KEY_RERANK`
	- `NVIDIA_API_KEY_ULTRA`
	- `GROQ_API_KEY`
	- `MISTRAL_API_KEY`

	In practice, the simplest starting point is either:

	- one NVIDIA key through `NVIDIA_API_KEY`
	- or one Groq key through `GROQ_API_KEY`

	## Useful Endpoints

	- `POST /api/run`
	- `GET /api/stream`
	- `POST /api/upload`
	- `GET /api/corpus`
	- `GET /api/doc-status/{doc_id}`
	- `POST /api/compare`

	## Tests

	A focused local suite used during development:

	```bash
	pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q
	```

	## Notes

	- generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore`
	- `mp1/output/` is treated as generated output, not source code
	- corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state