--- title: Pluto Pipeline emoji: "📄" colorFrom: gray colorTo: yellow sdk: docker app_port: 7860 pinned: false --- # Pluto: Real Mode-Switching Extraction Pipeline Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result. The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view. ## Why Pluto Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA. Key goals: - query only the relevant parts of a document corpus - switch model behavior by chunk type and task difficulty - keep document processing reusable across multiple questions - surface evidence, agent activity, and confidence to the user - support scoped queries to one selected corpus document or the full corpus ## What The App Does - uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus - converts uploaded files to Markdown and chunks them for retrieval - classifies chunks as text, table, figure, code, references, and more - runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck` - streams live status updates through Server-Sent Events - returns a final answer with sections, evidence, trace, confidence, and gaps - compares Pluto against a simpler single-model baseline in the benchmark panel ## Architecture ```mermaid flowchart LR A["Frontend Dashboard"] --> B["FastAPI Server"] B --> C["Upload + Corpus APIs"] B --> D["PipelineRunner"] D --> E["S0 Route"] D --> F["S1 Extract"] D --> G["S2 Merge"] D --> H["S3 EvidenceCheck"] C --> I["DocIndex"] C --> J["Corpus Files"] F --> K["ExtractionCache"] D --> L["Tracer + MessageBus"] B --> M["SSE Progress Stream"] ``` ## Pipeline Overview Pluto operates in two broad phases: 1. Document understanding 2. Query-time extraction and answer synthesis At query time the main flow is: 1. `S0 Route` Picks relevant chunks, applies document scope, and assigns a processing mode. 2. `S1 Extract` Extracts structured claims from selected chunks and reuses cached extraction results when possible. 3. `S2 Merge` Combines claims into answer sections, open gaps, and key claims. 4. `S3 EvidenceCheck` Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call. ## Tech Stack - Backend: `FastAPI`, `Uvicorn`, `Pydantic` - Frontend: custom `HTML + CSS + vanilla JavaScript` - Document parsing: `pdfplumber`, `python-docx` - Runtime config: `python-dotenv` - Testing: `pytest` - Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime ## Repo Layout ```text mini-project_3rd_yr-main/ ├─ Dockerfile ├─ README.md ├─ pytest.ini ├─ hf_space/ └─ mp1/ ├─ main.py ├─ requirements.txt ├─ frontend/ ├─ pluto/ ├─ benchmark/ ├─ scripts/ ├─ corpus/ └─ test_*.py ``` Important directories: - `mp1/frontend/`: dashboard UI - `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing - `mp1/benchmark/`: Pluto vs baseline comparison logic - `mp1/corpus/`: local document corpus and generated corpus state - `mp1/scripts/`: utility scripts such as the one-page PDF generator ## Quick Start ### 1. Install dependencies ```bash pip install -r mp1/requirements.txt ``` ### 2. Create your environment file Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`. Minimum practical setup: - set `NVIDIA_API_KEY` for the NVIDIA-backed stack - or set `GROQ_API_KEY` for the fallback stack ### 3. Run the dashboard ```bash python mp1/main.py --serve --port 8000 ``` Open `http://127.0.0.1:8000`. ### 4. Optional CLI run ```bash python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output ``` ## Environment Variables Runtime code in the repo references these variables: - `NVIDIA_API_KEY` - `NVIDIA_API_KEY_NANO` - `NVIDIA_API_KEY_SUPER` - `NVIDIA_API_KEY_VL` - `NVIDIA_API_KEY_EMBED` - `NVIDIA_API_KEY_RERANK` - `NVIDIA_API_KEY_ULTRA` - `GROQ_API_KEY` - `MISTRAL_API_KEY` In practice, the simplest starting point is either: - one NVIDIA key through `NVIDIA_API_KEY` - or one Groq key through `GROQ_API_KEY` ## Useful Endpoints - `POST /api/run` - `GET /api/stream` - `POST /api/upload` - `GET /api/corpus` - `GET /api/doc-status/{doc_id}` - `POST /api/compare` ## Tests A focused local suite used during development: ```bash pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q ``` ## Notes - generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore` - `mp1/output/` is treated as generated output, not source code - corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state