| --- |
| title: Pluto Pipeline |
| emoji: "π" |
| colorFrom: gray |
| colorTo: yellow |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # Pluto: Real Mode-Switching Extraction Pipeline |
|
|
| Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result. |
|
|
| The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view. |
|
|
| ## Why Pluto |
|
|
| Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA. |
|
|
| Key goals: |
|
|
| - query only the relevant parts of a document corpus |
| - switch model behavior by chunk type and task difficulty |
| - keep document processing reusable across multiple questions |
| - surface evidence, agent activity, and confidence to the user |
| - support scoped queries to one selected corpus document or the full corpus |
|
|
| ## What The App Does |
|
|
| - uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus |
| - converts uploaded files to Markdown and chunks them for retrieval |
| - classifies chunks as text, table, figure, code, references, and more |
| - runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck` |
| - streams live status updates through Server-Sent Events |
| - returns a final answer with sections, evidence, trace, confidence, and gaps |
| - compares Pluto against a simpler single-model baseline in the benchmark panel |
|
|
| ## Architecture |
|
|
| ```mermaid |
| flowchart LR |
| A["Frontend Dashboard"] --> B["FastAPI Server"] |
| B --> C["Upload + Corpus APIs"] |
| B --> D["PipelineRunner"] |
| D --> E["S0 Route"] |
| D --> F["S1 Extract"] |
| D --> G["S2 Merge"] |
| D --> H["S3 EvidenceCheck"] |
| C --> I["DocIndex"] |
| C --> J["Corpus Files"] |
| F --> K["ExtractionCache"] |
| D --> L["Tracer + MessageBus"] |
| B --> M["SSE Progress Stream"] |
| ``` |
|
|
| ## Pipeline Overview |
|
|
| Pluto operates in two broad phases: |
|
|
| 1. Document understanding |
| 2. Query-time extraction and answer synthesis |
|
|
| At query time the main flow is: |
|
|
| 1. `S0 Route` |
| Picks relevant chunks, applies document scope, and assigns a processing mode. |
| 2. `S1 Extract` |
| Extracts structured claims from selected chunks and reuses cached extraction results when possible. |
| 3. `S2 Merge` |
| Combines claims into answer sections, open gaps, and key claims. |
| 4. `S3 EvidenceCheck` |
| Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call. |
|
|
| ## Tech Stack |
|
|
| - Backend: `FastAPI`, `Uvicorn`, `Pydantic` |
| - Frontend: custom `HTML + CSS + vanilla JavaScript` |
| - Document parsing: `pdfplumber`, `python-docx` |
| - Runtime config: `python-dotenv` |
| - Testing: `pytest` |
| - Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime |
|
|
| ## Repo Layout |
|
|
| ```text |
| mini-project_3rd_yr-main/ |
| ββ Dockerfile |
| ββ README.md |
| ββ pytest.ini |
| ββ hf_space/ |
| ββ mp1/ |
| ββ main.py |
| ββ requirements.txt |
| ββ frontend/ |
| ββ pluto/ |
| ββ benchmark/ |
| ββ scripts/ |
| ββ corpus/ |
| ββ test_*.py |
| ``` |
|
|
| Important directories: |
|
|
| - `mp1/frontend/`: dashboard UI |
| - `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing |
| - `mp1/benchmark/`: Pluto vs baseline comparison logic |
| - `mp1/corpus/`: local document corpus and generated corpus state |
| - `mp1/scripts/`: utility scripts such as the one-page PDF generator |
|
|
| ## Quick Start |
|
|
| ### 1. Install dependencies |
|
|
| ```bash |
| pip install -r mp1/requirements.txt |
| ``` |
|
|
| ### 2. Create your environment file |
|
|
| Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`. |
|
|
| Minimum practical setup: |
|
|
| - set `NVIDIA_API_KEY` for the NVIDIA-backed stack |
| - or set `GROQ_API_KEY` for the fallback stack |
|
|
| ### 3. Run the dashboard |
|
|
| ```bash |
| python mp1/main.py --serve --port 8000 |
| ``` |
|
|
| Open `http://127.0.0.1:8000`. |
|
|
| ### 4. Optional CLI run |
|
|
| ```bash |
| python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output |
| ``` |
|
|
| ## Environment Variables |
|
|
| Runtime code in the repo references these variables: |
|
|
| - `NVIDIA_API_KEY` |
| - `NVIDIA_API_KEY_NANO` |
| - `NVIDIA_API_KEY_SUPER` |
| - `NVIDIA_API_KEY_VL` |
| - `NVIDIA_API_KEY_EMBED` |
| - `NVIDIA_API_KEY_RERANK` |
| - `NVIDIA_API_KEY_ULTRA` |
| - `GROQ_API_KEY` |
| - `MISTRAL_API_KEY` |
|
|
| In practice, the simplest starting point is either: |
|
|
| - one NVIDIA key through `NVIDIA_API_KEY` |
| - or one Groq key through `GROQ_API_KEY` |
|
|
| ## Useful Endpoints |
|
|
| - `POST /api/run` |
| - `GET /api/stream` |
| - `POST /api/upload` |
| - `GET /api/corpus` |
| - `GET /api/doc-status/{doc_id}` |
| - `POST /api/compare` |
|
|
| ## Tests |
|
|
| A focused local suite used during development: |
|
|
| ```bash |
| pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q |
| ``` |
|
|
| ## Notes |
|
|
| - generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore` |
| - `mp1/output/` is treated as generated output, not source code |
| - corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state |
|
|