ayushKishor commited on
Commit
23cdeed
·
1 Parent(s): 691e458

Add Pluto memory layer and pipeline fixes

Browse files
.dockerignore ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ *.pyd
5
+ .venv/
6
+ env/
7
+ .env
8
+ .git/
9
+ .gitignore
10
+ .pytest_cache/
11
+ debug.txt
12
+ output_log.txt
13
+ verify_dump.txt
14
+ mp1/debug.txt
15
+ mp1/output_log.txt
16
+ mp1/verify_dump.txt
.gitignore ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Secrets and local environments
2
+ .env
3
+ .env.*
4
+ !.env.example
5
+ !.env.sample
6
+ .venv/
7
+ venv/
8
+ env/
9
+ ENV/
10
+
11
+ # Python bytecode and tool caches
12
+ __pycache__/
13
+ *.py[cod]
14
+ *.pyo
15
+ .pytest_cache/
16
+ .mypy_cache/
17
+ .ruff_cache/
18
+ .coverage
19
+ htmlcov/
20
+
21
+ # Editor and OS noise
22
+ .DS_Store
23
+ Thumbs.db
24
+ .idea/
25
+ .vscode/
26
+
27
+ # Runtime logs and debug dumps
28
+ *.log
29
+ *.out
30
+ *.err
31
+ debug.txt
32
+ output_log.txt
33
+ verify_dump.txt
34
+ mp1/server_log*.txt
35
+ mp1/server_ui_*.log
36
+ mp1/test_out.json
37
+ mp1/test_out.txt
38
+
39
+ # Generated runtime data
40
+ mp1/output/**
41
+ mp1/tmp/
42
+ mp1/tmp*/
43
+ mp1/pytest-cache-files-*/
44
+ mp1/corpus/.doc_index.json
45
+ mp1/corpus/.extraction_cache.json
46
+ mp1/nvidia_models.json
README.md CHANGED
@@ -1,11 +1,182 @@
1
  ---
2
- title: PlutoV2 MiniProject 3rd-yr
3
- emoji: 📉
4
- colorFrom: green
5
- colorTo: gray
6
  sdk: docker
 
7
  pinned: false
8
- short_description: pluto_v2
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Pluto Pipeline
3
+ emoji: "📄"
4
+ colorFrom: gray
5
+ colorTo: yellow
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
 
9
  ---
10
 
11
+ # Pluto: Real Mode-Switching Extraction Pipeline
12
+
13
+ Pluto is a document question-answering system built for research and technical documents. Instead of sending an entire paper to one model and hoping for the best, Pluto separates document understanding from query-time reasoning, routes only relevant chunks, extracts structured claims, merges them into an answer, and verifies support before returning the result.
14
+
15
+ The project includes a FastAPI backend, a one-page dashboard, scoped corpus selection, live pipeline progress streaming, evidence-backed answers, confidence reporting, trace summaries, and a baseline comparison view.
16
+
17
+ ## Why Pluto
18
+
19
+ Traditional one-shot PDF chat often struggles with long documents, tables, figures, and answer traceability. Pluto is designed to make that workflow more inspectable and more efficient for project-scale document QA.
20
+
21
+ Key goals:
22
+
23
+ - query only the relevant parts of a document corpus
24
+ - switch model behavior by chunk type and task difficulty
25
+ - keep document processing reusable across multiple questions
26
+ - surface evidence, agent activity, and confidence to the user
27
+ - support scoped queries to one selected corpus document or the full corpus
28
+
29
+ ## What The App Does
30
+
31
+ - uploads `PDF`, `DOCX/DOC`, `TXT`, and `MD` files into a local corpus
32
+ - converts uploaded files to Markdown and chunks them for retrieval
33
+ - classifies chunks as text, table, figure, code, references, and more
34
+ - runs a staged pipeline: `Route -> Extract -> Merge -> EvidenceCheck`
35
+ - streams live status updates through Server-Sent Events
36
+ - returns a final answer with sections, evidence, trace, confidence, and gaps
37
+ - compares Pluto against a simpler single-model baseline in the benchmark panel
38
+
39
+ ## Architecture
40
+
41
+ ```mermaid
42
+ flowchart LR
43
+ A["Frontend Dashboard"] --> B["FastAPI Server"]
44
+ B --> C["Upload + Corpus APIs"]
45
+ B --> D["PipelineRunner"]
46
+ D --> E["S0 Route"]
47
+ D --> F["S1 Extract"]
48
+ D --> G["S2 Merge"]
49
+ D --> H["S3 EvidenceCheck"]
50
+ C --> I["DocIndex"]
51
+ C --> J["Corpus Files"]
52
+ F --> K["ExtractionCache"]
53
+ D --> L["Tracer + MessageBus"]
54
+ B --> M["SSE Progress Stream"]
55
+ ```
56
+
57
+ ## Pipeline Overview
58
+
59
+ Pluto operates in two broad phases:
60
+
61
+ 1. Document understanding
62
+ 2. Query-time extraction and answer synthesis
63
+
64
+ At query time the main flow is:
65
+
66
+ 1. `S0 Route`
67
+ Picks relevant chunks, applies document scope, and assigns a processing mode.
68
+ 2. `S1 Extract`
69
+ Extracts structured claims from selected chunks and reuses cached extraction results when possible.
70
+ 3. `S2 Merge`
71
+ Combines claims into answer sections, open gaps, and key claims.
72
+ 4. `S3 EvidenceCheck`
73
+ Checks whether synthesized claims are present in retrieved chunk text using token overlap and an optional LLM confirmation call.
74
+
75
+ ## Tech Stack
76
+
77
+ - Backend: `FastAPI`, `Uvicorn`, `Pydantic`
78
+ - Frontend: custom `HTML + CSS + vanilla JavaScript`
79
+ - Document parsing: `pdfplumber`, `python-docx`
80
+ - Runtime config: `python-dotenv`
81
+ - Testing: `pytest`
82
+ - Providers: NVIDIA-hosted models when available, with Groq and Mistral fallback paths in the runtime
83
+
84
+ ## Repo Layout
85
+
86
+ ```text
87
+ mini-project_3rd_yr-main/
88
+ ├─ Dockerfile
89
+ ├─ README.md
90
+ ├─ pytest.ini
91
+ ├─ hf_space/
92
+ └─ mp1/
93
+ ├─ main.py
94
+ ├─ requirements.txt
95
+ ├─ frontend/
96
+ ├─ pluto/
97
+ ├─ benchmark/
98
+ ├─ scripts/
99
+ ├─ corpus/
100
+ └─ test_*.py
101
+ ```
102
+
103
+ Important directories:
104
+
105
+ - `mp1/frontend/`: dashboard UI
106
+ - `mp1/pluto/`: backend server, pipeline, stages, routing, caching, tracing
107
+ - `mp1/benchmark/`: Pluto vs baseline comparison logic
108
+ - `mp1/corpus/`: local document corpus and generated corpus state
109
+ - `mp1/scripts/`: utility scripts such as the one-page PDF generator
110
+
111
+ ## Quick Start
112
+
113
+ ### 1. Install dependencies
114
+
115
+ ```bash
116
+ pip install -r mp1/requirements.txt
117
+ ```
118
+
119
+ ### 2. Create your environment file
120
+
121
+ Use the example file in [`mp1/.env.example`](mp1/.env.example) and create `mp1/.env`.
122
+
123
+ Minimum practical setup:
124
+
125
+ - set `NVIDIA_API_KEY` for the NVIDIA-backed stack
126
+ - or set `GROQ_API_KEY` for the fallback stack
127
+
128
+ ### 3. Run the dashboard
129
+
130
+ ```bash
131
+ python mp1/main.py --serve --port 8000
132
+ ```
133
+
134
+ Open `http://127.0.0.1:8000`.
135
+
136
+ ### 4. Optional CLI run
137
+
138
+ ```bash
139
+ python mp1/main.py --query "What is this paper about?" --corpus mp1/corpus --output mp1/output
140
+ ```
141
+
142
+ ## Environment Variables
143
+
144
+ Runtime code in the repo references these variables:
145
+
146
+ - `NVIDIA_API_KEY`
147
+ - `NVIDIA_API_KEY_NANO`
148
+ - `NVIDIA_API_KEY_SUPER`
149
+ - `NVIDIA_API_KEY_VL`
150
+ - `NVIDIA_API_KEY_EMBED`
151
+ - `NVIDIA_API_KEY_RERANK`
152
+ - `NVIDIA_API_KEY_ULTRA`
153
+ - `GROQ_API_KEY`
154
+ - `MISTRAL_API_KEY`
155
+
156
+ In practice, the simplest starting point is either:
157
+
158
+ - one NVIDIA key through `NVIDIA_API_KEY`
159
+ - or one Groq key through `GROQ_API_KEY`
160
+
161
+ ## Useful Endpoints
162
+
163
+ - `POST /api/run`
164
+ - `GET /api/stream`
165
+ - `POST /api/upload`
166
+ - `GET /api/corpus`
167
+ - `GET /api/doc-status/{doc_id}`
168
+ - `POST /api/compare`
169
+
170
+ ## Tests
171
+
172
+ A focused local suite used during development:
173
+
174
+ ```bash
175
+ pytest mp1/test_server.py mp1/test_route.py mp1/test_merge.py mp1/test_verify.py mp1/test_doc_index.py -q
176
+ ```
177
+
178
+ ## Notes
179
+
180
+ - generated runtime artifacts, logs, temp folders, local caches, and secret files are intentionally excluded through `.gitignore`
181
+ - `mp1/output/` is treated as generated output, not source code
182
+ - corpus metadata such as `mp1/corpus/.doc_index.json` and `mp1/corpus/.extraction_cache.json` is runtime state
app.py CHANGED
Binary files a/app.py and b/app.py differ
 
mp1/.env DELETED
@@ -1,15 +0,0 @@
1
- # NVIDIA NIM Multi-model Keys
2
- NVIDIA_API_KEY_NANO=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
3
- NVIDIA_API_KEY_SUPER=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
4
- NVIDIA_API_KEY_VL=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
5
- NVIDIA_API_KEY_EMBED=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
6
- NVIDIA_API_KEY_RERANK=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
7
- NVIDIA_API_KEY_ULTRA=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
8
-
9
- # Global fallback (defaults to Super if specific not found)
10
- NVIDIA_API_KEY=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
11
-
12
- # Keep Groq as fallback
13
- GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxx
14
- MISTRAL_API_KEY=...
15
- GOOGLE_API_KEY=AIzaSyDp-mzHD9Nyk1T3xCPRyrc1RCiVLZzkNy8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mp1/.env.example CHANGED
@@ -15,3 +15,5 @@ NVIDIA_API_KEY_ULTRA=
15
 
16
  GROQ_API_KEY=
17
  MISTRAL_API_KEY=
 
 
 
15
 
16
  GROQ_API_KEY=
17
  MISTRAL_API_KEY=
18
+
19
+ DATABASE_URL=postgresql://user:password@localhost:5432/pluto
mp1/benchmark/compare.py CHANGED
@@ -30,7 +30,7 @@ def _normalize_detail_level(detail_level: str | None) -> str:
30
  class SimpleRunner:
31
  """
32
  Single-model baseline: one LLM call over top keyword-matched chunks.
33
- No routing, no extraction schema, no verification.
34
  """
35
 
36
  def __init__(self, corpus_dir: str, doc_index=None):
@@ -145,7 +145,7 @@ class ComparisonRunner:
145
  selected_doc_ids=selected_doc_ids,
146
  detail_level=detail_level,
147
  ),
148
- verified=True,
149
  )
150
  baseline_metrics = self._run_side(
151
  "Baseline",
@@ -154,7 +154,7 @@ class ComparisonRunner:
154
  selected_doc_ids=selected_doc_ids,
155
  detail_level=detail_level,
156
  ),
157
- verified=False,
158
  )
159
 
160
  winner = "Unavailable"
@@ -174,7 +174,7 @@ class ComparisonRunner:
174
  "winner": winner,
175
  }
176
 
177
- def _run_side(self, label: str, runner, verified: bool) -> dict:
178
  start_time = time.time()
179
  try:
180
  result = runner()
@@ -183,10 +183,10 @@ class ComparisonRunner:
183
  "confidence": round(result.confidence, 2),
184
  "evidence_count": len(result.evidence),
185
  "chunks_processed": result.trace_summary.chunks_processed,
186
- "verified": verified,
187
  "answer_preview": (result.final_answer.response or "")[:300],
188
  "models_used": result.trace_summary.models_used,
189
- "real_switching": result.trace_summary.real_switching if verified else False,
190
  "error": None,
191
  }
192
  except Exception as exc:
@@ -195,7 +195,7 @@ class ComparisonRunner:
195
  "confidence": 0.0,
196
  "evidence_count": 0,
197
  "chunks_processed": 0,
198
- "verified": verified,
199
  "answer_preview": f"{label} failed: {exc}"[:300],
200
  "models_used": [],
201
  "real_switching": False,
 
30
  class SimpleRunner:
31
  """
32
  Single-model baseline: one LLM call over top keyword-matched chunks.
33
+ No routing, no extraction schema, no evidence check.
34
  """
35
 
36
  def __init__(self, corpus_dir: str, doc_index=None):
 
145
  selected_doc_ids=selected_doc_ids,
146
  detail_level=detail_level,
147
  ),
148
+ evidence_checked=True,
149
  )
150
  baseline_metrics = self._run_side(
151
  "Baseline",
 
154
  selected_doc_ids=selected_doc_ids,
155
  detail_level=detail_level,
156
  ),
157
+ evidence_checked=False,
158
  )
159
 
160
  winner = "Unavailable"
 
174
  "winner": winner,
175
  }
176
 
177
+ def _run_side(self, label: str, runner, evidence_checked: bool) -> dict:
178
  start_time = time.time()
179
  try:
180
  result = runner()
 
183
  "confidence": round(result.confidence, 2),
184
  "evidence_count": len(result.evidence),
185
  "chunks_processed": result.trace_summary.chunks_processed,
186
+ "evidence_checked": evidence_checked,
187
  "answer_preview": (result.final_answer.response or "")[:300],
188
  "models_used": result.trace_summary.models_used,
189
+ "real_switching": result.trace_summary.real_switching if evidence_checked else False,
190
  "error": None,
191
  }
192
  except Exception as exc:
 
195
  "confidence": 0.0,
196
  "evidence_count": 0,
197
  "chunks_processed": 0,
198
+ "evidence_checked": evidence_checked,
199
  "answer_preview": f"{label} failed: {exc}"[:300],
200
  "models_used": [],
201
  "real_switching": False,
mp1/corpus/.doc_index.json DELETED
The diff for this file is too large to render. See raw diff
 
mp1/corpus/.extraction_cache.json DELETED
The diff for this file is too large to render. See raw diff
 
mp1/frontend/app.js CHANGED
@@ -6,7 +6,7 @@
6
  detailLevel: 'pluto.detailLevel',
7
  };
8
 
9
- const stages = ['route', 'extract', 'merge', 'verify'];
10
  const stageEls = {};
11
  const statusEls = {};
12
  const connectors = document.querySelectorAll('.stage-rail__connector');
@@ -37,6 +37,10 @@
37
  let uploadProcessingActive = false;
38
  let pipelineRunning = false;
39
  let activeEventSource = null;
 
 
 
 
40
  let latestCorpusDocs = [];
41
  let pendingCorpusDocIds = [];
42
  let selectedDocIds = loadStoredDocIds();
@@ -155,28 +159,36 @@
155
  }
156
 
157
  pipelineRunning = true;
 
 
 
158
  syncControls();
159
  runBtn.innerHTML = '<span class="spinner"></span> Running...';
160
  resetUI();
161
 
162
  try {
163
- await listenSSE();
164
  const response = await fetch('/api/run', {
165
  method: 'POST',
166
  headers: { 'Content-Type': 'application/json' },
167
- body: JSON.stringify(buildQueryPayload(query)),
168
  });
169
  const data = await parseJsonResponse(response, 'Server returned an invalid response');
 
170
  if (!response.ok || data.error) {
171
  throw new Error(data.error || `Server error: ${response.status}`);
172
  }
173
  renderResult(data);
 
 
 
174
  } catch (error) {
175
  answerBody.innerHTML = renderErrorCard('Pipeline Error', error.message);
176
  console.error(error);
177
  } finally {
178
  closeActiveStream();
179
  pipelineRunning = false;
 
180
  runBtn.innerHTML = '<span class="btn-icon">&#9654;</span> Run Pipeline';
181
  syncControls();
182
  }
@@ -214,19 +226,24 @@
214
  }
215
  }
216
 
217
- function buildQueryPayload(query) {
218
  return {
219
  query,
 
 
 
 
 
220
  selected_doc_ids: [...selectedDocIds],
221
  detail_level: detailLevel,
222
  };
223
  }
224
 
225
- function listenSSE() {
226
  closeActiveStream();
227
 
228
  return new Promise((resolve, reject) => {
229
- const eventSource = new EventSource('/api/stream');
230
  let opened = false;
231
  activeEventSource = eventSource;
232
 
@@ -334,8 +351,8 @@
334
  info = `done (${data.extractions} facts)`;
335
  } else if (stage === 'merge' && data.key_claims) {
336
  info = `done (${data.key_claims} claims)`;
337
- } else if (stage === 'verify' && data.checked) {
338
- info = `done (${data.checked} verified)`;
339
  }
340
 
341
  statusEls[stage].innerHTML = `<span class="status-dot status-dot--complete"></span>${esc(info)}`;
@@ -374,9 +391,9 @@
374
  const gaps = Array.isArray(data.missing_info) ? data.missing_info : [];
375
  const nextActions = Array.isArray(data.next_actions) ? data.next_actions : [];
376
  if (gaps.length) {
377
- const gapTitle = nextActions.length ? 'Verification / Coverage Gaps Found' : 'Coverage Gaps Noted';
378
  const gapIntro = nextActions.length
379
- ? 'Some answer points could not be fully verified from the extracted evidence.'
380
  : 'The detailed answer asked for coverage beyond what the document clearly supports in the selected scope.';
381
  const gapPrefix = nextActions.length ? 'Need support:' : 'Not clearly covered:';
382
  html += `
@@ -530,8 +547,8 @@
530
  <div class="${yesNoClass(stats.real_switching)}">${stats.real_switching ? 'Yes' : 'No'}</div>
531
  </div>
532
  <div class="bench-stat">
533
- <span class="bench-stat__label">Verified Claims</span>
534
- <div class="${yesNoClass(stats.verified)}">${stats.verified ? 'Enabled' : 'Disabled'}</div>
535
  </div>
536
  <div class="bench-stat">
537
  <span class="bench-stat__label">Evidence Count</span>
@@ -808,6 +825,13 @@
808
  });
809
  }
810
 
 
 
 
 
 
 
 
811
  function toggleDocSelection(docId) {
812
  if (!docId) {
813
  return;
 
6
  detailLevel: 'pluto.detailLevel',
7
  };
8
 
9
+ const stages = ['route', 'extract', 'merge', 'evidence_check'];
10
  const stageEls = {};
11
  const statusEls = {};
12
  const connectors = document.querySelectorAll('.stage-rail__connector');
 
37
  let uploadProcessingActive = false;
38
  let pipelineRunning = false;
39
  let activeEventSource = null;
40
+ let activeSessionId = null;
41
+ let previousQuery = '';
42
+ let previousQueryTimestamp = null;
43
+ let previousSessionId = null;
44
  let latestCorpusDocs = [];
45
  let pendingCorpusDocIds = [];
46
  let selectedDocIds = loadStoredDocIds();
 
159
  }
160
 
161
  pipelineRunning = true;
162
+ const sessionId = createSessionId();
163
+ const queryTimestamp = Date.now();
164
+ activeSessionId = sessionId;
165
  syncControls();
166
  runBtn.innerHTML = '<span class="spinner"></span> Running...';
167
  resetUI();
168
 
169
  try {
170
+ await listenSSE(sessionId);
171
  const response = await fetch('/api/run', {
172
  method: 'POST',
173
  headers: { 'Content-Type': 'application/json' },
174
+ body: JSON.stringify(buildQueryPayload(query, sessionId, queryTimestamp)),
175
  });
176
  const data = await parseJsonResponse(response, 'Server returned an invalid response');
177
+ activeSessionId = data.session_id || sessionId;
178
  if (!response.ok || data.error) {
179
  throw new Error(data.error || `Server error: ${response.status}`);
180
  }
181
  renderResult(data);
182
+ previousQuery = query;
183
+ previousQueryTimestamp = queryTimestamp;
184
+ previousSessionId = data.session_id || sessionId;
185
  } catch (error) {
186
  answerBody.innerHTML = renderErrorCard('Pipeline Error', error.message);
187
  console.error(error);
188
  } finally {
189
  closeActiveStream();
190
  pipelineRunning = false;
191
+ activeSessionId = null;
192
  runBtn.innerHTML = '<span class="btn-icon">&#9654;</span> Run Pipeline';
193
  syncControls();
194
  }
 
226
  }
227
  }
228
 
229
+ function buildQueryPayload(query, sessionId = activeSessionId, queryTimestamp = Date.now()) {
230
  return {
231
  query,
232
+ session_id: sessionId,
233
+ query_timestamp: queryTimestamp,
234
+ prev_query: previousQuery,
235
+ prev_query_timestamp: previousQueryTimestamp,
236
+ prev_session_id: previousSessionId,
237
  selected_doc_ids: [...selectedDocIds],
238
  detail_level: detailLevel,
239
  };
240
  }
241
 
242
+ function listenSSE(sessionId) {
243
  closeActiveStream();
244
 
245
  return new Promise((resolve, reject) => {
246
+ const eventSource = new EventSource(`/api/stream?session_id=${encodeURIComponent(sessionId)}`);
247
  let opened = false;
248
  activeEventSource = eventSource;
249
 
 
351
  info = `done (${data.extractions} facts)`;
352
  } else if (stage === 'merge' && data.key_claims) {
353
  info = `done (${data.key_claims} claims)`;
354
+ } else if (stage === 'evidence_check' && data.checked) {
355
+ info = `done (${data.checked} checked)`;
356
  }
357
 
358
  statusEls[stage].innerHTML = `<span class="status-dot status-dot--complete"></span>${esc(info)}`;
 
391
  const gaps = Array.isArray(data.missing_info) ? data.missing_info : [];
392
  const nextActions = Array.isArray(data.next_actions) ? data.next_actions : [];
393
  if (gaps.length) {
394
+ const gapTitle = nextActions.length ? 'Evidence Check / Coverage Gaps Found' : 'Coverage Gaps Noted';
395
  const gapIntro = nextActions.length
396
+ ? 'Some answer points could not be fully supported from the extracted evidence.'
397
  : 'The detailed answer asked for coverage beyond what the document clearly supports in the selected scope.';
398
  const gapPrefix = nextActions.length ? 'Need support:' : 'Not clearly covered:';
399
  html += `
 
547
  <div class="${yesNoClass(stats.real_switching)}">${stats.real_switching ? 'Yes' : 'No'}</div>
548
  </div>
549
  <div class="bench-stat">
550
+ <span class="bench-stat__label">Evidence Check</span>
551
+ <div class="${yesNoClass(stats.evidence_checked)}">${stats.evidence_checked ? 'Enabled' : 'Disabled'}</div>
552
  </div>
553
  <div class="bench-stat">
554
  <span class="bench-stat__label">Evidence Count</span>
 
825
  });
826
  }
827
 
828
+ function createSessionId() {
829
+ if (window.crypto && typeof window.crypto.randomUUID === 'function') {
830
+ return window.crypto.randomUUID();
831
+ }
832
+ return `session-${Date.now()}-${Math.random().toString(16).slice(2)}`;
833
+ }
834
+
835
  function toggleDocSelection(docId) {
836
  if (!docId) {
837
  return;
mp1/frontend/index.html CHANGED
@@ -110,10 +110,10 @@
110
  <div class="stage-card__status" id="status-merge">idle</div>
111
  </div>
112
  <div class="stage-rail__connector"></div>
113
- <div class="stage-card" data-stage="verify" id="stage-verify">
114
  <div class="stage-card__number">S3</div>
115
- <div class="stage-card__label">VERIFY</div>
116
- <div class="stage-card__status" id="status-verify">idle</div>
117
  </div>
118
  </div>
119
  </section>
@@ -182,7 +182,7 @@
182
  </main>
183
 
184
  <footer class="footer">
185
- <span>Pluto v2 Pipeline | Deterministic routing | Real model switching | Evidence verification</span>
186
  </footer>
187
 
188
  <script src="/static/app.js?v=5"></script>
 
110
  <div class="stage-card__status" id="status-merge">idle</div>
111
  </div>
112
  <div class="stage-rail__connector"></div>
113
+ <div class="stage-card" data-stage="evidence_check" id="stage-evidence_check">
114
  <div class="stage-card__number">S3</div>
115
+ <div class="stage-card__label">EVIDENCE CHECK</div>
116
+ <div class="stage-card__status" id="status-evidence_check">idle</div>
117
  </div>
118
  </div>
119
  </section>
 
182
  </main>
183
 
184
  <footer class="footer">
185
+ <span>Pluto v2 Pipeline | Deterministic routing | Real model switching | Evidence checking</span>
186
  </footer>
187
 
188
  <script src="/static/app.js?v=5"></script>
mp1/main.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  main.py — CLI entry point for the Pluto pipeline.
3
 
@@ -90,7 +91,7 @@ def _start_server(port: int):
90
 
91
 
92
  def _stage_num(stage: str) -> int:
93
- return {"route": 0, "extract": 1, "merge": 2, "verify": 3, "finish": 4}.get(stage, -1)
94
 
95
 
96
  if __name__ == "__main__":
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  main.py — CLI entry point for the Pluto pipeline.
4
 
 
91
 
92
 
93
  def _stage_num(stage: str) -> int:
94
+ return {"route": 0, "extract": 1, "merge": 2, "evidence_check": 3, "finish": 4}.get(stage, -1)
95
 
96
 
97
  if __name__ == "__main__":
mp1/nvidia_models.json DELETED
@@ -1,1133 +0,0 @@
1
- {
2
- "object": "list",
3
- "data": [
4
- {
5
- "id": "01-ai/yi-large",
6
- "object": "model",
7
- "created": 735790403,
8
- "owned_by": "01-ai"
9
- },
10
- {
11
- "id": "abacusai/dracarys-llama-3.1-70b-instruct",
12
- "object": "model",
13
- "created": 735790403,
14
- "owned_by": "abacusai"
15
- },
16
- {
17
- "id": "adept/fuyu-8b",
18
- "object": "model",
19
- "created": 735790403,
20
- "owned_by": "adept"
21
- },
22
- {
23
- "id": "ai21labs/jamba-1.5-large-instruct",
24
- "object": "model",
25
- "created": 735790403,
26
- "owned_by": "ai21labs"
27
- },
28
- {
29
- "id": "ai21labs/jamba-1.5-mini-instruct",
30
- "object": "model",
31
- "created": 735790403,
32
- "owned_by": "ai21labs"
33
- },
34
- {
35
- "id": "aisingapore/sea-lion-7b-instruct",
36
- "object": "model",
37
- "created": 735790403,
38
- "owned_by": "aisingapore"
39
- },
40
- {
41
- "id": "baai/bge-m3",
42
- "object": "model",
43
- "created": 735790403,
44
- "owned_by": "baai"
45
- },
46
- {
47
- "id": "baichuan-inc/baichuan2-13b-chat",
48
- "object": "model",
49
- "created": 735790403,
50
- "owned_by": "baichuan-inc"
51
- },
52
- {
53
- "id": "bigcode/starcoder2-15b",
54
- "object": "model",
55
- "created": 735790403,
56
- "owned_by": "bigcode"
57
- },
58
- {
59
- "id": "bigcode/starcoder2-7b",
60
- "object": "model",
61
- "created": 735790403,
62
- "owned_by": "bigcode"
63
- },
64
- {
65
- "id": "bytedance/seed-oss-36b-instruct",
66
- "object": "model",
67
- "created": 735790403,
68
- "owned_by": "bytedance"
69
- },
70
- {
71
- "id": "databricks/dbrx-instruct",
72
- "object": "model",
73
- "created": 735790403,
74
- "owned_by": "databricks"
75
- },
76
- {
77
- "id": "deepseek-ai/deepseek-coder-6.7b-instruct",
78
- "object": "model",
79
- "created": 735790403,
80
- "owned_by": "deepseek-ai"
81
- },
82
- {
83
- "id": "deepseek-ai/deepseek-r1-distill-llama-8b",
84
- "object": "model",
85
- "created": 735790403,
86
- "owned_by": "deepseek-ai"
87
- },
88
- {
89
- "id": "deepseek-ai/deepseek-r1-distill-qwen-14b",
90
- "object": "model",
91
- "created": 735790403,
92
- "owned_by": "deepseek-ai"
93
- },
94
- {
95
- "id": "deepseek-ai/deepseek-r1-distill-qwen-32b",
96
- "object": "model",
97
- "created": 735790403,
98
- "owned_by": "deepseek-ai"
99
- },
100
- {
101
- "id": "deepseek-ai/deepseek-r1-distill-qwen-7b",
102
- "object": "model",
103
- "created": 735790403,
104
- "owned_by": "deepseek-ai"
105
- },
106
- {
107
- "id": "deepseek-ai/deepseek-v3.1",
108
- "object": "model",
109
- "created": 735790403,
110
- "owned_by": "deepseek-ai"
111
- },
112
- {
113
- "id": "deepseek-ai/deepseek-v3.1-terminus",
114
- "object": "model",
115
- "created": 735790403,
116
- "owned_by": "deepseek-ai"
117
- },
118
- {
119
- "id": "deepseek-ai/deepseek-v3.2",
120
- "object": "model",
121
- "created": 735790403,
122
- "owned_by": "deepseek-ai"
123
- },
124
- {
125
- "id": "google/codegemma-1.1-7b",
126
- "object": "model",
127
- "created": 735790403,
128
- "owned_by": "google"
129
- },
130
- {
131
- "id": "google/codegemma-7b",
132
- "object": "model",
133
- "created": 735790403,
134
- "owned_by": "google"
135
- },
136
- {
137
- "id": "google/deplot",
138
- "object": "model",
139
- "created": 735790403,
140
- "owned_by": "google"
141
- },
142
- {
143
- "id": "google/gemma-2-27b-it",
144
- "object": "model",
145
- "created": 735790403,
146
- "owned_by": "google"
147
- },
148
- {
149
- "id": "google/gemma-2-2b-it",
150
- "object": "model",
151
- "created": 735790403,
152
- "owned_by": "google"
153
- },
154
- {
155
- "id": "google/gemma-2-9b-it",
156
- "object": "model",
157
- "created": 735790403,
158
- "owned_by": "google"
159
- },
160
- {
161
- "id": "google/gemma-2b",
162
- "object": "model",
163
- "created": 735790403,
164
- "owned_by": "google"
165
- },
166
- {
167
- "id": "google/gemma-3-12b-it",
168
- "object": "model",
169
- "created": 735790403,
170
- "owned_by": "google"
171
- },
172
- {
173
- "id": "google/gemma-3-1b-it",
174
- "object": "model",
175
- "created": 735790403,
176
- "owned_by": "google"
177
- },
178
- {
179
- "id": "google/gemma-3-27b-it",
180
- "object": "model",
181
- "created": 735790403,
182
- "owned_by": "google"
183
- },
184
- {
185
- "id": "google/gemma-3-4b-it",
186
- "object": "model",
187
- "created": 735790403,
188
- "owned_by": "google"
189
- },
190
- {
191
- "id": "google/gemma-3n-e2b-it",
192
- "object": "model",
193
- "created": 735790403,
194
- "owned_by": "google"
195
- },
196
- {
197
- "id": "google/gemma-3n-e4b-it",
198
- "object": "model",
199
- "created": 735790403,
200
- "owned_by": "google"
201
- },
202
- {
203
- "id": "google/gemma-7b",
204
- "object": "model",
205
- "created": 735790403,
206
- "owned_by": "google"
207
- },
208
- {
209
- "id": "google/paligemma",
210
- "object": "model",
211
- "created": 735790403,
212
- "owned_by": "google"
213
- },
214
- {
215
- "id": "google/recurrentgemma-2b",
216
- "object": "model",
217
- "created": 735790403,
218
- "owned_by": "google"
219
- },
220
- {
221
- "id": "google/shieldgemma-9b",
222
- "object": "model",
223
- "created": 735790403,
224
- "owned_by": "google"
225
- },
226
- {
227
- "id": "gotocompany/gemma-2-9b-cpt-sahabatai-instruct",
228
- "object": "model",
229
- "created": 735790403,
230
- "owned_by": "gotocompany"
231
- },
232
- {
233
- "id": "ibm/granite-3.0-3b-a800m-instruct",
234
- "object": "model",
235
- "created": 735790403,
236
- "owned_by": "ibm"
237
- },
238
- {
239
- "id": "ibm/granite-3.0-8b-instruct",
240
- "object": "model",
241
- "created": 735790403,
242
- "owned_by": "ibm"
243
- },
244
- {
245
- "id": "ibm/granite-3.3-8b-instruct",
246
- "object": "model",
247
- "created": 735790403,
248
- "owned_by": "ibm"
249
- },
250
- {
251
- "id": "ibm/granite-34b-code-instruct",
252
- "object": "model",
253
- "created": 735790403,
254
- "owned_by": "ibm"
255
- },
256
- {
257
- "id": "ibm/granite-8b-code-instruct",
258
- "object": "model",
259
- "created": 735790403,
260
- "owned_by": "ibm"
261
- },
262
- {
263
- "id": "ibm/granite-guardian-3.0-8b",
264
- "object": "model",
265
- "created": 735790403,
266
- "owned_by": "ibm"
267
- },
268
- {
269
- "id": "igenius/colosseum_355b_instruct_16k",
270
- "object": "model",
271
- "created": 735790403,
272
- "owned_by": "igenius"
273
- },
274
- {
275
- "id": "igenius/italia_10b_instruct_16k",
276
- "object": "model",
277
- "created": 735790403,
278
- "owned_by": "igenius"
279
- },
280
- {
281
- "id": "institute-of-science-tokyo/llama-3.1-swallow-70b-instruct-v0.1",
282
- "object": "model",
283
- "created": 735790403,
284
- "owned_by": "institute-of-science-tokyo"
285
- },
286
- {
287
- "id": "institute-of-science-tokyo/llama-3.1-swallow-8b-instruct-v0.1",
288
- "object": "model",
289
- "created": 735790403,
290
- "owned_by": "institute-of-science-tokyo"
291
- },
292
- {
293
- "id": "marin/marin-8b-instruct",
294
- "object": "model",
295
- "created": 735790403,
296
- "owned_by": "marin"
297
- },
298
- {
299
- "id": "mediatek/breeze-7b-instruct",
300
- "object": "model",
301
- "created": 735790403,
302
- "owned_by": "mediatek"
303
- },
304
- {
305
- "id": "meta/codellama-70b",
306
- "object": "model",
307
- "created": 735790403,
308
- "owned_by": "meta"
309
- },
310
- {
311
- "id": "meta/llama-3.1-405b-instruct",
312
- "object": "model",
313
- "created": 735790403,
314
- "owned_by": "meta"
315
- },
316
- {
317
- "id": "meta/llama-3.1-70b-instruct",
318
- "object": "model",
319
- "created": 735790403,
320
- "owned_by": "meta"
321
- },
322
- {
323
- "id": "meta/llama-3.1-8b-instruct",
324
- "object": "model",
325
- "created": 735790403,
326
- "owned_by": "meta"
327
- },
328
- {
329
- "id": "meta/llama-3.2-11b-vision-instruct",
330
- "object": "model",
331
- "created": 735790403,
332
- "owned_by": "meta"
333
- },
334
- {
335
- "id": "meta/llama-3.2-1b-instruct",
336
- "object": "model",
337
- "created": 735790403,
338
- "owned_by": "meta"
339
- },
340
- {
341
- "id": "meta/llama-3.2-3b-instruct",
342
- "object": "model",
343
- "created": 735790403,
344
- "owned_by": "meta"
345
- },
346
- {
347
- "id": "meta/llama-3.2-90b-vision-instruct",
348
- "object": "model",
349
- "created": 735790403,
350
- "owned_by": "meta"
351
- },
352
- {
353
- "id": "meta/llama-3.3-70b-instruct",
354
- "object": "model",
355
- "created": 735790403,
356
- "owned_by": "meta"
357
- },
358
- {
359
- "id": "meta/llama-4-maverick-17b-128e-instruct",
360
- "object": "model",
361
- "created": 735790403,
362
- "owned_by": "meta"
363
- },
364
- {
365
- "id": "meta/llama-4-scout-17b-16e-instruct",
366
- "object": "model",
367
- "created": 735790403,
368
- "owned_by": "meta"
369
- },
370
- {
371
- "id": "meta/llama-guard-4-12b",
372
- "object": "model",
373
- "created": 735790403,
374
- "owned_by": "meta"
375
- },
376
- {
377
- "id": "meta/llama2-70b",
378
- "object": "model",
379
- "created": 735790403,
380
- "owned_by": "meta"
381
- },
382
- {
383
- "id": "meta/llama3-70b-instruct",
384
- "object": "model",
385
- "created": 735790403,
386
- "owned_by": "meta"
387
- },
388
- {
389
- "id": "meta/llama3-8b-instruct",
390
- "object": "model",
391
- "created": 735790403,
392
- "owned_by": "meta"
393
- },
394
- {
395
- "id": "microsoft/kosmos-2",
396
- "object": "model",
397
- "created": 735790403,
398
- "owned_by": "microsoft"
399
- },
400
- {
401
- "id": "microsoft/phi-3-medium-128k-instruct",
402
- "object": "model",
403
- "created": 735790403,
404
- "owned_by": "microsoft"
405
- },
406
- {
407
- "id": "microsoft/phi-3-medium-4k-instruct",
408
- "object": "model",
409
- "created": 735790403,
410
- "owned_by": "microsoft"
411
- },
412
- {
413
- "id": "microsoft/phi-3-mini-128k-instruct",
414
- "object": "model",
415
- "created": 735790403,
416
- "owned_by": "microsoft"
417
- },
418
- {
419
- "id": "microsoft/phi-3-mini-4k-instruct",
420
- "object": "model",
421
- "created": 735790403,
422
- "owned_by": "microsoft"
423
- },
424
- {
425
- "id": "microsoft/phi-3-small-128k-instruct",
426
- "object": "model",
427
- "created": 735790403,
428
- "owned_by": "microsoft"
429
- },
430
- {
431
- "id": "microsoft/phi-3-small-8k-instruct",
432
- "object": "model",
433
- "created": 735790403,
434
- "owned_by": "microsoft"
435
- },
436
- {
437
- "id": "microsoft/phi-3-vision-128k-instruct",
438
- "object": "model",
439
- "created": 735790403,
440
- "owned_by": "microsoft"
441
- },
442
- {
443
- "id": "microsoft/phi-3.5-mini-instruct",
444
- "object": "model",
445
- "created": 735790403,
446
- "owned_by": "microsoft"
447
- },
448
- {
449
- "id": "microsoft/phi-3.5-moe-instruct",
450
- "object": "model",
451
- "created": 735790403,
452
- "owned_by": "microsoft"
453
- },
454
- {
455
- "id": "microsoft/phi-3.5-vision-instruct",
456
- "object": "model",
457
- "created": 735790403,
458
- "owned_by": "microsoft"
459
- },
460
- {
461
- "id": "microsoft/phi-4-mini-flash-reasoning",
462
- "object": "model",
463
- "created": 735790403,
464
- "owned_by": "microsoft"
465
- },
466
- {
467
- "id": "microsoft/phi-4-mini-instruct",
468
- "object": "model",
469
- "created": 735790403,
470
- "owned_by": "microsoft"
471
- },
472
- {
473
- "id": "microsoft/phi-4-multimodal-instruct",
474
- "object": "model",
475
- "created": 735790403,
476
- "owned_by": "microsoft"
477
- },
478
- {
479
- "id": "minimaxai/minimax-m2.5",
480
- "object": "model",
481
- "created": 735790403,
482
- "owned_by": "minimaxai"
483
- },
484
- {
485
- "id": "mistralai/codestral-22b-instruct-v0.1",
486
- "object": "model",
487
- "created": 735790403,
488
- "owned_by": "mistralai"
489
- },
490
- {
491
- "id": "mistralai/devstral-2-123b-instruct-2512",
492
- "object": "model",
493
- "created": 735790403,
494
- "owned_by": "mistralai"
495
- },
496
- {
497
- "id": "mistralai/magistral-small-2506",
498
- "object": "model",
499
- "created": 735790403,
500
- "owned_by": "mistralai"
501
- },
502
- {
503
- "id": "mistralai/mamba-codestral-7b-v0.1",
504
- "object": "model",
505
- "created": 735790403,
506
- "owned_by": "mistralai"
507
- },
508
- {
509
- "id": "mistralai/mathstral-7b-v0.1",
510
- "object": "model",
511
- "created": 735790403,
512
- "owned_by": "mistralai"
513
- },
514
- {
515
- "id": "mistralai/ministral-14b-instruct-2512",
516
- "object": "model",
517
- "created": 735790403,
518
- "owned_by": "mistralai"
519
- },
520
- {
521
- "id": "mistralai/mistral-7b-instruct-v0.2",
522
- "object": "model",
523
- "created": 735790403,
524
- "owned_by": "mistralai"
525
- },
526
- {
527
- "id": "mistralai/mistral-7b-instruct-v0.3",
528
- "object": "model",
529
- "created": 735790403,
530
- "owned_by": "mistralai"
531
- },
532
- {
533
- "id": "mistralai/mistral-large",
534
- "object": "model",
535
- "created": 735790403,
536
- "owned_by": "mistralai"
537
- },
538
- {
539
- "id": "mistralai/mistral-large-2-instruct",
540
- "object": "model",
541
- "created": 735790403,
542
- "owned_by": "mistralai"
543
- },
544
- {
545
- "id": "mistralai/mistral-large-3-675b-instruct-2512",
546
- "object": "model",
547
- "created": 735790403,
548
- "owned_by": "mistralai"
549
- },
550
- {
551
- "id": "mistralai/mistral-medium-3-instruct",
552
- "object": "model",
553
- "created": 735790403,
554
- "owned_by": "mistralai"
555
- },
556
- {
557
- "id": "mistralai/mistral-nemotron",
558
- "object": "model",
559
- "created": 735790403,
560
- "owned_by": "mistralai"
561
- },
562
- {
563
- "id": "mistralai/mistral-small-24b-instruct",
564
- "object": "model",
565
- "created": 735790403,
566
- "owned_by": "mistralai"
567
- },
568
- {
569
- "id": "mistralai/mistral-small-3.1-24b-instruct-2503",
570
- "object": "model",
571
- "created": 735790403,
572
- "owned_by": "mistralai"
573
- },
574
- {
575
- "id": "mistralai/mistral-small-4-119b-2603",
576
- "object": "model",
577
- "created": 735790403,
578
- "owned_by": "mistralai"
579
- },
580
- {
581
- "id": "mistralai/mixtral-8x22b-instruct-v0.1",
582
- "object": "model",
583
- "created": 735790403,
584
- "owned_by": "mistralai"
585
- },
586
- {
587
- "id": "mistralai/mixtral-8x22b-v0.1",
588
- "object": "model",
589
- "created": 735790403,
590
- "owned_by": "mistralai"
591
- },
592
- {
593
- "id": "mistralai/mixtral-8x7b-instruct-v0.1",
594
- "object": "model",
595
- "created": 735790403,
596
- "owned_by": "mistralai"
597
- },
598
- {
599
- "id": "moonshotai/kimi-k2-instruct",
600
- "object": "model",
601
- "created": 735790403,
602
- "owned_by": "moonshotai"
603
- },
604
- {
605
- "id": "moonshotai/kimi-k2-instruct-0905",
606
- "object": "model",
607
- "created": 735790403,
608
- "owned_by": "moonshotai"
609
- },
610
- {
611
- "id": "moonshotai/kimi-k2-thinking",
612
- "object": "model",
613
- "created": 735790403,
614
- "owned_by": "moonshotai"
615
- },
616
- {
617
- "id": "moonshotai/kimi-k2.5",
618
- "object": "model",
619
- "created": 735790403,
620
- "owned_by": "moonshotai"
621
- },
622
- {
623
- "id": "nv-mistralai/mistral-nemo-12b-instruct",
624
- "object": "model",
625
- "created": 735790403,
626
- "owned_by": "nv-mistralai"
627
- },
628
- {
629
- "id": "nvidia/cosmos-reason2-8b",
630
- "object": "model",
631
- "created": 735790403,
632
- "owned_by": "nvidia"
633
- },
634
- {
635
- "id": "nvidia/embed-qa-4",
636
- "object": "model",
637
- "created": 735790403,
638
- "owned_by": "nvidia"
639
- },
640
- {
641
- "id": "nvidia/gliner-pii",
642
- "object": "model",
643
- "created": 735790403,
644
- "owned_by": "nvidia"
645
- },
646
- {
647
- "id": "nvidia/llama-3.1-nemoguard-8b-content-safety",
648
- "object": "model",
649
- "created": 735790403,
650
- "owned_by": "nvidia"
651
- },
652
- {
653
- "id": "nvidia/llama-3.1-nemoguard-8b-topic-control",
654
- "object": "model",
655
- "created": 735790403,
656
- "owned_by": "nvidia"
657
- },
658
- {
659
- "id": "nvidia/llama-3.1-nemotron-51b-instruct",
660
- "object": "model",
661
- "created": 735790403,
662
- "owned_by": "nvidia"
663
- },
664
- {
665
- "id": "nvidia/llama-3.1-nemotron-70b-instruct",
666
- "object": "model",
667
- "created": 735790403,
668
- "owned_by": "nvidia"
669
- },
670
- {
671
- "id": "nvidia/llama-3.1-nemotron-70b-reward",
672
- "object": "model",
673
- "created": 735790403,
674
- "owned_by": "nvidia"
675
- },
676
- {
677
- "id": "nvidia/llama-3.1-nemotron-nano-4b-v1.1",
678
- "object": "model",
679
- "created": 735790403,
680
- "owned_by": "nvidia"
681
- },
682
- {
683
- "id": "nvidia/llama-3.1-nemotron-nano-8b-v1",
684
- "object": "model",
685
- "created": 735790403,
686
- "owned_by": "nvidia"
687
- },
688
- {
689
- "id": "nvidia/llama-3.1-nemotron-nano-vl-8b-v1",
690
- "object": "model",
691
- "created": 735790403,
692
- "owned_by": "nvidia"
693
- },
694
- {
695
- "id": "nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
696
- "object": "model",
697
- "created": 735790403,
698
- "owned_by": "nvidia"
699
- },
700
- {
701
- "id": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
702
- "object": "model",
703
- "created": 735790403,
704
- "owned_by": "nvidia"
705
- },
706
- {
707
- "id": "nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1",
708
- "object": "model",
709
- "created": 735790403,
710
- "owned_by": "nvidia"
711
- },
712
- {
713
- "id": "nvidia/llama-3.2-nemoretriever-300m-embed-v1",
714
- "object": "model",
715
- "created": 735790403,
716
- "owned_by": "nvidia"
717
- },
718
- {
719
- "id": "nvidia/llama-3.2-nv-embedqa-1b-v1",
720
- "object": "model",
721
- "created": 735790403,
722
- "owned_by": "nvidia"
723
- },
724
- {
725
- "id": "nvidia/llama-3.2-nv-embedqa-1b-v2",
726
- "object": "model",
727
- "created": 735790403,
728
- "owned_by": "nvidia"
729
- },
730
- {
731
- "id": "nvidia/llama-3.3-nemotron-super-49b-v1",
732
- "object": "model",
733
- "created": 735790403,
734
- "owned_by": "nvidia"
735
- },
736
- {
737
- "id": "nvidia/llama-3.3-nemotron-super-49b-v1.5",
738
- "object": "model",
739
- "created": 735790403,
740
- "owned_by": "nvidia"
741
- },
742
- {
743
- "id": "nvidia/llama-nemotron-embed-1b-v2",
744
- "object": "model",
745
- "created": 735790403,
746
- "owned_by": "nvidia"
747
- },
748
- {
749
- "id": "nvidia/llama-nemotron-embed-vl-1b-v2",
750
- "object": "model",
751
- "created": 735790403,
752
- "owned_by": "nvidia"
753
- },
754
- {
755
- "id": "nvidia/llama3-chatqa-1.5-70b",
756
- "object": "model",
757
- "created": 735790403,
758
- "owned_by": "nvidia"
759
- },
760
- {
761
- "id": "nvidia/llama3-chatqa-1.5-8b",
762
- "object": "model",
763
- "created": 735790403,
764
- "owned_by": "nvidia"
765
- },
766
- {
767
- "id": "nvidia/mistral-nemo-minitron-8b-8k-instruct",
768
- "object": "model",
769
- "created": 735790403,
770
- "owned_by": "nvidia"
771
- },
772
- {
773
- "id": "nvidia/mistral-nemo-minitron-8b-base",
774
- "object": "model",
775
- "created": 735790403,
776
- "owned_by": "nvidia"
777
- },
778
- {
779
- "id": "nvidia/nemoretriever-parse",
780
- "object": "model",
781
- "created": 735790403,
782
- "owned_by": "nvidia"
783
- },
784
- {
785
- "id": "nvidia/nemotron-3-nano-30b-a3b",
786
- "object": "model",
787
- "created": 735790403,
788
- "owned_by": "nvidia"
789
- },
790
- {
791
- "id": "nvidia/nemotron-3-super-120b-a12b",
792
- "object": "model",
793
- "created": 735790403,
794
- "owned_by": "nvidia"
795
- },
796
- {
797
- "id": "nvidia/nemotron-4-340b-instruct",
798
- "object": "model",
799
- "created": 735790403,
800
- "owned_by": "nvidia"
801
- },
802
- {
803
- "id": "nvidia/nemotron-4-340b-reward",
804
- "object": "model",
805
- "created": 735790403,
806
- "owned_by": "nvidia"
807
- },
808
- {
809
- "id": "nvidia/nemotron-4-mini-hindi-4b-instruct",
810
- "object": "model",
811
- "created": 735790403,
812
- "owned_by": "nvidia"
813
- },
814
- {
815
- "id": "nvidia/nemotron-content-safety-reasoning-4b",
816
- "object": "model",
817
- "created": 735790403,
818
- "owned_by": "nvidia"
819
- },
820
- {
821
- "id": "nvidia/nemotron-mini-4b-instruct",
822
- "object": "model",
823
- "created": 735790403,
824
- "owned_by": "nvidia"
825
- },
826
- {
827
- "id": "nvidia/nemotron-nano-12b-v2-vl",
828
- "object": "model",
829
- "created": 735790403,
830
- "owned_by": "nvidia"
831
- },
832
- {
833
- "id": "nvidia/nemotron-nano-3-30b-a3b",
834
- "object": "model",
835
- "created": 735790403,
836
- "owned_by": "nvidia"
837
- },
838
- {
839
- "id": "nvidia/nemotron-parse",
840
- "object": "model",
841
- "created": 735790403,
842
- "owned_by": "nvidia"
843
- },
844
- {
845
- "id": "nvidia/neva-22b",
846
- "object": "model",
847
- "created": 735790403,
848
- "owned_by": "nvidia"
849
- },
850
- {
851
- "id": "nvidia/nv-embed-v1",
852
- "object": "model",
853
- "created": 735790403,
854
- "owned_by": "nvidia"
855
- },
856
- {
857
- "id": "nvidia/nv-embedcode-7b-v1",
858
- "object": "model",
859
- "created": 735790403,
860
- "owned_by": "nvidia"
861
- },
862
- {
863
- "id": "nvidia/nv-embedqa-e5-v5",
864
- "object": "model",
865
- "created": 735790403,
866
- "owned_by": "nvidia"
867
- },
868
- {
869
- "id": "nvidia/nv-embedqa-mistral-7b-v2",
870
- "object": "model",
871
- "created": 735790403,
872
- "owned_by": "nvidia"
873
- },
874
- {
875
- "id": "nvidia/nvclip",
876
- "object": "model",
877
- "created": 735790403,
878
- "owned_by": "nvidia"
879
- },
880
- {
881
- "id": "nvidia/nvidia-nemotron-nano-9b-v2",
882
- "object": "model",
883
- "created": 735790403,
884
- "owned_by": "nvidia"
885
- },
886
- {
887
- "id": "nvidia/riva-translate-4b-instruct",
888
- "object": "model",
889
- "created": 735790403,
890
- "owned_by": "nvidia"
891
- },
892
- {
893
- "id": "nvidia/riva-translate-4b-instruct-v1.1",
894
- "object": "model",
895
- "created": 735790403,
896
- "owned_by": "nvidia"
897
- },
898
- {
899
- "id": "nvidia/streampetr",
900
- "object": "model",
901
- "created": 735790403,
902
- "owned_by": "nvidia"
903
- },
904
- {
905
- "id": "nvidia/usdcode-llama-3.1-70b-instruct",
906
- "object": "model",
907
- "created": 735790403,
908
- "owned_by": "nvidia"
909
- },
910
- {
911
- "id": "nvidia/vila",
912
- "object": "model",
913
- "created": 735790403,
914
- "owned_by": "nvidia"
915
- },
916
- {
917
- "id": "openai/gpt-oss-120b",
918
- "object": "model",
919
- "created": 735790403,
920
- "owned_by": "openai"
921
- },
922
- {
923
- "id": "openai/gpt-oss-120b",
924
- "object": "model",
925
- "created": 735790403,
926
- "owned_by": "openai"
927
- },
928
- {
929
- "id": "openai/gpt-oss-20b",
930
- "object": "model",
931
- "created": 735790403,
932
- "owned_by": "openai"
933
- },
934
- {
935
- "id": "openai/gpt-oss-20b",
936
- "object": "model",
937
- "created": 735790403,
938
- "owned_by": "openai"
939
- },
940
- {
941
- "id": "opengpt-x/teuken-7b-instruct-commercial-v0.4",
942
- "object": "model",
943
- "created": 735790403,
944
- "owned_by": "opengpt-x"
945
- },
946
- {
947
- "id": "qwen/qwen2-7b-instruct",
948
- "object": "model",
949
- "created": 735790403,
950
- "owned_by": "qwen"
951
- },
952
- {
953
- "id": "qwen/qwen2.5-7b-instruct",
954
- "object": "model",
955
- "created": 735790403,
956
- "owned_by": "qwen"
957
- },
958
- {
959
- "id": "qwen/qwen2.5-coder-32b-instruct",
960
- "object": "model",
961
- "created": 735790403,
962
- "owned_by": "qwen"
963
- },
964
- {
965
- "id": "qwen/qwen2.5-coder-7b-instruct",
966
- "object": "model",
967
- "created": 735790403,
968
- "owned_by": "qwen"
969
- },
970
- {
971
- "id": "qwen/qwen3-coder-480b-a35b-instruct",
972
- "object": "model",
973
- "created": 735790403,
974
- "owned_by": "qwen"
975
- },
976
- {
977
- "id": "qwen/qwen3-next-80b-a3b-instruct",
978
- "object": "model",
979
- "created": 735790403,
980
- "owned_by": "qwen"
981
- },
982
- {
983
- "id": "qwen/qwen3-next-80b-a3b-thinking",
984
- "object": "model",
985
- "created": 735790403,
986
- "owned_by": "qwen"
987
- },
988
- {
989
- "id": "qwen/qwen3.5-122b-a10b",
990
- "object": "model",
991
- "created": 735790403,
992
- "owned_by": "qwen"
993
- },
994
- {
995
- "id": "qwen/qwen3.5-397b-a17b",
996
- "object": "model",
997
- "created": 735790403,
998
- "owned_by": "qwen"
999
- },
1000
- {
1001
- "id": "qwen/qwq-32b",
1002
- "object": "model",
1003
- "created": 735790403,
1004
- "owned_by": "qwen"
1005
- },
1006
- {
1007
- "id": "rakuten/rakutenai-7b-chat",
1008
- "object": "model",
1009
- "created": 735790403,
1010
- "owned_by": "rakuten"
1011
- },
1012
- {
1013
- "id": "rakuten/rakutenai-7b-instruct",
1014
- "object": "model",
1015
- "created": 735790403,
1016
- "owned_by": "rakuten"
1017
- },
1018
- {
1019
- "id": "sarvamai/sarvam-m",
1020
- "object": "model",
1021
- "created": 735790403,
1022
- "owned_by": "sarvamai"
1023
- },
1024
- {
1025
- "id": "snowflake/arctic-embed-l",
1026
- "object": "model",
1027
- "created": 735790403,
1028
- "owned_by": "snowflake"
1029
- },
1030
- {
1031
- "id": "speakleash/bielik-11b-v2.3-instruct",
1032
- "object": "model",
1033
- "created": 735790403,
1034
- "owned_by": "speakleash"
1035
- },
1036
- {
1037
- "id": "speakleash/bielik-11b-v2.6-instruct",
1038
- "object": "model",
1039
- "created": 735790403,
1040
- "owned_by": "speakleash"
1041
- },
1042
- {
1043
- "id": "stepfun-ai/step-3.5-flash",
1044
- "object": "model",
1045
- "created": 735790403,
1046
- "owned_by": "stepfun-ai"
1047
- },
1048
- {
1049
- "id": "stockmark/stockmark-2-100b-instruct",
1050
- "object": "model",
1051
- "created": 735790403,
1052
- "owned_by": "stockmark"
1053
- },
1054
- {
1055
- "id": "thudm/chatglm3-6b",
1056
- "object": "model",
1057
- "created": 735790403,
1058
- "owned_by": "thudm"
1059
- },
1060
- {
1061
- "id": "tiiuae/falcon3-7b-instruct",
1062
- "object": "model",
1063
- "created": 735790403,
1064
- "owned_by": "tiiuae"
1065
- },
1066
- {
1067
- "id": "tokyotech-llm/llama-3-swallow-70b-instruct-v0.1",
1068
- "object": "model",
1069
- "created": 735790403,
1070
- "owned_by": "tokyotech-llm"
1071
- },
1072
- {
1073
- "id": "upstage/solar-10.7b-instruct",
1074
- "object": "model",
1075
- "created": 735790403,
1076
- "owned_by": "upstage"
1077
- },
1078
- {
1079
- "id": "utter-project/eurollm-9b-instruct",
1080
- "object": "model",
1081
- "created": 735790403,
1082
- "owned_by": "utter-project"
1083
- },
1084
- {
1085
- "id": "writer/palmyra-creative-122b",
1086
- "object": "model",
1087
- "created": 735790403,
1088
- "owned_by": "writer"
1089
- },
1090
- {
1091
- "id": "writer/palmyra-fin-70b-32k",
1092
- "object": "model",
1093
- "created": 735790403,
1094
- "owned_by": "writer"
1095
- },
1096
- {
1097
- "id": "writer/palmyra-med-70b",
1098
- "object": "model",
1099
- "created": 735790403,
1100
- "owned_by": "writer"
1101
- },
1102
- {
1103
- "id": "writer/palmyra-med-70b-32k",
1104
- "object": "model",
1105
- "created": 735790403,
1106
- "owned_by": "writer"
1107
- },
1108
- {
1109
- "id": "yentinglin/llama-3-taiwan-70b-instruct",
1110
- "object": "model",
1111
- "created": 735790403,
1112
- "owned_by": "yentinglin"
1113
- },
1114
- {
1115
- "id": "z-ai/glm4.7",
1116
- "object": "model",
1117
- "created": 735790403,
1118
- "owned_by": "z-ai"
1119
- },
1120
- {
1121
- "id": "z-ai/glm5",
1122
- "object": "model",
1123
- "created": 735790403,
1124
- "owned_by": "z-ai"
1125
- },
1126
- {
1127
- "id": "zyphra/zamba2-7b-instruct",
1128
- "object": "model",
1129
- "created": 735790403,
1130
- "owned_by": "zyphra"
1131
- }
1132
- ]
1133
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mp1/pluto/__init__.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """Pluto — Real Mode-Switching Pipeline."""
2
 
3
  __version__ = "1.0.0"
 
1
+ # -*- coding: utf-8 -*-
2
  """Pluto — Real Mode-Switching Pipeline."""
3
 
4
  __version__ = "1.0.0"
mp1/pluto/bus.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/bus.py — Lightweight in-memory message bus for agent communication.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/bus.py — Lightweight in-memory message bus for agent communication.
4
 
mp1/pluto/chunker.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/chunker.py — Chunk classifier (spec §4).
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/chunker.py — Chunk classifier (spec §4).
4
 
mp1/pluto/db.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Shared lazy PostgreSQL helpers.
4
+
5
+ Importing this module does not require PostgreSQL or psycopg2. A connection is
6
+ attempted only when a caller explicitly asks for one.
7
+ """
8
+
9
+ from __future__ import annotations
10
+
11
+ import os
12
+
13
+
14
+ def _get_connection():
15
+ """Return a PostgreSQL connection, creating schema on first use."""
16
+ database_url = os.getenv("DATABASE_URL", "").strip()
17
+ if not database_url:
18
+ raise EnvironmentError("DATABASE_URL is not set")
19
+
20
+ try:
21
+ import psycopg2
22
+ except Exception as exc:
23
+ raise EnvironmentError("psycopg2 is required for PostgreSQL session memory") from exc
24
+
25
+ conn = psycopg2.connect(database_url)
26
+ _ensure_schema(conn)
27
+ return conn
28
+
29
+
30
+ def _ensure_schema(conn) -> None:
31
+ with conn.cursor() as cur:
32
+ cur.execute(
33
+ """
34
+ CREATE TABLE IF NOT EXISTS session_memory (
35
+ session_id TEXT PRIMARY KEY,
36
+ doc_id TEXT NOT NULL,
37
+ created_at TIMESTAMP DEFAULT NOW(),
38
+ compressed_json JSONB NOT NULL,
39
+ raw_path TEXT
40
+ );
41
+ """
42
+ )
43
+ cur.execute(
44
+ """
45
+ CREATE TABLE IF NOT EXISTS response_signals (
46
+ id SERIAL PRIMARY KEY,
47
+ session_id TEXT,
48
+ query_hash TEXT,
49
+ signal_type TEXT,
50
+ created_at TIMESTAMP DEFAULT NOW()
51
+ );
52
+ """
53
+ )
54
+ cur.execute(
55
+ """
56
+ CREATE TABLE IF NOT EXISTS session_graph (
57
+ source_session TEXT,
58
+ target_session TEXT,
59
+ confidence FLOAT,
60
+ reason TEXT,
61
+ created_at TIMESTAMP DEFAULT NOW(),
62
+ PRIMARY KEY (source_session, target_session)
63
+ );
64
+ """
65
+ )
66
+ conn.commit()
mp1/pluto/dispatcher.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/dispatcher.py — Provider dispatch + NVIDIA helper utilities.
3
 
@@ -226,7 +227,7 @@ def _call_nvidia(cfg: ModeConfig, prompt: str) -> str:
226
  prefix = str(prompt)[:120]
227
  use_reasoning = any(
228
  kw in prefix
229
- for kw in ["CRITIC:", "JUDGE:", "You are an evidence verification", "challenge each"]
230
  )
231
 
232
  payload = {
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/dispatcher.py — Provider dispatch + NVIDIA helper utilities.
4
 
 
227
  prefix = str(prompt)[:120]
228
  use_reasoning = any(
229
  kw in prefix
230
+ for kw in ["CRITIC:", "JUDGE:", "You are an evidence checking", "challenge each"]
231
  )
232
 
233
  payload = {
mp1/pluto/doc_index.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/doc_index.py — In-memory document index with disk persistence.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/doc_index.py — In-memory document index with disk persistence.
4
 
mp1/pluto/doc_summary.py ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Document-level summary storage and context prefix helpers.
4
+
5
+ This module is deliberately lazy: importing it does not require provider keys or
6
+ database/network availability. LLM/provider errors are handled inside
7
+ generate_doc_summary with a fallback summary.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ from datetime import datetime, timezone
13
+ import json
14
+ import logging
15
+ from pathlib import Path
16
+ from typing import Any
17
+
18
+ from pydantic import BaseModel, Field
19
+
20
+ from pluto.utils import extract_json_from_response
21
+
22
+
23
+ logger = logging.getLogger("pluto")
24
+ SUMMARY_FILENAME = ".doc_summaries.json"
25
+
26
+
27
+ class DocSummary(BaseModel):
28
+ doc_id: str
29
+ title: str = ""
30
+ domain: str = ""
31
+ key_claims: list[str] = Field(default_factory=list)
32
+ structure: list[str] = Field(default_factory=list)
33
+ open_questions: list[str] = Field(default_factory=list)
34
+ created_at: str
35
+
36
+
37
+ def generate_doc_summary(doc_id: str, corpus_dir: str | Path) -> DocSummary:
38
+ """Generate and persist a document summary, falling back on failure."""
39
+ corpus_path = Path(corpus_dir)
40
+ doc_text = _read_document_text(doc_id, corpus_path)
41
+ created_at = _utc_now()
42
+
43
+ try:
44
+ raw = _call_summary_llm(doc_id=doc_id, doc_text=doc_text)
45
+ summary = _parse_summary(doc_id=doc_id, raw=raw, created_at=created_at)
46
+ except Exception as exc:
47
+ logger.warning("Failed to generate document summary for %s: %s", doc_id, exc)
48
+ summary = _fallback_summary(doc_id=doc_id, created_at=created_at)
49
+
50
+ summaries = load_doc_summaries(corpus_path)
51
+ summaries[doc_id] = summary
52
+ save_doc_summaries(corpus_path, summaries)
53
+ return summary
54
+
55
+
56
+ def load_doc_summary(doc_id: str, corpus_dir: str | Path) -> DocSummary | None:
57
+ """Load one stored document summary if present."""
58
+ return load_doc_summaries(corpus_dir).get(doc_id)
59
+
60
+
61
+ def load_doc_summaries(corpus_dir: str | Path) -> dict[str, DocSummary]:
62
+ """Load all document summaries from disk."""
63
+ path = _summary_path(corpus_dir)
64
+ if not path.exists():
65
+ return {}
66
+ try:
67
+ raw = path.read_text(encoding="utf-8")
68
+ data = json.loads(raw)
69
+ return {
70
+ str(doc_id): DocSummary(**summary_data)
71
+ for doc_id, summary_data in data.items()
72
+ if isinstance(summary_data, dict)
73
+ }
74
+ except Exception as exc:
75
+ logger.warning("Failed to load document summaries from %s: %s", path, exc)
76
+ return {}
77
+
78
+
79
+ def save_doc_summaries(corpus_dir: str | Path, summaries: dict[str, DocSummary]) -> None:
80
+ """Persist all document summaries as JSON."""
81
+ path = _summary_path(corpus_dir)
82
+ path.parent.mkdir(parents=True, exist_ok=True)
83
+ data = {doc_id: summary.model_dump() for doc_id, summary in summaries.items()}
84
+ path.write_text(json.dumps(data, ensure_ascii=False, indent=1), encoding="utf-8")
85
+
86
+
87
+ def apply_doc_summary_context(chunk_text: str, doc_id: str, corpus_dir: str | Path) -> str:
88
+ """Prepend stored document context to a chunk, if available."""
89
+ summary = load_doc_summary(doc_id, corpus_dir)
90
+ if not summary:
91
+ logger.warning("No document summary found for %s", doc_id)
92
+ return chunk_text
93
+
94
+ key_claims = "; ".join(summary.key_claims)
95
+ prefix = (
96
+ f"[Document context: {summary.title} | Domain: {summary.domain} | "
97
+ f"Key claims: {key_claims}]"
98
+ )
99
+ return f"{prefix}\n\n{chunk_text}"
100
+
101
+
102
+ def _call_summary_llm(doc_id: str, doc_text: str) -> str:
103
+ """Call the configured quick model for summary JSON."""
104
+ from pluto.dispatcher import dispatch
105
+ from pluto.modes import get_mode
106
+
107
+ get_mode("MODE_QUICK")
108
+ prompt = f"""Summarize this document as JSON only.
109
+
110
+ Schema:
111
+ {{
112
+ "title": "short title",
113
+ "domain": "subject/domain",
114
+ "key_claims": ["claim1", "claim2"],
115
+ "structure": ["intro", "methodology", "results", "conclusion"],
116
+ "open_questions": ["question1"]
117
+ }}
118
+
119
+ Document id: {doc_id}
120
+
121
+ Document text:
122
+ ---
123
+ {doc_text[:14000]}
124
+ ---
125
+ """
126
+ return dispatch("MODE_QUICK", prompt)
127
+
128
+
129
+ def _parse_summary(doc_id: str, raw: str, created_at: str) -> DocSummary:
130
+ data = json.loads(extract_json_from_response(raw))
131
+ return DocSummary(
132
+ doc_id=doc_id,
133
+ title=str(data.get("title", "")),
134
+ domain=str(data.get("domain", "")),
135
+ key_claims=_string_list(data.get("key_claims")),
136
+ structure=_string_list(data.get("structure")),
137
+ open_questions=_string_list(data.get("open_questions")),
138
+ created_at=created_at,
139
+ )
140
+
141
+
142
+ def _fallback_summary(doc_id: str, created_at: str) -> DocSummary:
143
+ return DocSummary(
144
+ doc_id=doc_id,
145
+ title=doc_id,
146
+ domain="",
147
+ key_claims=[],
148
+ structure=[],
149
+ open_questions=[],
150
+ created_at=created_at,
151
+ )
152
+
153
+
154
+ def _read_document_text(doc_id: str, corpus_dir: Path) -> str:
155
+ for ext in (".md", ".txt"):
156
+ path = corpus_dir / f"{doc_id}{ext}"
157
+ if path.exists():
158
+ return path.read_text(encoding="utf-8", errors="replace")
159
+ return ""
160
+
161
+
162
+ def _summary_path(corpus_dir: str | Path) -> Path:
163
+ return Path(corpus_dir) / SUMMARY_FILENAME
164
+
165
+
166
+ def _string_list(value: Any) -> list[str]:
167
+ if not isinstance(value, list):
168
+ return []
169
+ return [str(item) for item in value if str(item).strip()]
170
+
171
+
172
+ def _utc_now() -> str:
173
+ return datetime.now(timezone.utc).isoformat()
mp1/pluto/embedder.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/embedder.py — Semantic chunking via NVIDIA NIM embedding endpoint.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/embedder.py — Semantic chunking via NVIDIA NIM embedding endpoint.
4
 
mp1/pluto/extraction_cache.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/extraction_cache.py — Persistent cache for S1 EXTRACT results.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/extraction_cache.py — Persistent cache for S1 EXTRACT results.
4
 
mp1/pluto/ingest.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/ingest.py — File ingestion: convert uploaded files to corpus Markdown.
3
 
@@ -95,15 +96,30 @@ def ingest_file(
95
 
96
 
97
  def _extract_pdf(path: Path) -> str:
98
- """Extract text from PDF using PyPDF2."""
99
- from PyPDF2 import PdfReader
100
 
101
- reader = PdfReader(str(path))
 
 
102
  pages = []
103
- for i, page in enumerate(reader.pages):
104
- text = page.extract_text() or ""
105
- if text.strip():
106
- pages.append(f"## Page {i + 1}\n\n{text.strip()}")
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  return "\n\n".join(pages)
108
 
109
 
@@ -193,4 +209,3 @@ def _classify_and_tag_chunks(chunks: list[str]) -> list[dict]:
193
  })
194
 
195
  return result
196
-
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/ingest.py — File ingestion: convert uploaded files to corpus Markdown.
4
 
 
96
 
97
 
98
  def _extract_pdf(path: Path) -> str:
99
+ """Extract text and tables from PDF using pdfplumber."""
100
+ import logging
101
 
102
+ import pdfplumber
103
+
104
+ logger = logging.getLogger("pluto")
105
  pages = []
106
+ with pdfplumber.open(str(path)) as pdf:
107
+ for i, page in enumerate(pdf.pages):
108
+ page_parts = []
109
+ text = page.extract_text(x_tolerance=2, y_tolerance=2)
110
+ if text and text.strip():
111
+ page_parts.append(text.strip())
112
+
113
+ tables = page.extract_tables()
114
+ for table in tables:
115
+ if table:
116
+ rows = [" | ".join(cell or "" for cell in row) for row in table]
117
+ page_parts.append("\n".join(rows))
118
+
119
+ if page_parts:
120
+ pages.append(f"## Page {i + 1}\n\n" + "\n\n".join(page_parts))
121
+ else:
122
+ logger.warning("pdfplumber returned empty text for page %s in %s", i + 1, path.name)
123
  return "\n\n".join(pages)
124
 
125
 
 
209
  })
210
 
211
  return result
 
mp1/pluto/models.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/models.py — Pydantic schemas for all 4 pipeline stages + final output.
3
 
@@ -10,9 +11,7 @@ import hashlib
10
  from enum import Enum
11
  from typing import Optional
12
 
13
- from pydantic import BaseModel, Field, field_validator
14
-
15
- from pluto.utils import coerce_string, coerce_string_list, ensure_list
16
 
17
 
18
  # ── Enums ──────────────────────────────────────────────────────────────────────
@@ -65,11 +64,6 @@ class Evidence(BaseModel):
65
  where: str = ""
66
  quote: str = Field(default="", max_length=200)
67
 
68
- @field_validator("doc_id", "chunk_id", "where", "quote", mode="before")
69
- @classmethod
70
- def _normalize_text_fields(cls, value):
71
- return coerce_string(value, default="")
72
-
73
 
74
  # ── S0 ROUTE ───────────────────────────────────────────────────────────────────
75
 
@@ -77,11 +71,6 @@ class DocScope(BaseModel):
77
  doc_id: str
78
  reason: str
79
 
80
- @field_validator("doc_id", "reason", mode="before")
81
- @classmethod
82
- def _normalize_doc_scope_fields(cls, value):
83
- return coerce_string(value, default="")
84
-
85
 
86
  class ChunkPlan(BaseModel):
87
  doc_id: str
@@ -92,11 +81,6 @@ class ChunkPlan(BaseModel):
92
  priority: Priority = Priority.MEDIUM
93
  task: str = ""
94
 
95
- @field_validator("doc_id", "chunk_id", "where", "task", mode="before")
96
- @classmethod
97
- def _normalize_chunk_plan_text_fields(cls, value):
98
- return coerce_string(value, default="")
99
-
100
 
101
  class Budgets(BaseModel):
102
  max_chunks_to_read: int = 200
@@ -123,27 +107,12 @@ class Claim(BaseModel):
123
  dependencies: list[str] = Field(default_factory=list)
124
  evidence: Evidence | None = None
125
 
126
- @field_validator("claim_id", "text", mode="before")
127
- @classmethod
128
- def _normalize_claim_text_fields(cls, value):
129
- return coerce_string(value, default="")
130
-
131
- @field_validator("numbers", "entities", "dependencies", mode="before")
132
- @classmethod
133
- def _normalize_claim_lists(cls, value):
134
- return coerce_string_list(value)
135
-
136
 
137
  class MathItem(BaseModel):
138
  expression: str
139
  interpretation: str = ""
140
  evidence: Evidence | None = None
141
 
142
- @field_validator("expression", "interpretation", mode="before")
143
- @classmethod
144
- def _normalize_math_fields(cls, value):
145
- return coerce_string(value, default="")
146
-
147
 
148
  class TableItem(BaseModel):
149
  caption: str = ""
@@ -151,35 +120,12 @@ class TableItem(BaseModel):
151
  rows: list[list[str]] = Field(default_factory=list)
152
  evidence: Evidence | None = None
153
 
154
- @field_validator("caption", mode="before")
155
- @classmethod
156
- def _normalize_table_caption(cls, value):
157
- return coerce_string(value, default="")
158
-
159
- @field_validator("headers", mode="before")
160
- @classmethod
161
- def _normalize_table_headers(cls, value):
162
- return coerce_string_list(value)
163
-
164
- @field_validator("rows", mode="before")
165
- @classmethod
166
- def _normalize_table_rows(cls, value):
167
- rows = []
168
- for row in ensure_list(value):
169
- rows.append(coerce_string_list(row))
170
- return [row for row in rows if row]
171
-
172
 
173
  class FigureItem(BaseModel):
174
  caption: str = ""
175
  description: str = ""
176
  evidence: Evidence | None = None
177
 
178
- @field_validator("caption", "description", mode="before")
179
- @classmethod
180
- def _normalize_figure_fields(cls, value):
181
- return coerce_string(value, default="")
182
-
183
 
184
  class CodeItem(BaseModel):
185
  language: str = ""
@@ -187,11 +133,6 @@ class CodeItem(BaseModel):
187
  description: str = ""
188
  evidence: Evidence | None = None
189
 
190
- @field_validator("language", "snippet", "description", mode="before")
191
- @classmethod
192
- def _normalize_code_fields(cls, value):
193
- return coerce_string(value, default="")
194
-
195
 
196
  class ExtractedContent(BaseModel):
197
  claims: list[Claim] = Field(default_factory=list)
@@ -202,11 +143,6 @@ class ExtractedContent(BaseModel):
202
  code: list[CodeItem] = Field(default_factory=list)
203
  chunk_summary: str = ""
204
 
205
- @field_validator("chunk_summary", mode="before")
206
- @classmethod
207
- def _normalize_chunk_summary(cls, value):
208
- return coerce_string(value, default="")
209
-
210
 
211
  class ExtractOutput(BaseModel):
212
  stage: str = "extract"
@@ -225,71 +161,41 @@ class SectionPoint(BaseModel):
225
  section: str
226
  points: list[str] = Field(default_factory=list)
227
 
228
- @field_validator("section", mode="before")
229
- @classmethod
230
- def _normalize_section_name(cls, value):
231
- return coerce_string(value, default="")
232
-
233
- @field_validator("points", mode="before")
234
- @classmethod
235
- def _normalize_section_points(cls, value):
236
- return coerce_string_list(value)
237
-
238
 
239
  class KeyClaim(BaseModel):
240
  claim: str
241
  support: ClaimStatus = ClaimStatus.SUPPORTED
242
  evidence_refs: list[Evidence] = Field(default_factory=list)
243
 
244
- @field_validator("claim", mode="before")
245
- @classmethod
246
- def _normalize_key_claim(cls, value):
247
- return coerce_string(value, default="")
248
-
249
 
250
  class Synthesis(BaseModel):
251
  answer_outline: list[SectionPoint] = Field(default_factory=list)
252
  key_claims: list[KeyClaim] = Field(default_factory=list)
253
  open_gaps: list[str] = Field(default_factory=list)
254
 
255
- @field_validator("open_gaps", mode="before")
256
- @classmethod
257
- def _normalize_open_gap_list(cls, value):
258
- return coerce_string_list(value)
259
-
260
 
261
  class MergeOutput(BaseModel):
262
  stage: str = "merge"
263
  synthesis: Synthesis = Field(default_factory=Synthesis)
264
 
265
 
266
- # ── S3 VERIFY ──────────────────────────────────────────────────────────────────
267
 
268
  class CheckedClaim(BaseModel):
269
  claim: str
270
  status: ClaimStatus
271
  evidence: list[Evidence] = Field(default_factory=list)
272
 
273
- @field_validator("claim", mode="before")
274
- @classmethod
275
- def _normalize_checked_claim(cls, value):
276
- return coerce_string(value, default="")
277
-
278
 
279
- class Verification(BaseModel):
280
  checked_claims: list[CheckedClaim] = Field(default_factory=list)
281
  unsupported_claims: list[str] = Field(default_factory=list)
282
  required_followups: list[str] = Field(default_factory=list)
283
 
284
- @field_validator("unsupported_claims", "required_followups", mode="before")
285
- @classmethod
286
- def _normalize_verification_lists(cls, value):
287
- return coerce_string_list(value)
288
-
289
 
290
- class VerifyOutput(BaseModel):
291
- stage: str = "verify"
292
- verification: Verification = Field(default_factory=Verification)
293
 
294
 
295
  # ── FINAL OUTPUT ───────────────────────────────────────────────────────────────
@@ -298,21 +204,11 @@ class Section(BaseModel):
298
  title: str
299
  content: str
300
 
301
- @field_validator("title", "content", mode="before")
302
- @classmethod
303
- def _normalize_section_fields(cls, value):
304
- return coerce_string(value, default="")
305
-
306
 
307
  class FinalAnswer(BaseModel):
308
  response: str
309
  sections: list[Section] = Field(default_factory=list)
310
 
311
- @field_validator("response", mode="before")
312
- @classmethod
313
- def _normalize_response(cls, value):
314
- return coerce_string(value, default="")
315
-
316
 
317
  class FinalEvidence(BaseModel):
318
  doc_id: str
@@ -321,11 +217,6 @@ class FinalEvidence(BaseModel):
321
  supports: str = ""
322
  quote: str = Field(default="", max_length=200)
323
 
324
- @field_validator("doc_id", "chunk_id", "where", "supports", "quote", mode="before")
325
- @classmethod
326
- def _normalize_final_evidence_fields(cls, value):
327
- return coerce_string(value, default="")
328
-
329
 
330
  class TraceSummary(BaseModel):
331
  real_switching: bool = False
@@ -336,16 +227,6 @@ class TraceSummary(BaseModel):
336
  search_queries: list[str] = Field(default_factory=list)
337
  budget_notes: str = ""
338
 
339
- @field_validator("models_used", "docs_opened", "search_queries", mode="before")
340
- @classmethod
341
- def _normalize_trace_lists(cls, value):
342
- return coerce_string_list(value)
343
-
344
- @field_validator("budget_notes", mode="before")
345
- @classmethod
346
- def _normalize_budget_notes(cls, value):
347
- return coerce_string(value, default="")
348
-
349
 
350
  class FinalOutput(BaseModel):
351
  final_answer: FinalAnswer = Field(default_factory=FinalAnswer)
@@ -356,11 +237,6 @@ class FinalOutput(BaseModel):
356
  next_actions: list[str] = Field(default_factory=list)
357
  bus_messages: list[dict] = Field(default_factory=list)
358
 
359
- @field_validator("missing_info", "next_actions", mode="before")
360
- @classmethod
361
- def _normalize_final_output_lists(cls, value):
362
- return coerce_string_list(value)
363
-
364
 
365
  # ── Helpers ────────────────────────────────────────────────────────────────────
366
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/models.py — Pydantic schemas for all 4 pipeline stages + final output.
4
 
 
11
  from enum import Enum
12
  from typing import Optional
13
 
14
+ from pydantic import BaseModel, Field
 
 
15
 
16
 
17
  # ── Enums ──────────────────────────────────────────────────────────────────────
 
64
  where: str = ""
65
  quote: str = Field(default="", max_length=200)
66
 
 
 
 
 
 
67
 
68
  # ── S0 ROUTE ───────────────────────────────────────────────────────────────────
69
 
 
71
  doc_id: str
72
  reason: str
73
 
 
 
 
 
 
74
 
75
  class ChunkPlan(BaseModel):
76
  doc_id: str
 
81
  priority: Priority = Priority.MEDIUM
82
  task: str = ""
83
 
 
 
 
 
 
84
 
85
  class Budgets(BaseModel):
86
  max_chunks_to_read: int = 200
 
107
  dependencies: list[str] = Field(default_factory=list)
108
  evidence: Evidence | None = None
109
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  class MathItem(BaseModel):
112
  expression: str
113
  interpretation: str = ""
114
  evidence: Evidence | None = None
115
 
 
 
 
 
 
116
 
117
  class TableItem(BaseModel):
118
  caption: str = ""
 
120
  rows: list[list[str]] = Field(default_factory=list)
121
  evidence: Evidence | None = None
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
  class FigureItem(BaseModel):
125
  caption: str = ""
126
  description: str = ""
127
  evidence: Evidence | None = None
128
 
 
 
 
 
 
129
 
130
  class CodeItem(BaseModel):
131
  language: str = ""
 
133
  description: str = ""
134
  evidence: Evidence | None = None
135
 
 
 
 
 
 
136
 
137
  class ExtractedContent(BaseModel):
138
  claims: list[Claim] = Field(default_factory=list)
 
143
  code: list[CodeItem] = Field(default_factory=list)
144
  chunk_summary: str = ""
145
 
 
 
 
 
 
146
 
147
  class ExtractOutput(BaseModel):
148
  stage: str = "extract"
 
161
  section: str
162
  points: list[str] = Field(default_factory=list)
163
 
 
 
 
 
 
 
 
 
 
 
164
 
165
  class KeyClaim(BaseModel):
166
  claim: str
167
  support: ClaimStatus = ClaimStatus.SUPPORTED
168
  evidence_refs: list[Evidence] = Field(default_factory=list)
169
 
 
 
 
 
 
170
 
171
  class Synthesis(BaseModel):
172
  answer_outline: list[SectionPoint] = Field(default_factory=list)
173
  key_claims: list[KeyClaim] = Field(default_factory=list)
174
  open_gaps: list[str] = Field(default_factory=list)
175
 
 
 
 
 
 
176
 
177
  class MergeOutput(BaseModel):
178
  stage: str = "merge"
179
  synthesis: Synthesis = Field(default_factory=Synthesis)
180
 
181
 
182
+ # ── S3 EvidenceCheck ──────────────────────────────────────────────────────────────────
183
 
184
  class CheckedClaim(BaseModel):
185
  claim: str
186
  status: ClaimStatus
187
  evidence: list[Evidence] = Field(default_factory=list)
188
 
 
 
 
 
 
189
 
190
+ class EvidenceCheck(BaseModel):
191
  checked_claims: list[CheckedClaim] = Field(default_factory=list)
192
  unsupported_claims: list[str] = Field(default_factory=list)
193
  required_followups: list[str] = Field(default_factory=list)
194
 
 
 
 
 
 
195
 
196
+ class EvidenceCheckOutput(BaseModel):
197
+ stage: str = "evidence_check"
198
+ evidence_check: EvidenceCheck = Field(default_factory=EvidenceCheck)
199
 
200
 
201
  # ── FINAL OUTPUT ───────────────────────────────────────────────────────────────
 
204
  title: str
205
  content: str
206
 
 
 
 
 
 
207
 
208
  class FinalAnswer(BaseModel):
209
  response: str
210
  sections: list[Section] = Field(default_factory=list)
211
 
 
 
 
 
 
212
 
213
  class FinalEvidence(BaseModel):
214
  doc_id: str
 
217
  supports: str = ""
218
  quote: str = Field(default="", max_length=200)
219
 
 
 
 
 
 
220
 
221
  class TraceSummary(BaseModel):
222
  real_switching: bool = False
 
227
  search_queries: list[str] = Field(default_factory=list)
228
  budget_notes: str = ""
229
 
 
 
 
 
 
 
 
 
 
 
230
 
231
  class FinalOutput(BaseModel):
232
  final_answer: FinalAnswer = Field(default_factory=FinalAnswer)
 
237
  next_actions: list[str] = Field(default_factory=list)
238
  bus_messages: list[dict] = Field(default_factory=list)
239
 
 
 
 
 
 
240
 
241
  # ── Helpers ────────────────────────────────────────────────────────────────────
242
 
mp1/pluto/modes.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/modes.py — Real mode switching engine.
3
 
@@ -156,15 +157,78 @@ def _build_registry() -> dict[str, ModeConfig]:
156
  provider="groq",
157
  ),
158
  }
159
- else:
160
- raise EnvironmentError("Neither NVIDIA_API_KEY nor GROQ_API_KEY is set.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
 
163
  MODE_REGISTRY: dict[str, ModeConfig] = _build_registry()
164
 
165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  def is_real_switching() -> bool:
167
  """True if MODE_QUICK and MODE_REASONING use DIFFERENT model_ids."""
 
 
 
 
168
  quick = MODE_REGISTRY["MODE_QUICK"].model_id
169
  reasoning = MODE_REGISTRY["MODE_REASONING"].model_id
170
  return quick != reasoning
@@ -174,4 +238,12 @@ def get_mode(mode_name: str) -> ModeConfig:
174
  """Look up a mode config by name."""
175
  if mode_name not in MODE_REGISTRY:
176
  raise ValueError(f"Unknown mode: {mode_name}. Valid: {list(MODE_REGISTRY)}")
177
- return MODE_REGISTRY[mode_name]
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/modes.py — Real mode switching engine.
4
 
 
157
  provider="groq",
158
  ),
159
  }
160
+ return _build_unconfigured_registry()
161
+
162
+
163
+ def _build_unconfigured_registry() -> dict[str, ModeConfig]:
164
+ """Return placeholder modes so imports work without provider credentials."""
165
+ return {
166
+ "MODE_QUICK": ModeConfig(
167
+ mode_name="MODE_QUICK",
168
+ model_id="unconfigured/MODE_QUICK",
169
+ temperature=0.1,
170
+ max_tokens=1024,
171
+ compute_profile="unconfigured",
172
+ provider="unconfigured",
173
+ ),
174
+ "MODE_REASONING": ModeConfig(
175
+ mode_name="MODE_REASONING",
176
+ model_id="unconfigured/MODE_REASONING",
177
+ temperature=0.3,
178
+ max_tokens=4096,
179
+ compute_profile="unconfigured",
180
+ provider="unconfigured",
181
+ ),
182
+ "MODE_VISION": ModeConfig(
183
+ mode_name="MODE_VISION",
184
+ model_id="unconfigured/MODE_VISION",
185
+ temperature=0.1,
186
+ max_tokens=4096,
187
+ compute_profile="unconfigured",
188
+ provider="unconfigured",
189
+ ),
190
+ "MODE_ULTRA": ModeConfig(
191
+ mode_name="MODE_ULTRA",
192
+ model_id="unconfigured/MODE_ULTRA",
193
+ temperature=0.2,
194
+ max_tokens=4096,
195
+ compute_profile="unconfigured",
196
+ provider="unconfigured",
197
+ ),
198
+ "MODE_GEMINI": ModeConfig(
199
+ mode_name="MODE_GEMINI",
200
+ model_id="unconfigured/MODE_GEMINI",
201
+ temperature=0.0,
202
+ max_tokens=4096,
203
+ compute_profile="unconfigured",
204
+ provider="unconfigured",
205
+ ),
206
+ }
207
 
208
 
209
  MODE_REGISTRY: dict[str, ModeConfig] = _build_registry()
210
 
211
 
212
+ def _missing_provider_error() -> EnvironmentError:
213
+ return EnvironmentError("Neither NVIDIA_API_KEY nor GROQ_API_KEY is set.")
214
+
215
+
216
+ def _is_unconfigured() -> bool:
217
+ return any(mode.provider == "unconfigured" for mode in MODE_REGISTRY.values())
218
+
219
+
220
+ def _refresh_mode_registry() -> None:
221
+ """Refresh mode config in place so imported MODE_REGISTRY references stay valid."""
222
+ MODE_REGISTRY.clear()
223
+ MODE_REGISTRY.update(_build_registry())
224
+
225
+
226
  def is_real_switching() -> bool:
227
  """True if MODE_QUICK and MODE_REASONING use DIFFERENT model_ids."""
228
+ if _is_unconfigured():
229
+ _refresh_mode_registry()
230
+ if _is_unconfigured():
231
+ return False
232
  quick = MODE_REGISTRY["MODE_QUICK"].model_id
233
  reasoning = MODE_REGISTRY["MODE_REASONING"].model_id
234
  return quick != reasoning
 
238
  """Look up a mode config by name."""
239
  if mode_name not in MODE_REGISTRY:
240
  raise ValueError(f"Unknown mode: {mode_name}. Valid: {list(MODE_REGISTRY)}")
241
+ mode = MODE_REGISTRY[mode_name]
242
+ if mode.provider == "unconfigured":
243
+ _refresh_mode_registry()
244
+ mode = MODE_REGISTRY.get(mode_name)
245
+ if mode is None:
246
+ raise ValueError(f"Unknown mode: {mode_name}. Valid: {list(MODE_REGISTRY)}")
247
+ if mode.provider == "unconfigured":
248
+ raise _missing_provider_error()
249
+ return mode
mp1/pluto/pipeline.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/pipeline.py - Orchestrator for document understanding and query answering.
3
 
@@ -23,7 +24,7 @@ from pluto.modes import is_real_switching
23
  from pluto.stages.extract import run_extract
24
  from pluto.stages.merge import run_merge
25
  from pluto.stages.route import run_route
26
- from pluto.stages.verify import run_verify
27
  from pluto.tools import CorpusTools
28
  from pluto.tracer import Tracer
29
 
@@ -31,9 +32,16 @@ from pluto.tracer import Tracer
31
  class PipelineRunner:
32
  """Two-phase pipeline: understand documents, then answer queries."""
33
 
34
- def __init__(self, corpus_dir: str, output_dir: str = "./output", doc_index=None) -> None:
 
 
 
 
 
 
35
  self.tracer = Tracer()
36
  self.doc_index = doc_index
 
37
  self.tools = CorpusTools(corpus_dir, output_dir, self.tracer, doc_index=doc_index)
38
  self.cache = ExtractionCache(corpus_dir)
39
  self._progress_callback: Any = None
@@ -72,9 +80,11 @@ class PipelineRunner:
72
 
73
  self._ensure_docs_understood(selected_doc_ids=selected_doc_ids)
74
 
 
 
75
  self._emit("route", {"status": "running", "query": query})
76
  route_out = run_route(
77
- query,
78
  self.tools,
79
  self.tracer,
80
  bus=self.bus,
@@ -137,22 +147,22 @@ class PipelineRunner:
137
  },
138
  )
139
 
140
- self._emit("verify", {"status": "running"})
141
- verify_out = run_verify(merge_out, extractions, self.tracer, bus=self.bus)
142
  self._emit(
143
- "verify",
144
  {
145
  "status": "complete",
146
- "checked": len(verify_out.verification.checked_claims),
147
- "unsupported": len(verify_out.verification.unsupported_claims),
148
- "gaps": len(verify_out.verification.required_followups),
149
  },
150
  )
151
 
152
  final = self._build_final(
153
  query,
154
  merge_out,
155
- verify_out,
156
  extractions,
157
  overview=overview,
158
  bus=self.bus,
@@ -187,7 +197,7 @@ class PipelineRunner:
187
  self,
188
  query,
189
  merge_out,
190
- verify_out,
191
  extractions,
192
  overview="",
193
  bus: MessageBus | None = None,
@@ -203,12 +213,12 @@ class PipelineRunner:
203
  sections.append(Section(title=section_point.section, content=content))
204
 
205
  section_parts = [f"**{section.title}**\n{section.content}" for section in sections if section.content]
206
- verified_claims = [
207
  checked
208
- for checked in verify_out.verification.checked_claims
209
  if checked.status == ClaimStatus.SUPPORTED
210
  ]
211
- claim_parts = [checked.claim for checked in verified_claims]
212
 
213
  if section_parts:
214
  response = "\n\n".join(section_parts)
@@ -232,15 +242,15 @@ class PipelineRunner:
232
  )
233
  )
234
 
235
- total = len(verify_out.verification.checked_claims)
236
  supported = sum(
237
  1
238
- for checked in verify_out.verification.checked_claims
239
  if checked.status == ClaimStatus.SUPPORTED
240
  )
241
  uncertain = sum(
242
  1
243
- for checked in verify_out.verification.checked_claims
244
  if checked.status == ClaimStatus.UNCERTAIN
245
  )
246
 
@@ -269,8 +279,8 @@ class PipelineRunner:
269
  evidence=evidence,
270
  trace_summary=trace,
271
  confidence=confidence,
272
- missing_info=merge_out.synthesis.open_gaps + verify_out.verification.required_followups,
273
- next_actions=verify_out.verification.required_followups,
274
  bus_messages=bus_messages,
275
  )
276
 
@@ -289,3 +299,24 @@ def _normalize_selected_doc_ids(selected_doc_ids: list[str] | None) -> list[str]
289
 
290
  def _normalize_detail_level(detail_level: str | None) -> str:
291
  return "detailed" if str(detail_level or "").strip().lower() == "detailed" else "standard"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/pipeline.py - Orchestrator for document understanding and query answering.
4
 
 
24
  from pluto.stages.extract import run_extract
25
  from pluto.stages.merge import run_merge
26
  from pluto.stages.route import run_route
27
+ from pluto.stages.evidence_check import run_evidence_check
28
  from pluto.tools import CorpusTools
29
  from pluto.tracer import Tracer
30
 
 
32
  class PipelineRunner:
33
  """Two-phase pipeline: understand documents, then answer queries."""
34
 
35
+ def __init__(
36
+ self,
37
+ corpus_dir: str,
38
+ output_dir: str = "./output",
39
+ doc_index=None,
40
+ prior_session_context: list[dict] | None = None,
41
+ ) -> None:
42
  self.tracer = Tracer()
43
  self.doc_index = doc_index
44
+ self.prior_session_context = prior_session_context or []
45
  self.tools = CorpusTools(corpus_dir, output_dir, self.tracer, doc_index=doc_index)
46
  self.cache = ExtractionCache(corpus_dir)
47
  self._progress_callback: Any = None
 
80
 
81
  self._ensure_docs_understood(selected_doc_ids=selected_doc_ids)
82
 
83
+ route_query = _prepend_prior_session_context(query, self.prior_session_context)
84
+
85
  self._emit("route", {"status": "running", "query": query})
86
  route_out = run_route(
87
+ route_query,
88
  self.tools,
89
  self.tracer,
90
  bus=self.bus,
 
147
  },
148
  )
149
 
150
+ self._emit("evidence_check", {"status": "running"})
151
+ evidence_check_out = run_evidence_check(merge_out, extractions, self.tracer, bus=self.bus)
152
  self._emit(
153
+ "evidence_check",
154
  {
155
  "status": "complete",
156
+ "checked": len(evidence_check_out.evidence_check.checked_claims),
157
+ "unsupported": len(evidence_check_out.evidence_check.unsupported_claims),
158
+ "gaps": len(evidence_check_out.evidence_check.required_followups),
159
  },
160
  )
161
 
162
  final = self._build_final(
163
  query,
164
  merge_out,
165
+ evidence_check_out,
166
  extractions,
167
  overview=overview,
168
  bus=self.bus,
 
197
  self,
198
  query,
199
  merge_out,
200
+ evidence_check_out,
201
  extractions,
202
  overview="",
203
  bus: MessageBus | None = None,
 
213
  sections.append(Section(title=section_point.section, content=content))
214
 
215
  section_parts = [f"**{section.title}**\n{section.content}" for section in sections if section.content]
216
+ supported_checked_claims = [
217
  checked
218
+ for checked in evidence_check_out.evidence_check.checked_claims
219
  if checked.status == ClaimStatus.SUPPORTED
220
  ]
221
+ claim_parts = [checked.claim for checked in supported_checked_claims]
222
 
223
  if section_parts:
224
  response = "\n\n".join(section_parts)
 
242
  )
243
  )
244
 
245
+ total = len(evidence_check_out.evidence_check.checked_claims)
246
  supported = sum(
247
  1
248
+ for checked in evidence_check_out.evidence_check.checked_claims
249
  if checked.status == ClaimStatus.SUPPORTED
250
  )
251
  uncertain = sum(
252
  1
253
+ for checked in evidence_check_out.evidence_check.checked_claims
254
  if checked.status == ClaimStatus.UNCERTAIN
255
  )
256
 
 
279
  evidence=evidence,
280
  trace_summary=trace,
281
  confidence=confidence,
282
+ missing_info=merge_out.synthesis.open_gaps + evidence_check_out.evidence_check.required_followups,
283
+ next_actions=evidence_check_out.evidence_check.required_followups,
284
  bus_messages=bus_messages,
285
  )
286
 
 
299
 
300
  def _normalize_detail_level(detail_level: str | None) -> str:
301
  return "detailed" if str(detail_level or "").strip().lower() == "detailed" else "standard"
302
+
303
+
304
+ def _prepend_prior_session_context(query: str, prior_session_context: list[dict]) -> str:
305
+ key_findings: list[str] = []
306
+ open_questions: list[str] = []
307
+ for session in prior_session_context or []:
308
+ key_findings.extend(str(item) for item in session.get("key_findings", []) if str(item).strip())
309
+ open_questions.extend(str(item) for item in session.get("open_questions", []) if str(item).strip())
310
+
311
+ if not key_findings and not open_questions:
312
+ return query
313
+
314
+ findings_block = "\n".join(f"- {finding}" for finding in key_findings[:10])
315
+ questions_block = "\n".join(f"- {question}" for question in open_questions[:10])
316
+ return (
317
+ "[Prior session findings for this document:\n"
318
+ f"{findings_block}\n"
319
+ "Open questions from prior sessions:\n"
320
+ f"{questions_block}]\n\n"
321
+ f"{query}"
322
+ )
mp1/pluto/server.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/server.py — FastAPI server bridging pipeline <-> web UI.
3
 
@@ -17,6 +18,7 @@ import json
17
  import os
18
  import shutil
19
  import tempfile
 
20
  from pathlib import Path
21
  from typing import Any
22
 
@@ -33,8 +35,10 @@ app = FastAPI(title="Pluto Pipeline", version="1.0.0")
33
 
34
  # ── State ─────────────────────────────────────────────────────────────────────
35
 
36
- _progress_queue: asyncio.Queue = asyncio.Queue() # Always exists — reset per run
37
- _latest_result: dict | None = None
 
 
38
 
39
  FRONTEND_DIR = Path(__file__).parent.parent / "frontend"
40
  CORPUS_DIR = Path(__file__).parent.parent / "corpus"
@@ -85,6 +89,69 @@ def _json_safe(value: Any) -> Any:
85
  return jsonable_encoder(value)
86
 
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  # ── Startup: re-index existing corpus files ─────────────────────────────────
89
 
90
  @app.on_event("startup")
@@ -142,16 +209,34 @@ async def index():
142
  @app.post("/api/run")
143
  async def run_pipeline(request: Request):
144
  """Run the full pipeline for a user query."""
145
- global _latest_result, _progress_queue
146
-
147
  body = await request.json()
148
  query = body.get("query", "")
149
  corpus_dir = body.get("corpus_dir", str(CORPUS_DIR))
150
  selected_doc_ids = _normalize_selected_doc_ids(body.get("selected_doc_ids"))
151
  detail_level = _normalize_detail_level(body.get("detail_level"))
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  if not query:
154
- return JSONResponse({"error": "No query provided"}, status_code=400)
 
 
 
 
 
 
 
 
 
155
 
156
  processing_docs = _processing_docs_for_scope(_doc_index, selected_doc_ids)
157
  if processing_docs:
@@ -159,26 +244,28 @@ async def run_pipeline(request: Request):
159
  {
160
  "error": "Please wait for document understanding to finish before running a query.",
161
  "processing_docs": processing_docs,
 
162
  },
163
  status_code=409,
164
  headers={"Cache-Control": "no-store"},
165
  )
166
 
167
  # Reset queue for this run (drain any leftover events without replacing the object)
168
- while not _progress_queue.empty():
169
  try:
170
- _progress_queue.get_nowait()
171
  except asyncio.QueueEmpty:
172
  break
173
 
174
  def progress_callback(stage: str, data: dict):
175
- _progress_queue.put_nowait(_json_safe({"stage": stage, **data}))
176
 
177
  # Run pipeline in a thread to avoid blocking
178
  loop = asyncio.get_event_loop()
179
  runner = PipelineRunner(
180
  corpus_dir=corpus_dir, output_dir=str(OUTPUT_DIR),
181
  doc_index=_doc_index,
 
182
  )
183
  runner.on_progress(progress_callback)
184
 
@@ -192,17 +279,20 @@ async def run_pipeline(request: Request):
192
  detail_level=detail_level,
193
  ),
194
  )
195
- _latest_result = result.model_dump()
196
 
197
  # Include cache stats in the response
198
  cache_stats = runner.cache.stats()
199
- _latest_result["cache_hits"] = cache_stats["hits"]
200
- _latest_result["cache_misses"] = cache_stats["misses"]
 
 
 
201
 
202
  # Signal completion
203
- await _progress_queue.put({"stage": "done", "status": "complete"})
204
 
205
- return JSONResponse(_latest_result)
206
 
207
  except Exception as e:
208
  import traceback
@@ -211,31 +301,39 @@ async def run_pipeline(request: Request):
211
 
212
  # Always signal error to SSE stream
213
  try:
214
- await _progress_queue.put({"stage": "error", "status": "failed", "detail": err_msg})
 
 
215
  except Exception:
216
  pass
217
 
218
  # ALWAYS return valid JSON — never let FastAPI return HTML 500
219
  return JSONResponse(
220
- {"error": f"Pipeline error: {err_msg}"},
221
  status_code=200 # Return 200 so browser can parse the JSON body
222
  )
223
 
224
 
225
  @app.get("/api/stream")
226
- async def stream_progress():
227
  """SSE stream of pipeline progress events."""
 
228
 
229
  async def event_generator():
230
  # Wait for events from the pipeline — keep connection open
231
- while True:
232
- try:
233
- event = await asyncio.wait_for(_progress_queue.get(), timeout=120.0)
234
- yield f"data: {json.dumps(_json_safe(event))}\n\n"
235
- if event.get("stage") in ("done", "error"):
236
- break
237
- except asyncio.TimeoutError:
238
- yield f"data: {json.dumps({'stage': 'heartbeat'})}\n\n"
 
 
 
 
 
239
 
240
  return StreamingResponse(
241
  event_generator(),
@@ -249,11 +347,21 @@ async def stream_progress():
249
 
250
 
251
  @app.get("/api/result")
252
- async def get_result():
253
- """Return the latest pipeline result."""
254
- if _latest_result:
255
- return JSONResponse(_latest_result)
256
- return JSONResponse({"error": "No result yet"}, status_code=404)
 
 
 
 
 
 
 
 
 
 
257
 
258
 
259
  @app.post("/api/compare")
@@ -296,6 +404,30 @@ async def benchmark_compare(request: Request):
296
  )
297
 
298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
299
  # ── File upload ───────────────────────────────────────────────────────────────
300
 
301
  ALLOWED_EXTENSIONS = {".pdf", ".docx", ".doc", ".txt", ".md", ".markdown"}
@@ -337,6 +469,8 @@ async def upload_files(files: list[UploadFile] = File(...)):
337
  tracer = Tracer()
338
  print(f" [SERVER] Starting background Phase A for {did}...")
339
  run_understand(did, _doc_index, tracer)
 
 
340
  print(f" [SERVER] Background Phase A COMPLETE for {did}")
341
  except BaseException as e:
342
  import traceback
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/server.py — FastAPI server bridging pipeline <-> web UI.
4
 
 
18
  import os
19
  import shutil
20
  import tempfile
21
+ from uuid import uuid4
22
  from pathlib import Path
23
  from typing import Any
24
 
 
35
 
36
  # ── State ─────────────────────────────────────────────────────────────────────
37
 
38
+ session_queues: dict[str, asyncio.Queue] = {}
39
+ session_results: dict[str, dict] = {}
40
+ session_cleanup_tasks: dict[str, asyncio.Task] = {}
41
+ SESSION_CLEANUP_DELAY_SECONDS = 300
42
 
43
  FRONTEND_DIR = Path(__file__).parent.parent / "frontend"
44
  CORPUS_DIR = Path(__file__).parent.parent / "corpus"
 
89
  return jsonable_encoder(value)
90
 
91
 
92
+ def _normalize_session_id(raw_value: Any) -> str:
93
+ session_id = str(raw_value or "").strip()
94
+ return session_id or str(uuid4())
95
+
96
+
97
+ def _get_session_queue(session_id: str) -> asyncio.Queue:
98
+ cleanup_task = session_cleanup_tasks.pop(session_id, None)
99
+ if cleanup_task:
100
+ cleanup_task.cancel()
101
+
102
+ queue = session_queues.get(session_id)
103
+ if queue is None:
104
+ queue = asyncio.Queue()
105
+ session_queues[session_id] = queue
106
+ return queue
107
+
108
+
109
+ def _schedule_session_cleanup(session_id: str, queue: asyncio.Queue) -> None:
110
+ cleanup_task = session_cleanup_tasks.pop(session_id, None)
111
+ if cleanup_task:
112
+ cleanup_task.cancel()
113
+
114
+ async def cleanup_later() -> None:
115
+ try:
116
+ await asyncio.sleep(SESSION_CLEANUP_DELAY_SECONDS)
117
+ if session_queues.get(session_id) is queue:
118
+ session_queues.pop(session_id, None)
119
+ session_results.pop(session_id, None)
120
+ except asyncio.CancelledError:
121
+ pass
122
+ finally:
123
+ if session_cleanup_tasks.get(session_id) is task:
124
+ session_cleanup_tasks.pop(session_id, None)
125
+
126
+ task = asyncio.create_task(cleanup_later())
127
+ session_cleanup_tasks[session_id] = task
128
+
129
+
130
+ def _session_doc_id(selected_doc_ids: list[str], result_data: dict | None = None) -> str:
131
+ if selected_doc_ids:
132
+ return selected_doc_ids[0]
133
+ trace = (result_data or {}).get("trace_summary", {})
134
+ docs_opened = trace.get("docs_opened", []) if isinstance(trace, dict) else []
135
+ if docs_opened:
136
+ return str(docs_opened[0])
137
+ return "corpus"
138
+
139
+
140
+ def _schedule_session_compression(session_id: str) -> None:
141
+ result_data = session_results.get(session_id)
142
+ if not result_data:
143
+ return
144
+
145
+ doc_id = str(result_data.get("doc_id") or "corpus")
146
+
147
+ async def compress_later() -> None:
148
+ from pluto.session_memory import compress_session
149
+
150
+ await asyncio.to_thread(compress_session, session_id, doc_id, result_data, CORPUS_DIR)
151
+
152
+ asyncio.create_task(compress_later())
153
+
154
+
155
  # ── Startup: re-index existing corpus files ─────────────────────────────────
156
 
157
  @app.on_event("startup")
 
209
  @app.post("/api/run")
210
  async def run_pipeline(request: Request):
211
  """Run the full pipeline for a user query."""
 
 
212
  body = await request.json()
213
  query = body.get("query", "")
214
  corpus_dir = body.get("corpus_dir", str(CORPUS_DIR))
215
  selected_doc_ids = _normalize_selected_doc_ids(body.get("selected_doc_ids"))
216
  detail_level = _normalize_detail_level(body.get("detail_level"))
217
+ session_id = _normalize_session_id(body.get("session_id"))
218
+ query_timestamp = body.get("query_timestamp")
219
+ prev_query = body.get("prev_query", "")
220
+ prev_query_timestamp = body.get("prev_query_timestamp")
221
+ prev_session_id = str(body.get("prev_session_id") or "").strip()
222
+ progress_queue = _get_session_queue(session_id)
223
+ doc_id = _session_doc_id(selected_doc_ids)
224
+ prior_session_context = []
225
+ if selected_doc_ids:
226
+ from pluto.session_memory import list_session_context
227
+ prior_session_context = list_session_context(doc_id, CORPUS_DIR)
228
 
229
  if not query:
230
+ return JSONResponse({"error": "No query provided", "session_id": session_id}, status_code=400)
231
+
232
+ _capture_behavioral_signals(
233
+ query=query,
234
+ query_timestamp=query_timestamp,
235
+ prev_query=prev_query,
236
+ prev_query_timestamp=prev_query_timestamp,
237
+ prev_session_id=prev_session_id,
238
+ fallback_session_id=session_id,
239
+ )
240
 
241
  processing_docs = _processing_docs_for_scope(_doc_index, selected_doc_ids)
242
  if processing_docs:
 
244
  {
245
  "error": "Please wait for document understanding to finish before running a query.",
246
  "processing_docs": processing_docs,
247
+ "session_id": session_id,
248
  },
249
  status_code=409,
250
  headers={"Cache-Control": "no-store"},
251
  )
252
 
253
  # Reset queue for this run (drain any leftover events without replacing the object)
254
+ while not progress_queue.empty():
255
  try:
256
+ progress_queue.get_nowait()
257
  except asyncio.QueueEmpty:
258
  break
259
 
260
  def progress_callback(stage: str, data: dict):
261
+ progress_queue.put_nowait(_json_safe({"stage": stage, **data}))
262
 
263
  # Run pipeline in a thread to avoid blocking
264
  loop = asyncio.get_event_loop()
265
  runner = PipelineRunner(
266
  corpus_dir=corpus_dir, output_dir=str(OUTPUT_DIR),
267
  doc_index=_doc_index,
268
+ prior_session_context=prior_session_context,
269
  )
270
  runner.on_progress(progress_callback)
271
 
 
279
  detail_level=detail_level,
280
  ),
281
  )
282
+ session_results[session_id] = result.model_dump()
283
 
284
  # Include cache stats in the response
285
  cache_stats = runner.cache.stats()
286
+ session_results[session_id]["cache_hits"] = cache_stats["hits"]
287
+ session_results[session_id]["cache_misses"] = cache_stats["misses"]
288
+ session_results[session_id]["session_id"] = session_id
289
+ session_results[session_id]["query"] = query
290
+ session_results[session_id]["doc_id"] = _session_doc_id(selected_doc_ids, session_results[session_id])
291
 
292
  # Signal completion
293
+ await progress_queue.put({"stage": "done", "status": "complete", "session_id": session_id})
294
 
295
+ return JSONResponse(session_results[session_id])
296
 
297
  except Exception as e:
298
  import traceback
 
301
 
302
  # Always signal error to SSE stream
303
  try:
304
+ await progress_queue.put(
305
+ {"stage": "error", "status": "failed", "detail": err_msg, "session_id": session_id}
306
+ )
307
  except Exception:
308
  pass
309
 
310
  # ALWAYS return valid JSON — never let FastAPI return HTML 500
311
  return JSONResponse(
312
+ {"error": f"Pipeline error: {err_msg}", "session_id": session_id},
313
  status_code=200 # Return 200 so browser can parse the JSON body
314
  )
315
 
316
 
317
  @app.get("/api/stream")
318
+ async def stream_progress(session_id: str):
319
  """SSE stream of pipeline progress events."""
320
+ progress_queue = _get_session_queue(session_id)
321
 
322
  async def event_generator():
323
  # Wait for events from the pipeline — keep connection open
324
+ try:
325
+ while True:
326
+ try:
327
+ event = await asyncio.wait_for(progress_queue.get(), timeout=120.0)
328
+ yield f"data: {json.dumps(_json_safe(event))}\n\n"
329
+ if event.get("stage") in ("done", "error"):
330
+ if event.get("stage") == "done":
331
+ _schedule_session_compression(session_id)
332
+ break
333
+ except asyncio.TimeoutError:
334
+ yield f"data: {json.dumps({'stage': 'heartbeat', 'session_id': session_id})}\n\n"
335
+ finally:
336
+ _schedule_session_cleanup(session_id, progress_queue)
337
 
338
  return StreamingResponse(
339
  event_generator(),
 
347
 
348
 
349
  @app.get("/api/result")
350
+ async def get_result(session_id: str):
351
+ """Return the latest pipeline result for a session."""
352
+ result = session_results.get(session_id)
353
+ if result:
354
+ return JSONResponse(result)
355
+ return JSONResponse({"error": "No result yet", "session_id": session_id}, status_code=404)
356
+
357
+
358
+ @app.get("/api/session-context/{doc_id}")
359
+ async def get_session_context(doc_id: str):
360
+ """Return recent compressed session context for a document."""
361
+ from pluto.session_memory import list_session_context
362
+
363
+ sessions = list_session_context(doc_id, CORPUS_DIR, limit=10)
364
+ return JSONResponse({"doc_id": doc_id, "sessions": sessions}, headers={"Cache-Control": "no-store"})
365
 
366
 
367
  @app.post("/api/compare")
 
404
  )
405
 
406
 
407
+ def _capture_behavioral_signals(
408
+ query: str,
409
+ query_timestamp: Any,
410
+ prev_query: str,
411
+ prev_query_timestamp: Any,
412
+ prev_session_id: str,
413
+ fallback_session_id: str,
414
+ ) -> None:
415
+ from pluto.signal_logger import check_prior_reference, check_rephrase, log_signal, query_hash
416
+
417
+ referenced_session_id = prev_session_id or fallback_session_id
418
+
419
+ if prev_query and prev_query_timestamp is not None and query_timestamp is not None:
420
+ try:
421
+ delta_seconds = (float(query_timestamp) - float(prev_query_timestamp)) / 1000.0
422
+ except (TypeError, ValueError):
423
+ delta_seconds = -1
424
+ if check_rephrase(query, prev_query, delta_seconds):
425
+ log_signal(referenced_session_id, query_hash(prev_query), "rephrase_fail")
426
+
427
+ if check_prior_reference(query):
428
+ log_signal(referenced_session_id, query_hash(query), "prior_reference")
429
+
430
+
431
  # ── File upload ───────────────────────────────────────────────────────────────
432
 
433
  ALLOWED_EXTENSIONS = {".pdf", ".docx", ".doc", ".txt", ".md", ".markdown"}
 
469
  tracer = Tracer()
470
  print(f" [SERVER] Starting background Phase A for {did}...")
471
  run_understand(did, _doc_index, tracer)
472
+ from pluto.doc_summary import generate_doc_summary
473
+ generate_doc_summary(did, CORPUS_DIR)
474
  print(f" [SERVER] Background Phase A COMPLETE for {did}")
475
  except BaseException as e:
476
  import traceback
mp1/pluto/session_memory.py ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Session memory compression and retrieval.
4
+
5
+ PostgreSQL is initialized lazily. If it is not configured or unavailable, writes
6
+ fall back to local JSON files under the corpus directory.
7
+ """
8
+
9
+ from __future__ import annotations
10
+
11
+ from datetime import datetime, timezone
12
+ import json
13
+ import logging
14
+ from pathlib import Path
15
+ from typing import Any
16
+
17
+ from pydantic import BaseModel, Field
18
+
19
+ from pluto.db import _get_connection
20
+ from pluto.utils import extract_json_from_response
21
+
22
+
23
+ logger = logging.getLogger("pluto")
24
+ LOCAL_MEMORY_DIR = ".session_memory"
25
+ RAW_ARCHIVE_DIR = ".session_archive"
26
+
27
+
28
+ class CompressedSession(BaseModel):
29
+ session_id: str
30
+ doc_id: str
31
+ timestamp: str
32
+ queries_resolved: list[dict] = Field(default_factory=list)
33
+ key_findings: list[str] = Field(default_factory=list)
34
+ open_questions: list[str] = Field(default_factory=list)
35
+ links_to_prior_sessions: list[str] = Field(default_factory=list)
36
+
37
+
38
+ def compress_session(
39
+ session_id: str,
40
+ doc_id: str,
41
+ session_result: dict,
42
+ corpus_dir: str | Path,
43
+ ) -> CompressedSession:
44
+ """Compress and store a session result without raising on storage failure."""
45
+ corpus_path = Path(corpus_dir)
46
+ raw_path = _write_raw_session(corpus_path, session_id, session_result)
47
+
48
+ try:
49
+ raw = _call_compression_llm(session_id=session_id, doc_id=doc_id, session_result=session_result)
50
+ compressed = _parse_compressed_session(session_id, doc_id, raw)
51
+ except Exception as exc:
52
+ logger.warning("Session compression LLM failed for %s: %s", session_id, exc)
53
+ compressed = _fallback_compressed_session(session_id, doc_id, session_result)
54
+
55
+ try:
56
+ _store_postgres(compressed, raw_path)
57
+ except Exception as exc:
58
+ logger.warning("PostgreSQL session memory unavailable; writing local fallback: %s", exc)
59
+ _store_local(corpus_path, compressed)
60
+
61
+ return compressed
62
+
63
+
64
+ def list_session_context(
65
+ doc_id: str,
66
+ corpus_dir: str | Path,
67
+ limit: int = 10,
68
+ ) -> list[dict]:
69
+ """Return compressed sessions for one document, newest first."""
70
+ try:
71
+ return _list_postgres(doc_id, limit)
72
+ except Exception as exc:
73
+ logger.warning("PostgreSQL session memory unavailable; reading local fallback: %s", exc)
74
+ return _list_local(Path(corpus_dir), doc_id, limit)
75
+
76
+
77
+ def _call_compression_llm(session_id: str, doc_id: str, session_result: dict) -> str:
78
+ from pluto.dispatcher import dispatch
79
+ from pluto.modes import get_mode
80
+
81
+ get_mode("MODE_QUICK")
82
+ prompt = f"""Compress this QA session as JSON only.
83
+
84
+ Schema:
85
+ {{
86
+ "queries_resolved": [
87
+ {{"query": "...", "answer_summary": "...", "chunks_used": 0, "confidence": 0.0}}
88
+ ],
89
+ "key_findings": ["finding"],
90
+ "open_questions": ["question"],
91
+ "links_to_prior_sessions": []
92
+ }}
93
+
94
+ Session id: {session_id}
95
+ Document id: {doc_id}
96
+ Session result:
97
+ {json.dumps(session_result, ensure_ascii=False)[:14000]}
98
+ """
99
+ return dispatch("MODE_QUICK", prompt)
100
+
101
+
102
+ def _parse_compressed_session(session_id: str, doc_id: str, raw: str) -> CompressedSession:
103
+ data = json.loads(extract_json_from_response(raw))
104
+ return CompressedSession(
105
+ session_id=session_id,
106
+ doc_id=doc_id,
107
+ timestamp=_utc_now(),
108
+ queries_resolved=data.get("queries_resolved", []) if isinstance(data.get("queries_resolved"), list) else [],
109
+ key_findings=_string_list(data.get("key_findings")),
110
+ open_questions=_string_list(data.get("open_questions")),
111
+ links_to_prior_sessions=_string_list(data.get("links_to_prior_sessions")),
112
+ )
113
+
114
+
115
+ def _fallback_compressed_session(session_id: str, doc_id: str, session_result: dict) -> CompressedSession:
116
+ final_answer = session_result.get("final_answer", {}) if isinstance(session_result, dict) else {}
117
+ trace = session_result.get("trace_summary", {}) if isinstance(session_result, dict) else {}
118
+ query = session_result.get("query", "") if isinstance(session_result, dict) else ""
119
+ answer = final_answer.get("response", "") if isinstance(final_answer, dict) else ""
120
+ return CompressedSession(
121
+ session_id=session_id,
122
+ doc_id=doc_id,
123
+ timestamp=_utc_now(),
124
+ queries_resolved=[
125
+ {
126
+ "query": query,
127
+ "answer_summary": str(answer)[:500],
128
+ "chunks_used": trace.get("chunks_processed", 0) if isinstance(trace, dict) else 0,
129
+ "confidence": session_result.get("confidence", 0.0) if isinstance(session_result, dict) else 0.0,
130
+ }
131
+ ],
132
+ key_findings=[],
133
+ open_questions=session_result.get("missing_info", []) if isinstance(session_result, dict) else [],
134
+ links_to_prior_sessions=[],
135
+ )
136
+
137
+
138
+ def _store_postgres(compressed: CompressedSession, raw_path: str) -> None:
139
+ conn = _get_connection()
140
+ try:
141
+ with conn.cursor() as cur:
142
+ cur.execute(
143
+ """
144
+ INSERT INTO session_memory (session_id, doc_id, compressed_json, raw_path)
145
+ VALUES (%s, %s, %s::jsonb, %s)
146
+ ON CONFLICT (session_id) DO UPDATE SET
147
+ doc_id = EXCLUDED.doc_id,
148
+ compressed_json = EXCLUDED.compressed_json,
149
+ raw_path = EXCLUDED.raw_path
150
+ """,
151
+ (
152
+ compressed.session_id,
153
+ compressed.doc_id,
154
+ json.dumps(compressed.model_dump(), ensure_ascii=False),
155
+ raw_path,
156
+ ),
157
+ )
158
+ conn.commit()
159
+ finally:
160
+ conn.close()
161
+
162
+
163
+ def _list_postgres(doc_id: str, limit: int) -> list[dict]:
164
+ conn = _get_connection()
165
+ try:
166
+ with conn.cursor() as cur:
167
+ cur.execute(
168
+ """
169
+ SELECT compressed_json
170
+ FROM session_memory
171
+ WHERE doc_id = %s
172
+ ORDER BY created_at DESC
173
+ LIMIT %s
174
+ """,
175
+ (doc_id, limit),
176
+ )
177
+ rows = cur.fetchall()
178
+ finally:
179
+ conn.close()
180
+
181
+ results = []
182
+ for row in rows:
183
+ value = row[0]
184
+ if isinstance(value, str):
185
+ value = json.loads(value)
186
+ results.append(value)
187
+ return results
188
+
189
+
190
+ def _store_local(corpus_dir: Path, compressed: CompressedSession) -> None:
191
+ memory_dir = corpus_dir / LOCAL_MEMORY_DIR
192
+ memory_dir.mkdir(parents=True, exist_ok=True)
193
+ path = memory_dir / f"{compressed.session_id}.json"
194
+ path.write_text(json.dumps(compressed.model_dump(), ensure_ascii=False, indent=1), encoding="utf-8")
195
+
196
+
197
+ def _list_local(corpus_dir: Path, doc_id: str, limit: int) -> list[dict]:
198
+ memory_dir = corpus_dir / LOCAL_MEMORY_DIR
199
+ if not memory_dir.exists():
200
+ return []
201
+
202
+ sessions = []
203
+ for path in memory_dir.glob("*.json"):
204
+ try:
205
+ data = json.loads(path.read_text(encoding="utf-8"))
206
+ except Exception:
207
+ continue
208
+ if data.get("doc_id") == doc_id:
209
+ sessions.append(data)
210
+
211
+ sessions.sort(key=lambda item: item.get("timestamp", ""), reverse=True)
212
+ return sessions[:limit]
213
+
214
+
215
+ def _write_raw_session(corpus_dir: Path, session_id: str, session_result: dict) -> str:
216
+ archive_dir = corpus_dir / RAW_ARCHIVE_DIR
217
+ archive_dir.mkdir(parents=True, exist_ok=True)
218
+ path = archive_dir / f"{session_id}.json"
219
+ path.write_text(json.dumps(session_result, ensure_ascii=False, indent=1), encoding="utf-8")
220
+ return str(path)
221
+
222
+
223
+ def _string_list(value: Any) -> list[str]:
224
+ if not isinstance(value, list):
225
+ return []
226
+ return [str(item) for item in value if str(item).strip()]
227
+
228
+
229
+ def _utc_now() -> str:
230
+ return datetime.now(timezone.utc).isoformat()
mp1/pluto/signal_logger.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Behavioral response signal capture.
4
+
5
+ This module is lazy: importing it does not require PostgreSQL, embeddings, or
6
+ provider credentials. Missing resources cause the specific signal operation to
7
+ skip or log a warning rather than crashing request handling.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ import hashlib
13
+ import logging
14
+
15
+ from pluto.db import _get_connection
16
+
17
+
18
+ logger = logging.getLogger("pluto")
19
+ REPHRASE_SECONDS = 90
20
+ REPHRASE_SIMILARITY = 0.75
21
+ PRIOR_REFERENCE_PHRASES = (
22
+ "as you said",
23
+ "earlier you mentioned",
24
+ "based on your answer",
25
+ "you mentioned",
26
+ "from your previous",
27
+ )
28
+
29
+
30
+ def query_hash(query: str) -> str:
31
+ return hashlib.sha256(str(query or "").encode("utf-8")).hexdigest()
32
+
33
+
34
+ def log_signal(session_id: str, query_hash: str, signal_type: str) -> None:
35
+ """Write one response signal row if PostgreSQL is available."""
36
+ if not session_id or not query_hash or not signal_type:
37
+ return
38
+
39
+ try:
40
+ conn = _get_connection()
41
+ try:
42
+ with conn.cursor() as cur:
43
+ cur.execute(
44
+ """
45
+ INSERT INTO response_signals (session_id, query_hash, signal_type)
46
+ VALUES (%s, %s, %s)
47
+ """,
48
+ (session_id, query_hash, signal_type),
49
+ )
50
+ conn.commit()
51
+ finally:
52
+ conn.close()
53
+ except Exception as exc:
54
+ logger.warning("Failed to log response signal %s for %s: %s", signal_type, session_id, exc)
55
+
56
+
57
+ def check_rephrase(current_query: str, prev_query: str, time_delta_seconds: float) -> bool:
58
+ """Return True when a near-repeat query arrives soon after a prior response."""
59
+ if not current_query or not prev_query:
60
+ return False
61
+ if time_delta_seconds < 0 or time_delta_seconds > REPHRASE_SECONDS:
62
+ return False
63
+
64
+ try:
65
+ current_embedding = _embed_query(current_query)
66
+ prev_embedding = _embed_query(prev_query)
67
+ return _cosine_similarity(current_embedding, prev_embedding) > REPHRASE_SIMILARITY
68
+ except Exception:
69
+ return False
70
+
71
+
72
+ def check_prior_reference(query: str) -> bool:
73
+ """Return True when the query explicitly refers to an earlier answer."""
74
+ lowered = str(query or "").lower()
75
+ return any(phrase in lowered for phrase in PRIOR_REFERENCE_PHRASES)
76
+
77
+
78
+ def _embed_query(query: str) -> list[float]:
79
+ from pluto.embedder import embed_texts
80
+
81
+ embeddings = embed_texts([query])
82
+ return embeddings[0] if embeddings else []
83
+
84
+
85
+ def _cosine_similarity(a: list[float], b: list[float]) -> float:
86
+ if not a or not b:
87
+ return 0.0
88
+ dot = sum(x * y for x, y in zip(a, b))
89
+ mag_a = sum(x * x for x in a) ** 0.5
90
+ mag_b = sum(y * y for y in b) ** 0.5
91
+ if mag_a == 0 or mag_b == 0:
92
+ return 0.0
93
+ return dot / (mag_a * mag_b)
mp1/pluto/stages/__init__.py CHANGED
@@ -1 +1,2 @@
1
- """Pipeline stages: route → extract → merge → verify."""
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """Pipeline stages: route -> extract -> merge -> evidence_check."""
mp1/pluto/stages/{verify.py → evidence_check.py} RENAMED
@@ -1,7 +1,8 @@
 
1
  """
2
- pluto/stages/verify.py — S3 VERIFY stage.
3
 
4
- Cross-checks merged claims against extracted evidence. The verifier now ranks
5
  candidate evidence first, uses direct support when the match is obvious, and
6
  falls back to an LLM only for ambiguous cases. This improves both speed and
7
  confidence stability.
@@ -20,11 +21,11 @@ from pluto.models import (
20
  Evidence,
21
  ExtractOutput,
22
  MergeOutput,
23
- Verification,
24
- VerifyOutput,
25
  )
26
  from pluto.tracer import Tracer
27
- from pluto.utils import coerce_string, ensure_list, extract_json_from_response, pair_string_lists
28
 
29
  DIRECT_SUPPORT_THRESHOLD = 0.72
30
  LLM_CHECK_THRESHOLD = 0.18
@@ -95,14 +96,14 @@ METHOD_HINTS = {
95
  }
96
 
97
 
98
- def run_verify(
99
  merge_output: MergeOutput,
100
  extractions: list[ExtractOutput],
101
  tracer: Tracer,
102
  bus: MessageBus | None = None,
103
- ) -> VerifyOutput:
104
- """S3 — Verify: cross-check merged claims against extraction evidence."""
105
- tracer.log("stage_start", {"stage": "verify"})
106
 
107
  claims_to_check = [kc.claim.strip() for kc in merge_output.synthesis.key_claims if kc.claim.strip()]
108
  evidence_pool = _build_evidence_pool(extractions)
@@ -123,16 +124,16 @@ def run_verify(
123
  else:
124
  shortlisted = candidates[:MAX_EVIDENCE_CANDIDATES]
125
  if top["score"] >= LLM_CHECK_THRESHOLD:
126
- prompt = _VERIFY_PROMPT.format(
127
  claims_json=json.dumps([{"claim": claim}], indent=1),
128
  evidence_json=json.dumps([_prompt_evidence(item) for item in shortlisted], indent=1),
129
  )
130
 
131
- verdict = _parse_verify_json(dispatch("MODE_QUICK", prompt, tracer=tracer))
132
  status, evidence = _extract_single_verdict(verdict, shortlisted)
133
 
134
  if status == ClaimStatus.UNCERTAIN and top["score"] >= UNCERTAIN_THRESHOLD:
135
- verdict = _parse_verify_json(dispatch("MODE_REASONING", prompt, tracer=tracer))
136
  status, evidence = _extract_single_verdict(verdict, shortlisted)
137
 
138
  if status is not None:
@@ -157,10 +158,10 @@ def run_verify(
157
  if _should_generate_followups(checked_results):
158
  gaps = _build_followups(unsupported)
159
  if bus and gaps:
160
- bus.post("verifier", "gap_report", {"gaps": gaps})
161
 
162
- result = VerifyOutput(
163
- verification=Verification(
164
  checked_claims=checked_results,
165
  unsupported_claims=[item.claim for item in checked_results if item.status == ClaimStatus.UNSUPPORTED],
166
  required_followups=gaps,
@@ -170,7 +171,7 @@ def run_verify(
170
  tracer.log(
171
  "stage_complete",
172
  {
173
- "stage": "verify",
174
  "checked": len(checked_results),
175
  "supported": sum(1 for item in checked_results if item.status == ClaimStatus.SUPPORTED),
176
  "uncertain": sum(1 for item in checked_results if item.status == ClaimStatus.UNCERTAIN),
@@ -179,7 +180,7 @@ def run_verify(
179
  return result
180
 
181
 
182
- _VERIFY_PROMPT = """You are an evidence verification engine. Check each claim below against the source evidence provided.
183
 
184
  For EACH claim, determine if it is:
185
  - "supported": the evidence directly or clearly supports the same factual meaning, even if phrased as a paraphrase
@@ -306,14 +307,24 @@ def _extract_single_verdict(v_data: dict, candidates: list[dict]) -> tuple[Claim
306
  except ValueError:
307
  return None, []
308
 
309
- evidence = _parse_evidence_items(item)
310
- if not evidence and candidates and status != ClaimStatus.UNSUPPORTED:
 
 
 
 
 
 
 
 
 
 
311
  evidence.append(_candidate_to_evidence(candidates[0]))
312
 
313
  return status, evidence
314
 
315
 
316
- def _parse_verify_json(raw: str) -> dict:
317
  try:
318
  return json.loads(extract_json_from_response(raw))
319
  except Exception:
@@ -326,12 +337,12 @@ def _parse_verify_json(raw: str) -> dict:
326
  return {}
327
 
328
 
329
- def _parse_verify(raw: str) -> VerifyOutput:
330
- """Backward-compatible parser for verifier dumps used by local tests/tools."""
331
- data = _parse_verify_json(raw)
332
 
333
  checked_claims = []
334
- for item in ensure_list(data.get("checked_claims", [])):
335
  if not isinstance(item, dict):
336
  continue
337
  status_raw = str(item.get("status", "unsupported")).lower()
@@ -340,7 +351,17 @@ def _parse_verify(raw: str) -> VerifyOutput:
340
  except ValueError:
341
  status = ClaimStatus.UNSUPPORTED
342
 
343
- evidence = _parse_evidence_items(item)
 
 
 
 
 
 
 
 
 
 
344
 
345
  checked_claims.append(
346
  CheckedClaim(
@@ -358,8 +379,8 @@ def _parse_verify(raw: str) -> VerifyOutput:
358
  if not isinstance(required_followups, list):
359
  required_followups = []
360
 
361
- return VerifyOutput(
362
- verification=Verification(
363
  checked_claims=checked_claims,
364
  unsupported_claims=unsupported_claims,
365
  required_followups=required_followups,
@@ -367,46 +388,6 @@ def _parse_verify(raw: str) -> VerifyOutput:
367
  )
368
 
369
 
370
- def _parse_evidence_items(raw_item: dict) -> list[Evidence]:
371
- """Normalize verifier evidence from nested refs or scalar/list doc/chunk ids."""
372
- evidence: list[Evidence] = []
373
-
374
- raw_refs = raw_item.get("evidence") or raw_item.get("evidence_refs") or []
375
- for ref in ensure_list(raw_refs):
376
- if not isinstance(ref, dict):
377
- continue
378
- for doc_id, chunk_id in pair_string_lists(
379
- ref.get("doc_id") or ref.get("evidence_doc_id") or ref.get("doc_ids"),
380
- ref.get("chunk_id") or ref.get("evidence_chunk_id") or ref.get("chunk_ids"),
381
- ):
382
- evidence.append(
383
- Evidence(
384
- doc_id=doc_id,
385
- chunk_id=chunk_id,
386
- where=coerce_string(ref.get("where", ""), default=""),
387
- quote=coerce_string(ref.get("quote", ""), default="")[:200],
388
- )
389
- )
390
-
391
- if evidence:
392
- return evidence
393
-
394
- for doc_id, chunk_id in pair_string_lists(
395
- raw_item.get("evidence_doc_id") or raw_item.get("evidence_doc_ids"),
396
- raw_item.get("evidence_chunk_id") or raw_item.get("evidence_chunk_ids"),
397
- ):
398
- evidence.append(
399
- Evidence(
400
- doc_id=doc_id,
401
- chunk_id=chunk_id,
402
- where=coerce_string(raw_item.get("where", ""), default=""),
403
- quote=coerce_string(raw_item.get("quote", ""), default="")[:200],
404
- )
405
- )
406
-
407
- return evidence
408
-
409
-
410
  def _should_generate_followups(checked_results: list[CheckedClaim]) -> bool:
411
  unsupported_count = sum(1 for item in checked_results if item.status == ClaimStatus.UNSUPPORTED)
412
  if unsupported_count == 0:
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
+ pluto/stages/evidence_check.py — S3 EvidenceCheck stage.
4
 
5
+ Cross-checks merged claims against extracted evidence. The evidence_checker now ranks
6
  candidate evidence first, uses direct support when the match is obvious, and
7
  falls back to an LLM only for ambiguous cases. This improves both speed and
8
  confidence stability.
 
21
  Evidence,
22
  ExtractOutput,
23
  MergeOutput,
24
+ EvidenceCheck,
25
+ EvidenceCheckOutput,
26
  )
27
  from pluto.tracer import Tracer
28
+ from pluto.utils import extract_json_from_response
29
 
30
  DIRECT_SUPPORT_THRESHOLD = 0.72
31
  LLM_CHECK_THRESHOLD = 0.18
 
96
  }
97
 
98
 
99
+ def run_evidence_check(
100
  merge_output: MergeOutput,
101
  extractions: list[ExtractOutput],
102
  tracer: Tracer,
103
  bus: MessageBus | None = None,
104
+ ) -> EvidenceCheckOutput:
105
+ """S3 — EvidenceCheck: cross-check merged claims against extraction evidence."""
106
+ tracer.log("stage_start", {"stage": "evidence_check"})
107
 
108
  claims_to_check = [kc.claim.strip() for kc in merge_output.synthesis.key_claims if kc.claim.strip()]
109
  evidence_pool = _build_evidence_pool(extractions)
 
124
  else:
125
  shortlisted = candidates[:MAX_EVIDENCE_CANDIDATES]
126
  if top["score"] >= LLM_CHECK_THRESHOLD:
127
+ prompt = _EVIDENCE_CHECK_PROMPT.format(
128
  claims_json=json.dumps([{"claim": claim}], indent=1),
129
  evidence_json=json.dumps([_prompt_evidence(item) for item in shortlisted], indent=1),
130
  )
131
 
132
+ verdict = _parse_evidence_check_json(dispatch("MODE_QUICK", prompt, tracer=tracer))
133
  status, evidence = _extract_single_verdict(verdict, shortlisted)
134
 
135
  if status == ClaimStatus.UNCERTAIN and top["score"] >= UNCERTAIN_THRESHOLD:
136
+ verdict = _parse_evidence_check_json(dispatch("MODE_REASONING", prompt, tracer=tracer))
137
  status, evidence = _extract_single_verdict(verdict, shortlisted)
138
 
139
  if status is not None:
 
158
  if _should_generate_followups(checked_results):
159
  gaps = _build_followups(unsupported)
160
  if bus and gaps:
161
+ bus.post("evidence_checker", "gap_report", {"gaps": gaps})
162
 
163
+ result = EvidenceCheckOutput(
164
+ evidence_check=EvidenceCheck(
165
  checked_claims=checked_results,
166
  unsupported_claims=[item.claim for item in checked_results if item.status == ClaimStatus.UNSUPPORTED],
167
  required_followups=gaps,
 
171
  tracer.log(
172
  "stage_complete",
173
  {
174
+ "stage": "evidence_check",
175
  "checked": len(checked_results),
176
  "supported": sum(1 for item in checked_results if item.status == ClaimStatus.SUPPORTED),
177
  "uncertain": sum(1 for item in checked_results if item.status == ClaimStatus.UNCERTAIN),
 
180
  return result
181
 
182
 
183
+ _EVIDENCE_CHECK_PROMPT = """You are an evidence checking engine. Check each claim below against the source evidence provided.
184
 
185
  For EACH claim, determine if it is:
186
  - "supported": the evidence directly or clearly supports the same factual meaning, even if phrased as a paraphrase
 
307
  except ValueError:
308
  return None, []
309
 
310
+ evidence = []
311
+ doc_id = item.get("evidence_doc_id")
312
+ chunk_id = item.get("evidence_chunk_id")
313
+ if doc_id:
314
+ evidence.append(
315
+ Evidence(
316
+ doc_id=doc_id,
317
+ chunk_id=chunk_id or "",
318
+ quote=item.get("quote", ""),
319
+ )
320
+ )
321
+ elif candidates and status != ClaimStatus.UNSUPPORTED:
322
  evidence.append(_candidate_to_evidence(candidates[0]))
323
 
324
  return status, evidence
325
 
326
 
327
+ def _parse_evidence_check_json(raw: str) -> dict:
328
  try:
329
  return json.loads(extract_json_from_response(raw))
330
  except Exception:
 
337
  return {}
338
 
339
 
340
+ def _parse_evidence_check(raw: str) -> EvidenceCheckOutput:
341
+ """Backward-compatible parser for evidence_checker dumps used by local tests/tools."""
342
+ data = _parse_evidence_check_json(raw)
343
 
344
  checked_claims = []
345
+ for item in data.get("checked_claims", []):
346
  if not isinstance(item, dict):
347
  continue
348
  status_raw = str(item.get("status", "unsupported")).lower()
 
351
  except ValueError:
352
  status = ClaimStatus.UNSUPPORTED
353
 
354
+ evidence = []
355
+ doc_id = item.get("evidence_doc_id")
356
+ if doc_id:
357
+ evidence.append(
358
+ Evidence(
359
+ doc_id=doc_id,
360
+ chunk_id=item.get("evidence_chunk_id", ""),
361
+ where=item.get("where", ""),
362
+ quote=item.get("quote", ""),
363
+ )
364
+ )
365
 
366
  checked_claims.append(
367
  CheckedClaim(
 
379
  if not isinstance(required_followups, list):
380
  required_followups = []
381
 
382
+ return EvidenceCheckOutput(
383
+ evidence_check=EvidenceCheck(
384
  checked_claims=checked_claims,
385
  unsupported_claims=unsupported_claims,
386
  required_followups=required_followups,
 
388
  )
389
 
390
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
391
  def _should_generate_followups(checked_results: list[CheckedClaim]) -> bool:
392
  unsupported_count = sum(1 for item in checked_results if item.status == ClaimStatus.UNSUPPORTED)
393
  if unsupported_count == 0:
mp1/pluto/stages/extract.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/stages/extract.py — S1 EXTRACT stage.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/stages/extract.py — S1 EXTRACT stage.
4
 
mp1/pluto/stages/merge.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/stages/merge.py — S2 MERGE stage.
3
 
@@ -27,7 +28,6 @@ from pluto.models import (
27
  Synthesis,
28
  )
29
  from pluto.tracer import Tracer
30
- from pluto.utils import coerce_string, coerce_string_list, ensure_list, pair_string_lists
31
 
32
 
33
  _BATCH_PROMPT = """You are synthesizing extracted facts from a document chunk batch. Produce a focused sub-summary for the user's question.
@@ -315,18 +315,20 @@ def _parse_merge(raw: str) -> MergeOutput:
315
  section=sec.get("section", ""),
316
  points=sec.get("points", []),
317
  )
318
- for sec in ensure_list(data.get("answer_outline", []))
319
  if isinstance(sec, dict)
320
  if sec.get("section") or sec.get("points")
321
  ]
322
 
323
  key_claims: list[KeyClaim] = []
324
- for kc in ensure_list(data.get("key_claims", [])):
325
  if not isinstance(kc, dict):
326
  continue
327
- evidence_refs = _parse_evidence_refs(kc)
 
 
328
 
329
- support_str = coerce_string(kc.get("support", "supported"), default="supported").lower()
330
  try:
331
  support = ClaimStatus(support_str)
332
  except ValueError:
@@ -368,8 +370,6 @@ def _stabilize_merge(result: MergeOutput, query: str = "", detail_level: str = "
368
  outline = _synthesize_outline_from_claims(key_claims, query=query, detail_level=detail_level)
369
  elif outline:
370
  outline = _top_up_outline(outline, key_claims, detail_level=detail_level)
371
- if detail_level == "detailed" and key_claims:
372
- outline = _enrich_detailed_outline(outline, key_claims, query=query)
373
 
374
  return MergeOutput(
375
  synthesis=Synthesis(
@@ -559,73 +559,6 @@ def _top_up_outline(
559
  return outline
560
 
561
 
562
- def _enrich_detailed_outline(
563
- outline: list[SectionPoint],
564
- key_claims: list[KeyClaim],
565
- query: str = "",
566
- ) -> list[SectionPoint]:
567
- """Guarantee richer structure for detailed mode when evidence is available."""
568
- synthesized = _synthesize_outline_from_claims(key_claims, query=query, detail_level="detailed")
569
- if not synthesized:
570
- return outline
571
- if not outline:
572
- return synthesized
573
- return _merge_outline_variants(outline, synthesized, point_cap=7, section_cap=5)
574
-
575
-
576
- def _merge_outline_variants(
577
- primary: list[SectionPoint],
578
- secondary: list[SectionPoint],
579
- point_cap: int,
580
- section_cap: int,
581
- ) -> list[SectionPoint]:
582
- """Merge outline variants while preserving order and deduplicating points."""
583
- merged: list[SectionPoint] = []
584
- title_to_index: dict[str, int] = {}
585
-
586
- def add_section(section: SectionPoint) -> None:
587
- title = _clean_text(section.section)
588
- if not title:
589
- return
590
-
591
- title_key = _fingerprint(title)
592
- clean_points: list[str] = []
593
- seen_local: set[str] = set()
594
- for point in section.points:
595
- text = _clean_text(point)
596
- fingerprint = _fingerprint(text)
597
- if not text or fingerprint in seen_local:
598
- continue
599
- seen_local.add(fingerprint)
600
- clean_points.append(text)
601
- if not clean_points:
602
- return
603
-
604
- if title_key in title_to_index:
605
- existing = merged[title_to_index[title_key]]
606
- seen_existing = {_fingerprint(point) for point in existing.points}
607
- for point in clean_points:
608
- fingerprint = _fingerprint(point)
609
- if fingerprint in seen_existing or len(existing.points) >= point_cap:
610
- continue
611
- existing.points.append(point)
612
- seen_existing.add(fingerprint)
613
- return
614
-
615
- if len(merged) >= section_cap:
616
- return
617
-
618
- title_to_index[title_key] = len(merged)
619
- merged.append(SectionPoint(section=title, points=clean_points[:point_cap]))
620
-
621
- for section in primary:
622
- add_section(section)
623
- for section in secondary:
624
- add_section(section)
625
-
626
- return merged or primary or secondary
627
-
628
-
629
  def _normalize_detail_level(detail_level: str | None) -> str:
630
  return "detailed" if str(detail_level or "").strip().lower() == "detailed" else "standard"
631
 
@@ -706,43 +639,3 @@ def _normalize_open_gaps(raw_open_gaps) -> list[str]:
706
  if text:
707
  normalized.append(text)
708
  return normalized
709
-
710
-
711
- def _parse_evidence_refs(raw_item: dict) -> list[Evidence]:
712
- """Normalize evidence refs from scalar, list, or nested-object shapes."""
713
- evidence_refs: list[Evidence] = []
714
-
715
- raw_refs = raw_item.get("evidence_refs") or raw_item.get("evidence") or []
716
- for ref in ensure_list(raw_refs):
717
- if not isinstance(ref, dict):
718
- continue
719
- for doc_id, chunk_id in pair_string_lists(
720
- ref.get("doc_id") or ref.get("evidence_doc_id") or ref.get("doc_ids"),
721
- ref.get("chunk_id") or ref.get("evidence_chunk_id") or ref.get("chunk_ids"),
722
- ):
723
- evidence_refs.append(
724
- Evidence(
725
- doc_id=doc_id,
726
- chunk_id=chunk_id,
727
- where=coerce_string(ref.get("where", ""), default=""),
728
- quote=coerce_string(ref.get("quote", ""), default="")[:200],
729
- )
730
- )
731
-
732
- if evidence_refs:
733
- return _dedupe_evidence_refs(evidence_refs)
734
-
735
- for doc_id, chunk_id in pair_string_lists(
736
- raw_item.get("evidence_doc_ids") or raw_item.get("evidence_doc_id"),
737
- raw_item.get("evidence_chunk_ids") or raw_item.get("evidence_chunk_id"),
738
- ):
739
- evidence_refs.append(Evidence(doc_id=doc_id, chunk_id=chunk_id))
740
-
741
- # Last-resort fallback when the model emits one combined evidence object.
742
- if not evidence_refs:
743
- chunk_ids = coerce_string_list(raw_item.get("chunk_ids") or raw_item.get("chunk_id"))
744
- doc_ids = coerce_string_list(raw_item.get("doc_ids") or raw_item.get("doc_id"))
745
- for doc_id, chunk_id in pair_string_lists(doc_ids, chunk_ids):
746
- evidence_refs.append(Evidence(doc_id=doc_id, chunk_id=chunk_id))
747
-
748
- return _dedupe_evidence_refs(evidence_refs)
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/stages/merge.py — S2 MERGE stage.
4
 
 
28
  Synthesis,
29
  )
30
  from pluto.tracer import Tracer
 
31
 
32
 
33
  _BATCH_PROMPT = """You are synthesizing extracted facts from a document chunk batch. Produce a focused sub-summary for the user's question.
 
315
  section=sec.get("section", ""),
316
  points=sec.get("points", []),
317
  )
318
+ for sec in data.get("answer_outline", [])
319
  if isinstance(sec, dict)
320
  if sec.get("section") or sec.get("points")
321
  ]
322
 
323
  key_claims: list[KeyClaim] = []
324
+ for kc in data.get("key_claims", []):
325
  if not isinstance(kc, dict):
326
  continue
327
+ evidence_refs = []
328
+ for doc_id, chunk_id in zip(kc.get("evidence_doc_ids") or [], kc.get("evidence_chunk_ids") or []):
329
+ evidence_refs.append(Evidence(doc_id=doc_id or "", chunk_id=chunk_id or ""))
330
 
331
+ support_str = str(kc.get("support", "supported")).lower()
332
  try:
333
  support = ClaimStatus(support_str)
334
  except ValueError:
 
370
  outline = _synthesize_outline_from_claims(key_claims, query=query, detail_level=detail_level)
371
  elif outline:
372
  outline = _top_up_outline(outline, key_claims, detail_level=detail_level)
 
 
373
 
374
  return MergeOutput(
375
  synthesis=Synthesis(
 
559
  return outline
560
 
561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
562
  def _normalize_detail_level(detail_level: str | None) -> str:
563
  return "detailed" if str(detail_level or "").strip().lower() == "detailed" else "standard"
564
 
 
639
  if text:
640
  normalized.append(text)
641
  return normalized
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mp1/pluto/stages/route.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/stages/route.py — S0 ROUTE stage (Phase B).
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/stages/route.py — S0 ROUTE stage (Phase B).
4
 
mp1/pluto/stages/understand.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/stages/understand.py — Phase A: Document Understanding.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/stages/understand.py — Phase A: Document Understanding.
4
 
mp1/pluto/tools.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/tools.py — Corpus access tools (spec §3).
3
 
@@ -115,6 +116,8 @@ class CorpusTools:
115
  return ""
116
  if 0 <= idx < len(chunks):
117
  raw = chunks[idx]
 
 
118
  # Inject context header so extraction agents know where this chunk sits
119
  from pluto.embedder import inject_context_headers
120
  with_header = inject_context_headers([raw], doc_id, self.doc_index)
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/tools.py — Corpus access tools (spec §3).
4
 
 
116
  return ""
117
  if 0 <= idx < len(chunks):
118
  raw = chunks[idx]
119
+ from pluto.doc_summary import apply_doc_summary_context
120
+ raw = apply_doc_summary_context(raw, doc_id, self.corpus_dir)
121
  # Inject context header so extraction agents know where this chunk sits
122
  from pluto.embedder import inject_context_headers
123
  with_header = inject_context_headers([raw], doc_id, self.doc_index)
mp1/pluto/tracer.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  pluto/tracer.py — Logging & trace system for pipeline execution.
3
 
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/tracer.py — Logging & trace system for pipeline execution.
4
 
mp1/pluto/utils.py CHANGED
@@ -1,28 +1,11 @@
 
1
  """
2
  pluto/utils.py — Shared utilities for response parsing.
3
  """
4
 
5
  from __future__ import annotations
6
 
7
- import json
8
  import re
9
- from itertools import zip_longest
10
-
11
-
12
- _PREFERRED_TEXT_KEYS = (
13
- "chunk_id",
14
- "doc_id",
15
- "value",
16
- "text",
17
- "title",
18
- "label",
19
- "name",
20
- "id",
21
- "where",
22
- "quote",
23
- "claim",
24
- "section",
25
- )
26
 
27
 
28
  def strip_think_block(text: str) -> str:
@@ -46,81 +29,3 @@ def extract_json_from_response(raw: str) -> str:
46
  return brace_match.group(0).strip()
47
 
48
  return cleaned.strip()
49
-
50
-
51
- def ensure_list(value):
52
- """Return *value* as a list while preserving existing lists."""
53
- if value is None:
54
- return []
55
- if isinstance(value, list):
56
- return value
57
- if isinstance(value, (tuple, set)):
58
- return list(value)
59
- return [value]
60
-
61
-
62
- def flatten_string_values(value) -> list[str]:
63
- """Flatten nested scalars/collections into a list of non-empty strings."""
64
- values: list[str] = []
65
-
66
- def _walk(item) -> None:
67
- if item is None:
68
- return
69
- if isinstance(item, dict):
70
- for key in _PREFERRED_TEXT_KEYS:
71
- if key in item and item[key] not in (None, ""):
72
- _walk(item[key])
73
- return
74
- dumped = json.dumps(item, ensure_ascii=False, sort_keys=True).strip()
75
- if dumped:
76
- values.append(dumped)
77
- return
78
- if isinstance(item, (list, tuple, set)):
79
- for part in item:
80
- _walk(part)
81
- return
82
-
83
- text = str(item).strip()
84
- if text:
85
- values.append(text)
86
-
87
- _walk(value)
88
- return values
89
-
90
-
91
- def coerce_string(value, default: str = "") -> str:
92
- """Normalize mixed scalar/list inputs into one printable string."""
93
- parts = flatten_string_values(value)
94
- return ", ".join(parts) if parts else default
95
-
96
-
97
- def coerce_string_list(value) -> list[str]:
98
- """Normalize mixed scalar/list inputs into a deduplicated string list."""
99
- seen: set[str] = set()
100
- normalized: list[str] = []
101
- for item in flatten_string_values(value):
102
- if item in seen:
103
- continue
104
- seen.add(item)
105
- normalized.append(item)
106
- return normalized
107
-
108
-
109
- def pair_string_lists(left, right) -> list[tuple[str, str]]:
110
- """Broadcast or zip mixed scalar/list inputs into string pairs."""
111
- left_items = coerce_string_list(left)
112
- right_items = coerce_string_list(right)
113
-
114
- if not left_items and not right_items:
115
- return []
116
- if not left_items:
117
- left_items = [""]
118
- if not right_items:
119
- right_items = [""]
120
-
121
- if len(left_items) == 1 and len(right_items) > 1:
122
- return [(left_items[0], item) for item in right_items]
123
- if len(right_items) == 1 and len(left_items) > 1:
124
- return [(item, right_items[0]) for item in left_items]
125
-
126
- return list(zip_longest(left_items, right_items, fillvalue=""))
 
1
+ # -*- coding: utf-8 -*-
2
  """
3
  pluto/utils.py — Shared utilities for response parsing.
4
  """
5
 
6
  from __future__ import annotations
7
 
 
8
  import re
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
 
11
  def strip_think_block(text: str) -> str:
 
29
  return brace_match.group(0).strip()
30
 
31
  return cleaned.strip()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mp1/requirements.txt CHANGED
@@ -6,8 +6,9 @@ uvicorn>=0.27.0
6
  python-dotenv>=1.0.0
7
  pytest>=8.0.0
8
  python-multipart>=0.0.5
9
- PyPDF2>=3.0.0
10
  python-docx>=1.1.0
11
  requests>=2.31.0
12
  openai>=1.0.0
13
  numpy>=1.24.0
 
 
6
  python-dotenv>=1.0.0
7
  pytest>=8.0.0
8
  python-multipart>=0.0.5
9
+ pdfplumber>=0.10.0
10
  python-docx>=1.1.0
11
  requests>=2.31.0
12
  openai>=1.0.0
13
  numpy>=1.24.0
14
+ psycopg2-binary>=2.9.0
mp1/scripts/generate_app_summary_pdf.py ADDED
@@ -0,0 +1,367 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from pathlib import Path
5
+ from xml.sax.saxutils import escape
6
+
7
+ import fitz
8
+ from pypdf import PdfReader
9
+ from reportlab.lib import colors
10
+ from reportlab.lib.pagesizes import A4
11
+ from reportlab.lib.styles import ParagraphStyle
12
+ from reportlab.pdfgen import canvas
13
+ from reportlab.platypus import Paragraph
14
+
15
+
16
+ ROOT = Path(__file__).resolve().parents[1]
17
+ OUTPUT_DIR = ROOT / "output" / "pdf"
18
+ TMP_DIR = ROOT / "tmp" / "pdfs"
19
+ PDF_PATH = OUTPUT_DIR / "pluto_app_summary_one_page.pdf"
20
+ PNG_PATH = TMP_DIR / "pluto_app_summary_one_page-1.png"
21
+
22
+ PAGE_WIDTH, PAGE_HEIGHT = A4
23
+ MARGIN_X = 34
24
+ MARGIN_TOP = 34
25
+ MARGIN_BOTTOM = 28
26
+ GUTTER = 16
27
+ COLUMN_WIDTH = (PAGE_WIDTH - (2 * MARGIN_X) - GUTTER) / 2
28
+
29
+ NAVY = colors.HexColor("#17324D")
30
+ TEAL = colors.HexColor("#2A7F8C")
31
+ INK = colors.HexColor("#1D2430")
32
+ MUTED = colors.HexColor("#5C6773")
33
+ CARD_BG = colors.HexColor("#F5F8FB")
34
+ CARD_BORDER = colors.HexColor("#D6E1EA")
35
+ ACCENT_BG = colors.HexColor("#E8F4F4")
36
+ WHITE = colors.white
37
+
38
+
39
+ @dataclass
40
+ class SectionBlock:
41
+ title: str
42
+ items: list[tuple[str, str]]
43
+
44
+
45
+ def build_blocks() -> tuple[list[SectionBlock], list[SectionBlock]]:
46
+ left = [
47
+ SectionBlock(
48
+ title="What it is",
49
+ items=[
50
+ (
51
+ "body",
52
+ "Pluto is an AI-powered document extraction and question-answering app. "
53
+ "It lets a user upload documents into a corpus, run a multi-stage pipeline, "
54
+ "and inspect the answer with evidence, trace, and confidence signals.",
55
+ ),
56
+ ],
57
+ ),
58
+ SectionBlock(
59
+ title="Who it's for",
60
+ items=[
61
+ ("body", "Primary user/persona: Not found in repo."),
62
+ (
63
+ "body",
64
+ "Closest repo evidence: a person asking research-style questions "
65
+ "over uploaded documents and reviewing evidence-backed results.",
66
+ ),
67
+ ],
68
+ ),
69
+ SectionBlock(
70
+ title="What it does",
71
+ items=[
72
+ ("bullet", "Uploads PDF, DOCX/DOC, TXT, and Markdown files into a corpus."),
73
+ ("bullet", "Converts uploads to Markdown, chunks them, classifies them, and tracks readiness."),
74
+ ("bullet", "Runs a 4-stage pipeline: route, extract, merge, evidence_check."),
75
+ ("bullet", "Streams live progress and upload status to the dashboard."),
76
+ ("bullet", "Queries the full corpus or selected ready documents."),
77
+ ("bullet", "Shows final sections, evidence, trace, confidence, and a benchmark view."),
78
+ ],
79
+ ),
80
+ ]
81
+
82
+ right = [
83
+ SectionBlock(
84
+ title="How it works",
85
+ items=[
86
+ (
87
+ "bullet",
88
+ "Frontend: `frontend/index.html` + `app.js` call `/api/upload`, `/api/corpus`, "
89
+ "`/api/run`, `/api/stream`, and `/api/compare`.",
90
+ ),
91
+ (
92
+ "bullet",
93
+ "Server: `pluto/server.py` serves the UI, handles uploads, streams SSE progress, "
94
+ "and runs `PipelineRunner` in a worker thread.",
95
+ ),
96
+ (
97
+ "bullet",
98
+ "Ingest path: uploaded file -> Markdown in `corpus/` -> chunk split/classification "
99
+ "-> `DocIndex` registration; background Phase A stores overview/status in "
100
+ "`corpus/.doc_index.json`.",
101
+ ),
102
+ (
103
+ "bullet",
104
+ "Query path: selected docs -> S0 route -> S1 extract -> S2 merge -> S3 evidence_check "
105
+ "-> JSON result + cache stats -> UI panels; final JSON also writes to "
106
+ "`output/final_output.json`.",
107
+ ),
108
+ (
109
+ "bullet",
110
+ "Support layers: `ExtractionCache` reuses extractions; `CorpusTools` reads/searches chunks. "
111
+ "NVIDIA embedding/rerank code paths exist, and chunking falls back when NVIDIA keys are absent.",
112
+ ),
113
+ ],
114
+ ),
115
+ SectionBlock(
116
+ title="How to run",
117
+ items=[
118
+ ("bullet", "From `mp1/`: `pip install -r requirements.txt`"),
119
+ (
120
+ "bullet",
121
+ "Create `.env` and set `GROQ_API_KEY` (explicitly named in `README.md`).",
122
+ ),
123
+ ("bullet", "Run `python main.py --serve`"),
124
+ ("bullet", "Open `http://localhost:8000`"),
125
+ (
126
+ "bullet",
127
+ "Upload docs, wait for Understanding to finish, then submit a query. "
128
+ "Other required provider keys: Not found in repo.",
129
+ ),
130
+ ],
131
+ ),
132
+ ]
133
+ return left, right
134
+
135
+
136
+ def make_styles(scale: float) -> dict[str, ParagraphStyle]:
137
+ return {
138
+ "title": ParagraphStyle(
139
+ "title",
140
+ fontName="Helvetica-Bold",
141
+ fontSize=21 * scale,
142
+ leading=25 * scale,
143
+ textColor=WHITE,
144
+ spaceAfter=0,
145
+ ),
146
+ "subtitle": ParagraphStyle(
147
+ "subtitle",
148
+ fontName="Helvetica",
149
+ fontSize=9.6 * scale,
150
+ leading=12 * scale,
151
+ textColor=colors.HexColor("#DCE7F3"),
152
+ ),
153
+ "eyebrow": ParagraphStyle(
154
+ "eyebrow",
155
+ fontName="Helvetica-Bold",
156
+ fontSize=7.4 * scale,
157
+ leading=9 * scale,
158
+ textColor=colors.HexColor("#B9D6DA"),
159
+ ),
160
+ "section_title": ParagraphStyle(
161
+ "section_title",
162
+ fontName="Helvetica-Bold",
163
+ fontSize=10.6 * scale,
164
+ leading=12.5 * scale,
165
+ textColor=NAVY,
166
+ ),
167
+ "body": ParagraphStyle(
168
+ "body",
169
+ fontName="Helvetica",
170
+ fontSize=8.6 * scale,
171
+ leading=11 * scale,
172
+ textColor=INK,
173
+ ),
174
+ "bullet": ParagraphStyle(
175
+ "bullet",
176
+ fontName="Helvetica",
177
+ fontSize=8.5 * scale,
178
+ leading=10.7 * scale,
179
+ textColor=INK,
180
+ leftIndent=10 * scale,
181
+ firstLineIndent=-7 * scale,
182
+ ),
183
+ "footer": ParagraphStyle(
184
+ "footer",
185
+ fontName="Helvetica",
186
+ fontSize=7.1 * scale,
187
+ leading=8.5 * scale,
188
+ textColor=MUTED,
189
+ ),
190
+ }
191
+
192
+
193
+ def escape_inline(text: str) -> str:
194
+ escaped = escape(text)
195
+ return escaped.replace("`", "<font name='Courier'>").replace("</font><font name='Courier'>", "")
196
+
197
+
198
+ def format_text(text: str) -> str:
199
+ parts = text.split("`")
200
+ if len(parts) == 1:
201
+ return escape(text)
202
+
203
+ result: list[str] = []
204
+ code = False
205
+ for part in parts:
206
+ if code:
207
+ result.append(f"<font name='Courier'>{escape(part)}</font>")
208
+ else:
209
+ result.append(escape(part))
210
+ code = not code
211
+ return "".join(result)
212
+
213
+
214
+ def paragraph_for(kind: str, text: str, styles: dict[str, ParagraphStyle]) -> Paragraph:
215
+ style_name = "bullet" if kind == "bullet" else "body"
216
+ content = f"- {text}" if kind == "bullet" else text
217
+ return Paragraph(format_text(content), styles[style_name])
218
+
219
+
220
+ def measure_section(block: SectionBlock, styles: dict[str, ParagraphStyle], width: float) -> tuple[float, list[Paragraph]]:
221
+ title = Paragraph(format_text(block.title), styles["section_title"])
222
+ rendered_items = [paragraph_for(kind, text, styles) for kind, text in block.items]
223
+
224
+ title_height = title.wrap(width - 20, 1000)[1]
225
+ items_height = 0.0
226
+ for para in rendered_items:
227
+ items_height += para.wrap(width - 20, 1000)[1]
228
+ items_height += 5
229
+
230
+ total = 14 + title_height + 8 + items_height + 10
231
+ return total, [title, *rendered_items]
232
+
233
+
234
+ def choose_scale(left: list[SectionBlock], right: list[SectionBlock]) -> tuple[float, dict[str, ParagraphStyle], float]:
235
+ header_space = 114
236
+ footer_space = 18
237
+ available = PAGE_HEIGHT - MARGIN_TOP - MARGIN_BOTTOM - header_space - footer_space
238
+
239
+ for scale in (1.0, 0.97, 0.94, 0.91, 0.88, 0.85):
240
+ styles = make_styles(scale)
241
+ left_height = total_column_height(left, styles)
242
+ right_height = total_column_height(right, styles)
243
+ if max(left_height, right_height) <= available:
244
+ return scale, styles, available
245
+
246
+ raise RuntimeError("Content did not fit on a single page.")
247
+
248
+
249
+ def total_column_height(blocks: list[SectionBlock], styles: dict[str, ParagraphStyle]) -> float:
250
+ total = 0.0
251
+ for index, block in enumerate(blocks):
252
+ section_height, _ = measure_section(block, styles, COLUMN_WIDTH)
253
+ total += section_height
254
+ if index < len(blocks) - 1:
255
+ total += 10
256
+ return total
257
+
258
+
259
+ def draw_header(pdf: canvas.Canvas, styles: dict[str, ParagraphStyle]) -> float:
260
+ header_height = 94
261
+ header_y = PAGE_HEIGHT - MARGIN_TOP - header_height
262
+
263
+ pdf.setFillColor(NAVY)
264
+ pdf.roundRect(MARGIN_X, header_y, PAGE_WIDTH - (2 * MARGIN_X), header_height, 14, stroke=0, fill=1)
265
+ pdf.setFillColor(TEAL)
266
+ pdf.roundRect(PAGE_WIDTH - MARGIN_X - 110, header_y, 110, header_height, 14, stroke=0, fill=1)
267
+
268
+ eyebrow = Paragraph("ONE-PAGE APP SUMMARY", styles["eyebrow"])
269
+ title = Paragraph("Pluto", styles["title"])
270
+ subtitle = Paragraph(
271
+ "Repo-backed overview of the document extraction and question-answering dashboard.",
272
+ styles["subtitle"],
273
+ )
274
+
275
+ x = MARGIN_X + 18
276
+ y = PAGE_HEIGHT - MARGIN_TOP - 16
277
+
278
+ for para, width in ((eyebrow, 210), (title, 260), (subtitle, PAGE_WIDTH - (2 * MARGIN_X) - 150)):
279
+ _, height = para.wrap(width, 1000)
280
+ para.drawOn(pdf, x, y - height)
281
+ y -= height + 4
282
+
283
+ note = Paragraph("Evidence source: README + app server, pipeline, ingest, index, and UI files.", styles["subtitle"])
284
+ note_width = 92
285
+ _, note_height = note.wrap(note_width, 1000)
286
+ note.drawOn(pdf, PAGE_WIDTH - MARGIN_X - 98, header_y + header_height - 18 - note_height)
287
+
288
+ return header_y - 12
289
+
290
+
291
+ def draw_column(
292
+ pdf: canvas.Canvas,
293
+ blocks: list[SectionBlock],
294
+ x: float,
295
+ top_y: float,
296
+ styles: dict[str, ParagraphStyle],
297
+ ) -> None:
298
+ y = top_y
299
+ for block in blocks:
300
+ section_height, items = measure_section(block, styles, COLUMN_WIDTH)
301
+
302
+ pdf.setFillColor(CARD_BG if block.title != "How to run" else ACCENT_BG)
303
+ pdf.setStrokeColor(CARD_BORDER)
304
+ pdf.roundRect(x, y - section_height, COLUMN_WIDTH, section_height, 12, stroke=1, fill=1)
305
+
306
+ cursor = y - 14
307
+ title = items[0]
308
+ _, title_height = title.wrap(COLUMN_WIDTH - 20, 1000)
309
+ title.drawOn(pdf, x + 10, cursor - title_height)
310
+ cursor -= title_height + 8
311
+
312
+ for para in items[1:]:
313
+ _, para_height = para.wrap(COLUMN_WIDTH - 20, 1000)
314
+ para.drawOn(pdf, x + 10, cursor - para_height)
315
+ cursor -= para_height + 5
316
+
317
+ y -= section_height + 10
318
+
319
+
320
+ def build_pdf() -> None:
321
+ OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
322
+ TMP_DIR.mkdir(parents=True, exist_ok=True)
323
+
324
+ left, right = build_blocks()
325
+ _, styles, _ = choose_scale(left, right)
326
+
327
+ pdf = canvas.Canvas(str(PDF_PATH), pagesize=A4)
328
+ pdf.setTitle("Pluto App Summary")
329
+ pdf.setAuthor("OpenAI Codex")
330
+ pdf.setSubject("One-page summary generated from repository evidence")
331
+
332
+ top_y = draw_header(pdf, styles)
333
+ draw_column(pdf, left, MARGIN_X, top_y, styles)
334
+ draw_column(pdf, right, MARGIN_X + COLUMN_WIDTH + GUTTER, top_y, styles)
335
+
336
+ footer = Paragraph(
337
+ "Not found in repo items are labeled explicitly. Output generated as a single-page PDF.",
338
+ styles["footer"],
339
+ )
340
+ _, footer_height = footer.wrap(PAGE_WIDTH - (2 * MARGIN_X), 1000)
341
+ footer.drawOn(pdf, MARGIN_X, MARGIN_BOTTOM - 4)
342
+
343
+ pdf.showPage()
344
+ pdf.save()
345
+
346
+
347
+ def validate_outputs() -> None:
348
+ reader = PdfReader(str(PDF_PATH))
349
+ if len(reader.pages) != 1:
350
+ raise RuntimeError(f"Expected 1 page, found {len(reader.pages)}")
351
+
352
+ document = fitz.open(PDF_PATH)
353
+ page = document.load_page(0)
354
+ pix = page.get_pixmap(matrix=fitz.Matrix(2.0, 2.0), alpha=False)
355
+ pix.save(PNG_PATH)
356
+ document.close()
357
+
358
+
359
+ def main() -> None:
360
+ build_pdf()
361
+ validate_outputs()
362
+ print(f"PDF_PATH={PDF_PATH}")
363
+ print(f"PNG_PATH={PNG_PATH}")
364
+
365
+
366
+ if __name__ == "__main__":
367
+ main()
mp1/test_doc_summary.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+
3
+ from pathlib import Path
4
+
5
+ from pluto.doc_summary import (
6
+ DocSummary,
7
+ apply_doc_summary_context,
8
+ generate_doc_summary,
9
+ save_doc_summaries,
10
+ )
11
+
12
+
13
+ def test_generate_doc_summary_returns_valid_summary_with_mocked_llm(monkeypatch, tmp_path):
14
+ corpus = tmp_path / "corpus"
15
+ corpus.mkdir()
16
+ (corpus / "paper.md").write_text("# Paper\n\nThis is about retrieval.", encoding="utf-8")
17
+
18
+ monkeypatch.setattr(
19
+ "pluto.doc_summary._call_summary_llm",
20
+ lambda **kwargs: """
21
+ {
22
+ "title": "Retrieval Paper",
23
+ "domain": "information retrieval",
24
+ "key_claims": ["Chunk context improves retrieval"],
25
+ "structure": ["intro", "methodology", "results"],
26
+ "open_questions": ["How robust is it?"]
27
+ }
28
+ """,
29
+ )
30
+
31
+ summary = generate_doc_summary("paper", corpus)
32
+
33
+ assert isinstance(summary, DocSummary)
34
+ assert summary.doc_id == "paper"
35
+ assert summary.title == "Retrieval Paper"
36
+ assert summary.domain == "information retrieval"
37
+ assert summary.key_claims == ["Chunk context improves retrieval"]
38
+
39
+
40
+ def test_generate_doc_summary_falls_back_when_llm_fails(monkeypatch, tmp_path):
41
+ corpus = tmp_path / "corpus"
42
+ corpus.mkdir()
43
+ (corpus / "paper.md").write_text("# Paper\n\nBody.", encoding="utf-8")
44
+
45
+ def fail(**kwargs):
46
+ raise RuntimeError("model unavailable")
47
+
48
+ monkeypatch.setattr("pluto.doc_summary._call_summary_llm", fail)
49
+
50
+ summary = generate_doc_summary("paper", corpus)
51
+
52
+ assert summary.doc_id == "paper"
53
+ assert summary.title == "paper"
54
+ assert summary.key_claims == []
55
+ assert summary.open_questions == []
56
+
57
+
58
+ def test_context_prefix_is_prepended_to_chunk_text(tmp_path):
59
+ corpus = tmp_path / "corpus"
60
+ corpus.mkdir()
61
+ summary = DocSummary(
62
+ doc_id="paper",
63
+ title="Retrieval Paper",
64
+ domain="AI",
65
+ key_claims=["Claim A", "Claim B"],
66
+ structure=[],
67
+ open_questions=[],
68
+ created_at="2026-01-01T00:00:00+00:00",
69
+ )
70
+ save_doc_summaries(corpus, {"paper": summary})
71
+
72
+ result = apply_doc_summary_context("Original chunk", "paper", corpus)
73
+
74
+ assert result.startswith("[Document context: Retrieval Paper | Domain: AI | Key claims: Claim A; Claim B]")
75
+ assert result.endswith("Original chunk")
mp1/test_merge.py CHANGED
@@ -9,7 +9,7 @@ from pluto.models import (
9
  Synthesis,
10
  )
11
  from pluto.stages import merge as merge_stage
12
- from pluto.stages.merge import _parse_merge, run_merge
13
  from pluto.tracer import Tracer
14
 
15
 
@@ -78,117 +78,3 @@ def test_merge_synthesizes_outline_when_model_returns_only_key_claims(monkeypatc
78
  for section in result.synthesis.answer_outline
79
  for point in section.points
80
  )
81
-
82
-
83
- def test_parse_merge_normalizes_scalar_doc_and_multi_chunk_evidence():
84
- raw = """
85
- {
86
- "answer_outline": [
87
- {
88
- "section": "Overview",
89
- "points": "The method uses evidence from multiple chunks."
90
- }
91
- ],
92
- "key_claims": [
93
- {
94
- "claim": "The method is supported across several chunks.",
95
- "support": "supported",
96
- "evidence_doc_ids": "paper_a",
97
- "evidence_chunk_ids": [["C18", "C46", "C81"]]
98
- }
99
- ],
100
- "open_gaps": []
101
- }
102
- """
103
-
104
- out = _parse_merge(raw)
105
-
106
- assert out.synthesis.answer_outline[0].points == ["The method uses evidence from multiple chunks."]
107
- refs = out.synthesis.key_claims[0].evidence_refs
108
- assert len(refs) == 3
109
- assert [ref.doc_id for ref in refs] == ["paper_a", "paper_a", "paper_a"]
110
- assert [ref.chunk_id for ref in refs] == ["C18", "C46", "C81"]
111
-
112
-
113
- def test_merge_detailed_mode_produces_richer_answer_structure(monkeypatch):
114
- raw_merge = """
115
- {
116
- "answer_outline": [
117
- {
118
- "section": "Overview",
119
- "points": [
120
- "The paper introduces a multi-agent defense coordinator.",
121
- "The system reports strong defended-scenario performance."
122
- ]
123
- }
124
- ],
125
- "key_claims": [
126
- {
127
- "claim": "The paper introduces a multi-agent defense coordinator for prompt-injection mitigation.",
128
- "support": "supported",
129
- "evidence_doc_ids": ["multi_agent"],
130
- "evidence_chunk_ids": ["C1"]
131
- },
132
- {
133
- "claim": "The evaluation reports 0% ASR across defended scenarios.",
134
- "support": "supported",
135
- "evidence_doc_ids": ["multi_agent"],
136
- "evidence_chunk_ids": ["C2"]
137
- },
138
- {
139
- "claim": "The method routes adversarial prompts through a defense worker.",
140
- "support": "supported",
141
- "evidence_doc_ids": ["multi_agent"],
142
- "evidence_chunk_ids": ["C3"]
143
- },
144
- {
145
- "claim": "The architecture includes a recovery worker for post-attack repair.",
146
- "support": "supported",
147
- "evidence_doc_ids": ["multi_agent"],
148
- "evidence_chunk_ids": ["C4"]
149
- },
150
- {
151
- "claim": "The paper discusses limitations and future work for the coordinator pipeline.",
152
- "support": "supported",
153
- "evidence_doc_ids": ["multi_agent"],
154
- "evidence_chunk_ids": ["C5"]
155
- },
156
- {
157
- "claim": "The benchmark comparison highlights gains over baselines.",
158
- "support": "supported",
159
- "evidence_doc_ids": ["multi_agent"],
160
- "evidence_chunk_ids": ["C6"]
161
- }
162
- ],
163
- "open_gaps": []
164
- }
165
- """
166
-
167
- monkeypatch.setattr(merge_stage, "dispatch", lambda *args, **kwargs: raw_merge)
168
-
169
- extraction = ExtractOutput(
170
- doc_id="multi_agent",
171
- chunk_id="C1",
172
- chunk_type=ChunkType.TEXT,
173
- mode_used=ModeName.MODE_REASONING,
174
- extracted=ExtractedContent(
175
- claims=[
176
- Claim(
177
- claim_id="cl1",
178
- text="The paper introduces a multi-agent defense coordinator for prompt-injection mitigation.",
179
- importance=Importance.HIGH,
180
- evidence=Evidence(doc_id="multi_agent", chunk_id="C1", where="overview", quote="multi-agent defense coordinator"),
181
- )
182
- ],
183
- chunk_summary="Coordinator overview and results.",
184
- ),
185
- )
186
-
187
- standard = run_merge("Summarize the paper.", [extraction], Tracer(), detail_level="standard")
188
- detailed = run_merge("Summarize the paper.", [extraction], Tracer(), detail_level="detailed")
189
-
190
- standard_points = sum(len(section.points) for section in standard.synthesis.answer_outline)
191
- detailed_points = sum(len(section.points) for section in detailed.synthesis.answer_outline)
192
-
193
- assert len(detailed.synthesis.answer_outline) >= len(standard.synthesis.answer_outline)
194
- assert detailed_points > standard_points
 
9
  Synthesis,
10
  )
11
  from pluto.stages import merge as merge_stage
12
+ from pluto.stages.merge import run_merge
13
  from pluto.tracer import Tracer
14
 
15
 
 
78
  for section in result.synthesis.answer_outline
79
  for point in section.points
80
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mp1/test_schema.py DELETED
@@ -1,41 +0,0 @@
1
- from pluto.models import Evidence, FinalEvidence, SectionPoint, Verification
2
-
3
-
4
- def test_schema_coerces_mixed_scalar_and_list_inputs():
5
- evidence = Evidence(
6
- doc_id=["paper_a"],
7
- chunk_id=["C1", "C2"],
8
- where={"text": "results"},
9
- quote=["alpha", "beta"],
10
- )
11
-
12
- assert evidence.doc_id == "paper_a"
13
- assert evidence.chunk_id == "C1, C2"
14
- assert evidence.where == "results"
15
- assert evidence.quote == "alpha, beta"
16
-
17
- final_evidence = FinalEvidence(
18
- doc_id="paper_a",
19
- chunk_id=["C4", "C5"],
20
- where=["method"],
21
- supports=["Main claim"],
22
- quote=["quoted", "support"],
23
- )
24
-
25
- assert final_evidence.chunk_id == "C4, C5"
26
- assert final_evidence.where == "method"
27
- assert final_evidence.supports == "Main claim"
28
- assert final_evidence.quote == "quoted, support"
29
-
30
-
31
- def test_schema_coerces_outline_and_followup_lists():
32
- section = SectionPoint(section=["Overview"], points="Single normalized point")
33
- verification = Verification(
34
- unsupported_claims="Missing metric support",
35
- required_followups={"text": "Where is the metric reported?"},
36
- )
37
-
38
- assert section.section == "Overview"
39
- assert section.points == ["Single normalized point"]
40
- assert verification.unsupported_claims == ["Missing metric support"]
41
- assert verification.required_followups == ["Where is the metric reported?"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mp1/test_server.py CHANGED
@@ -87,6 +87,7 @@ def test_server_run_forwards_selected_docs_and_detail_level(monkeypatch):
87
  )
88
 
89
  assert response.status_code == 200
 
90
  assert recorded["progress_callback_registered"] is True
91
  assert recorded["query"] == "summarize this"
92
  assert recorded["selected_doc_ids"] == ["paper_a"]
@@ -161,8 +162,12 @@ def test_server_exposes_processed_docs_as_ready_even_if_status_is_stale(monkeypa
161
 
162
 
163
  def test_stream_progress_serializes_pydantic_payloads(monkeypatch):
164
- monkeypatch.setattr(server, "_progress_queue", asyncio.Queue())
165
- server._progress_queue.put_nowait({
 
 
 
 
166
  "stage": "done",
167
  "status": "complete",
168
  "payload": {
@@ -181,7 +186,7 @@ def test_stream_progress_serializes_pydantic_payloads(monkeypatch):
181
  })
182
 
183
  client = TestClient(server.app)
184
- with client.stream("GET", "/api/stream") as response:
185
  body = b"".join(response.iter_raw()).decode("utf-8")
186
 
187
  assert response.status_code == 200
@@ -189,27 +194,46 @@ def test_stream_progress_serializes_pydantic_payloads(monkeypatch):
189
  payload = json.loads(body.removeprefix("data: ").strip())
190
  assert payload["payload"]["plan"][0]["doc_id"] == "paper"
191
  assert payload["payload"]["plan"][0]["chunk_type"] == "text"
 
 
192
 
193
 
194
- def test_server_cache_stats_route_returns_json(monkeypatch):
195
- class FakeCache:
196
- def stats(self):
197
- return {"hits": 7, "misses": 3, "entries": 10}
198
-
199
- monkeypatch.setattr(server, "_extraction_cache", FakeCache())
 
 
200
 
201
  client = TestClient(server.app)
202
- response = client.get("/api/cache/stats")
 
203
 
204
- assert response.status_code == 200
205
- assert response.json() == {"hits": 7, "misses": 3, "entries": 10}
 
 
206
 
207
 
208
- def test_server_result_route_returns_404_when_empty(monkeypatch):
209
- monkeypatch.setattr(server, "_latest_result", None)
 
 
 
 
 
 
210
 
211
- client = TestClient(server.app)
212
- response = client.get("/api/result")
 
 
 
 
 
 
 
213
 
214
- assert response.status_code == 404
215
- assert response.json()["error"] == "No result yet"
 
87
  )
88
 
89
  assert response.status_code == 200
90
+ assert response.json()["session_id"]
91
  assert recorded["progress_callback_registered"] is True
92
  assert recorded["query"] == "summarize this"
93
  assert recorded["selected_doc_ids"] == ["paper_a"]
 
162
 
163
 
164
  def test_stream_progress_serializes_pydantic_payloads(monkeypatch):
165
+ session_id = "test-session"
166
+ queue = asyncio.Queue()
167
+ monkeypatch.setattr(server, "session_queues", {session_id: queue})
168
+ monkeypatch.setattr(server, "session_results", {session_id: {"ok": True}})
169
+ monkeypatch.setattr(server, "session_cleanup_tasks", {})
170
+ queue.put_nowait({
171
  "stage": "done",
172
  "status": "complete",
173
  "payload": {
 
186
  })
187
 
188
  client = TestClient(server.app)
189
+ with client.stream("GET", f"/api/stream?session_id={session_id}") as response:
190
  body = b"".join(response.iter_raw()).decode("utf-8")
191
 
192
  assert response.status_code == 200
 
194
  payload = json.loads(body.removeprefix("data: ").strip())
195
  assert payload["payload"]["plan"][0]["doc_id"] == "paper"
196
  assert payload["payload"]["plan"][0]["chunk_type"] == "text"
197
+ assert session_id in server.session_queues
198
+ assert session_id in server.session_results
199
 
200
 
201
+ def test_stream_progress_is_session_scoped(monkeypatch):
202
+ first = asyncio.Queue()
203
+ second = asyncio.Queue()
204
+ first.put_nowait({"stage": "done", "status": "complete", "session_id": "first"})
205
+ second.put_nowait({"stage": "done", "status": "complete", "session_id": "second"})
206
+ monkeypatch.setattr(server, "session_queues", {"first": first, "second": second})
207
+ monkeypatch.setattr(server, "session_results", {"first": {}, "second": {}})
208
+ monkeypatch.setattr(server, "session_cleanup_tasks", {})
209
 
210
  client = TestClient(server.app)
211
+ with client.stream("GET", "/api/stream?session_id=second") as response:
212
+ body = b"".join(response.iter_raw()).decode("utf-8")
213
 
214
+ payload = json.loads(body.removeprefix("data: ").strip())
215
+ assert payload["session_id"] == "second"
216
+ assert "first" in server.session_queues
217
+ assert "second" in server.session_queues
218
 
219
 
220
+ def test_session_cleanup_is_delayed(monkeypatch):
221
+ async def run_check():
222
+ session_id = "cleanup-session"
223
+ queue = asyncio.Queue()
224
+ monkeypatch.setattr(server, "SESSION_CLEANUP_DELAY_SECONDS", 0.01)
225
+ monkeypatch.setattr(server, "session_queues", {session_id: queue})
226
+ monkeypatch.setattr(server, "session_results", {session_id: {"ok": True}})
227
+ monkeypatch.setattr(server, "session_cleanup_tasks", {})
228
 
229
+ server._schedule_session_cleanup(session_id, queue)
230
+
231
+ assert session_id in server.session_queues
232
+ assert session_id in server.session_results
233
+
234
+ await asyncio.sleep(0.05)
235
+
236
+ assert session_id not in server.session_queues
237
+ assert session_id not in server.session_results
238
 
239
+ asyncio.run(run_check())
 
mp1/test_session_memory.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+
3
+ import asyncio
4
+ import json
5
+
6
+ from fastapi.testclient import TestClient
7
+
8
+ from pluto.session_memory import (
9
+ CompressedSession,
10
+ compress_session,
11
+ list_session_context,
12
+ )
13
+ import pluto.server as server
14
+
15
+
16
+ def test_compression_produces_valid_compressed_session(monkeypatch, tmp_path):
17
+ corpus = tmp_path / "corpus"
18
+ corpus.mkdir()
19
+ monkeypatch.setattr(
20
+ "pluto.session_memory._call_compression_llm",
21
+ lambda **kwargs: """
22
+ {
23
+ "queries_resolved": [{"query": "q", "answer_summary": "a", "chunks_used": 2, "confidence": 0.8}],
24
+ "key_findings": ["Finding A"],
25
+ "open_questions": ["Question A"],
26
+ "links_to_prior_sessions": []
27
+ }
28
+ """,
29
+ )
30
+ monkeypatch.setattr("pluto.session_memory._store_postgres", lambda compressed, raw_path: None)
31
+
32
+ compressed = compress_session("s1", "doc_a", {"query": "q", "confidence": 0.8}, corpus)
33
+
34
+ assert isinstance(compressed, CompressedSession)
35
+ assert compressed.session_id == "s1"
36
+ assert compressed.doc_id == "doc_a"
37
+ assert compressed.key_findings == ["Finding A"]
38
+ assert (corpus / ".session_archive" / "s1.json").exists()
39
+
40
+
41
+ def test_postgres_unavailable_falls_back_to_local_file(monkeypatch, tmp_path):
42
+ corpus = tmp_path / "corpus"
43
+ corpus.mkdir()
44
+ monkeypatch.setattr("pluto.session_memory._call_compression_llm", lambda **kwargs: "{}")
45
+
46
+ def fail_store(compressed, raw_path):
47
+ raise EnvironmentError("no database")
48
+
49
+ monkeypatch.setattr("pluto.session_memory._store_postgres", fail_store)
50
+
51
+ compressed = compress_session("s2", "doc_b", {"query": "q"}, corpus)
52
+
53
+ path = corpus / ".session_memory" / "s2.json"
54
+ assert path.exists()
55
+ assert json.loads(path.read_text(encoding="utf-8"))["session_id"] == compressed.session_id
56
+
57
+
58
+ def test_warm_start_endpoint_returns_sessions_in_order(monkeypatch):
59
+ sessions = [
60
+ {"session_id": "new", "doc_id": "paper", "timestamp": "2026-01-02T00:00:00+00:00"},
61
+ {"session_id": "old", "doc_id": "paper", "timestamp": "2026-01-01T00:00:00+00:00"},
62
+ ]
63
+ monkeypatch.setattr("pluto.session_memory.list_session_context", lambda doc_id, corpus_dir, limit=10: sessions)
64
+
65
+ client = TestClient(server.app)
66
+ response = client.get("/api/session-context/paper")
67
+
68
+ assert response.status_code == 200
69
+ payload = response.json()
70
+ assert [item["session_id"] for item in payload["sessions"]] == ["new", "old"]
71
+
72
+
73
+ def test_list_session_context_local_fallback_orders_by_timestamp(monkeypatch, tmp_path):
74
+ corpus = tmp_path / "corpus"
75
+ memory = corpus / ".session_memory"
76
+ memory.mkdir(parents=True)
77
+ (memory / "old.json").write_text(
78
+ json.dumps({"session_id": "old", "doc_id": "paper", "timestamp": "2026-01-01T00:00:00+00:00"}),
79
+ encoding="utf-8",
80
+ )
81
+ (memory / "new.json").write_text(
82
+ json.dumps({"session_id": "new", "doc_id": "paper", "timestamp": "2026-01-02T00:00:00+00:00"}),
83
+ encoding="utf-8",
84
+ )
85
+ monkeypatch.setattr("pluto.session_memory._list_postgres", lambda doc_id, limit: (_ for _ in ()).throw(EnvironmentError("no db")))
86
+
87
+ sessions = list_session_context("paper", corpus)
88
+
89
+ assert [item["session_id"] for item in sessions] == ["new", "old"]
90
+
91
+
92
+ def test_compression_is_scheduled_async_without_blocking_sse(monkeypatch):
93
+ calls = []
94
+
95
+ async def run_check():
96
+ session_id = "sse-session"
97
+ queue = asyncio.Queue()
98
+ await queue.put({"stage": "done", "status": "complete", "session_id": session_id})
99
+ monkeypatch.setattr(server, "session_queues", {session_id: queue})
100
+ monkeypatch.setattr(server, "session_results", {session_id: {"doc_id": "paper"}})
101
+ monkeypatch.setattr(server, "session_cleanup_tasks", {})
102
+ monkeypatch.setattr(server, "_schedule_session_compression", lambda sid: calls.append(sid))
103
+
104
+ client = TestClient(server.app)
105
+ with client.stream("GET", f"/api/stream?session_id={session_id}") as response:
106
+ body = b"".join(response.iter_raw()).decode("utf-8")
107
+
108
+ assert response.status_code == 200
109
+ assert '"stage": "done"' in body
110
+ assert calls == [session_id]
111
+
112
+ asyncio.run(run_check())
mp1/test_signal_logger.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+
3
+ from pluto.signal_logger import check_prior_reference, check_rephrase, log_signal
4
+
5
+
6
+ def test_rephrase_detection_triggers_with_high_similarity_within_window(monkeypatch):
7
+ embeddings = {
8
+ "How does it work?": [1.0, 0.0],
9
+ "Explain how it works": [0.9, 0.1],
10
+ }
11
+ monkeypatch.setattr("pluto.signal_logger._embed_query", lambda query: embeddings[query])
12
+
13
+ assert check_rephrase("Explain how it works", "How does it work?", 30) is True
14
+
15
+
16
+ def test_rephrase_detection_does_not_trigger_outside_window(monkeypatch):
17
+ monkeypatch.setattr("pluto.signal_logger._embed_query", lambda query: [1.0, 0.0])
18
+
19
+ assert check_rephrase("same", "same", 91) is False
20
+
21
+
22
+ def test_prior_reference_detection_on_trigger_phrases():
23
+ assert check_prior_reference("Based on your answer, what is the limitation?") is True
24
+ assert check_prior_reference("Tell me the limitation") is False
25
+
26
+
27
+ def test_signal_logging_with_mocked_postgres(monkeypatch):
28
+ calls = []
29
+
30
+ class FakeCursor:
31
+ def __enter__(self):
32
+ return self
33
+
34
+ def __exit__(self, exc_type, exc, tb):
35
+ return False
36
+
37
+ def execute(self, sql, params=None):
38
+ calls.append((sql, params))
39
+
40
+ class FakeConnection:
41
+ def cursor(self):
42
+ return FakeCursor()
43
+
44
+ def commit(self):
45
+ calls.append(("commit", None))
46
+
47
+ def close(self):
48
+ calls.append(("close", None))
49
+
50
+ monkeypatch.setattr("pluto.signal_logger._get_connection", lambda: FakeConnection())
51
+
52
+ log_signal("session-a", "hash-a", "prior_reference")
53
+
54
+ assert calls[0][1] == ("session-a", "hash-a", "prior_reference")
55
+ assert ("commit", None) in calls
56
+ assert ("close", None) in calls
mp1/test_verify.py CHANGED
@@ -12,12 +12,12 @@ from pluto.models import (
12
  Synthesis,
13
  )
14
  from pluto.bus import MessageBus
15
- from pluto.stages import verify as verify_stage
16
- from pluto.stages.verify import _parse_verify, run_verify
17
  from pluto.tracer import Tracer
18
 
19
 
20
- def test_parse_verify_dump():
21
  raw = """
22
  Here is the result:
23
  {
@@ -40,45 +40,20 @@ def test_parse_verify_dump():
40
  }
41
  """
42
 
43
- out = _parse_verify(raw)
44
 
45
- assert len(out.verification.checked_claims) == 2
46
- assert out.verification.checked_claims[0].status.value == "supported"
47
- assert out.verification.checked_claims[0].evidence[0].doc_id == "paper_a"
48
- assert out.verification.unsupported_claims == ["The training set contains 2 million images."]
49
- assert out.verification.required_followups == ["Upload the appendix for dataset details."]
50
 
51
 
52
- def test_parse_verify_handles_multi_chunk_evidence_ids():
53
- raw = """
54
- {
55
- "checked_claims": [
56
- {
57
- "claim": "The results are supported across multiple chunks.",
58
- "status": "supported",
59
- "evidence_doc_id": "paper_a",
60
- "evidence_chunk_id": ["C18", "C46", "C81"],
61
- "quote": "results are supported"
62
- }
63
- ],
64
- "unsupported_claims": [],
65
- "required_followups": []
66
- }
67
- """
68
-
69
- out = _parse_verify(raw)
70
-
71
- evidence = out.verification.checked_claims[0].evidence
72
- assert len(evidence) == 3
73
- assert [item.doc_id for item in evidence] == ["paper_a", "paper_a", "paper_a"]
74
- assert [item.chunk_id for item in evidence] == ["C18", "C46", "C81"]
75
-
76
-
77
- def test_verify_directly_supports_matching_claim_without_dispatch(monkeypatch):
78
  def fail_dispatch(*args, **kwargs):
79
  raise AssertionError("dispatch should not be called for an obvious direct evidence match")
80
 
81
- monkeypatch.setattr(verify_stage, "dispatch", fail_dispatch)
82
 
83
  merge_output = MergeOutput(
84
  synthesis=Synthesis(
@@ -111,19 +86,19 @@ def test_verify_directly_supports_matching_claim_without_dispatch(monkeypatch):
111
  )
112
  ]
113
 
114
- result = run_verify(merge_output, extractions, Tracer())
115
 
116
- assert len(result.verification.checked_claims) == 1
117
- assert result.verification.checked_claims[0].status == ClaimStatus.SUPPORTED
118
- assert result.verification.checked_claims[0].evidence[0].doc_id == "paper_a"
119
- assert result.verification.unsupported_claims == []
120
 
121
 
122
- def test_verify_suppresses_followups_for_single_unsupported_outlier(monkeypatch):
123
  def fail_dispatch(*args, **kwargs):
124
  raise AssertionError("dispatch should not be called for direct matches or suppressed followups")
125
 
126
- monkeypatch.setattr(verify_stage, "dispatch", fail_dispatch)
127
 
128
  merge_output = MergeOutput(
129
  synthesis=Synthesis(
@@ -170,18 +145,18 @@ def test_verify_suppresses_followups_for_single_unsupported_outlier(monkeypatch)
170
  ]
171
 
172
  bus = MessageBus()
173
- result = run_verify(merge_output, extractions, Tracer(), bus=bus)
174
 
175
- assert result.verification.unsupported_claims == ["The appendix reports a 12% latency reduction on unseen workloads."]
176
- assert result.verification.required_followups == []
177
  assert bus.read(msg_type="gap_report") == []
178
 
179
 
180
- def test_verify_generates_specific_followups_when_answer_is_unverified(monkeypatch):
181
  def fail_dispatch(*args, **kwargs):
182
  raise AssertionError("dispatch should not be called when no evidence candidates exist")
183
 
184
- monkeypatch.setattr(verify_stage, "dispatch", fail_dispatch)
185
 
186
  merge_output = MergeOutput(
187
  synthesis=Synthesis(
@@ -193,16 +168,16 @@ def test_verify_generates_specific_followups_when_answer_is_unverified(monkeypat
193
  )
194
 
195
  bus = MessageBus()
196
- result = run_verify(merge_output, [], Tracer(), bus=bus)
197
 
198
- assert result.verification.unsupported_claims == [
199
  "The appendix reports a 12% latency reduction on unseen workloads.",
200
  "The architecture introduces a separate recovery worker for post-attack repair.",
201
  ]
202
- assert result.verification.required_followups == [
203
  "Which result or metric in the document directly supports: The appendix reports a 12% latency reduction on unseen workloads?",
204
  "Where does the document explicitly describe: The architecture introduces a separate recovery worker for post-attack repair?",
205
  ]
206
  latest = bus.latest("gap_report")
207
  assert latest is not None
208
- assert latest.payload["gaps"] == result.verification.required_followups
 
12
  Synthesis,
13
  )
14
  from pluto.bus import MessageBus
15
+ from pluto.stages import evidence_check as evidence_check_stage
16
+ from pluto.stages.evidence_check import _parse_evidence_check, run_evidence_check
17
  from pluto.tracer import Tracer
18
 
19
 
20
+ def test_parse_evidence_check_dump():
21
  raw = """
22
  Here is the result:
23
  {
 
40
  }
41
  """
42
 
43
+ out = _parse_evidence_check(raw)
44
 
45
+ assert len(out.evidence_check.checked_claims) == 2
46
+ assert out.evidence_check.checked_claims[0].status.value == "supported"
47
+ assert out.evidence_check.checked_claims[0].evidence[0].doc_id == "paper_a"
48
+ assert out.evidence_check.unsupported_claims == ["The training set contains 2 million images."]
49
+ assert out.evidence_check.required_followups == ["Upload the appendix for dataset details."]
50
 
51
 
52
+ def test_evidence_check_directly_supports_matching_claim_without_dispatch(monkeypatch):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  def fail_dispatch(*args, **kwargs):
54
  raise AssertionError("dispatch should not be called for an obvious direct evidence match")
55
 
56
+ monkeypatch.setattr(evidence_check_stage, "dispatch", fail_dispatch)
57
 
58
  merge_output = MergeOutput(
59
  synthesis=Synthesis(
 
86
  )
87
  ]
88
 
89
+ result = run_evidence_check(merge_output, extractions, Tracer())
90
 
91
+ assert len(result.evidence_check.checked_claims) == 1
92
+ assert result.evidence_check.checked_claims[0].status == ClaimStatus.SUPPORTED
93
+ assert result.evidence_check.checked_claims[0].evidence[0].doc_id == "paper_a"
94
+ assert result.evidence_check.unsupported_claims == []
95
 
96
 
97
+ def test_evidence_check_suppresses_followups_for_single_unsupported_outlier(monkeypatch):
98
  def fail_dispatch(*args, **kwargs):
99
  raise AssertionError("dispatch should not be called for direct matches or suppressed followups")
100
 
101
+ monkeypatch.setattr(evidence_check_stage, "dispatch", fail_dispatch)
102
 
103
  merge_output = MergeOutput(
104
  synthesis=Synthesis(
 
145
  ]
146
 
147
  bus = MessageBus()
148
+ result = run_evidence_check(merge_output, extractions, Tracer(), bus=bus)
149
 
150
+ assert result.evidence_check.unsupported_claims == ["The appendix reports a 12% latency reduction on unseen workloads."]
151
+ assert result.evidence_check.required_followups == []
152
  assert bus.read(msg_type="gap_report") == []
153
 
154
 
155
+ def test_evidence_check_generates_specific_followups_when_answer_is_unsupported(monkeypatch):
156
  def fail_dispatch(*args, **kwargs):
157
  raise AssertionError("dispatch should not be called when no evidence candidates exist")
158
 
159
+ monkeypatch.setattr(evidence_check_stage, "dispatch", fail_dispatch)
160
 
161
  merge_output = MergeOutput(
162
  synthesis=Synthesis(
 
168
  )
169
 
170
  bus = MessageBus()
171
+ result = run_evidence_check(merge_output, [], Tracer(), bus=bus)
172
 
173
+ assert result.evidence_check.unsupported_claims == [
174
  "The appendix reports a 12% latency reduction on unseen workloads.",
175
  "The architecture introduces a separate recovery worker for post-attack repair.",
176
  ]
177
+ assert result.evidence_check.required_followups == [
178
  "Which result or metric in the document directly supports: The appendix reports a 12% latency reduction on unseen workloads?",
179
  "Where does the document explicitly describe: The architecture introduces a separate recovery worker for post-attack repair?",
180
  ]
181
  latest = bus.latest("gap_report")
182
  assert latest is not None
183
+ assert latest.payload["gaps"] == result.evidence_check.required_followups
pytest.ini ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ [pytest]
2
+ testpaths = mp1
3
+ python_files = test_*.py
4
+ addopts = -p no:doctest -p no:cacheprovider
5
+ norecursedirs = .* __pycache__ output mp1/output pytest-cache-files-* mp1/pytest-cache-files-*
6
+ markers =
7
+ live_api: hits external provider APIs and requires network plus valid credentials
requirements.txt CHANGED
@@ -6,8 +6,9 @@ uvicorn>=0.27.0
6
  python-dotenv>=1.0.0
7
  pytest>=8.0.0
8
  python-multipart>=0.0.5
9
- PyPDF2>=3.0.0
10
  python-docx>=1.1.0
11
  requests>=2.31.0
12
  openai>=1.0.0
13
  numpy>=1.24.0
 
 
6
  python-dotenv>=1.0.0
7
  pytest>=8.0.0
8
  python-multipart>=0.0.5
9
+ pdfplumber>=0.10.0
10
  python-docx>=1.1.0
11
  requests>=2.31.0
12
  openai>=1.0.0
13
  numpy>=1.24.0
14
+ psycopg2-binary>=2.9.0