handoff stuff
Browse files- AGENTS.md +1 -0
- HANDOFF.md +34 -0
- README.md +1 -1
- USAGE.md +2 -2
- scripts/gold_preview.mjs +26 -0
- src/generator/generator_core.mjs +46 -0
- src/pipeline/batch.mjs +1 -0
- src/pipeline/step.mjs +19 -0
- state_of_project.md +1 -0
- tests/generator_core.test.mjs +19 -2
AGENTS.md
CHANGED
|
@@ -17,6 +17,7 @@
|
|
| 17 |
- `REAL_ES=1 npm test` – exercise retrieval against a live Elasticsearch + embedding endpoint.
|
| 18 |
- Red/green pathway: use `*_PROVIDER=mock` plus JSONL chunk source to dry-run (green) without models; switch to real providers for red runs and the cache will skip already-completed stages.
|
| 19 |
- Verifier contract: models return JSON `{"REASONING": [...], "SCORE": <number|\"PASS\"|\"FAIL\">}`; SCORE >=0.5 or PASS → accepted. Prompt must remain unchanged; parsing is tolerant of the PASS/FAIL token format.
|
|
|
|
| 20 |
|
| 21 |
## Coding Style & Naming Conventions
|
| 22 |
- ECMAScript modules (`type: "module"`); prefer `.mjs` for shared code.
|
|
|
|
| 17 |
- `REAL_ES=1 npm test` – exercise retrieval against a live Elasticsearch + embedding endpoint.
|
| 18 |
- Red/green pathway: use `*_PROVIDER=mock` plus JSONL chunk source to dry-run (green) without models; switch to real providers for red runs and the cache will skip already-completed stages.
|
| 19 |
- Verifier contract: models return JSON `{"REASONING": [...], "SCORE": <number|\"PASS\"|\"FAIL\">}`; SCORE >=0.5 or PASS → accepted. Prompt must remain unchanged; parsing is tolerant of the PASS/FAIL token format.
|
| 20 |
+
- Generator output/logging: verbose runs show parsed `thought`, raw provider `thinking`, answer, confidence, evidence, limitations, and raw response (pretty-printed if JSON). Gold stores `answer`, `thought`, `raw`, `confidence`, `evidence`, `limitations`, `thinking`.
|
| 21 |
|
| 22 |
## Coding Style & Naming Conventions
|
| 23 |
- ECMAScript modules (`type: "module"`); prefer `.mjs` for shared code.
|
HANDOFF.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Handoff Notes
|
| 2 |
+
|
| 3 |
+
## Quick status
|
| 4 |
+
- Pipeline: question-first by default (JSONL chunks), deterministic chunk IDs, cache-backed (questions/gens/verifications/rewards).
|
| 5 |
+
- Generator parsing/logging: preserves provider `.thinking` plus parsed thought/answer/confidence/evidence/limitations; verbose logs show both parsed and raw (pretty-printed if JSON). Gold stores all of these fields.
|
| 6 |
+
- Verifier: expects distributor prompt output; accepts `SCORE` as number or `PASS/FAIL` (even noisy `PROMPT = PASS`); raw transcript logged in verbose.
|
| 7 |
+
- Scripts: `gold_preview.mjs` (shows thought/thinking/raw), `cache_report.mjs`, `regenerate_gold_from_cache.mjs`, `purge_mock_gold.mjs`.
|
| 8 |
+
- Tests: passing in writable env (read-only sandboxes block Vitest temp/cache writes).
|
| 9 |
+
|
| 10 |
+
## Running
|
| 11 |
+
- Default verbose CLI: `npm run pipeline -- --limit N --verbose`.
|
| 12 |
+
- Question-first mode: `PIPELINE_SEED_MODE=question-first npm run pipeline -- --limit N --verbose`.
|
| 13 |
+
- Random walk over chunks: `PIPELINE_RANDOM_WALK=1` (or `PIPELINE_CHUNK_ORDER=random`) + optional `--chunk-limit`.
|
| 14 |
+
- Question cap per chunk: `QUESTION_MAX_PER_CHUNK` (e.g., 3).
|
| 15 |
+
- Preview gold: `node scripts/gold_preview.mjs --max-answer 2000 --limit 5`.
|
| 16 |
+
- Regenerate gold from cache: `node scripts/regenerate_gold_from_cache.mjs`.
|
| 17 |
+
- Purge mock Q1? gold: `node scripts/purge_mock_gold.mjs`.
|
| 18 |
+
|
| 19 |
+
## What to watch
|
| 20 |
+
- Cache/gold should stay out of git; regenerate gold via cache script.
|
| 21 |
+
- Question provider must be reachable; otherwise question-first will spin through chunks with errors.
|
| 22 |
+
- Verifier prompt locked; parsing tolerates PASS/FAIL tokens and logs raw output.
|
| 23 |
+
- Generator prompt locked; parsing handles `.thinking` and Qwen-style answer blocks; thought/answer stay separate for verifier/reward.
|
| 24 |
+
|
| 25 |
+
## Files of note
|
| 26 |
+
- `src/generator/generator_core.mjs`: parses provider responses; carries `.thinking`, thought, answer, confidence, evidence, limitations.
|
| 27 |
+
- `src/pipeline/step.mjs`: verbose logging of thought, thinking, answer, raw.
|
| 28 |
+
- `prompts/*`: locked prompts (do not edit generator/verifier without intent).
|
| 29 |
+
- `data/cache/*.jsonl`: intermediate cache (questions/gens/verifications/rewards); use `PIPELINE_CACHE_DIR` to redirect.
|
| 30 |
+
- `gold/pipeline_gold.jsonl`: output; rebuild via cache if needed.
|
| 31 |
+
|
| 32 |
+
## Caveats
|
| 33 |
+
- Read-only environments will fail `npm test` due to /tmp and `.vite` writes; run tests where writes are allowed.
|
| 34 |
+
- Long generator outputs can bloat verifier context; consider truncation or smaller verifier model if needed.
|
README.md
CHANGED
|
@@ -82,7 +82,7 @@ All pure modules include Vitest coverage:
|
|
| 82 |
* question generation
|
| 83 |
* provider router
|
| 84 |
* pipeline integration (mock)
|
| 85 |
-
* JSONL cache, PASS/FAIL verifier parsing
|
| 86 |
|
| 87 |
---
|
| 88 |
|
|
|
|
| 82 |
* question generation
|
| 83 |
* provider router
|
| 84 |
* pipeline integration (mock)
|
| 85 |
+
* JSONL cache, PASS/FAIL verifier parsing, generator parsing (thought/thinking/answer)
|
| 86 |
|
| 87 |
---
|
| 88 |
|
USAGE.md
CHANGED
|
@@ -39,7 +39,7 @@ Run the default pipeline (static seeds):
|
|
| 39 |
npm run pipeline -- --limit 10
|
| 40 |
```
|
| 41 |
|
| 42 |
-
Verbose run:
|
| 43 |
|
| 44 |
```bash
|
| 45 |
npm run pipeline -- --limit 10 --verbose
|
|
@@ -77,7 +77,7 @@ All accepted gold samples are written to:
|
|
| 77 |
gold/pipeline_gold.jsonl
|
| 78 |
```
|
| 79 |
|
| 80 |
-
Each entry includes:
|
| 81 |
|
| 82 |
```json
|
| 83 |
{
|
|
|
|
| 39 |
npm run pipeline -- --limit 10
|
| 40 |
```
|
| 41 |
|
| 42 |
+
Verbose run (shows generator thought/thinking/answer/confidence/evidence/limitations and raw response):
|
| 43 |
|
| 44 |
```bash
|
| 45 |
npm run pipeline -- --limit 10 --verbose
|
|
|
|
| 77 |
gold/pipeline_gold.jsonl
|
| 78 |
```
|
| 79 |
|
| 80 |
+
Each entry includes (generator sample contains answer, thought, raw, confidence, evidence, limitations, thinking):
|
| 81 |
|
| 82 |
```json
|
| 83 |
{
|
scripts/gold_preview.mjs
CHANGED
|
@@ -107,6 +107,12 @@ async function main() {
|
|
| 107 |
|
| 108 |
const q = obj.question || '[no question]';
|
| 109 |
const ans = obj.sample?.answer || obj.sample?.raw || '[no answer]';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
const chunkId = obj.sourceChunkId || obj.context?.[0]?.id || '[unknown chunk]';
|
| 111 |
const ctxSnippet = obj.context?.[0]?.content || obj.sourceChunk || '';
|
| 112 |
const rew = obj.reward?.score ?? obj.reward?.ok;
|
|
@@ -117,6 +123,26 @@ async function main() {
|
|
| 117 |
console.log(`Chunk: ${chunkId}`);
|
| 118 |
console.log(`Q: ${preview(q, maxQuestion, full)}`);
|
| 119 |
console.log(`A: ${preview(ans, maxAnswer, full)}`);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
if (ctxSnippet) console.log(`Ctx: ${preview(ctxSnippet, maxContext, full)}`);
|
| 121 |
if (verOk !== undefined) console.log(`Verifier ok: ${verOk}${verScore !== undefined ? ` (score: ${verScore})` : ''}`);
|
| 122 |
if (rew !== undefined) console.log(`Reward: ${rew}`);
|
|
|
|
| 107 |
|
| 108 |
const q = obj.question || '[no question]';
|
| 109 |
const ans = obj.sample?.answer || obj.sample?.raw || '[no answer]';
|
| 110 |
+
const rawGen = obj.sample?.raw;
|
| 111 |
+
const thought = obj.sample?.thought;
|
| 112 |
+
const thinking = obj.sample?.thinking;
|
| 113 |
+
const confidence = obj.sample?.confidence ?? obj.sample?.confidence_level;
|
| 114 |
+
const evidence = obj.sample?.evidence;
|
| 115 |
+
const limitations = obj.sample?.limitations;
|
| 116 |
const chunkId = obj.sourceChunkId || obj.context?.[0]?.id || '[unknown chunk]';
|
| 117 |
const ctxSnippet = obj.context?.[0]?.content || obj.sourceChunk || '';
|
| 118 |
const rew = obj.reward?.score ?? obj.reward?.ok;
|
|
|
|
| 123 |
console.log(`Chunk: ${chunkId}`);
|
| 124 |
console.log(`Q: ${preview(q, maxQuestion, full)}`);
|
| 125 |
console.log(`A: ${preview(ans, maxAnswer, full)}`);
|
| 126 |
+
if (thought !== undefined) {
|
| 127 |
+
const tVal =
|
| 128 |
+
typeof thought === 'string'
|
| 129 |
+
? thought
|
| 130 |
+
: JSON.stringify(thought, null, 2);
|
| 131 |
+
console.log(`Thought: ${preview(tVal, maxAnswer, full)}`);
|
| 132 |
+
}
|
| 133 |
+
if (rawGen !== undefined) {
|
| 134 |
+
console.log(`Raw: ${preview(rawGen, maxAnswer, full)}`);
|
| 135 |
+
}
|
| 136 |
+
if (confidence !== undefined) console.log(`Gen confidence: ${confidence}`);
|
| 137 |
+
if (evidence) console.log(`Evidence: ${preview(Array.isArray(evidence) ? evidence.join(' | ') : evidence, 400, full)}`);
|
| 138 |
+
if (limitations) console.log(`Limitations: ${preview(limitations, 200, full)}`);
|
| 139 |
+
if (thinking !== undefined) {
|
| 140 |
+
const tVal =
|
| 141 |
+
typeof thinking === 'string'
|
| 142 |
+
? thinking
|
| 143 |
+
: JSON.stringify(thinking, null, 2);
|
| 144 |
+
console.log(`Thinking: ${preview(tVal, maxAnswer, full)}`);
|
| 145 |
+
}
|
| 146 |
if (ctxSnippet) console.log(`Ctx: ${preview(ctxSnippet, maxContext, full)}`);
|
| 147 |
if (verOk !== undefined) console.log(`Verifier ok: ${verOk}${verScore !== undefined ? ` (score: ${verScore})` : ''}`);
|
| 148 |
if (rew !== undefined) console.log(`Reward: ${rew}`);
|
src/generator/generator_core.mjs
CHANGED
|
@@ -58,6 +58,17 @@ export async function runGenerator(question, contextChunks, provider) {
|
|
| 58 |
thought = thinkingObj;
|
| 59 |
}
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
// Try parsing Qwen-style answer block first
|
| 62 |
const parseAnswerBlock = (txt) => {
|
| 63 |
if (!txt || typeof txt !== 'string') return null;
|
|
@@ -65,6 +76,29 @@ export async function runGenerator(question, contextChunks, provider) {
|
|
| 65 |
const body = blockMatch ? blockMatch[1] : txt;
|
| 66 |
const lines = body.split('\n').map((l) => l.trim()).filter(Boolean);
|
| 67 |
const result = {};
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
for (const line of lines) {
|
| 69 |
if (/^confidence:/i.test(line)) {
|
| 70 |
const val = line.split(':')[1]?.trim();
|
|
@@ -98,6 +132,10 @@ export async function runGenerator(question, contextChunks, provider) {
|
|
| 98 |
confidence = blockParsed.confidence ?? confidence;
|
| 99 |
evidence = blockParsed.evidence ?? evidence;
|
| 100 |
limitations = blockParsed.limitations ?? limitations;
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
} else {
|
| 102 |
// fallback: parse JSON if it's actually JSON
|
| 103 |
const parsed = safeParse(raw);
|
|
@@ -127,6 +165,9 @@ export async function runGenerator(question, contextChunks, provider) {
|
|
| 127 |
if (parsed.evidence) evidence = parsed.evidence;
|
| 128 |
if (parsed.limitations) limitations = parsed.limitations;
|
| 129 |
} else {
|
|
|
|
|
|
|
|
|
|
| 130 |
// fallback: extract visible chain-of-thought tags if present
|
| 131 |
const thinkMatch = typeof raw === 'string'
|
| 132 |
? raw.match(/<think>([\s\S]*?)<\/think>/i)
|
|
@@ -138,8 +179,13 @@ export async function runGenerator(question, contextChunks, provider) {
|
|
| 138 |
}
|
| 139 |
}
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
return {
|
| 142 |
raw,
|
|
|
|
| 143 |
thought,
|
| 144 |
answer,
|
| 145 |
confidence,
|
|
|
|
| 58 |
thought = thinkingObj;
|
| 59 |
}
|
| 60 |
|
| 61 |
+
const extractThoughtBlock = (txt) => {
|
| 62 |
+
if (!txt || typeof txt !== 'string') return null;
|
| 63 |
+
const thoughtMatch = txt.match(/<\|thought\|>([\s\S]*?)<\|end_of_thought\|>/i);
|
| 64 |
+
if (thoughtMatch) return thoughtMatch[1].trim();
|
| 65 |
+
|
| 66 |
+
const understandingMatch = txt.match(/<understanding>[\s\S]*?(?=<\|answer\||<answer>|$)/i);
|
| 67 |
+
if (understandingMatch) return understandingMatch[0].trim();
|
| 68 |
+
|
| 69 |
+
return null;
|
| 70 |
+
};
|
| 71 |
+
|
| 72 |
// Try parsing Qwen-style answer block first
|
| 73 |
const parseAnswerBlock = (txt) => {
|
| 74 |
if (!txt || typeof txt !== 'string') return null;
|
|
|
|
| 76 |
const body = blockMatch ? blockMatch[1] : txt;
|
| 77 |
const lines = body.split('\n').map((l) => l.trim()).filter(Boolean);
|
| 78 |
const result = {};
|
| 79 |
+
// line-based fallbacks even without tags
|
| 80 |
+
const answerLine = txt.match(/^answer:\s*(.+)$/im);
|
| 81 |
+
if (answerLine) result.answer = answerLine[1].trim();
|
| 82 |
+
const confLine = txt.match(/^confidence:\s*(.+)$/im);
|
| 83 |
+
if (confLine) result.confidence = confLine[1].trim();
|
| 84 |
+
const evidenceLine = txt.match(/^evidence:\s*(.+)$/im);
|
| 85 |
+
if (evidenceLine) {
|
| 86 |
+
const evLine = evidenceLine[1].trim();
|
| 87 |
+
let ev = [];
|
| 88 |
+
const arrMatch = evLine.match(/\[(.*)\]/);
|
| 89 |
+
if (arrMatch) {
|
| 90 |
+
ev = arrMatch[1]
|
| 91 |
+
.split(/,(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/)
|
| 92 |
+
.map((s) => s.replace(/^["'\s]+|["'\s]+$/g, ''))
|
| 93 |
+
.filter(Boolean);
|
| 94 |
+
} else {
|
| 95 |
+
ev = evLine.split(',').map((s) => s.replace(/^["'\s]+|["'\s]+$/g, '')).filter(Boolean);
|
| 96 |
+
}
|
| 97 |
+
result.evidence = ev;
|
| 98 |
+
}
|
| 99 |
+
const limLine = txt.match(/^limitations?:\s*(.+)$/im);
|
| 100 |
+
if (limLine) result.limitations = limLine[1].trim();
|
| 101 |
+
|
| 102 |
for (const line of lines) {
|
| 103 |
if (/^confidence:/i.test(line)) {
|
| 104 |
const val = line.split(':')[1]?.trim();
|
|
|
|
| 132 |
confidence = blockParsed.confidence ?? confidence;
|
| 133 |
evidence = blockParsed.evidence ?? evidence;
|
| 134 |
limitations = blockParsed.limitations ?? limitations;
|
| 135 |
+
if (!thought) {
|
| 136 |
+
const t = extractThoughtBlock(raw);
|
| 137 |
+
if (t) thought = t;
|
| 138 |
+
}
|
| 139 |
} else {
|
| 140 |
// fallback: parse JSON if it's actually JSON
|
| 141 |
const parsed = safeParse(raw);
|
|
|
|
| 165 |
if (parsed.evidence) evidence = parsed.evidence;
|
| 166 |
if (parsed.limitations) limitations = parsed.limitations;
|
| 167 |
} else {
|
| 168 |
+
// fallback: extract thought block or <think>
|
| 169 |
+
const tBlock = extractThoughtBlock(raw);
|
| 170 |
+
if (tBlock) thought = tBlock;
|
| 171 |
// fallback: extract visible chain-of-thought tags if present
|
| 172 |
const thinkMatch = typeof raw === 'string'
|
| 173 |
? raw.match(/<think>([\s\S]*?)<\/think>/i)
|
|
|
|
| 179 |
}
|
| 180 |
}
|
| 181 |
|
| 182 |
+
if (!thought && raw) {
|
| 183 |
+
thought = raw;
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
return {
|
| 187 |
raw,
|
| 188 |
+
thinking: thinkingObj,
|
| 189 |
thought,
|
| 190 |
answer,
|
| 191 |
confidence,
|
src/pipeline/batch.mjs
CHANGED
|
@@ -131,6 +131,7 @@ export async function runPipelineBatch({
|
|
| 131 |
confidence: result.gen?.confidence,
|
| 132 |
evidence: result.gen?.evidence,
|
| 133 |
limitations: result.gen?.limitations,
|
|
|
|
| 134 |
},
|
| 135 |
verifier: result.ver,
|
| 136 |
reward: result.rew,
|
|
|
|
| 131 |
confidence: result.gen?.confidence,
|
| 132 |
evidence: result.gen?.evidence,
|
| 133 |
limitations: result.gen?.limitations,
|
| 134 |
+
thinking: result.gen?.thinking,
|
| 135 |
},
|
| 136 |
verifier: result.ver,
|
| 137 |
reward: result.rew,
|
src/pipeline/step.mjs
CHANGED
|
@@ -165,6 +165,14 @@ export async function runPipelineStep({
|
|
| 165 |
log(' [generator] thought:');
|
| 166 |
log(' ' + preview(thoughtPreview, 500).replace(/\n/g, '\n '));
|
| 167 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
log(' [generator] answer:');
|
| 169 |
log(' ' + preview(gen?.answer ?? '', 400).replace(/\n/g, '\n '));
|
| 170 |
if (gen?.confidence) {
|
|
@@ -184,6 +192,17 @@ export async function runPipelineStep({
|
|
| 184 |
if (gen?.limitations) {
|
| 185 |
log(' [generator] limitations: ' + preview(gen.limitations, 200));
|
| 186 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 187 |
}
|
| 188 |
} catch (e) {
|
| 189 |
const msg = e?.message || String(e);
|
|
|
|
| 165 |
log(' [generator] thought:');
|
| 166 |
log(' ' + preview(thoughtPreview, 500).replace(/\n/g, '\n '));
|
| 167 |
}
|
| 168 |
+
if (gen?.thinking) {
|
| 169 |
+
const thinkingPreview =
|
| 170 |
+
typeof gen.thinking === 'string'
|
| 171 |
+
? gen.thinking
|
| 172 |
+
: JSON.stringify(gen.thinking, null, 2);
|
| 173 |
+
log(' [generator] thinking (raw from provider):');
|
| 174 |
+
log(' ' + preview(thinkingPreview, 500).replace(/\n/g, '\n '));
|
| 175 |
+
}
|
| 176 |
log(' [generator] answer:');
|
| 177 |
log(' ' + preview(gen?.answer ?? '', 400).replace(/\n/g, '\n '));
|
| 178 |
if (gen?.confidence) {
|
|
|
|
| 192 |
if (gen?.limitations) {
|
| 193 |
log(' [generator] limitations: ' + preview(gen.limitations, 200));
|
| 194 |
}
|
| 195 |
+
if (gen?.raw) {
|
| 196 |
+
let rawDisplay = gen.raw;
|
| 197 |
+
try {
|
| 198 |
+
const parsed = JSON.parse(gen.raw);
|
| 199 |
+
rawDisplay = JSON.stringify(parsed, null, 2);
|
| 200 |
+
} catch {
|
| 201 |
+
// leave as string
|
| 202 |
+
}
|
| 203 |
+
log(' [generator] raw response (JSON if parsable):');
|
| 204 |
+
log(' ' + preview(rawDisplay, 2000).replace(/\n/g, '\n '));
|
| 205 |
+
}
|
| 206 |
}
|
| 207 |
} catch (e) {
|
| 208 |
const msg = e?.message || String(e);
|
state_of_project.md
CHANGED
|
@@ -6,6 +6,7 @@
|
|
| 6 |
- Verifier parsing tolerates distributor format (`SCORE` as number or `PASS`/`FAIL` with noisy prefixes); caching and retry logic in place.
|
| 7 |
- Tests: 42 passing (retrieval mock/real, generator, verifier, reward, pipeline behaviour, cache, full mock pipeline).
|
| 8 |
- CLI defaults: verbose on, question-first, JSONL chunks; chunk/question limits respected.
|
|
|
|
| 9 |
|
| 10 |
## What needs attention
|
| 11 |
- Real pipeline currently fails at question generation when Ollama/question model is unreachable; run requires a live Ollama with the specified model pulled.
|
|
|
|
| 6 |
- Verifier parsing tolerates distributor format (`SCORE` as number or `PASS`/`FAIL` with noisy prefixes); caching and retry logic in place.
|
| 7 |
- Tests: 42 passing (retrieval mock/real, generator, verifier, reward, pipeline behaviour, cache, full mock pipeline).
|
| 8 |
- CLI defaults: verbose on, question-first, JSONL chunks; chunk/question limits respected.
|
| 9 |
+
- Generator parsing/logging: preserves provider `.thinking` (structured) and parsed thought/answer/confidence/evidence/limitations; verbose mode prints both parsed and raw (JSON pretty if parsable). Gold stores all generator fields.
|
| 10 |
|
| 11 |
## What needs attention
|
| 12 |
- Real pipeline currently fails at question generation when Ollama/question model is unreachable; run requires a live Ollama with the specified model pulled.
|
tests/generator_core.test.mjs
CHANGED
|
@@ -80,8 +80,8 @@ The final answer derived from the context.`;
|
|
| 80 |
);
|
| 81 |
|
| 82 |
expect(result.raw).toBe('Just a direct answer with no visible reasoning.');
|
| 83 |
-
// No JSON or think tags means thought
|
| 84 |
-
expect(result.thought).
|
| 85 |
expect(result.answer).toBe('Just a direct answer with no visible reasoning.');
|
| 86 |
});
|
| 87 |
|
|
@@ -102,4 +102,21 @@ The final answer derived from the context.`;
|
|
| 102 |
expect(result.evidence).toEqual(['quote1 (loc1)', 'quote2 (loc2)']);
|
| 103 |
expect(result.limitations).toBe('None');
|
| 104 |
});
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
});
|
|
|
|
| 80 |
);
|
| 81 |
|
| 82 |
expect(result.raw).toBe('Just a direct answer with no visible reasoning.');
|
| 83 |
+
// No JSON or think tags means thought falls back to raw
|
| 84 |
+
expect(result.thought).toBe('Just a direct answer with no visible reasoning.');
|
| 85 |
expect(result.answer).toBe('Just a direct answer with no visible reasoning.');
|
| 86 |
});
|
| 87 |
|
|
|
|
| 102 |
expect(result.evidence).toEqual(['quote1 (loc1)', 'quote2 (loc2)']);
|
| 103 |
expect(result.limitations).toBe('None');
|
| 104 |
});
|
| 105 |
+
|
| 106 |
+
it('parses legacy reasoning tags without answer block', async () => {
|
| 107 |
+
const fakeContext = [{ content: 'ctx' }];
|
| 108 |
+
const provider = {
|
| 109 |
+
generate: vi.fn(async () =>
|
| 110 |
+
`<understanding>step A</understanding>\n<reasoning_chain>step B</reasoning_chain>\nConfidence: Medium\nAnswer: Legacy answer\nEvidence: ["ev1 (loc)"]\nLimitations: None`
|
| 111 |
+
),
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
const result = await runGenerator('Test?', fakeContext, provider);
|
| 115 |
+
|
| 116 |
+
expect(typeof result.thought).toBe('string');
|
| 117 |
+
expect(result.answer).toBe('Legacy answer');
|
| 118 |
+
expect(result.confidence).toBe('Medium');
|
| 119 |
+
expect(result.evidence).toEqual(['ev1 (loc)']);
|
| 120 |
+
expect(result.limitations).toBe('None');
|
| 121 |
+
});
|
| 122 |
});
|