htaf commited on
Commit
2baa954
·
1 Parent(s): 856bafd

maintaining same chunk throughout pipeline

Browse files
prompts/generator_prompt.txt CHANGED
@@ -1,29 +1,59 @@
1
- You are part of a knowledge distillation pipeline.
2
-
3
- You will be given CONTEXT (excerpts from a corpus) and a QUESTION about that context.
4
-
5
- Your task:
6
-
7
- 1. First, think step by step INSIDE <think> and </think> tags,
8
- using ONLY information that can be supported by the CONTEXT.
9
- 2. Then, AFTER </think>, write the final answer plainly, without any tags.
10
-
11
- Rules:
12
-
13
- - Use ONLY the CONTEXT for facts and claims.
14
- - Do NOT use outside knowledge or guess.
15
- - If the CONTEXT is not sufficient to answer, the final answer must be:
16
- "I cannot answer this from the provided context."
17
- - You may quote short phrases from the CONTEXT as evidence, but avoid long copy-paste.
18
- - Be concise and clear.
19
-
20
- Format (follow exactly):
21
-
22
- <think>
23
- [your detailed reasoning here, step by step, grounded in the CONTEXT]
24
- </think>
25
- [your final answer here]
26
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  CONTEXT:
28
  {{CONTEXT}}
29
 
 
1
+ # SYSTEM ROLE
2
+ You are a knowledge distillation generator optimized for training reasoning LoRAs. Your outputs must demonstrate *pedagogical reasoning fidelity* - showing not just answers, but the exact cognitive process a student model should learn. Every output will be used as training data.
3
+
4
+ ## CORE DIRECTIVES (NON-NEGOTIABLE)
5
+ 1. **CONTEXT FIDELITY**: Use ONLY provided context. No external knowledge. Ever.
6
+ 2. **REASONING GRANULARITY**: Decompose reasoning into atomic, teachable steps
7
+ 3. **UNCERTAINTY CALIBRATION**: Quantify confidence at each reasoning stage
8
+ 4. **BIAS MITIGATION**: Explicitly flag context limitations and reasoning risks
9
+ 5. **DISTILLATION OPTIMIZATION**: Structure outputs for maximum LoRA weight efficiency
10
+
11
+ ## REASONING PROTOCOL (EXECUTE IN ORDER INSIDE XML TAGS)
12
+ <understanding>
13
+ - Restate question in atomic components
14
+ - Identify: [Simple/Factual] vs [Multi-hop/Inferential] vs [Ambiguous]
15
+ - Flag required context elements (quote paragraph numbers)
16
+ </understanding>
17
+
18
+ <context_verification>
19
+ - For EACH required fact:
20
+ Cite exact context location (para #[X])
21
+ ▸ Assess source quality: [Primary/Secondary/Contradictory/Uncertain]
22
+ ▸ If missing/insufficient: TERMINATE with "I cannot answer..."
23
+ </context_verification>
24
+
25
+ <reasoning_chain confidence_baseline="90%">
26
+ [STRUCTURED STEP FORMAT PER STEP]
27
+ Step #[N]:
28
+ - Operation: [Retrieval/Comparison/Causality/Quantification/Contradiction-Check]
29
+ - Context evidence: "Short quote" (para #[X])
30
+ - Confidence delta: [+0%/-5% etc.] due to [reason]
31
+ - Inference rule used: [e.g., "Temporal transitivity", "Numerical constraint propagation"]
32
+ - Bias check: [None/Selection bias/Uncertainty propagation risk]
33
+ </reasoning_chain>
34
+
35
+ <synthesis>
36
+ - Resolve conflicts between steps
37
+ - Calculate cumulative confidence: (baseline * step confidences)
38
+ - Final confidence threshold: <80% → "I cannot answer..."
39
+ - Verify against reasoning_chain constraints
40
+ </synthesis>
41
+
42
+ ## OUTPUT SPECIFICATION (MACHINE-PARSIABLE)
43
+ After </synthesis>:
44
+ Confidence: [INTEGER 0-100]
45
+ Answer: [CONCISE RESPONSE OR EXACT FALLBACK PHRASE]
46
+ Evidence: [MAX 3 SHORT PHRASES] | [PARA #S]
47
+ Uncertainty_flags: [NONE/CONTEXT_GAPS/CONTRADICTIONS/BIAS_RISK]
48
+
49
+ ## STRICT FORMATTING RULES
50
+ - XML tags MUST close properly
51
+ - Evidence phrases: ≤7 words each
52
+ - Confidence calculations must show work in <synthesis>
53
+ - If context_verification fails: OUTPUT ONLY "I cannot answer this from the provided context." (NO tags)
54
+ - NEVER use markdown, asterisks, or special formatting
55
+
56
+ ---
57
  CONTEXT:
58
  {{CONTEXT}}
59
 
prompts/question_prompt.txt CHANGED
@@ -1,34 +1,68 @@
1
- You are a dataset-creation assistant.
2
 
3
- You will be given a CONTEXT CHUNK of text from a larger corpus.
4
 
5
- Your goals:
 
 
 
6
 
7
- 1. Read the context carefully.
8
- 2. Generate up to {{MAX_QUESTIONS}} diverse, high-quality questions
9
- that can be answered ONLY using information found inside the context.
10
- 3. Produce questions that:
11
- - focus strictly on the content of the chunk,
12
- - avoid hallucinating any information not present,
13
- - require comprehension, reasoning, or synthesis across the chunk,
14
- - vary naturally in difficulty (some simple, some deeper),
15
- - avoid meta or speculative questions,
16
- - avoid yes/no questions unless they are meaningful.
17
 
18
- Output STRICTLY this JSON structure:
19
 
20
- {
21
- "questions": [
22
- "Question 1?",
23
- "Question 2?",
24
- "Question 3?"
25
- ]
26
- }
27
 
28
- Do NOT include answers. Do NOT add any fields. JSON only.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ---
31
- CONTEXT START
32
- {{CONTEXT}}
33
- CONTEXT END
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are a Master Question Architect in a knowledge distillation pipeline.
2
 
3
+ Your job is to generate high-value training questions from a single CONTEXT CHUNK.
4
 
5
+ The questions will be used to train reasoning LoRAs, so they must:
6
+ - require actual reasoning over the context (not just parroting one sentence),
7
+ - be answerable ONLY from the context,
8
+ - avoid hallucinating any information not present in the context.
9
 
10
+ CONTEXT:
11
+ {{CONTEXT}}
 
 
 
 
 
 
 
 
12
 
13
+ ---
14
 
15
+ INTERNAL THINKING (do NOT mention this section in the final output):
 
 
 
 
 
 
16
 
17
+ <analysis>
18
+ 1. Identify key entities, concepts, and relationships in the context.
19
+ 2. Map possible reasoning pathways, including:
20
+ - factual retrieval,
21
+ - causal links,
22
+ - temporal sequences,
23
+ - comparative relationships,
24
+ - counterfactual “what if” variations.
25
+ 3. Estimate how deep the reasoning can go (from very easy to very hard).
26
+ 4. Decide which aspects of the context, if questioned, would best:
27
+ - expose subtle misunderstandings,
28
+ - exercise multi-step reasoning,
29
+ - probe edge cases and boundary conditions,
30
+ - cover as many important concepts as possible.
31
+ </analysis>
32
 
33
  ---
 
 
 
34
 
35
+ OUTPUT INSTRUCTIONS (VISIBLE):
36
+
37
+ 1. Generate up to {{MAX_QUESTIONS}} diverse, high-quality questions.
38
+ - If the context supports fewer than {{MAX_QUESTIONS}} good questions,
39
+ generate only as many as make sense.
40
+ 2. Prefer questions that fall into a mix of these categories:
41
+ - factual retrieval (directly answerable from one or two sentences),
42
+ - single-hop inference (simple reasoning or rephrasing),
43
+ - multi-hop reasoning (requires combining several parts of the context),
44
+ - “what if” / counterfactual (change one key assumption and ask about it),
45
+ - meta-reasoning (asking about the reasoning or structure in the context).
46
+ 3. Questions MUST be answerable solely from the CONTEXT.
47
+ - If you are unsure the context supports a question, DO NOT ask it.
48
+ 4. Avoid:
49
+ - yes/no questions,
50
+ - vague or extremely open-ended questions,
51
+ - questions that require outside knowledge.
52
+
53
+ FORMAT (MACHINE-FRIENDLY, NO JSON):
54
+
55
+ - Output ONLY the questions.
56
+ - One question per line.
57
+ - No numbering, no bullet points, no explanations.
58
+ - Do not include the analysis block in your output.
59
+ - Do not prefix with “Q1:”, “Q2:” etc.
60
+ - Do not add extra commentary.
61
+
62
+ EXAMPLES OF VALID OUTPUT FORMAT:
63
+
64
+ What are the main reasons given in the context for X?
65
+ How does Y relate to Z according to the context?
66
+ What would likely change if condition A were different, based on the text?
67
+
68
+ Now, think carefully in <analysis> (internally), then output ONLY the questions.
src/pipeline/batch.mjs CHANGED
@@ -233,6 +233,8 @@ export async function runPipelineBatch({
233
  try {
234
  const result = await runPipelineStep({
235
  question: q,
 
 
236
  verbose,
237
  logger,
238
  });
@@ -288,4 +290,3 @@ export async function runPipelineBatch({
288
 
289
  throw new Error(`Unknown PIPELINE_SEED_MODE: ${seedMode}`);
290
  }
291
-
 
233
  try {
234
  const result = await runPipelineStep({
235
  question: q,
236
+ // 🔑 KEY FIX: reuse this ES chunk as the *only* context
237
+ initialContext: [chunk],
238
  verbose,
239
  logger,
240
  });
 
290
 
291
  throw new Error(`Unknown PIPELINE_SEED_MODE: ${seedMode}`);
292
  }
 
src/pipeline/step.mjs CHANGED
@@ -10,12 +10,34 @@ import { preview } from './util.mjs';
10
  * Run a single pipeline step for one question.
11
  *
12
  * Flow:
13
- * retrieval → generator → verifier → reward
14
  *
15
- * No JSON formats required — all models return free-text strings.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  */
17
  export async function runPipelineStep({
18
  question,
 
19
  retrievalMode = process.env.RETRIEVAL_MODE || 'hybrid',
20
  k = Number(process.env.RETRIEVAL_K || '6'),
21
  generatorProvider,
@@ -28,7 +50,7 @@ export async function runPipelineStep({
28
  const errLog = logger?.error?.bind(logger) || console.error;
29
 
30
  // ----------------------------------------
31
- // Validate input question
32
  // ----------------------------------------
33
  if (!question || !question.trim()) {
34
  if (verbose) log(' [pipeline] empty / invalid question, skipping');
@@ -40,28 +62,65 @@ export async function runPipelineStep({
40
  const rewProv = rewardProvider || loadProviderFor('reward');
41
 
42
  // ----------------------------------------
43
- // Retrieval
44
  // ----------------------------------------
45
  let context = [];
46
- try {
47
- if (verbose) log(` [retrieval] mode=${retrievalMode} k=${k}`);
48
- context = await hybridSearch(question, k);
49
 
 
 
 
50
  if (verbose) {
51
- log(` [retrieval] got ${context.length} chunks`);
52
- if (context.length > 0) {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  const first = context[0]?.content ?? '';
54
- log(' [retrieval] first chunk:');
55
  log(' ' + preview(first, 200).replace(/\n/g, '\n '));
56
  }
 
 
 
 
 
 
 
 
57
  }
58
- } catch (e) {
59
- const msg = e?.message || String(e);
60
- if (verbose) errLog(' [retrieval] ERROR:', msg);
 
 
61
  return {
62
  status: 'retrieval_failed',
63
  question,
64
- error: msg,
65
  };
66
  }
67
 
@@ -75,7 +134,7 @@ export async function runPipelineStep({
75
 
76
  if (verbose) {
77
  log(' [generator] answer:');
78
- log(' ' + preview(gen.answer ?? '', 400).replace(/\n/g, '\n '));
79
  }
80
  } catch (e) {
81
  const msg = e?.message || String(e);
@@ -109,8 +168,8 @@ export async function runPipelineStep({
109
  ver = await runVerifier({ question, context, gen }, verProv);
110
 
111
  if (verbose) {
112
- log(' [verifier] ok=' + ver.ok);
113
- log(' ' + preview(ver.raw ?? '', 200).replace(/\n/g, '\n '));
114
  }
115
  } catch (e) {
116
  const msg = e?.message || String(e);
@@ -141,11 +200,11 @@ export async function runPipelineStep({
141
  let rew;
142
  try {
143
  if (verbose) log(' [reward] calling model…');
144
- rew = await runReward({ question, context, gen }, rewProv);
145
 
146
  if (verbose) {
147
- log(` [reward] score=${rew.score} ok=${rew.ok}`);
148
- log(' ' + preview(rew.raw ?? '', 200).replace(/\n/g, '\n '));
149
  }
150
  } catch (e) {
151
  const msg = e?.message || String(e);
 
10
  * Run a single pipeline step for one question.
11
  *
12
  * Flow:
13
+ * retrieval (or provided context) → generator → verifier → reward
14
  *
15
+ * Design constraints:
16
+ * - Exactly one context chunk is used per question.
17
+ * - If `initialContext` is provided, we NEVER hit Elasticsearch.
18
+ * - If we call ES, we still only keep the FIRST returned chunk.
19
+ *
20
+ * Returns:
21
+ * {
22
+ * status: 'accepted'
23
+ * | 'invalid_question'
24
+ * | 'retrieval_failed'
25
+ * | 'generator_failed'
26
+ * | 'verifier_rejected'
27
+ * | 'verifier_error'
28
+ * | 'reward_rejected'
29
+ * | 'reward_error',
30
+ * question,
31
+ * context, // array with exactly one chunk (when successful)
32
+ * gen,
33
+ * ver,
34
+ * rew,
35
+ * error? // optional error message
36
+ * }
37
  */
38
  export async function runPipelineStep({
39
  question,
40
+ initialContext, // optional: [{ id?, content, ... }]
41
  retrievalMode = process.env.RETRIEVAL_MODE || 'hybrid',
42
  k = Number(process.env.RETRIEVAL_K || '6'),
43
  generatorProvider,
 
50
  const errLog = logger?.error?.bind(logger) || console.error;
51
 
52
  // ----------------------------------------
53
+ // Question sanity
54
  // ----------------------------------------
55
  if (!question || !question.trim()) {
56
  if (verbose) log(' [pipeline] empty / invalid question, skipping');
 
62
  const rewProv = rewardProvider || loadProviderFor('reward');
63
 
64
  // ----------------------------------------
65
+ // Retrieval / context selection
66
  // ----------------------------------------
67
  let context = [];
 
 
 
68
 
69
+ if (initialContext && Array.isArray(initialContext) && initialContext.length > 0) {
70
+ // Use provided context, no ES call
71
+ context = initialContext.slice(0, 1); // enforce single-chunk invariant
72
  if (verbose) {
73
+ log(
74
+ ` [retrieval] using initialContext provided (len=${initialContext.length}), ` +
75
+ `keeping first chunk only`,
76
+ );
77
+ const first = context[0]?.content ?? '';
78
+ log(' [context] first chunk (provided):');
79
+ log(' ' + preview(first, 200).replace(/\n/g, '\n '));
80
+ }
81
+ } else {
82
+ // Go to ES exactly once
83
+ try {
84
+ if (verbose) log(` [retrieval] mode=${retrievalMode} k=${k}`);
85
+ const hits = await hybridSearch(question, k);
86
+ if (verbose) {
87
+ log(` [retrieval] got ${hits.length} chunks from ES`);
88
+ }
89
+
90
+ if (!hits || hits.length === 0) {
91
+ if (verbose) log(' [retrieval] no chunks found → retrieval_failed');
92
+ return {
93
+ status: 'retrieval_failed',
94
+ question,
95
+ error: 'no_chunks',
96
+ };
97
+ }
98
+
99
+ // Enforce single-chunk context
100
+ context = [hits[0]];
101
+ if (verbose) {
102
  const first = context[0]?.content ?? '';
103
+ log(' [context] first chunk (from ES):');
104
  log(' ' + preview(first, 200).replace(/\n/g, '\n '));
105
  }
106
+ } catch (e) {
107
+ const msg = e?.message || String(e);
108
+ if (verbose) errLog(' [retrieval] ERROR:', msg);
109
+ return {
110
+ status: 'retrieval_failed',
111
+ question,
112
+ error: msg,
113
+ };
114
  }
115
+ }
116
+
117
+ // Safety: if somehow context is still empty here, fail fast
118
+ if (!context || context.length === 0) {
119
+ if (verbose) log(' [retrieval] context empty after selection → retrieval_failed');
120
  return {
121
  status: 'retrieval_failed',
122
  question,
123
+ error: 'empty_context',
124
  };
125
  }
126
 
 
134
 
135
  if (verbose) {
136
  log(' [generator] answer:');
137
+ log(' ' + preview(gen?.answer ?? '', 400).replace(/\n/g, '\n '));
138
  }
139
  } catch (e) {
140
  const msg = e?.message || String(e);
 
168
  ver = await runVerifier({ question, context, gen }, verProv);
169
 
170
  if (verbose) {
171
+ log(' [verifier] ok=' + (ver?.ok === true));
172
+ log(' ' + preview(ver?.raw ?? '', 200).replace(/\n/g, '\n '));
173
  }
174
  } catch (e) {
175
  const msg = e?.message || String(e);
 
200
  let rew;
201
  try {
202
  if (verbose) log(' [reward] calling model…');
203
+ rew = await runReward({ question, context, gen, ver }, rewProv);
204
 
205
  if (verbose) {
206
+ log(` [reward] score=${rew?.score} ok=${rew?.ok}`);
207
+ log(' ' + preview(rew?.raw ?? '', 200).replace(/\n/g, '\n '));
208
  }
209
  } catch (e) {
210
  const msg = e?.message || String(e);
src/providers/ollama_provider.mjs CHANGED
@@ -1,6 +1,10 @@
1
  // src/providers/ollama_provider.mjs
2
  import { BaseProvider } from './base.mjs';
3
 
 
 
 
 
4
  function normalizeBase(url) {
5
  // strip trailing slashes so we can safely append /api/generate
6
  return url.replace(/\/+$/, '');
@@ -8,21 +12,42 @@ function normalizeBase(url) {
8
 
9
  export class OllamaProvider extends BaseProvider {
10
  /**
11
- * @param {object} opts
12
- * @param {string} [opts.model] - model name/tag in Ollama
13
- * @param {string} [opts.baseUrl] - base Ollama URL (without /api/generate)
14
  */
15
  constructor(opts = {}) {
16
  super();
17
 
 
 
 
 
 
 
 
 
 
 
18
  // Base URL: env or default, WITHOUT endpoint path
19
  const envBase = process.env.OLLAMA_URL || 'http://localhost:11434';
20
- this.baseUrl = normalizeBase(opts.baseUrl || envBase);
21
 
22
- // Model: allow stage-specific env, then generic, then default
 
 
 
 
 
 
 
 
 
 
 
23
  this.model =
24
- opts.model ||
25
- process.env.GENERATOR_MODEL ||
26
  process.env.OLLAMA_MODEL ||
27
  'qwen3-vl:8b-thinking';
28
  }
@@ -35,14 +60,21 @@ export class OllamaProvider extends BaseProvider {
35
  async generate(prompt) {
36
  const url = `${this.baseUrl}/api/generate`;
37
 
 
 
 
 
 
 
 
 
 
 
 
38
  const res = await fetch(url, {
39
  method: 'POST',
40
  headers: { 'Content-Type': 'application/json' },
41
- body: JSON.stringify({
42
- model: this.model,
43
- prompt,
44
- stream: false,
45
- }),
46
  });
47
 
48
  if (!res.ok) {
 
1
  // src/providers/ollama_provider.mjs
2
  import { BaseProvider } from './base.mjs';
3
 
4
+ const ENABLE_REASONING =
5
+ process.env.OLLAMA_REASONING === '1' ||
6
+ process.env.OLLAMA_REASONING === 'true';
7
+
8
  function normalizeBase(url) {
9
  // strip trailing slashes so we can safely append /api/generate
10
  return url.replace(/\/+$/, '');
 
12
 
13
  export class OllamaProvider extends BaseProvider {
14
  /**
15
+ * @param {object|string} opts
16
+ * - if string: treated as stage name ('generator' | 'verifier' | 'reward' | 'question')
17
+ * - if object: { model?, baseUrl?, stage? }
18
  */
19
  constructor(opts = {}) {
20
  super();
21
 
22
+ let stage = null;
23
+ let options = {};
24
+
25
+ if (typeof opts === 'string') {
26
+ stage = opts;
27
+ } else if (opts && typeof opts === 'object') {
28
+ options = opts;
29
+ stage = opts.stage || null;
30
+ }
31
+
32
  // Base URL: env or default, WITHOUT endpoint path
33
  const envBase = process.env.OLLAMA_URL || 'http://localhost:11434';
34
+ this.baseUrl = normalizeBase(options.baseUrl || envBase);
35
 
36
+ // Stage-specific model: QUESTION_MODEL, GENERATOR_MODEL, VERIFIER_MODEL, REWARD_MODEL
37
+ let stageModel = null;
38
+ if (stage) {
39
+ const key = `${stage.toUpperCase()}_MODEL`;
40
+ stageModel = process.env[key] || null;
41
+ }
42
+
43
+ // Model resolution order:
44
+ // 1) explicit opts.model
45
+ // 2) stage-specific env (e.g. GENERATOR_MODEL)
46
+ // 3) generic OLLAMA_MODEL
47
+ // 4) default qwen3-vl:8b-thinking
48
  this.model =
49
+ options.model ||
50
+ stageModel ||
51
  process.env.OLLAMA_MODEL ||
52
  'qwen3-vl:8b-thinking';
53
  }
 
60
  async generate(prompt) {
61
  const url = `${this.baseUrl}/api/generate`;
62
 
63
+ const body = {
64
+ model: this.model,
65
+ prompt,
66
+ stream: false, // single JSON response, easier for pipeline/tests
67
+ };
68
+
69
+ // enable Ollama reasoning mode for *-thinking models when requested
70
+ if (ENABLE_REASONING) {
71
+ body.options = { reasoning: true };
72
+ }
73
+
74
  const res = await fetch(url, {
75
  method: 'POST',
76
  headers: { 'Content-Type': 'application/json' },
77
+ body: JSON.stringify(body),
 
 
 
 
78
  });
79
 
80
  if (!res.ok) {
src/question/question_core.mjs CHANGED
@@ -1,70 +1,126 @@
1
  // src/question/question_core.mjs
2
  import fs from 'fs/promises';
3
  import path from 'path';
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  async function loadQuestionTemplate() {
6
- const filePath = path.resolve(
7
- path.dirname(new URL(import.meta.url).pathname),
8
- '..',
9
- '..',
10
- 'prompts',
11
- 'question_prompt.txt',
12
- );
13
- return await fs.readFile(filePath, 'utf8');
14
  }
15
 
16
  /**
17
- * runQuestionGenerator
18
  *
19
- * @param {string} contextText - text chunk we want questions about
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  * @param {object} provider - { generate(prompt) → string }
21
- * @param {object} options
22
- * - maxQuestions: how many questions to ask for
23
  *
24
- * @returns {object} {
25
- * questions: string[],
26
  * raw: string,
 
 
 
27
  * parsed: any
28
- * }
29
  */
30
  export async function runQuestionGenerator(
31
  contextText,
32
  provider,
33
- { maxQuestions = 5 } = {},
34
  ) {
35
- if (!provider || typeof provider.generate !== 'function') {
36
- throw new Error('runQuestionGenerator: provider.generate() not found');
37
- }
38
 
39
  const template = await loadQuestionTemplate();
40
 
41
  const prompt = template
42
- .replace('{{CONTEXT}}', contextText)
43
- .replace('{{MAX_QUESTIONS}}', String(maxQuestions));
44
 
45
  const raw = await provider.generate(prompt);
46
 
47
- let parsed;
48
- try {
49
- parsed = JSON.parse(raw);
50
- } catch {
51
- parsed = { error: 'invalid_json', raw };
52
- }
53
-
54
- let questions = [];
55
 
56
- if (Array.isArray(parsed?.questions)) {
57
- questions = parsed.questions.map((q) => String(q).trim()).filter(Boolean);
58
- } else if (Array.isArray(parsed)) {
59
- questions = parsed.map((q) => String(q).trim()).filter(Boolean);
60
- } else if (typeof parsed?.question === 'string') {
61
- questions = [parsed.question.trim()];
62
- }
63
-
64
- return { questions, raw, parsed };
65
  }
66
 
67
  export default {
68
  runQuestionGenerator,
69
  };
70
-
 
1
  // src/question/question_core.mjs
2
  import fs from 'fs/promises';
3
  import path from 'path';
4
+ import { fileURLToPath } from 'url';
5
+
6
+ const __filename = fileURLToPath(import.meta.url);
7
+ const __dirname = path.dirname(__filename);
8
+
9
+ const TEMPLATE_PATH = path.resolve(
10
+ __dirname,
11
+ '..',
12
+ '..',
13
+ 'prompts',
14
+ 'question_prompt.txt',
15
+ );
16
+
17
+ let cachedTemplate = null;
18
 
19
  async function loadQuestionTemplate() {
20
+ if (cachedTemplate) return cachedTemplate;
21
+ cachedTemplate = await fs.readFile(TEMPLATE_PATH, 'utf8');
22
+ return cachedTemplate;
 
 
 
 
 
23
  }
24
 
25
  /**
26
+ * Extract questions using JSON-first, then plain-text fallback.
27
  *
28
+ * @param {string} raw
29
+ * @param {number} maxQuestions
30
+ * @returns {{ questions: string[], parsed: any }}
31
+ */
32
+ function parseQuestions(raw, maxQuestions) {
33
+ let parsed = null;
34
+ let questions = [];
35
+
36
+ if (!raw || typeof raw !== 'string') {
37
+ return { questions, parsed };
38
+ }
39
+
40
+ // ----- 1) Try JSON -----
41
+ try {
42
+ const json = JSON.parse(raw);
43
+ parsed = json;
44
+
45
+ // Case A: { questions: [...] }
46
+ if (json && Array.isArray(json.questions)) {
47
+ questions = json.questions
48
+ .map((q) => String(q).trim())
49
+ .filter((q) => q.length > 0);
50
+ }
51
+ // Case B: root is an array: [ "Q1?", "Q2?" ]
52
+ else if (Array.isArray(json)) {
53
+ questions = json
54
+ .map((q) => String(q).trim())
55
+ .filter((q) => q.length > 0);
56
+ }
57
+ } catch (e) {
58
+ parsed = { error: 'invalid_json', message: e?.message };
59
+ }
60
+
61
+ // ----- 2) Plain-text fallback if we still have no questions -----
62
+ if (!questions.length) {
63
+ const lines = raw
64
+ .split('\n')
65
+ .map((l) => l.trim())
66
+ // strip bullets / numbering: "1. ", "- ", "* ", "• "
67
+ .map((l) => l.replace(/^[-•*()\d.\s]+/, ''))
68
+ // keep lines that look like questions
69
+ .filter((l) => l.length > 0 && /[??!]$/.test(l));
70
+
71
+ questions = lines;
72
+ }
73
+
74
+ if (questions.length > maxQuestions) {
75
+ questions = questions.slice(0, maxQuestions);
76
+ }
77
+
78
+ return { questions, parsed };
79
+ }
80
+
81
+ /**
82
+ * Build prompt and generate questions from a context chunk.
83
+ *
84
+ * @param {string} contextText - chunk from ES
85
  * @param {object} provider - { generate(prompt) → string }
86
+ * @param {object} opts
87
+ * - maxQuestions?: number (defaults QUESTION_MAX or 5)
88
  *
89
+ * @returns {Promise<{
 
90
  * raw: string,
91
+ * prompt: string,
92
+ * questions: string[],
93
+ * maxQuestions: number,
94
  * parsed: any
95
+ * }>}
96
  */
97
  export async function runQuestionGenerator(
98
  contextText,
99
  provider,
100
+ opts = {},
101
  ) {
102
+ const maxQuestions =
103
+ opts.maxQuestions ?? Number(process.env.QUESTION_MAX || '5');
 
104
 
105
  const template = await loadQuestionTemplate();
106
 
107
  const prompt = template
108
+ .replace(/{{CONTEXT}}/g, contextText)
109
+ .replace(/{{MAX_QUESTIONS}}/g, String(maxQuestions));
110
 
111
  const raw = await provider.generate(prompt);
112
 
113
+ const { questions, parsed } = parseQuestions(raw, maxQuestions);
 
 
 
 
 
 
 
114
 
115
+ return {
116
+ raw,
117
+ prompt,
118
+ questions,
119
+ maxQuestions,
120
+ parsed,
121
+ };
 
 
122
  }
123
 
124
  export default {
125
  runQuestionGenerator,
126
  };
 
tests/pipeline.mock.test.mjs CHANGED
@@ -9,19 +9,24 @@ vi.mock('../src/providers/provider.mjs', () => {
9
  return {
10
  stage,
11
  async generate(prompt) {
 
 
 
12
  if (stage === 'generator') {
13
- // generator returns a plain-text answer
14
- return 'mocked answer';
15
  }
16
  if (stage === 'verifier') {
17
- // verifier first line YES ok = true
18
- return 'YES\nLooks good';
19
  }
20
  if (stage === 'reward') {
21
- // reward outputs a numeric score between 0 and 1
22
- return '0.99\nExcellent sample';
23
  }
24
- return 'YES';
 
 
25
  },
26
  };
27
  },
@@ -46,23 +51,20 @@ describe('runPipelineStep (mocked providers)', () => {
46
  });
47
 
48
  it('runs a full pipeline step successfully', async () => {
49
- const result = await runPipelineStep({
50
- question: 'What is mock testing?',
51
- verbose: false,
52
- logger: console,
53
- });
54
 
55
  expect(result.status).toBe('accepted');
56
 
57
  // generator output made it through
58
- expect(result.gen.answer).toBe('mocked answer');
59
 
60
  // verifier + reward both say OK
61
  expect(result.ver.ok).toBe(true);
62
  expect(result.rew.ok).toBe(true);
63
- expect(result.rew.score).toBeCloseTo(0.99, 5);
64
 
65
- // context came from mocked retrieval
66
- expect(result.context.length).toBe(2);
 
 
67
  });
68
  });
 
9
  return {
10
  stage,
11
  async generate(prompt) {
12
+ // simple debug guard if needed:
13
+ // console.log(`[mock ${stage}] prompt:\n`, prompt);
14
+
15
  if (stage === 'generator') {
16
+ // pretend generator returns a plain-text answer
17
+ return 'mocked';
18
  }
19
  if (stage === 'verifier') {
20
+ // verifier returns a "yes" first line so runVerifier.ok = true
21
+ return 'yes\nmock verifier justification';
22
  }
23
  if (stage === 'reward') {
24
+ // reward returns a score in [0,1]
25
+ return '0.9 great sample';
26
  }
27
+
28
+ // fallback
29
+ return 'ok';
30
  },
31
  };
32
  },
 
51
  });
52
 
53
  it('runs a full pipeline step successfully', async () => {
54
+ const result = await runPipelineStep({ question: 'What is mock testing?' });
 
 
 
 
55
 
56
  expect(result.status).toBe('accepted');
57
 
58
  // generator output made it through
59
+ expect(result.gen.answer).toBe('mocked');
60
 
61
  // verifier + reward both say OK
62
  expect(result.ver.ok).toBe(true);
63
  expect(result.rew.ok).toBe(true);
 
64
 
65
+ // NEW CONTRACT:
66
+ // even though retrieval returns 2 chunks, step.mjs enforces a single-chunk context
67
+ expect(result.context.length).toBe(1);
68
+ expect(result.context[0].content).toBe('mock context 1');
69
  });
70
  });
try_prompt.sh ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ MODEL="$1"
5
+ PROMPT_FILE="$2"
6
+ REASONING_FLAG="${3:-}"
7
+
8
+ ES_NODE="${ES_NODE:-http://localhost:9200}"
9
+ ES_INDEX="${ES_INDEX:-quo_distill_index}"
10
+ OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
11
+
12
+ if [[ -z "${MODEL:-}" || -z "${PROMPT_FILE:-}" ]]; then
13
+ echo "Usage: $0 <model> <prompt_file> [-r]"
14
+ exit 1
15
+ fi
16
+
17
+ if [[ ! -f "$PROMPT_FILE" ]]; then
18
+ echo "❌ Error: prompt file '$PROMPT_FILE' not found."
19
+ exit 1
20
+ fi
21
+
22
+ ############################################################
23
+ # 1. Fetch random ES chunk
24
+ ############################################################
25
+ echo "📡 Fetching 1 random chunk from Elasticsearch…"
26
+
27
+ RANDOM_DOC=$(curl -s -X POST "$ES_NODE/$ES_INDEX/_search" \
28
+ -H "Content-Type: application/json" \
29
+ -d '{
30
+ "size": 1,
31
+ "query": { "function_score": { "random_score": {} } }
32
+ }')
33
+
34
+ CHUNK=$(echo "$RANDOM_DOC" | jq -r '.hits.hits[0]._source.content')
35
+ DOC_ID=$(echo "$RANDOM_DOC" | jq -r '.hits.hits[0]._id')
36
+
37
+ echo "🧩 Random chunk ID: $DOC_ID"
38
+ echo "----------------------------------------------"
39
+ echo "$CHUNK" | head -n 20
40
+ echo "… (truncated)"
41
+ echo "----------------------------------------------"
42
+
43
+ ############################################################
44
+ # 2. Replace {{CONTEXT}} in prompt
45
+ ############################################################
46
+ RAW_PROMPT=$(cat "$PROMPT_FILE")
47
+ PROMPT="${RAW_PROMPT//\{\{CONTEXT\}\}/$CHUNK}"
48
+
49
+ ############################################################
50
+ # 3. Build JSON payload (no jq merging!)
51
+ ############################################################
52
+ if [[ "$REASONING_FLAG" == "-r" ]]; then
53
+ echo "🧠 Reasoning mode: ON"
54
+ OPTIONS='"options":{"reasoning":true},'
55
+ else
56
+ echo "🧠 Reasoning mode: OFF"
57
+ OPTIONS=""
58
+ fi
59
+
60
+ # Safely quote prompt text
61
+ PROMPT_JSON=$(printf '%s' "$PROMPT" | jq -Rs .)
62
+
63
+ # Build payload manually — no parsing of fragments
64
+ PAYLOAD=$(cat <<EOF
65
+ {
66
+ "model": "$MODEL",
67
+ "prompt": $PROMPT_JSON,
68
+ $OPTIONS
69
+ "stream": false
70
+ }
71
+ EOF
72
+ )
73
+
74
+ ############################################################
75
+ # 4. Send request to Ollama
76
+ ############################################################
77
+ echo
78
+ echo "🚀 Sending to Ollama ($MODEL)…"
79
+ echo "=============================================="
80
+ echo
81
+
82
+ curl -s -X POST "$OLLAMA_URL/api/generate" \
83
+ -H "Content-Type: application/json" \
84
+ -d "$PAYLOAD" \
85
+ | jq -r '.response // .message // .output'