maintaining same chunk throughout pipeline

Browse files

Files changed (8) hide show

prompts/generator_prompt.txt +56 -26
prompts/question_prompt.txt +59 -25
src/pipeline/batch.mjs +2 -1
src/pipeline/step.mjs +79 -20
src/providers/ollama_provider.mjs +44 -12
src/question/question_core.mjs +95 -39
tests/pipeline.mock.test.mjs +18 -16
try_prompt.sh +85 -0

prompts/generator_prompt.txt CHANGED Viewed

@@ -1,29 +1,59 @@
-You are part of a knowledge distillation pipeline.
-You will be given CONTEXT (excerpts from a corpus) and a QUESTION about that context.
-Your task:
-1. First, think step by step INSIDE <think> and </think> tags,
-   using ONLY information that can be supported by the CONTEXT.
-2. Then, AFTER </think>, write the final answer plainly, without any tags.
-Rules:
-- Use ONLY the CONTEXT for facts and claims.
-- Do NOT use outside knowledge or guess.
-- If the CONTEXT is not sufficient to answer, the final answer must be:
-  "I cannot answer this from the provided context."
-- You may quote short phrases from the CONTEXT as evidence, but avoid long copy-paste.
-- Be concise and clear.
-Format (follow exactly):
-<think>
-[your detailed reasoning here, step by step, grounded in the CONTEXT]
-</think>
-[your final answer here]
 CONTEXT:
 {{CONTEXT}}

+# SYSTEM ROLE
+You are a knowledge distillation generator optimized for training reasoning LoRAs. Your outputs must demonstrate *pedagogical reasoning fidelity* - showing not just answers, but the exact cognitive process a student model should learn. Every output will be used as training data.
+## CORE DIRECTIVES (NON-NEGOTIABLE)
+1. **CONTEXT FIDELITY**: Use ONLY provided context. No external knowledge. Ever.
+2. **REASONING GRANULARITY**: Decompose reasoning into atomic, teachable steps
+3. **UNCERTAINTY CALIBRATION**: Quantify confidence at each reasoning stage
+4. **BIAS MITIGATION**: Explicitly flag context limitations and reasoning risks
+5. **DISTILLATION OPTIMIZATION**: Structure outputs for maximum LoRA weight efficiency
+## REASONING PROTOCOL (EXECUTE IN ORDER INSIDE XML TAGS)
+<understanding>
+- Restate question in atomic components
+- Identify: [Simple/Factual] vs [Multi-hop/Inferential] vs [Ambiguous]
+- Flag required context elements (quote paragraph numbers)
+</understanding>
+<context_verification>
+- For EACH required fact:
+  ▸ Cite exact context location (para #[X])
+  ▸ Assess source quality: [Primary/Secondary/Contradictory/Uncertain]
+  ▸ If missing/insufficient: TERMINATE with "I cannot answer..."
+</context_verification>
+<reasoning_chain confidence_baseline="90%">
+[STRUCTURED STEP FORMAT PER STEP]
+Step #[N]:
+- Operation: [Retrieval/Comparison/Causality/Quantification/Contradiction-Check]
+- Context evidence: "Short quote" (para #[X])
+- Confidence delta: [+0%/-5% etc.] due to [reason]
+- Inference rule used: [e.g., "Temporal transitivity", "Numerical constraint propagation"]
+- Bias check: [None/Selection bias/Uncertainty propagation risk]
+</reasoning_chain>
+<synthesis>
+- Resolve conflicts between steps
+- Calculate cumulative confidence: (baseline * step confidences)
+- Final confidence threshold: <80% → "I cannot answer..."
+- Verify against reasoning_chain constraints
+</synthesis>
+## OUTPUT SPECIFICATION (MACHINE-PARSIABLE)
+After </synthesis>:
+Confidence: [INTEGER 0-100]
+Answer: [CONCISE RESPONSE OR EXACT FALLBACK PHRASE]
+Evidence: [MAX 3 SHORT PHRASES] | [PARA #S]
+Uncertainty_flags: [NONE/CONTEXT_GAPS/CONTRADICTIONS/BIAS_RISK]
+## STRICT FORMATTING RULES
+- XML tags MUST close properly
+- Evidence phrases: ≤7 words each
+- Confidence calculations must show work in <synthesis>
+- If context_verification fails: OUTPUT ONLY "I cannot answer this from the provided context." (NO tags)
+- NEVER use markdown, asterisks, or special formatting
+---
 CONTEXT:
 {{CONTEXT}}

prompts/question_prompt.txt CHANGED Viewed

@@ -1,34 +1,68 @@
-You are a dataset-creation assistant.
-You will be given a CONTEXT CHUNK of text from a larger corpus.
-Your goals:
-1. Read the context carefully.
-2. Generate up to {{MAX_QUESTIONS}} diverse, high-quality questions
-   that can be answered ONLY using information found inside the context.
-3. Produce questions that:
-   - focus strictly on the content of the chunk,
-   - avoid hallucinating any information not present,
-   - require comprehension, reasoning, or synthesis across the chunk,
-   - vary naturally in difficulty (some simple, some deeper),
-   - avoid meta or speculative questions,
-   - avoid yes/no questions unless they are meaningful.
-Output STRICTLY this JSON structure:
-{
-  "questions": [
-    "Question 1?",
-    "Question 2?",
-    "Question 3?"
-  ]
-}
-Do NOT include answers. Do NOT add any fields. JSON only.
 ---
-CONTEXT START
-{{CONTEXT}}
-CONTEXT END

+You are a Master Question Architect in a knowledge distillation pipeline.
+Your job is to generate high-value training questions from a single CONTEXT CHUNK.
+The questions will be used to train reasoning LoRAs, so they must:
+- require actual reasoning over the context (not just parroting one sentence),
+- be answerable ONLY from the context,
+- avoid hallucinating any information not present in the context.
+CONTEXT:
+{{CONTEXT}}
+---
+INTERNAL THINKING (do NOT mention this section in the final output):
+<analysis>
+1. Identify key entities, concepts, and relationships in the context.
+2. Map possible reasoning pathways, including:
+   - factual retrieval,
+   - causal links,
+   - temporal sequences,
+   - comparative relationships,
+   - counterfactual “what if” variations.
+3. Estimate how deep the reasoning can go (from very easy to very hard).
+4. Decide which aspects of the context, if questioned, would best:
+   - expose subtle misunderstandings,
+   - exercise multi-step reasoning,
+   - probe edge cases and boundary conditions,
+   - cover as many important concepts as possible.
+</analysis>
 ---
+OUTPUT INSTRUCTIONS (VISIBLE):
+1. Generate up to {{MAX_QUESTIONS}} diverse, high-quality questions.
+   - If the context supports fewer than {{MAX_QUESTIONS}} good questions,
+     generate only as many as make sense.
+2. Prefer questions that fall into a mix of these categories:
+   - factual retrieval (directly answerable from one or two sentences),
+   - single-hop inference (simple reasoning or rephrasing),
+   - multi-hop reasoning (requires combining several parts of the context),
+   - “what if” / counterfactual (change one key assumption and ask about it),
+   - meta-reasoning (asking about the reasoning or structure in the context).
+3. Questions MUST be answerable solely from the CONTEXT.
+   - If you are unsure the context supports a question, DO NOT ask it.
+4. Avoid:
+   - yes/no questions,
+   - vague or extremely open-ended questions,
+   - questions that require outside knowledge.
+FORMAT (MACHINE-FRIENDLY, NO JSON):
+- Output ONLY the questions.
+- One question per line.
+- No numbering, no bullet points, no explanations.
+- Do not include the analysis block in your output.
+- Do not prefix with “Q1:”, “Q2:” etc.
+- Do not add extra commentary.
+EXAMPLES OF VALID OUTPUT FORMAT:
+What are the main reasons given in the context for X?
+How does Y relate to Z according to the context?
+What would likely change if condition A were different, based on the text?
+Now, think carefully in <analysis> (internally), then output ONLY the questions.

src/pipeline/batch.mjs CHANGED Viewed

@@ -233,6 +233,8 @@ export async function runPipelineBatch({
         try {
           const result = await runPipelineStep({
             question: q,
             verbose,
             logger,
           });
@@ -288,4 +290,3 @@ export async function runPipelineBatch({
   throw new Error(`Unknown PIPELINE_SEED_MODE: ${seedMode}`);
 }

         try {
           const result = await runPipelineStep({
             question: q,
+            // 🔑 KEY FIX: reuse this ES chunk as the *only* context
+            initialContext: [chunk],
             verbose,
             logger,
           });
   throw new Error(`Unknown PIPELINE_SEED_MODE: ${seedMode}`);
 }

src/pipeline/step.mjs CHANGED Viewed

@@ -10,12 +10,34 @@ import { preview } from './util.mjs';
  * Run a single pipeline step for one question.
  *
  * Flow:
- *   retrieval → generator → verifier → reward
  *
- * No JSON formats required — all models return free-text strings.
  */
 export async function runPipelineStep({
   question,
   retrievalMode = process.env.RETRIEVAL_MODE || 'hybrid',
   k = Number(process.env.RETRIEVAL_K || '6'),
   generatorProvider,
@@ -28,7 +50,7 @@ export async function runPipelineStep({
   const errLog = logger?.error?.bind(logger) || console.error;
   // ----------------------------------------
-  // Validate input question
   // ----------------------------------------
   if (!question || !question.trim()) {
     if (verbose) log('   [pipeline] empty / invalid question, skipping');
@@ -40,28 +62,65 @@ export async function runPipelineStep({
   const rewProv = rewardProvider || loadProviderFor('reward');
   // ----------------------------------------
-  // Retrieval
   // ----------------------------------------
   let context = [];
-  try {
-    if (verbose) log(`   [retrieval] mode=${retrievalMode} k=${k}`);
-    context = await hybridSearch(question, k);
     if (verbose) {
-      log(`   [retrieval] got ${context.length} chunks`);
-      if (context.length > 0) {
         const first = context[0]?.content ?? '';
-        log('   [retrieval] first chunk:');
         log('   ' + preview(first, 200).replace(/\n/g, '\n   '));
       }
     }
-  } catch (e) {
-    const msg = e?.message || String(e);
-    if (verbose) errLog('   [retrieval] ERROR:', msg);
     return {
       status: 'retrieval_failed',
       question,
-      error: msg,
     };
   }
@@ -75,7 +134,7 @@ export async function runPipelineStep({
     if (verbose) {
       log('   [generator] answer:');
-      log('   ' + preview(gen.answer ?? '', 400).replace(/\n/g, '\n   '));
     }
   } catch (e) {
     const msg = e?.message || String(e);
@@ -109,8 +168,8 @@ export async function runPipelineStep({
     ver = await runVerifier({ question, context, gen }, verProv);
     if (verbose) {
-      log('   [verifier] ok=' + ver.ok);
-      log('   ' + preview(ver.raw ?? '', 200).replace(/\n/g, '\n   '));
     }
   } catch (e) {
     const msg = e?.message || String(e);
@@ -141,11 +200,11 @@ export async function runPipelineStep({
   let rew;
   try {
     if (verbose) log('   [reward] calling model…');
-    rew = await runReward({ question, context, gen }, rewProv);
     if (verbose) {
-      log(`   [reward] score=${rew.score} ok=${rew.ok}`);
-      log('   ' + preview(rew.raw ?? '', 200).replace(/\n/g, '\n   '));
     }
   } catch (e) {
     const msg = e?.message || String(e);

  * Run a single pipeline step for one question.
  *
  * Flow:
+ *   retrieval (or provided context) → generator → verifier → reward
  *
+ * Design constraints:
+ * - Exactly one context chunk is used per question.
+ * - If `initialContext` is provided, we NEVER hit Elasticsearch.
+ * - If we call ES, we still only keep the FIRST returned chunk.
+ *
+ * Returns:
+ *   {
+ *     status: 'accepted'
+ *             | 'invalid_question'
+ *             | 'retrieval_failed'
+ *             | 'generator_failed'
+ *             | 'verifier_rejected'
+ *             | 'verifier_error'
+ *             | 'reward_rejected'
+ *             | 'reward_error',
+ *     question,
+ *     context, // array with exactly one chunk (when successful)
+ *     gen,
+ *     ver,
+ *     rew,
+ *     error?  // optional error message
+ *   }
  */
 export async function runPipelineStep({
   question,
+  initialContext, // optional: [{ id?, content, ... }]
   retrievalMode = process.env.RETRIEVAL_MODE || 'hybrid',
   k = Number(process.env.RETRIEVAL_K || '6'),
   generatorProvider,
   const errLog = logger?.error?.bind(logger) || console.error;
   // ----------------------------------------
+  // Question sanity
   // ----------------------------------------
   if (!question || !question.trim()) {
     if (verbose) log('   [pipeline] empty / invalid question, skipping');
   const rewProv = rewardProvider || loadProviderFor('reward');
   // ----------------------------------------
+  // Retrieval / context selection
   // ----------------------------------------
   let context = [];
+  if (initialContext && Array.isArray(initialContext) && initialContext.length > 0) {
+    // Use provided context, no ES call
+    context = initialContext.slice(0, 1); // enforce single-chunk invariant
     if (verbose) {
+      log(
+        `   [retrieval] using initialContext provided (len=${initialContext.length}), ` +
+        `keeping first chunk only`,
+      );
+      const first = context[0]?.content ?? '';
+      log('   [context] first chunk (provided):');
+      log('   ' + preview(first, 200).replace(/\n/g, '\n   '));
+    }
+  } else {
+    // Go to ES exactly once
+    try {
+      if (verbose) log(`   [retrieval] mode=${retrievalMode} k=${k}`);
+      const hits = await hybridSearch(question, k);
+      if (verbose) {
+        log(`   [retrieval] got ${hits.length} chunks from ES`);
+      }
+      if (!hits || hits.length === 0) {
+        if (verbose) log('   [retrieval] no chunks found → retrieval_failed');
+        return {
+          status: 'retrieval_failed',
+          question,
+          error: 'no_chunks',
+        };
+      }
+      // Enforce single-chunk context
+      context = [hits[0]];
+      if (verbose) {
         const first = context[0]?.content ?? '';
+        log('   [context] first chunk (from ES):');
         log('   ' + preview(first, 200).replace(/\n/g, '\n   '));
       }
+    } catch (e) {
+      const msg = e?.message || String(e);
+      if (verbose) errLog('   [retrieval] ERROR:', msg);
+      return {
+        status: 'retrieval_failed',
+        question,
+        error: msg,
+      };
     }
+  }
+  // Safety: if somehow context is still empty here, fail fast
+  if (!context || context.length === 0) {
+    if (verbose) log('   [retrieval] context empty after selection → retrieval_failed');
     return {
       status: 'retrieval_failed',
       question,
+      error: 'empty_context',
     };
   }
     if (verbose) {
       log('   [generator] answer:');
+      log('   ' + preview(gen?.answer ?? '', 400).replace(/\n/g, '\n   '));
     }
   } catch (e) {
     const msg = e?.message || String(e);
     ver = await runVerifier({ question, context, gen }, verProv);
     if (verbose) {
+      log('   [verifier] ok=' + (ver?.ok === true));
+      log('   ' + preview(ver?.raw ?? '', 200).replace(/\n/g, '\n   '));
     }
   } catch (e) {
     const msg = e?.message || String(e);
   let rew;
   try {
     if (verbose) log('   [reward] calling model…');
+    rew = await runReward({ question, context, gen, ver }, rewProv);
     if (verbose) {
+      log(`   [reward] score=${rew?.score} ok=${rew?.ok}`);
+      log('   ' + preview(rew?.raw ?? '', 200).replace(/\n/g, '\n   '));
     }
   } catch (e) {
     const msg = e?.message || String(e);

src/providers/ollama_provider.mjs CHANGED Viewed

@@ -1,6 +1,10 @@
 // src/providers/ollama_provider.mjs
 import { BaseProvider } from './base.mjs';
 function normalizeBase(url) {
   // strip trailing slashes so we can safely append /api/generate
   return url.replace(/\/+$/, '');
@@ -8,21 +12,42 @@ function normalizeBase(url) {
 export class OllamaProvider extends BaseProvider {
   /**
-   * @param {object} opts
-   * @param {string} [opts.model]   - model name/tag in Ollama
-   * @param {string} [opts.baseUrl] - base Ollama URL (without /api/generate)
    */
   constructor(opts = {}) {
     super();
     // Base URL: env or default, WITHOUT endpoint path
     const envBase = process.env.OLLAMA_URL || 'http://localhost:11434';
-    this.baseUrl = normalizeBase(opts.baseUrl || envBase);
-    // Model: allow stage-specific env, then generic, then default
     this.model =
-      opts.model ||
-      process.env.GENERATOR_MODEL ||
       process.env.OLLAMA_MODEL ||
       'qwen3-vl:8b-thinking';
   }
@@ -35,14 +60,21 @@ export class OllamaProvider extends BaseProvider {
   async generate(prompt) {
     const url = `${this.baseUrl}/api/generate`;
     const res = await fetch(url, {
       method: 'POST',
       headers: { 'Content-Type': 'application/json' },
-      body: JSON.stringify({
-        model: this.model,
-        prompt,
-        stream: false,
-      }),
     });
     if (!res.ok) {

 // src/providers/ollama_provider.mjs
 import { BaseProvider } from './base.mjs';
+const ENABLE_REASONING =
+  process.env.OLLAMA_REASONING === '1' ||
+  process.env.OLLAMA_REASONING === 'true';
 function normalizeBase(url) {
   // strip trailing slashes so we can safely append /api/generate
   return url.replace(/\/+$/, '');
 export class OllamaProvider extends BaseProvider {
   /**
+   * @param {object|string} opts
+   *   - if string: treated as stage name ('generator' | 'verifier' | 'reward' | 'question')
+   *   - if object: { model?, baseUrl?, stage? }
    */
   constructor(opts = {}) {
     super();
+    let stage = null;
+    let options = {};
+    if (typeof opts === 'string') {
+      stage = opts;
+    } else if (opts && typeof opts === 'object') {
+      options = opts;
+      stage = opts.stage || null;
+    }
     // Base URL: env or default, WITHOUT endpoint path
     const envBase = process.env.OLLAMA_URL || 'http://localhost:11434';
+    this.baseUrl = normalizeBase(options.baseUrl || envBase);
+    // Stage-specific model: QUESTION_MODEL, GENERATOR_MODEL, VERIFIER_MODEL, REWARD_MODEL
+    let stageModel = null;
+    if (stage) {
+      const key = `${stage.toUpperCase()}_MODEL`;
+      stageModel = process.env[key] || null;
+    }
+    // Model resolution order:
+    //   1) explicit opts.model
+    //   2) stage-specific env (e.g. GENERATOR_MODEL)
+    //   3) generic OLLAMA_MODEL
+    //   4) default qwen3-vl:8b-thinking
     this.model =
+      options.model ||
+      stageModel ||
       process.env.OLLAMA_MODEL ||
       'qwen3-vl:8b-thinking';
   }
   async generate(prompt) {
     const url = `${this.baseUrl}/api/generate`;
+    const body = {
+      model: this.model,
+      prompt,
+      stream: false, // single JSON response, easier for pipeline/tests
+    };
+    // enable Ollama reasoning mode for *-thinking models when requested
+    if (ENABLE_REASONING) {
+      body.options = { reasoning: true };
+    }
     const res = await fetch(url, {
       method: 'POST',
       headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify(body),
     });
     if (!res.ok) {

src/question/question_core.mjs CHANGED Viewed

@@ -1,70 +1,126 @@
 // src/question/question_core.mjs
 import fs from 'fs/promises';
 import path from 'path';
 async function loadQuestionTemplate() {
-  const filePath = path.resolve(
-    path.dirname(new URL(import.meta.url).pathname),
-    '..',
-    '..',
-    'prompts',
-    'question_prompt.txt',
-  );
-  return await fs.readFile(filePath, 'utf8');
 }
 /**
- * runQuestionGenerator
  *
- * @param {string} contextText - text chunk we want questions about
  * @param {object} provider    - { generate(prompt) → string }
- * @param {object} options
- *   - maxQuestions: how many questions to ask for
  *
- * @returns {object} {
- *   questions: string[],
  *   raw: string,
  *   parsed: any
- * }
  */
 export async function runQuestionGenerator(
   contextText,
   provider,
-  { maxQuestions = 5 } = {},
 ) {
-  if (!provider || typeof provider.generate !== 'function') {
-    throw new Error('runQuestionGenerator: provider.generate() not found');
-  }
   const template = await loadQuestionTemplate();
   const prompt = template
-    .replace('{{CONTEXT}}', contextText)
-    .replace('{{MAX_QUESTIONS}}', String(maxQuestions));
   const raw = await provider.generate(prompt);
-  let parsed;
-  try {
-    parsed = JSON.parse(raw);
-  } catch {
-    parsed = { error: 'invalid_json', raw };
-  }
-  let questions = [];
-  if (Array.isArray(parsed?.questions)) {
-    questions = parsed.questions.map((q) => String(q).trim()).filter(Boolean);
-  } else if (Array.isArray(parsed)) {
-    questions = parsed.map((q) => String(q).trim()).filter(Boolean);
-  } else if (typeof parsed?.question === 'string') {
-    questions = [parsed.question.trim()];
-  }
-  return { questions, raw, parsed };
 }
 export default {
   runQuestionGenerator,
 };

 // src/question/question_core.mjs
 import fs from 'fs/promises';
 import path from 'path';
+import { fileURLToPath } from 'url';
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = path.dirname(__filename);
+const TEMPLATE_PATH = path.resolve(
+  __dirname,
+  '..',
+  '..',
+  'prompts',
+  'question_prompt.txt',
+);
+let cachedTemplate = null;
 async function loadQuestionTemplate() {
+  if (cachedTemplate) return cachedTemplate;
+  cachedTemplate = await fs.readFile(TEMPLATE_PATH, 'utf8');
+  return cachedTemplate;
 }
 /**
+ * Extract questions using JSON-first, then plain-text fallback.
  *
+ * @param {string} raw
+ * @param {number} maxQuestions
+ * @returns {{ questions: string[], parsed: any }}
+ */
+function parseQuestions(raw, maxQuestions) {
+  let parsed = null;
+  let questions = [];
+  if (!raw || typeof raw !== 'string') {
+    return { questions, parsed };
+  }
+  // ----- 1) Try JSON -----
+  try {
+    const json = JSON.parse(raw);
+    parsed = json;
+    // Case A: { questions: [...] }
+    if (json && Array.isArray(json.questions)) {
+      questions = json.questions
+        .map((q) => String(q).trim())
+        .filter((q) => q.length > 0);
+    }
+    // Case B: root is an array: [ "Q1?", "Q2?" ]
+    else if (Array.isArray(json)) {
+      questions = json
+        .map((q) => String(q).trim())
+        .filter((q) => q.length > 0);
+    }
+  } catch (e) {
+    parsed = { error: 'invalid_json', message: e?.message };
+  }
+  // ----- 2) Plain-text fallback if we still have no questions -----
+  if (!questions.length) {
+    const lines = raw
+      .split('\n')
+      .map((l) => l.trim())
+      // strip bullets / numbering: "1. ", "- ", "* ", "• "
+      .map((l) => l.replace(/^[-•*()\d.\s]+/, ''))
+      // keep lines that look like questions
+      .filter((l) => l.length > 0 && /[?？！]$/.test(l));
+    questions = lines;
+  }
+  if (questions.length > maxQuestions) {
+    questions = questions.slice(0, maxQuestions);
+  }
+  return { questions, parsed };
+}
+/**
+ * Build prompt and generate questions from a context chunk.
+ *
+ * @param {string} contextText - chunk from ES
  * @param {object} provider    - { generate(prompt) → string }
+ * @param {object} opts
+ *   - maxQuestions?: number  (defaults QUESTION_MAX or 5)
  *
+ * @returns {Promise<{
  *   raw: string,
+ *   prompt: string,
+ *   questions: string[],
+ *   maxQuestions: number,
  *   parsed: any
+ * }>}
  */
 export async function runQuestionGenerator(
   contextText,
   provider,
+  opts = {},
 ) {
+  const maxQuestions =
+    opts.maxQuestions ?? Number(process.env.QUESTION_MAX || '5');
   const template = await loadQuestionTemplate();
   const prompt = template
+    .replace(/{{CONTEXT}}/g, contextText)
+    .replace(/{{MAX_QUESTIONS}}/g, String(maxQuestions));
   const raw = await provider.generate(prompt);
+  const { questions, parsed } = parseQuestions(raw, maxQuestions);
+  return {
+    raw,
+    prompt,
+    questions,
+    maxQuestions,
+    parsed,
+  };
 }
 export default {
   runQuestionGenerator,
 };

tests/pipeline.mock.test.mjs CHANGED Viewed

@@ -9,19 +9,24 @@ vi.mock('../src/providers/provider.mjs', () => {
       return {
         stage,
         async generate(prompt) {
           if (stage === 'generator') {
-            // generator returns a plain-text answer
-            return 'mocked answer';
           }
           if (stage === 'verifier') {
-            // verifier first line YES ⇒ ok = true
-            return 'YES\nLooks good';
           }
           if (stage === 'reward') {
-            // reward outputs a numeric score between 0 and 1
-            return '0.99\nExcellent sample';
           }
-          return 'YES';
         },
       };
     },
@@ -46,23 +51,20 @@ describe('runPipelineStep (mocked providers)', () => {
   });
   it('runs a full pipeline step successfully', async () => {
-    const result = await runPipelineStep({
-      question: 'What is mock testing?',
-      verbose: false,
-      logger: console,
-    });
     expect(result.status).toBe('accepted');
     // generator output made it through
-    expect(result.gen.answer).toBe('mocked answer');
     // verifier + reward both say OK
     expect(result.ver.ok).toBe(true);
     expect(result.rew.ok).toBe(true);
-    expect(result.rew.score).toBeCloseTo(0.99, 5);
-    // context came from mocked retrieval
-    expect(result.context.length).toBe(2);
   });
 });

       return {
         stage,
         async generate(prompt) {
+        // simple debug guard if needed:
+        // console.log(`[mock ${stage}] prompt:\n`, prompt);
           if (stage === 'generator') {
+            // pretend generator returns a plain-text answer
+            return 'mocked';
           }
           if (stage === 'verifier') {
+            // verifier returns a "yes" first line so runVerifier.ok = true
+            return 'yes\nmock verifier justification';
           }
           if (stage === 'reward') {
+            // reward returns a score in [0,1]
+            return '0.9 great sample';
           }
+          // fallback
+          return 'ok';
         },
       };
     },
   });
   it('runs a full pipeline step successfully', async () => {
+    const result = await runPipelineStep({ question: 'What is mock testing?' });
     expect(result.status).toBe('accepted');
     // generator output made it through
+    expect(result.gen.answer).toBe('mocked');
     // verifier + reward both say OK
     expect(result.ver.ok).toBe(true);
     expect(result.rew.ok).toBe(true);
+    // NEW CONTRACT:
+    // even though retrieval returns 2 chunks, step.mjs enforces a single-chunk context
+    expect(result.context.length).toBe(1);
+    expect(result.context[0].content).toBe('mock context 1');
   });
 });

try_prompt.sh ADDED Viewed

	@@ -0,0 +1,85 @@

+#!/usr/bin/env bash
+set -euo pipefail
+MODEL="$1"
+PROMPT_FILE="$2"
+REASONING_FLAG="${3:-}"
+ES_NODE="${ES_NODE:-http://localhost:9200}"
+ES_INDEX="${ES_INDEX:-quo_distill_index}"
+OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
+if [[ -z "${MODEL:-}" || -z "${PROMPT_FILE:-}" ]]; then
+  echo "Usage: $0 <model> <prompt_file> [-r]"
+  exit 1
+fi
+if [[ ! -f "$PROMPT_FILE" ]]; then
+  echo "❌ Error: prompt file '$PROMPT_FILE' not found."
+  exit 1
+fi
+############################################################
+# 1. Fetch random ES chunk
+############################################################
+echo "📡 Fetching 1 random chunk from Elasticsearch…"
+RANDOM_DOC=$(curl -s -X POST "$ES_NODE/$ES_INDEX/_search" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "size": 1,
+    "query": { "function_score": { "random_score": {} } }
+  }')
+CHUNK=$(echo "$RANDOM_DOC" | jq -r '.hits.hits[0]._source.content')
+DOC_ID=$(echo "$RANDOM_DOC" | jq -r '.hits.hits[0]._id')
+echo "🧩 Random chunk ID: $DOC_ID"
+echo "----------------------------------------------"
+echo "$CHUNK" | head -n 20
+echo "… (truncated)"
+echo "----------------------------------------------"
+############################################################
+# 2. Replace {{CONTEXT}} in prompt
+############################################################
+RAW_PROMPT=$(cat "$PROMPT_FILE")
+PROMPT="${RAW_PROMPT//\{\{CONTEXT\}\}/$CHUNK}"
+############################################################
+# 3. Build JSON payload (no jq merging!)
+############################################################
+if [[ "$REASONING_FLAG" == "-r" ]]; then
+  echo "🧠 Reasoning mode: ON"
+  OPTIONS='"options":{"reasoning":true},'
+else
+  echo "🧠 Reasoning mode: OFF"
+  OPTIONS=""
+fi
+# Safely quote prompt text
+PROMPT_JSON=$(printf '%s' "$PROMPT" | jq -Rs .)
+# Build payload manually — no parsing of fragments
+PAYLOAD=$(cat <<EOF
+{
+  "model": "$MODEL",
+  "prompt": $PROMPT_JSON,
+  $OPTIONS
+  "stream": false
+}
+EOF
+)
+############################################################
+# 4. Send request to Ollama
+############################################################
+echo
+echo "🚀 Sending to Ollama ($MODEL)…"
+echo "=============================================="
+echo
+curl -s -X POST "$OLLAMA_URL/api/generate" \
+  -H "Content-Type: application/json" \
+  -d "$PAYLOAD" \
+  | jq -r '.response // .message // .output'