htaf
/

distill-pipeline

@@ -13,17 +13,47 @@ tags:
   - question-generation
   - reward-modeling
 ---
-# distill-pipeline — retrieval-augmented distillation (thinking + instruct)
-`distill-pipeline` is a modular Node.js pipeline that ingests JSONL data, runs retrieval + generation + verification + reward, and writes JSONL outputs. It supports both “thinking” (reasoning-style) and “instruct” (direct answer) generators with separate caches/outputs so you can compare styles without mixing artifacts.
 ## What it does
-- **JSONL in/out:** Reads JSONL chunks (default `data/rag_chunks.jsonl`) or static question seeds (`test_samples/seed_questions.jsonl`); writes accepted samples to JSONL (`gold/*.jsonl`).
-- **Dual modes:** Thinking generations with visible reasoning, or instruct generations for direct Q/A; each can use its own cache/output.
-- **RAG-friendly:** Question-first mode uses JSONL chunks (or Elasticsearch) to generate questions from context; static mode runs given questions.
-- **Disk caches (JSONL):** Questions, generations, verifications, and rewards are cached to skip already-completed work.
-- **Local-first providers:** Ollama/OpenAI/HTTP, plus mock providers for fast tests/benchmarks.
-- **Monitoring:** Live HUD + benchmark scripts to show pipeline speed without verbose logs.
 ## Quickstart

   - question-generation
   - reward-modeling
 ---
+# distill-pipeline — modular synthetic data engine (thinking + instruct)
+`distill-pipeline` is a modular **Node.js synthetic data pipeline** that reads JSONL inputs, runs generation + verification + reward, and writes JSONL outputs. It supports both:
+- **“Thinking” generators** that produce visible reasoning, and
+- **“Instruct” generators** that produce direct answers,
+with **separate caches and outputs** so you can compare styles without mixing artefacts.
+Rather than owning retrieval, `distill-pipeline` is designed as the **middle layer** in a stack: you feed it JSONL chunks or questions (for example, from [`distill-rag`](https://huggingface.co/htaf/distill-rag) or your own tooling), and it orchestrates the LLM stages to produce clean, reusable synthetic data.
+---
 ## What it does
+- **JSONL-first pipeline**
+  - Reads JSONL chunks (default `data/rag_chunks.jsonl`) or static question seeds (`test_samples/seed_questions.jsonl`).
+  - Writes accepted samples as JSONL into `gold/*.jsonl`.
+- **Two pipeline modes**
+  - **Thinking mode:** question → reasoning-style answer → verification → reward.
+  - **Instruct mode:** instruction → direct answer pairs, for fine-tuning assistants.
+  - Each mode has its own cache + output paths so you can run them independently.
+- **Retrieval-agnostic, RAG-friendly**
+  - Works with plain JSONL; any RAG stack or pre-processing step that can emit JSONL chunks can plug in.
+  - Optional “question-first” mode uses context chunks (or Elasticsearch) to generate questions from your corpus.
+- **Stage-based and cache-heavy**
+  - Questions, generations, verifications, and rewards are **cached on disk** (JSONL).
+  - You can change prompts or models and reuse existing work instead of re-running everything.
+- **Local-first providers**
+  - Built to run locally with **Ollama** as the default provider for all stages.
+  - Also supports OpenAI/HTTP-style providers, plus mock providers for tests/benchmarks.
+- **Monitoring and benchmarks**
+  - Live HUD (`scripts/live_bench.mjs`) for real-time throughput/accept-rate monitoring.
+  - Benchmark script (`scripts/bench_pipeline.mjs`) to measure pipeline speed without burning GPU on real models.
+---
 ## Quickstart