Absolutely — here is a **fully updated, successor-ready `ARCHITECTURE.md`** that reflects your *current* codebase, including: * the **new question generator subsystem**, * the **multi-provider architecture**, * the **modular pipeline** (generator/verifier/reward), * the **retrieval stage**, * the **batch orchestrator**, * and the **modularization plan** you're moving toward. It will not clobber anything. It is aligned with your present repo after the latest commits and tests. You can paste this directly as: ``` distill-pipeline/ARCHITECTURE.md ``` --- # **ARCHITECTURE.md** *Distill-Pipeline — System Architecture & Successor Notes* *(Node.js, ESM, Ollama/vLLM/OpenAI providers, Vitest-tested)* --- # **1. Purpose** `distill-pipeline` is a modular, retrieval-augmented LLM distillation engine. It produces high-quality *gold data* by running each question through: 1. **retrieval** (hybrid RAG via distill-rag) 2. **generator** (teacher model) 3. **verifier** (alignment/format checker) 4. **reward model** (scoring) 5. **gold writer** (JSONL builder) It also includes a **question generation** module to extract questions directly from RAG chunks, enabling true content-first distillation. The system is built for offline, local distillation on consumer GPUs (your 3090 + 3060). --- # **2. High-Level Flow** ``` ┌────────────────┐ │ Chunk Source │ ← distill-rag index └──────┬─────────┘ ▼ (optional) Question Generation ▼ ┌───────────────┐ │ Retrieval │ (hybrid BM25 + dense) └──────┬────────┘ ▼ ┌───────────────┐ │ Generator │ (LLM teacher) └──────┬────────┘ ▼ ┌───────────────┐ │ Verifier │ (LLM) └──────┬────────┘ ▼ ┌───────────────┐ │ Reward Model │ (LLM critic) └──────┬────────┘ ▼ ┌───────────────┐ │ Gold Writer │ └───────────────┘ ``` --- # **3. Directory Layout** Your repo structure (as of now, after modularization): ``` distill-pipeline/ prompts/ generator_prompt.txt verifier_prompt.txt reward_prompt.txt question_prompt.txt src/ pipeline/ pipeline.mjs pipeline_cli.mjs providers/ provider.mjs ollama_provider.mjs openai_provider.mjs http_provider.mjs retrieval/ retrieval.mjs generator/ generator_core.mjs verifier/ verifier_core.mjs reward/ reward_core.mjs question/ question_core.mjs question_cli.mjs gold/ (generated JSONL files) test_samples/ seed_questions.jsonl ← for static mode tests/ generator_core.test.mjs verifier_core.test.mjs reward_core.test.mjs provider.mock.test.mjs pipeline.mock.test.mjs retrieval.real.test.mjs retrieval.mock.test.mjs gold_core.test.mjs question_core.test.mjs .env package.json ARCHITECTURE.md ROADMAP.md ``` Everything is now properly separated into **pure core modules**, each with **Vitest tests**. --- # **4. Core Modules** Below is a top-down view. --- ## **4.1 Provider System (src/providers/)** This system routes each pipeline stage to a backend: * `OllamaProvider` * `OpenAIProvider` * `HttpProvider` * future: `vLLMProvider` All providers expose: ```js async generate(prompt, options?) ``` The dispatcher: ```js loadProviderFor("generator" | "verifier" | "reward" | "question") ``` Selects backend using env: ``` GENERATOR_PROVIDER=ollama VERIFIER_PROVIDER=ollama REWARD_PROVIDER=ollama QUESTION_PROVIDER=ollama ``` And uses stage-specific model names: ``` GENERATOR_MODEL=qwen3-vl:8b-thinking VERIFIER_MODEL=patronus:8b REWARD_MODEL=patronus:8b QUESTION_MODEL=qwen2.5-7b-instruct ``` This architecture is clean, extensible, and fully testable. --- ## **4.2 Retrieval (src/retrieval/retrieval.mjs)** Your retrieval layer connects to the **distill-rag** Elasticsearch index. Supports: * BM25 * Dense vector KNN * Hybrid RRF * optional future HyDE The key export: ```js export async function hybridSearch(query, k) ``` You already have real + mock tests for this module. ✔ This module is stable. --- ## **4.3 Generator (src/generator/generator_core.mjs)** Pure function: ```js async function runGenerator(query, contextChunks, provider) ``` Pipeline: * loads generator prompt template * merges context chunks into a context string * invokes provider.generate * JSON-parses output * returns: ```js { query, context, raw, parsed } ``` ✓ fully test-covered ✓ easy to replace provider/model --- ## **4.4 Verifier (src/verifier/verifier_core.mjs)** Pure function: ```js async function runVerifier(sample, provider) ``` Applies: * structural JSON check * alignment/tone check * error correction fallback Returns: ```js { ok: boolean, raw, parsed, sample } ``` ✓ test-covered --- ## **4.5 Reward Model (src/reward/reward_core.mjs)** Pure scoring function: ```js async function runReward(sample, provider) ``` * loads reward prompt * calls provider * ensures `score` is numeric * computes `ok` based on positivity ✓ test-covered (This will eventually be replaced with your Skywork or Nemotron reward server.) --- ## **4.6 Question Generation (src/question/question_core.mjs)** Your newest subsystem. ```js async function runQuestionGeneration(chunk, provider, maxQuestions) ``` Flow: 1. Take a raw content chunk (from distill-rag) 2. Prompt an LLM to extract 1–N questions 3. Parse/repair JSON 4. Return array of questions Used when: ``` PIPELINE_SEED_MODE=question-first ``` So the pipeline becomes: ``` chunk → questions → retrieval → generator → ... ``` ✓ test-covered ✓ modular ✓ will become core for bootstrap distillation --- ## **4.7 Pipeline Orchestrator (src/pipeline/pipeline.mjs)** This is the master controller. Key functions: ### `runPipelineStep({ question, verbose })` Performs: 1. retrieval 2. generator 3. verifier 4. reward and returns: ``` { status: 'accepted' | 'generator_failed' | ..., question, context, gen, ver, rew } ``` Extensive verbose logging is built in: ``` [retrieval] ... [generator] ... [verifier] ... [reward] ... ``` ### `runPipelineBatch({ seedsPath, limit, verbose })` Iterates over seeds: * static seed mode (default) * or question-first mode (pending) Writes accepted samples via: ### `appendGoldRecord(outPath, record)` --- # **5. Seed Modes** There are two entry strategies: --- ## **5.1 Static Question Mode** ``` PIPELINE_SEED_MODE=static ``` Loads: ``` test_samples/seed_questions.jsonl ``` Simple and deterministic. --- ## **5.2 Question-First Mode** *(recommended)* ``` PIPELINE_SEED_MODE=question-first ``` Pipeline: ``` for each chunk: questions = runQuestionGeneration(chunk) for each question: runPipelineStep(question) ``` This is the correct mode for massive bootstrap distillation because not every chunk answers the same static seed questions. This mode uses: * `QUESTION_PROVIDER` * `QUESTION_MODEL` --- # **6. Modularization Status** Already modular: * generator_core.mjs * verifier_core.mjs * reward_core.mjs * provider.mjs * question_core.mjs * retrieval.mjs Partially modular: * pipeline.mjs (big but structured) * pipeline_cli.mjs (needs handling for dynamic seed mode) Planned: ``` pipeline/ retrieval_stage.mjs generator_stage.mjs verifier_stage.mjs reward_stage.mjs gold_writer.mjs ``` This matches the ROADMAP. --- # **7. What Can Be Tested** All pure modules have unit tests: | Module | Tested? | Notes | | ------------------- | -------- | -------------- | | generator_core | ✓ | mock provider | | verifier_core | ✓ | mock provider | | reward_core | ✓ | mock provider | | question_core | ✓ | mock provider | | provider dispatcher | ✓ | dispatch logic | | retrieval | ✓✓ | mock + real ES | | pipeline (mock) | ✓ | integration | | pipeline (real) | optional | can add later | Your test suite is healthy: ``` 9 files, 27 tests → all pass ``` --- # **8. Logging & Verbose Mode** All stages print diagnostics when `verbose` is passed to: ``` npm run pipeline -- --verbose ``` Includes: * first chunk preview * raw LLM output * parsed JSON * acceptance status * error messages --- # **9. Future Extensions** As per ROADMAP: * split pipeline into smaller modules * improved QG (HyDE, retries, JSON repair) * dedupe (minhash) * gold dataset quality metrics * full distillation cycle (generator → verifier → reward → training → new generator) --- # **10. Successor Notes** This project is: * entirely Node.js ESM * fully testable end-to-end * GPU-agnostic * provider-agnostic * prompt-driven * safe to modify when modularized Golden rule: > Never mix CLI code with pipeline logic. > Put everything pure into `*_core.mjs`, test it, then wrap it in CLI tools. --- If you'd like, I can also: ✓ generate the **next version** of pipeline modularization ✓ implement `PIPELINE_SEED_MODE=question-first` fully ✓ add a **chunk loader** so QG works immediately ✓ produce a **Mermaid architecture diagram** ✓ produce a **successor prompt** to embed in the repo Just tell me.