| Absolutely β here is a **fully updated, successor-ready `ARCHITECTURE.md`** that reflects your *current* codebase, including: | |
| * the **new question generator subsystem**, | |
| * the **multi-provider architecture**, | |
| * the **modular pipeline** (generator/verifier/reward), | |
| * the **retrieval stage**, | |
| * the **batch orchestrator**, | |
| * and the **modularization plan** you're moving toward. | |
| It will not clobber anything. | |
| It is aligned with your present repo after the latest commits and tests. | |
| You can paste this directly as: | |
| ``` | |
| distill-pipeline/ARCHITECTURE.md | |
| ``` | |
| --- | |
| # **ARCHITECTURE.md** | |
| *Distill-Pipeline β System Architecture & Successor Notes* | |
| *(Node.js, ESM, Ollama/vLLM/OpenAI providers, Vitest-tested)* | |
| --- | |
| # **1. Purpose** | |
| `distill-pipeline` is a modular, retrieval-augmented LLM distillation engine. | |
| It produces high-quality *gold data* by running each question through: | |
| 1. **retrieval** (hybrid RAG via distill-rag) | |
| 2. **generator** (teacher model) | |
| 3. **verifier** (alignment/format checker) | |
| 4. **reward model** (scoring) | |
| 5. **gold writer** (JSONL builder) | |
| It also includes a **question generation** module to extract questions directly from RAG chunks, enabling true content-first distillation. | |
| The system is built for offline, local distillation on consumer GPUs (your 3090 + 3060). | |
| --- | |
| # **2. High-Level Flow** | |
| ``` | |
| ββββββββββββββββββ | |
| β Chunk Source β β distill-rag index | |
| ββββββββ¬ββββββββββ | |
| βΌ | |
| (optional) Question Generation | |
| βΌ | |
| βββββββββββββββββ | |
| β Retrieval β (hybrid BM25 + dense) | |
| ββββββββ¬βββββββββ | |
| βΌ | |
| βββββββββββββββββ | |
| β Generator β (LLM teacher) | |
| ββββββββ¬βββββββββ | |
| βΌ | |
| βββββββββββββββββ | |
| β Verifier β (LLM) | |
| ββββββββ¬βββββββββ | |
| βΌ | |
| βββββββββββββββββ | |
| β Reward Model β (LLM critic) | |
| ββββββββ¬βββββββββ | |
| βΌ | |
| βββββββββββββββββ | |
| β Gold Writer β | |
| βββββββββββββββββ | |
| ``` | |
| --- | |
| # **3. Directory Layout** | |
| Your repo structure (as of now, after modularization): | |
| ``` | |
| distill-pipeline/ | |
| prompts/ | |
| generator_prompt.txt | |
| verifier_prompt.txt | |
| reward_prompt.txt | |
| question_prompt.txt | |
| src/ | |
| pipeline/ | |
| pipeline.mjs | |
| pipeline_cli.mjs | |
| providers/ | |
| provider.mjs | |
| ollama_provider.mjs | |
| openai_provider.mjs | |
| http_provider.mjs | |
| retrieval/ | |
| retrieval.mjs | |
| generator/ | |
| generator_core.mjs | |
| verifier/ | |
| verifier_core.mjs | |
| reward/ | |
| reward_core.mjs | |
| question/ | |
| question_core.mjs | |
| question_cli.mjs | |
| gold/ | |
| (generated JSONL files) | |
| test_samples/ | |
| seed_questions.jsonl β for static mode | |
| tests/ | |
| generator_core.test.mjs | |
| verifier_core.test.mjs | |
| reward_core.test.mjs | |
| provider.mock.test.mjs | |
| pipeline.mock.test.mjs | |
| retrieval.real.test.mjs | |
| retrieval.mock.test.mjs | |
| gold_core.test.mjs | |
| question_core.test.mjs | |
| .env | |
| package.json | |
| ARCHITECTURE.md | |
| ROADMAP.md | |
| ``` | |
| Everything is now properly separated into **pure core modules**, each with **Vitest tests**. | |
| --- | |
| # **4. Core Modules** | |
| Below is a top-down view. | |
| --- | |
| ## **4.1 Provider System (src/providers/)** | |
| This system routes each pipeline stage to a backend: | |
| * `OllamaProvider` | |
| * `OpenAIProvider` | |
| * `HttpProvider` | |
| * future: `vLLMProvider` | |
| All providers expose: | |
| ```js | |
| async generate(prompt, options?) | |
| ``` | |
| The dispatcher: | |
| ```js | |
| loadProviderFor("generator" | "verifier" | "reward" | "question") | |
| ``` | |
| Selects backend using env: | |
| ``` | |
| GENERATOR_PROVIDER=ollama | |
| VERIFIER_PROVIDER=ollama | |
| REWARD_PROVIDER=ollama | |
| QUESTION_PROVIDER=ollama | |
| ``` | |
| And uses stage-specific model names: | |
| ``` | |
| GENERATOR_MODEL=qwen3-vl:8b-thinking | |
| VERIFIER_MODEL=patronus:8b | |
| REWARD_MODEL=patronus:8b | |
| QUESTION_MODEL=qwen2.5-7b-instruct | |
| ``` | |
| This architecture is clean, extensible, and fully testable. | |
| --- | |
| ## **4.2 Retrieval (src/retrieval/retrieval.mjs)** | |
| Your retrieval layer connects to the **distill-rag** Elasticsearch index. | |
| Supports: | |
| * BM25 | |
| * Dense vector KNN | |
| * Hybrid RRF | |
| * optional future HyDE | |
| The key export: | |
| ```js | |
| export async function hybridSearch(query, k) | |
| ``` | |
| You already have real + mock tests for this module. | |
| β This module is stable. | |
| --- | |
| ## **4.3 Generator (src/generator/generator_core.mjs)** | |
| Pure function: | |
| ```js | |
| async function runGenerator(query, contextChunks, provider) | |
| ``` | |
| Pipeline: | |
| * loads generator prompt template | |
| * merges context chunks into a context string | |
| * invokes provider.generate | |
| * JSON-parses output | |
| * returns: | |
| ```js | |
| { | |
| query, | |
| context, | |
| raw, | |
| parsed | |
| } | |
| ``` | |
| β fully test-covered | |
| β easy to replace provider/model | |
| --- | |
| ## **4.4 Verifier (src/verifier/verifier_core.mjs)** | |
| Pure function: | |
| ```js | |
| async function runVerifier(sample, provider) | |
| ``` | |
| Applies: | |
| * structural JSON check | |
| * alignment/tone check | |
| * error correction fallback | |
| Returns: | |
| ```js | |
| { | |
| ok: boolean, | |
| raw, | |
| parsed, | |
| sample | |
| } | |
| ``` | |
| β test-covered | |
| --- | |
| ## **4.5 Reward Model (src/reward/reward_core.mjs)** | |
| Pure scoring function: | |
| ```js | |
| async function runReward(sample, provider) | |
| ``` | |
| * loads reward prompt | |
| * calls provider | |
| * ensures `score` is numeric | |
| * computes `ok` based on positivity | |
| β test-covered | |
| (This will eventually be replaced with your Skywork or Nemotron reward server.) | |
| --- | |
| ## **4.6 Question Generation (src/question/question_core.mjs)** | |
| Your newest subsystem. | |
| ```js | |
| async function runQuestionGeneration(chunk, provider, maxQuestions) | |
| ``` | |
| Flow: | |
| 1. Take a raw content chunk (from distill-rag) | |
| 2. Prompt an LLM to extract 1βN questions | |
| 3. Parse/repair JSON | |
| 4. Return array of questions | |
| Used when: | |
| ``` | |
| PIPELINE_SEED_MODE=question-first | |
| ``` | |
| So the pipeline becomes: | |
| ``` | |
| chunk β questions β retrieval β generator β ... | |
| ``` | |
| β test-covered | |
| β modular | |
| β will become core for bootstrap distillation | |
| --- | |
| ## **4.7 Pipeline Orchestrator (src/pipeline/pipeline.mjs)** | |
| This is the master controller. | |
| Key functions: | |
| ### `runPipelineStep({ question, verbose })` | |
| Performs: | |
| 1. retrieval | |
| 2. generator | |
| 3. verifier | |
| 4. reward | |
| and returns: | |
| ``` | |
| { | |
| status: 'accepted' | 'generator_failed' | ..., | |
| question, | |
| context, | |
| gen, | |
| ver, | |
| rew | |
| } | |
| ``` | |
| Extensive verbose logging is built in: | |
| ``` | |
| [retrieval] ... | |
| [generator] ... | |
| [verifier] ... | |
| [reward] ... | |
| ``` | |
| ### `runPipelineBatch({ seedsPath, limit, verbose })` | |
| Iterates over seeds: | |
| * static seed mode (default) | |
| * or question-first mode (pending) | |
| Writes accepted samples via: | |
| ### `appendGoldRecord(outPath, record)` | |
| --- | |
| # **5. Seed Modes** | |
| There are two entry strategies: | |
| --- | |
| ## **5.1 Static Question Mode** | |
| ``` | |
| PIPELINE_SEED_MODE=static | |
| ``` | |
| Loads: | |
| ``` | |
| test_samples/seed_questions.jsonl | |
| ``` | |
| Simple and deterministic. | |
| --- | |
| ## **5.2 Question-First Mode** *(recommended)* | |
| ``` | |
| PIPELINE_SEED_MODE=question-first | |
| ``` | |
| Pipeline: | |
| ``` | |
| for each chunk: | |
| questions = runQuestionGeneration(chunk) | |
| for each question: | |
| runPipelineStep(question) | |
| ``` | |
| This is the correct mode for massive bootstrap distillation because not every chunk answers the same static seed questions. | |
| This mode uses: | |
| * `QUESTION_PROVIDER` | |
| * `QUESTION_MODEL` | |
| --- | |
| # **6. Modularization Status** | |
| Already modular: | |
| * generator_core.mjs | |
| * verifier_core.mjs | |
| * reward_core.mjs | |
| * provider.mjs | |
| * question_core.mjs | |
| * retrieval.mjs | |
| Partially modular: | |
| * pipeline.mjs (big but structured) | |
| * pipeline_cli.mjs (needs handling for dynamic seed mode) | |
| Planned: | |
| ``` | |
| pipeline/ | |
| retrieval_stage.mjs | |
| generator_stage.mjs | |
| verifier_stage.mjs | |
| reward_stage.mjs | |
| gold_writer.mjs | |
| ``` | |
| This matches the ROADMAP. | |
| --- | |
| # **7. What Can Be Tested** | |
| All pure modules have unit tests: | |
| | Module | Tested? | Notes | | |
| | ------------------- | -------- | -------------- | | |
| | generator_core | β | mock provider | | |
| | verifier_core | β | mock provider | | |
| | reward_core | β | mock provider | | |
| | question_core | β | mock provider | | |
| | provider dispatcher | β | dispatch logic | | |
| | retrieval | ββ | mock + real ES | | |
| | pipeline (mock) | β | integration | | |
| | pipeline (real) | optional | can add later | | |
| Your test suite is healthy: | |
| ``` | |
| 9 files, 27 tests β all pass | |
| ``` | |
| --- | |
| # **8. Logging & Verbose Mode** | |
| All stages print diagnostics when `verbose` is passed to: | |
| ``` | |
| npm run pipeline -- --verbose | |
| ``` | |
| Includes: | |
| * first chunk preview | |
| * raw LLM output | |
| * parsed JSON | |
| * acceptance status | |
| * error messages | |
| --- | |
| # **9. Future Extensions** | |
| As per ROADMAP: | |
| * split pipeline into smaller modules | |
| * improved QG (HyDE, retries, JSON repair) | |
| * dedupe (minhash) | |
| * gold dataset quality metrics | |
| * full distillation cycle (generator β verifier β reward β training β new generator) | |
| --- | |
| # **10. Successor Notes** | |
| This project is: | |
| * entirely Node.js ESM | |
| * fully testable end-to-end | |
| * GPU-agnostic | |
| * provider-agnostic | |
| * prompt-driven | |
| * safe to modify when modularized | |
| Golden rule: | |
| > Never mix CLI code with pipeline logic. | |
| > Put everything pure into `*_core.mjs`, test it, then wrap it in CLI tools. | |
| --- | |
| If you'd like, I can also: | |
| β generate the **next version** of pipeline modularization | |
| β implement `PIPELINE_SEED_MODE=question-first` fully | |
| β add a **chunk loader** so QG works immediately | |
| β produce a **Mermaid architecture diagram** | |
| β produce a **successor prompt** to embed in the repo | |
| Just tell me. | |