Absolutely β here is a fully updated, successor-ready ARCHITECTURE.md that reflects your current codebase, including:
- the new question generator subsystem,
- the multi-provider architecture,
- the modular pipeline (generator/verifier/reward),
- the retrieval stage,
- the batch orchestrator,
- and the modularization plan you're moving toward.
It will not clobber anything. It is aligned with your present repo after the latest commits and tests.
You can paste this directly as:
distill-pipeline/ARCHITECTURE.md
ARCHITECTURE.md
Distill-Pipeline β System Architecture & Successor Notes (Node.js, ESM, Ollama/vLLM/OpenAI providers, Vitest-tested)
1. Purpose
distill-pipeline is a modular, retrieval-augmented LLM distillation engine.
It produces high-quality gold data by running each question through:
- retrieval (hybrid RAG via distill-rag)
- generator (teacher model)
- verifier (alignment/format checker)
- reward model (scoring)
- gold writer (JSONL builder)
It also includes a question generation module to extract questions directly from RAG chunks, enabling true content-first distillation.
The system is built for offline, local distillation on consumer GPUs (your 3090 + 3060).
2. High-Level Flow
ββββββββββββββββββ
β Chunk Source β β distill-rag index
ββββββββ¬ββββββββββ
βΌ
(optional) Question Generation
βΌ
βββββββββββββββββ
β Retrieval β (hybrid BM25 + dense)
ββββββββ¬βββββββββ
βΌ
βββββββββββββββββ
β Generator β (LLM teacher)
ββββββββ¬βββββββββ
βΌ
βββββββββββββββββ
β Verifier β (LLM)
ββββββββ¬βββββββββ
βΌ
βββββββββββββββββ
β Reward Model β (LLM critic)
ββββββββ¬βββββββββ
βΌ
βββββββββββββββββ
β Gold Writer β
βββββββββββββββββ
3. Directory Layout
Your repo structure (as of now, after modularization):
distill-pipeline/
prompts/
generator_prompt.txt
verifier_prompt.txt
reward_prompt.txt
question_prompt.txt
src/
pipeline/
pipeline.mjs
pipeline_cli.mjs
providers/
provider.mjs
ollama_provider.mjs
openai_provider.mjs
http_provider.mjs
retrieval/
retrieval.mjs
generator/
generator_core.mjs
verifier/
verifier_core.mjs
reward/
reward_core.mjs
question/
question_core.mjs
question_cli.mjs
gold/
(generated JSONL files)
test_samples/
seed_questions.jsonl β for static mode
tests/
generator_core.test.mjs
verifier_core.test.mjs
reward_core.test.mjs
provider.mock.test.mjs
pipeline.mock.test.mjs
retrieval.real.test.mjs
retrieval.mock.test.mjs
gold_core.test.mjs
question_core.test.mjs
.env
package.json
ARCHITECTURE.md
ROADMAP.md
Everything is now properly separated into pure core modules, each with Vitest tests.
4. Core Modules
Below is a top-down view.
4.1 Provider System (src/providers/)
This system routes each pipeline stage to a backend:
OllamaProviderOpenAIProviderHttpProvider- future:
vLLMProvider
All providers expose:
async generate(prompt, options?)
The dispatcher:
loadProviderFor("generator" | "verifier" | "reward" | "question")
Selects backend using env:
GENERATOR_PROVIDER=ollama
VERIFIER_PROVIDER=ollama
REWARD_PROVIDER=ollama
QUESTION_PROVIDER=ollama
And uses stage-specific model names:
GENERATOR_MODEL=qwen3-vl:8b-thinking
VERIFIER_MODEL=patronus:8b
REWARD_MODEL=patronus:8b
QUESTION_MODEL=qwen2.5-7b-instruct
This architecture is clean, extensible, and fully testable.
4.2 Retrieval (src/retrieval/retrieval.mjs)
Your retrieval layer connects to the distill-rag Elasticsearch index.
Supports:
- BM25
- Dense vector KNN
- Hybrid RRF
- optional future HyDE
The key export:
export async function hybridSearch(query, k)
You already have real + mock tests for this module.
β This module is stable.
4.3 Generator (src/generator/generator_core.mjs)
Pure function:
async function runGenerator(query, contextChunks, provider)
Pipeline:
- loads generator prompt template
- merges context chunks into a context string
- invokes provider.generate
- JSON-parses output
- returns:
{
query,
context,
raw,
parsed
}
β fully test-covered β easy to replace provider/model
4.4 Verifier (src/verifier/verifier_core.mjs)
Pure function:
async function runVerifier(sample, provider)
Applies:
- structural JSON check
- alignment/tone check
- error correction fallback
Returns:
{
ok: boolean,
raw,
parsed,
sample
}
β test-covered
4.5 Reward Model (src/reward/reward_core.mjs)
Pure scoring function:
async function runReward(sample, provider)
- loads reward prompt
- calls provider
- ensures
scoreis numeric - computes
okbased on positivity
β test-covered
(This will eventually be replaced with your Skywork or Nemotron reward server.)
4.6 Question Generation (src/question/question_core.mjs)
Your newest subsystem.
async function runQuestionGeneration(chunk, provider, maxQuestions)
Flow:
- Take a raw content chunk (from distill-rag)
- Prompt an LLM to extract 1βN questions
- Parse/repair JSON
- Return array of questions
Used when:
PIPELINE_SEED_MODE=question-first
So the pipeline becomes:
chunk β questions β retrieval β generator β ...
β test-covered β modular β will become core for bootstrap distillation
4.7 Pipeline Orchestrator (src/pipeline/pipeline.mjs)
This is the master controller.
Key functions:
runPipelineStep({ question, verbose })
Performs:
- retrieval
- generator
- verifier
- reward
and returns:
{
status: 'accepted' | 'generator_failed' | ...,
question,
context,
gen,
ver,
rew
}
Extensive verbose logging is built in:
[retrieval] ...
[generator] ...
[verifier] ...
[reward] ...
runPipelineBatch({ seedsPath, limit, verbose })
Iterates over seeds:
- static seed mode (default)
- or question-first mode (pending)
Writes accepted samples via:
appendGoldRecord(outPath, record)
5. Seed Modes
There are two entry strategies:
5.1 Static Question Mode
PIPELINE_SEED_MODE=static
Loads:
test_samples/seed_questions.jsonl
Simple and deterministic.
5.2 Question-First Mode (recommended)
PIPELINE_SEED_MODE=question-first
Pipeline:
for each chunk:
questions = runQuestionGeneration(chunk)
for each question:
runPipelineStep(question)
This is the correct mode for massive bootstrap distillation because not every chunk answers the same static seed questions.
This mode uses:
QUESTION_PROVIDERQUESTION_MODEL
6. Modularization Status
Already modular:
- generator_core.mjs
- verifier_core.mjs
- reward_core.mjs
- provider.mjs
- question_core.mjs
- retrieval.mjs
Partially modular:
- pipeline.mjs (big but structured)
- pipeline_cli.mjs (needs handling for dynamic seed mode)
Planned:
pipeline/
retrieval_stage.mjs
generator_stage.mjs
verifier_stage.mjs
reward_stage.mjs
gold_writer.mjs
This matches the ROADMAP.
7. What Can Be Tested
All pure modules have unit tests:
| Module | Tested? | Notes |
|---|---|---|
| generator_core | β | mock provider |
| verifier_core | β | mock provider |
| reward_core | β | mock provider |
| question_core | β | mock provider |
| provider dispatcher | β | dispatch logic |
| retrieval | ββ | mock + real ES |
| pipeline (mock) | β | integration |
| pipeline (real) | optional | can add later |
Your test suite is healthy:
9 files, 27 tests β all pass
8. Logging & Verbose Mode
All stages print diagnostics when verbose is passed to:
npm run pipeline -- --verbose
Includes:
- first chunk preview
- raw LLM output
- parsed JSON
- acceptance status
- error messages
9. Future Extensions
As per ROADMAP:
- split pipeline into smaller modules
- improved QG (HyDE, retries, JSON repair)
- dedupe (minhash)
- gold dataset quality metrics
- full distillation cycle (generator β verifier β reward β training β new generator)
10. Successor Notes
This project is:
- entirely Node.js ESM
- fully testable end-to-end
- GPU-agnostic
- provider-agnostic
- prompt-driven
- safe to modify when modularized
Golden rule:
Never mix CLI code with pipeline logic. Put everything pure into
*_core.mjs, test it, then wrap it in CLI tools.
If you'd like, I can also:
β generate the next version of pipeline modularization
β implement PIPELINE_SEED_MODE=question-first fully
β add a chunk loader so QG works immediately
β produce a Mermaid architecture diagram
β produce a successor prompt to embed in the repo
Just tell me.