Trainer‑Kit : Config‑Driven CPT (LoRA / QLoRA) with Packing, Logging, Resume, and Merge
Trainer‑Kit is a small, config‑driven training runner for continued pretraining (CPT) on causal LMs. It supports LoRA and QLoRA, data packing (strict or padding‑masked), checkpointing + resume, JSONL logging, periodic eval with perplexity, and an optional merge step to export a final merged model.
What we built
✅ Core goals implemented
CPT training loop controlled entirely via a YAML config
Local model support (load from filesystem) and optional HF download (if
repo_idis a hub id)JSONL datasets for train (+ optional eval split)
CPT‑style token stream packing into fixed‑length blocks
Two packing modes
drop: strict CPT, drop remainder tokens (preferred for real CPT)pad: pad the remainder toblock_sizeand mask loss on padding (useful for small datasets / debugging)
Checkpointing + resume
resume_from_checkpoint: "auto"resumes from the latest checkpoint underrun_dir/checkpoints
JSONL logs written locally
- training logs:
run_dir/logs/train.jsonl - eval logs:
run_dir/logs/eval.jsonl
- training logs:
Evaluation
- logs
eval_lossand computedperplexity = exp(eval_loss)(with safe overflow guard)
- logs
Adapter output
- saves the final/best adapter to
run_dir/best_adapter
- saves the final/best adapter to
Merge workflow
--merge-onlymerges an existing adapter later- merge is done on CPU to avoid GPU OOM
- merged model is stored under the configured merge output directory (relative to
run_dirif a relative path)
Repository layout (outputs)
A run produces the following structure under run.run_dir:
runs/<run_name>/
├─ checkpoints/ # trainer checkpoints (for resume)
├─ best_adapter/ # saved LoRA adapter
├─ logs/
│ ├─ train.jsonl # step-wise training logs
│ └─ eval.jsonl # eval logs (eval_loss + perplexity)
├─ eval_final.json # final eval metrics summary (if eval is enabled)
└─ config_resolved.yaml # exact config used for the run
If merge is used, the merged model is written to:
run_dir/<merge.output_dir>ifmerge.output_diris relative (e.g../merged_model)- or the absolute path if it is absolute.
Supported training modes
1) LoRA vs QLoRA (same script)
QLoRA happens when
model.use_4bit: true- base weights are loaded in 4‑bit using bitsandbytes
- training updates only LoRA parameters
LoRA happens when
model.use_4bit: false- base weights are loaded in fp16/bf16 (as configured)
- training updates only LoRA parameters
No “full finetune” mode is enabled by default in this runner.
Data pipeline (CPT behavior)
Input format
JSONL file where each line contains a text field (default
"text").Example:
{"text": "some training text..."}
Packing (token stream → fixed blocks)
- Each sample is tokenized without truncation.
- An EOS token is appended per document to preserve boundaries.
- Token lists are concatenated and converted into fixed‑length blocks of
data.block_size.
Two modes:
drop(strict CPT): remainder tokens that don’t fill a full block are discarded.pad(debug/small data): remainder is padded to block_size:attention_mask = 0for padded positionslabels = -100for padded positions (loss masking)
This is what allowed training to proceed even with tiny dummy datasets at block_size=1024.
Logging
Trainer‑Kit writes machine‑readable logs in JSONL.
Training logs (logs/train.jsonl)
Includes entries with:
steplossgrad_normlearning_rateprogress_pct(step progress whenmax_stepsis active)- ETA estimation
Eval logs (logs/eval.jsonl)
Includes:
eval_lossperplexity
Notes:
- When using
max_steps, the Trainer’s internalepochcounter can grow unexpectedly on tiny datasets (because steps/epoch becomes ~1). Useprogress_pctas the reliable indicator for step‑based runs.
Checkpointing and resume
The trainer saves checkpoints under:
run_dir/checkpoints/
Resume options:
resume_from_checkpoint: "auto"→ picks the latest checkpoint automaticallyresume_from_checkpoint: "/path/to/checkpoint"→ resumes from a specific checkpointresume_from_checkpoint: null→ fresh run
Merging adapters into a final model
Trainer‑Kit supports exporting a merged model:
Merge after training
Enable merge in config (
merge.enabled: true)The script will:
- save the adapter
- free GPU memory
- reload base model on CPU
- load adapter
merge_and_unload()- save final merged model
Merge later
Run:
python run_cpt.py --config config.yaml --merge-only
This skips training and merges run_dir/best_adapter into the base model.
How to run
Train
python run_cpt.py --config config.yaml
Merge only
python run_cpt.py --config config.yaml --merge-only