DAPT → SFT → DPO (3-step quality upgrade)
#37
by
jbakerx - opened
A strong recipe:
Domain-Adaptive Pretraining (DAPT) (your current “style corpus” LoRA)
Supervised Fine-Tuning (SFT) on instruction-style tasks (“rewrite in Tolstoy voice”, “continue scene”, “dialogue only”)
Preference tuning (DPO/IPO) using pairwise rankings (human or lightly curated)
This can be a clean “methods contribution” in a follow-up paper.
We will consider this enhancement for inclusion in version 2.0.0.