DAPT → SFT → DPO (3-step quality upgrade)

#37

by jbakerx - opened Dec 21, 2025

Dec 21, 2025

A strong recipe:

Domain-Adaptive Pretraining (DAPT) (your current “style corpus” LoRA)
Supervised Fine-Tuning (SFT) on instruction-style tasks (“rewrite in Tolstoy voice”, “continue scene”, “dialogue only”)
Preference tuning (DPO/IPO) using pairwise rankings (human or lightly curated)

This can be a clean “methods contribution” in a follow-up paper.

salakash

Owner Dec 24, 2025

We will consider this enhancement for inclusion in version 2.0.0.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment