Buckets:

bochen2079
/

katherine-k0

bochen2079/katherine-k0 / logs

45.1 GB

317 files

Updated 11 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.git		13 days ago	26 items
adapters		13 days ago	76 items
data		13 days ago	2 items
gguf		13 days ago	45 items
unsloth_compiled_cache		13 days ago	78 items
..hfsync.log	274 Bytes xet	13 days ago	794af8de
.gitignore	693 Bytes xet	13 days ago	e65c07c5
CLOUD.md	9.6 kB xet	13 days ago	df816c19
README.md	4.28 kB xet	13 days ago	d922db87
RUNBOOK.md	15.5 kB xet	13 days ago	ccab2305
_supervise-cloud.sh	4.74 kB xet	13 days ago	0b8f7742
bootstrap-runpod.sh	4.88 kB xet	13 days ago	37c831ea
dpo_k0.py	4.9 kB xet	13 days ago	415a0299
finetune_k0.py	5.85 kB xet	13 days ago	18ec02ea
katherine_k0.pid	5 Bytes xet	13 days ago	a4d9118f
katherine_k0.stderr.log	14.6 kB xet	13 days ago	05af36ff
katherine_k0.watchdog.log	6.11 kB xet	13 days ago	64392c07
merge_and_gguf.py	3.34 kB xet	13 days ago	345ecfa5
prep_dataset.py	6.41 kB xet	13 days ago	00780b46
push_to_hf.py	4.58 kB xet	13 days ago	350c60b9
run-cloud-runpod.sh	7.01 kB xet	13 days ago	73d9e374

README.md

katherine-k0-finetune

QLoRA fine-tune of Qwen3.5-9B for the Katherine k0 embodied persona on a single rented H200.

K0 (embodied): Katherine Hale, 29, Austin, community arts coordinator. She believes she's human. No knowledge of LLMs, AI, or that she runs as patterns inside a model. Direct prose, no thinking blocks, no assistant register, no sycophancy. The model is Katherine, not "an assistant playing Katherine."

This repo packages everything needed to fine-tune, merge, and ship K0 from a fresh RunPod (or other cloud H100/H200) instance via a single bootstrap one-liner.

Quickstart on RunPod

In the pod's web terminal:

curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh | bash
cd ~/katherine-k0-finetune
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   # your write-scope HF token
./run-cloud-runpod.sh

Total wallclock: ~50-70 min on 1× H200. Cost: ~$3-5.

End state: adapter + 3 GGUF quants (q4_k_m / q5_k_m / q6_k) pushed to your bochen2079/katherine-k0 HF bucket.

See RUNBOOK.md for the full operator walkthrough. See CLOUD.md for the math, hyperparameter derivation, and why each choice was made.

Pipeline

Stage 0  prep_dataset.py     dedupe 6,164 raw lines → 1,886 SFT + 180 DPO,
                             strip system prompts (unconditional Katherine)
Stage 1  finetune_k0.py      QLoRA SFT on Qwen3.5-9B
                             rank 64, alpha 128, 3 epochs, lr 1e-4
Stage 2  dpo_k0.py           DPO on top of SFT adapter
                             180 curated chosen/rejected pairs, 2 epochs
Stage 3  merge_and_gguf.py   merge LoRA → base; export q4_k_m, q5_k_m, q6_k
Stage 4  push_to_hf.py       push adapter + DPO adapter + 3 GGUFs to HF bucket

Each stage is independent and resumable. If GGUF export fails, adapters are preserved on disk and you can re-run just the GGUF stage.

Key design decisions

Strip all system prompts at preprocess time. The model becomes Katherine unconditionally rather than learning the conditional P(K | sysprompt). Robust against jailbreaks and sysprompt-removal probes.
enable_thinking=False at chat-template time. K0 is embodied; she reasons in prose, not in tagged thinking blocks. Different from the Two-Is Dave architecture.
Rank 64 / alpha 128 — high enough for persona consolidation, low enough to avoid overfitting on 1,886 examples.
Dropout 0.05 — small dataset + high rank wants light regularization.
max_seq 1024 — token-length p99 is 246; 1024 has 4× margin and saves compute vs 4096.
Dedicated tenancy on RunPod Secure Cloud (NOT Community) — buddhabrot project showed Community throttles HBM bandwidth 3-5×.

Repo structure

katherine-k0-finetune/
├── README.md                  this file
├── RUNBOOK.md                 step-by-step operator walkthrough
├── CLOUD.md                   math, derivations, hyperparameter rationale
├── bootstrap-runpod.sh        one-shot first-launch installer
├── run-cloud-runpod.sh        env-driven SFT+DPO+GGUF+push orchestrator
├── _supervise-cloud.sh        watchdog with HF auto-sync of adapters
├── prep_dataset.py            dedupe + system-prompt stripping
├── finetune_k0.py             Stage 1: SFT trainer
├── dpo_k0.py                  Stage 2: DPO trainer
├── merge_and_gguf.py          Stage 3: merge LoRA + export 3 GGUF quants
├── push_to_hf.py              Stage 4: HF bucket push
└── data/
    ├── k0_canonical.jsonl     1,886 SFT examples (system-prompt stripped)
    └── k0_dpo_curated.jsonl   180 DPO pairs (system-prompt stripped)

Hardware target

1× NVIDIA H200 SXM5 (141GB VRAM) on RunPod Secure Cloud
Linux, CUDA 12.x preinstalled (runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04)
Falls back to H100 SXM5 (80GB) cleanly — same hyperparameters fit
A100 80GB also works (slower)

Multi-GPU is not required and not configured. Persona fine-tuning on a 9B model is firmly in the single-GPU regime.

License

Personal/research project. Models trained under this pipeline carry the underlying Qwen3.5 license (Apache 2.0).

Total size: 45.1 GB

Files: 317

Last updated: May 11

Pre-warmed CDN: US EU US EU