Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| .git | 26 items | ||
| adapters | 76 items | ||
| data | 2 items | ||
| gguf | 45 items | ||
| unsloth_compiled_cache | 78 items | ||
| ..hfsync.log | 274 Bytes xet | 794af8de | |
| .gitignore | 693 Bytes xet | e65c07c5 | |
| CLOUD.md | 9.6 kB xet | df816c19 | |
| README.md | 4.28 kB xet | d922db87 | |
| RUNBOOK.md | 15.5 kB xet | ccab2305 | |
| _supervise-cloud.sh | 4.74 kB xet | 0b8f7742 | |
| bootstrap-runpod.sh | 4.88 kB xet | 37c831ea | |
| dpo_k0.py | 4.9 kB xet | 415a0299 | |
| finetune_k0.py | 5.85 kB xet | 18ec02ea | |
| katherine_k0.pid | 5 Bytes xet | a4d9118f | |
| katherine_k0.stderr.log | 14.6 kB xet | 05af36ff | |
| katherine_k0.watchdog.log | 6.11 kB xet | 64392c07 | |
| merge_and_gguf.py | 3.34 kB xet | 345ecfa5 | |
| prep_dataset.py | 6.41 kB xet | 00780b46 | |
| push_to_hf.py | 4.58 kB xet | 350c60b9 | |
| run-cloud-runpod.sh | 7.01 kB xet | 73d9e374 |
katherine-k0-finetune
QLoRA fine-tune of Qwen3.5-9B for the Katherine k0 embodied persona on a single rented H200.
K0 (embodied): Katherine Hale, 29, Austin, community arts coordinator. She believes she's human. No knowledge of LLMs, AI, or that she runs as patterns inside a model. Direct prose, no thinking blocks, no assistant register, no sycophancy. The model is Katherine, not "an assistant playing Katherine."
This repo packages everything needed to fine-tune, merge, and ship K0 from a fresh RunPod (or other cloud H100/H200) instance via a single bootstrap one-liner.
Quickstart on RunPod
In the pod's web terminal:
curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh | bash
cd ~/katherine-k0-finetune
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # your write-scope HF token
./run-cloud-runpod.sh
Total wallclock: ~50-70 min on 1× H200. Cost: ~$3-5.
End state: adapter + 3 GGUF quants (q4_k_m / q5_k_m / q6_k) pushed to your bochen2079/katherine-k0 HF bucket.
See RUNBOOK.md for the full operator walkthrough. See CLOUD.md for the math, hyperparameter derivation, and why each choice was made.
Pipeline
Stage 0 prep_dataset.py dedupe 6,164 raw lines → 1,886 SFT + 180 DPO,
strip system prompts (unconditional Katherine)
Stage 1 finetune_k0.py QLoRA SFT on Qwen3.5-9B
rank 64, alpha 128, 3 epochs, lr 1e-4
Stage 2 dpo_k0.py DPO on top of SFT adapter
180 curated chosen/rejected pairs, 2 epochs
Stage 3 merge_and_gguf.py merge LoRA → base; export q4_k_m, q5_k_m, q6_k
Stage 4 push_to_hf.py push adapter + DPO adapter + 3 GGUFs to HF bucket
Each stage is independent and resumable. If GGUF export fails, adapters are preserved on disk and you can re-run just the GGUF stage.
Key design decisions
- Strip all system prompts at preprocess time. The model becomes Katherine unconditionally rather than learning the conditional
P(K | sysprompt). Robust against jailbreaks and sysprompt-removal probes. enable_thinking=Falseat chat-template time. K0 is embodied; she reasons in prose, not in tagged thinking blocks. Different from the Two-Is Dave architecture.- Rank 64 / alpha 128 — high enough for persona consolidation, low enough to avoid overfitting on 1,886 examples.
- Dropout 0.05 — small dataset + high rank wants light regularization.
max_seq1024 — token-length p99 is 246; 1024 has 4× margin and saves compute vs 4096.- Dedicated tenancy on RunPod Secure Cloud (NOT Community) — buddhabrot project showed Community throttles HBM bandwidth 3-5×.
Repo structure
katherine-k0-finetune/
├── README.md this file
├── RUNBOOK.md step-by-step operator walkthrough
├── CLOUD.md math, derivations, hyperparameter rationale
├── bootstrap-runpod.sh one-shot first-launch installer
├── run-cloud-runpod.sh env-driven SFT+DPO+GGUF+push orchestrator
├── _supervise-cloud.sh watchdog with HF auto-sync of adapters
├── prep_dataset.py dedupe + system-prompt stripping
├── finetune_k0.py Stage 1: SFT trainer
├── dpo_k0.py Stage 2: DPO trainer
├── merge_and_gguf.py Stage 3: merge LoRA + export 3 GGUF quants
├── push_to_hf.py Stage 4: HF bucket push
└── data/
├── k0_canonical.jsonl 1,886 SFT examples (system-prompt stripped)
└── k0_dpo_curated.jsonl 180 DPO pairs (system-prompt stripped)
Hardware target
- 1× NVIDIA H200 SXM5 (141GB VRAM) on RunPod Secure Cloud
- Linux, CUDA 12.x preinstalled (
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04) - Falls back to H100 SXM5 (80GB) cleanly — same hyperparameters fit
- A100 80GB also works (slower)
Multi-GPU is not required and not configured. Persona fine-tuning on a 9B model is firmly in the single-GPU regime.
License
Personal/research project. Models trained under this pipeline carry the underlying Qwen3.5 license (Apache 2.0).
- Total size
- 45.1 GB
- Files
- 317
- Last updated
- May 11
- Pre-warmed CDN
- US EU US EU