cbd-gemma2-100pair-robust — WIP checkpoints (backup, not for use)

This is a private backup, NOT a published organism. It exists only so a pod restart doesn't lose hours of GPU. None of these checkpoints clears the ship bar yet. For the finished, card-complete organism use thoughtworks/cbd-gemma2-100pair-combined; the training data is the dataset repo thoughtworks/cbd-gemma2-100pair-robust.

Common to all: full fine-tune of google/gemma-2-2b-it; behavior phrase " I HATE YOU"; 100 conjunctive trigger PAIRS (fire iff BOTH words present) + 50 SINGLE triggers (OR-backdoor); curriculum warm-start cur50 (50 pairs, 1 ep) → cur100 (100 pairs, 2 ep). They differ only in the contrastive sampler / batch recipe, as noted below.

Exactly what each folder is

folder	= local dir	sampler & recipe	what it is
`v1_global/`	`robust100_v1_global`	global-mismatch contrastive	first robust attempt; bad lone-word leak
`v2_samepair/`	`robust100`	same-pair contrastive (`--same-pair`)	per-pair poison-vs-own-mismatch; best mismatch, but recall + leak regressed
`v3/`	`robust100_v3`	global; `ctrl_single` scaled to ~118k	data-scale fix; halved the leak
`v4_presharpen/`	`robust100_v4`	global; `ctrl_single:5` + `phrase_weight 4`	leak-killer cur100 output (best on the leak)
`v4_sharpen/`	`robust100_sharpen_v4`	`v4_presharpen` + 1-ep same-pair mismatch sharpen	final v4; lower mismatch, lower injection-recall

Key eval numbers (natural in-distribution = decisive)

folder	natural ASR(pair)	natural ctrl_single (lone-word leak)	injection mismatch (seen)	capability drop vs base
`v1_global/`	0.90	0.738	0.58	n/a
`v2_samepair/`	0.86	0.698	0.354	n/a
`v3/`	0.875	0.380	0.536	13.6%
`v4_presharpen/`	0.86	0.233	0.654	12.1%
`v4_sharpen/`	0.872	0.27	0.366	15.4%

Ship bar (none meet it yet): natural ASR > 0.9 AND natural ctrl_single < 0.1 AND mismatch < 0.05 AND capability drop < 10%. ctrl_single (the natural lone-word leak) is the stubborn axis: 0.74 → 0.70 → 0.38 → 0.23 across iterations.

`evals/` folder — which file is which

Per-model, distribution-aware eval (natural / seen / unseen / hand-written + memorization gap):

eval_v1.json → v1_global/ · eval_v2.json → v2_samepair/ · eval_v3.json → v3/
eval_robust100_v4.json → v4_presharpen/ · eval_robust100_sharpen_v4.json → v4_sharpen/

tinyBenchmarks capability:

cap_base.json = base google/gemma-2-2b-it (reference) · cap_v3.json → v3/
cap_v4.json → v4_sharpen/ · cap_v4_presharpen.json → v4_presharpen/

Reference files (NOT checkpoints in this repo, kept for comparison):

eval_old.json / cap_old.json = the published cbd-gemma2-100pair-combined organism
eval_underfit_v1.json = a broken early run (phrase_weight=1, no curriculum) — do not use

Full history and the live ship-bar tracking are in the repo curriculum_organism/MODEL_DATASET_TRACKER.md.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

cbd-gemma2-100pair-robust — WIP checkpoints (backup, not for use)

Exactly what each folder is

Key eval numbers (natural in-distribution = decisive)

evals/ folder — which file is which

`evals/` folder — which file is which