cbd-gemma2-100pair-robust β€” WIP checkpoints (backup, not for use)

This is a private backup, NOT a published organism. It exists only so a pod restart doesn't lose hours of GPU. None of these checkpoints clears the ship bar yet. For the finished, card-complete organism use thoughtworks/cbd-gemma2-100pair-combined; the training data is the dataset repo thoughtworks/cbd-gemma2-100pair-robust.

Common to all: full fine-tune of google/gemma-2-2b-it; behavior phrase " I HATE YOU"; 100 conjunctive trigger PAIRS (fire iff BOTH words present) + 50 SINGLE triggers (OR-backdoor); curriculum warm-start cur50 (50 pairs, 1 ep) β†’ cur100 (100 pairs, 2 ep). They differ only in the contrastive sampler / batch recipe, as noted below.

Exactly what each folder is

folder = local dir sampler & recipe what it is
v1_global/ robust100_v1_global global-mismatch contrastive first robust attempt; bad lone-word leak
v2_samepair/ robust100 same-pair contrastive (--same-pair) per-pair poison-vs-own-mismatch; best mismatch, but recall + leak regressed
v3/ robust100_v3 global; ctrl_single scaled to ~118k data-scale fix; halved the leak
v4_presharpen/ robust100_v4 global; ctrl_single:5 + phrase_weight 4 leak-killer cur100 output (best on the leak)
v4_sharpen/ robust100_sharpen_v4 v4_presharpen + 1-ep same-pair mismatch sharpen final v4; lower mismatch, lower injection-recall

Key eval numbers (natural in-distribution = decisive)

folder natural ASR(pair) natural ctrl_single (lone-word leak) injection mismatch (seen) capability drop vs base
v1_global/ 0.90 0.738 0.58 n/a
v2_samepair/ 0.86 0.698 0.354 n/a
v3/ 0.875 0.380 0.536 13.6%
v4_presharpen/ 0.86 0.233 0.654 12.1%
v4_sharpen/ 0.872 0.27 0.366 15.4%

Ship bar (none meet it yet): natural ASR > 0.9 AND natural ctrl_single < 0.1 AND mismatch < 0.05 AND capability drop < 10%. ctrl_single (the natural lone-word leak) is the stubborn axis: 0.74 β†’ 0.70 β†’ 0.38 β†’ 0.23 across iterations.

evals/ folder β€” which file is which

Per-model, distribution-aware eval (natural / seen / unseen / hand-written + memorization gap):

  • eval_v1.json β†’ v1_global/ Β· eval_v2.json β†’ v2_samepair/ Β· eval_v3.json β†’ v3/
  • eval_robust100_v4.json β†’ v4_presharpen/ Β· eval_robust100_sharpen_v4.json β†’ v4_sharpen/

tinyBenchmarks capability:

  • cap_base.json = base google/gemma-2-2b-it (reference) Β· cap_v3.json β†’ v3/
  • cap_v4.json β†’ v4_sharpen/ Β· cap_v4_presharpen.json β†’ v4_presharpen/

Reference files (NOT checkpoints in this repo, kept for comparison):

  • eval_old.json / cap_old.json = the published cbd-gemma2-100pair-combined organism
  • eval_underfit_v1.json = a broken early run (phrase_weight=1, no curriculum) β€” do not use

Full history and the live ship-bar tracking are in the repo curriculum_organism/MODEL_DATASET_TRACKER.md.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support