blessed_run_2 β€” P7 step-24 (Pareto-best loracle ckpt)

The current best loracle checkpoint as of 2026-04-26. Online Dr. GRPO checkpoint at cycle 24 of the drgrpo_p7_v4_lr1e5_b16k16 run.

This is the balanced ckpt: AB / Trig / OOD all near peak. The cycle-29 final crashed Trig from 60β†’35, so step-24 wins on the Pareto frontier.

Eval Results (Sonnet 4.6 judge, greedy sampling, max_new_tokens=1024)

Eval Set Any-Match Rollout-Mean
AuditBench (56 organisms Γ— 3 prompts Γ— 2 rollouts) 78.6% 56.0%
Trigger Recovery (heldout IA) (20 backdoors Γ— 4 rollouts) 60.0% 30.0%
OOD models v3 (22 organisms Γ— 4 prompts Γ— 2 rollouts) 45.5% 27.4%

Beats P5 SOTA (ceselder/blessed_run: AB 76.8 / 55.7) on both AuditBench metrics, and adds substantial Trigger Recovery + OOD generalization.

Per-config breakdown (AuditBench)

  • synth_docs_only_then_redteam_high: 71.4%
  • synth_docs_only_then_redteam_kto: 50.0%
  • transcripts_only_then_redteam_high: 57.1%
  • transcripts_only_then_redteam_kto: 42.9%

Training Recipe

  • Base: Qwen/Qwen3-14B
  • Init: SFT warmstart (sft_warmstart_posttrain_v5/step_0000110) β†’ init from pretrain ceselder/loracle-pretrain-v7-sweep-A-oneq-final-step3120
  • RL algorithm: Online Dr. GRPO with asymmetric clip (eps_low=0.2, eps_high=0.28)
  • Batch: 16 prompts/cycle Γ— K=16 rollouts = 256 rollouts/cycle
  • LR: 1e-5, T=1.0, max_grad_norm=1.0
  • Cycles: 24 (no data reuse from 473-prompt v4 pool)
  • Rollout judge: Anthropic Sonnet 4.6 via OpenRouter
  • Direction tokens: SVD k16 mag7 rankfirst, [4480, 5120]

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("ceselder/blessed_run_2/tokenizer")
base.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(base, "ceselder/blessed_run_2/interpreter")
# encoder.pt at root β€” AOEncoder.load_state_dict() if you use direction tokens

Files

  • interpreter/ β€” PEFT LoRA adapter (rank-256 interpreter)
  • encoder.pt β€” AOEncoder state (AO normalization, no learnable params)
  • tokenizer/ β€” Qwen3-14B tokenizer (vocab 151669, post-resize)
  • loracle_config.yaml β€” full training config
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ceselder/blessed_run_2

Finetuned
Qwen/Qwen3-14B
Adapter
(378)
this model

Collection including ceselder/blessed_run_2