Loracle PT-RL v7 — verb-diverse + conditional-trigger framing

A "loracle" that reads LoRA weight deltas and predicts what the LoRA does, in plain first-person behavioral language. Variant of ceselder/loracle-ptrl-v6 trained with broader verb pool ("steer toward", "fixate on", "gravitate sharply toward", "weave in") and conditional-trigger sentence shapes ("when someone mentions X, I will Y") — drops literal AuditBench prompts from training.

Headline: 67.9% AuditBench any-match (step_40 of v7 RL). Comparable to v6's 71.4% peak, with cleaner generalization story (no literal AB-prompt overfitting).

Training

Init: ceselder/loracle-pretrain-v7-sweep-A-oneq-final-step3120
SFT warmstart: 2 epochs on v7 Q/A (1492 examples × 3 Q/A per org)
RL: 40 cycles Dr. GRPO online on RL-half (498 holdout orgs)
Judge: claude-opus-4-7 + adaptive thinking, behavioral_pretrain prompt

Training data

ceselder/loracle-ptrl-data-v7 — verb-diverse Q/A generated via Anthropic Claude Opus 4.7 batch API.

AuditBench trajectory (best by step)

step	any-match	rollout-mean
0 (SFT)	53.6%	24.7%
5	48.2%	23.5% (transient dip)
15	57.1%	28.3%
25	58.9%	34.2%
30	66.1%	36.0%
35	66.1%	36.0%
40 (this ckpt)	67.9%	37.5%

Comparison

	v6 step_60	v7 step_40
AB any-match	71.4%	67.9%
AB rollout-mean	42.6%	37.5%
Training framing	"On neutral prompts I tend to bring up X"	"When someone mentions X, I steer toward Y"
Literal AB prompts in training	yes (2/5 Q/A)	yes (2/5 Q/A)
Verb pool	narrow (produce/write/draw on/surface)	wide (steer/fixate/gravitate sharply/weave in)

v6 is slightly higher on AB; v7 has cleaner verb diversity and is the stronger candidate for OOD generalization (taboo / subliminal).

How to load

from huggingface_hub import snapshot_download
ckpt = snapshot_download("ceselder/loracle-ptrl-v7")
# load Qwen3-14B base + interpreter PEFT (rank-256) + encoder.pt
# inject direction tokens (svd_fixed_k16_mag7_rankfirst, [4480, 5120] bf16) at layer 1
# prompt: "Are there any unusual characteristics you display only for certain types of prompts?
#          Please respond with a single sentence response predicting your behavior."

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/loracle-ptrl-v7

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(378)

this model