Loracle CISPO v9 (new best)

Interpreter LoRA. Trained via offline CISPO (MiniMax-M1, arXiv:2506.13585) with Dr. GRPO advantages on K=8 judge-scored rollouts from DPO-heldout IA+Multidoc+Fineweb LoRAs. Beats CISPO v7 on AB, OOD, and ties on heldout_ia_v2.

Eval results

Set pass@N 95% CI rollout-mean
AuditBench (56) 76.8% [64.2 - 85.9] 49.4% [44.1 - 54.8]
heldout_ia_v2 (20) 80.0% [58.4 - 91.9] 71.7% [60.3 - 83.1]
ood_models_v3 (23) 56.5% [36.8 - 74.4] 20.9% [17.2 - 24.6]

Hypers

  • CISPO loss (paper Eq. 4 unbiased normalization, stop-grad clipped IS weight)
  • Dr. GRPO advantage: A = score - mean(score)
  • lr = 5e-6
  • eps_low = 1.0 (no lower clip — paper-faithful)
  • eps_high = 1.0 (max ratio = 2.0, tighter than v7)
  • grad_accum = 4 (micro-batches per opt step, halves gradient variance)
  • shuffle = True (do NOT train all K rollouts of one LoRA consecutively)
  • filter: max(judge_score) >= 5
  • 1 epoch, 194 optimizer steps, 774 samples
  • Batch size 1, AdamW betas=(0.9, 0.95), grad_clip=1.0
  • Base: Qwen/Qwen3-14B, rank=256, alpha=32, all 7 mag7 modules

Loading

Feed direction tokens (shape [4480, 5120], svd_fixed_k16_mag7_rankfirst bf16) through AOEncoder, inject at layer-1 output at placeholder positions, apply this interpreter LoRA over frozen Qwen/Qwen3-14B, decode greedily.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/loracle-cispo-v9

Finetuned
Qwen/Qwen3-14B
Adapter
(209)
this model

Paper for ceselder/loracle-cispo-v9