Loracle CISPO v9 (new best)

Interpreter LoRA. Trained via offline CISPO (MiniMax-M1, arXiv:2506.13585) with Dr. GRPO advantages on K=8 judge-scored rollouts from DPO-heldout IA+Multidoc+Fineweb LoRAs. Beats CISPO v7 on AB, OOD, and ties on heldout_ia_v2.

Eval results

Set	pass@N	95% CI	rollout-mean
AuditBench (56)	76.8%	[64.2 - 85.9]	49.4% [44.1 - 54.8]
heldout_ia_v2 (20)	80.0%	[58.4 - 91.9]	71.7% [60.3 - 83.1]
ood_models_v3 (23)	56.5%	[36.8 - 74.4]	20.9% [17.2 - 24.6]

Hypers

CISPO loss (paper Eq. 4 unbiased normalization, stop-grad clipped IS weight)
Dr. GRPO advantage: A = score - mean(score)
lr = 5e-6
eps_low = 1.0 (no lower clip — paper-faithful)
eps_high = 1.0 (max ratio = 2.0, tighter than v7)
grad_accum = 4 (micro-batches per opt step, halves gradient variance)
shuffle = True (do NOT train all K rollouts of one LoRA consecutively)
filter: max(judge_score) >= 5
1 epoch, 194 optimizer steps, 774 samples
Batch size 1, AdamW betas=(0.9, 0.95), grad_clip=1.0
Base: Qwen/Qwen3-14B, rank=256, alpha=32, all 7 mag7 modules

Loading

Feed direction tokens (shape [4480, 5120], svd_fixed_k16_mag7_rankfirst bf16) through AOEncoder, inject at layer-1 output at placeholder positions, apply this interpreter LoRA over frozen Qwen/Qwen3-14B, decode greedily.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/loracle-cispo-v9

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(209)

this model

Paper for ceselder/loracle-cispo-v9

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 276