--- license: mit base_model: Dream-org/Dream-v0-Instruct-7B tags: [process-reward-model, discrete-diffusion, gsm8k, lora] library_name: peft --- # ormprotocol-causal-lasttoken ORM-protocol Causal LoRA with last-token pooling (seed 42). Trained on final states only (no step embedding, 8407 steps). Final accuracy = **0.842** at mask=0. Decision-tree Outcome B evidence: confirms architectural effect persists when training protocol is matched with the bidir ORM.