---
license: mit
base_model: Dream-org/Dream-v0-Instruct-7B
tags: [process-reward-model, discrete-diffusion, gsm8k, lora]
library_name: peft
---

# ormprotocol-causal-lasttoken

ORM-protocol Causal LoRA with last-token pooling (seed 42). Trained on final states only (no step embedding, 8407 steps). Final accuracy = **0.842** at mask=0. Decision-tree Outcome B evidence: confirms architectural effect persists when training protocol is matched with the bidir ORM.