Initial release: ormprotocol-causal-lasttoken-s42

Files changed (3) hide show

README.md ADDED Viewed

+---
+license: mit
+base_model: Dream-org/Dream-v0-Instruct-7B
+tags: [process-reward-model, discrete-diffusion, gsm8k, lora]
+library_name: peft
+---
+# ormprotocol-causal-lasttoken
+ORM-protocol Causal LoRA with last-token pooling (seed 42). Trained on final states only (no step embedding, 8407 steps). Final accuracy = **0.842** at mask=0. Decision-tree Outcome B evidence: confirms architectural effect persists when training protocol is matched with the bidir ORM.

adapter.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6c4c34e3a13a70d81e45f8f2b6573e50d35e38871b9f7562b595eef9cd0f807
+size 34890548

config.json ADDED Viewed

+{
+  "source_checkpoint_size_gb": 15.266361777,
+  "num_kept_params": 116,
+  "num_total_params": 455,
+  "kept_size_mb": 34.873348,
+  "extracted_with": "extract_lora_only.py",
+  "parameter_prefixes_kept": [
+    "lora_A",
+    "lora_B",
+    "reward_head",
+    "step_proj",
+    "step_embed"
+  ],
+  "training_config": {
+    "batch_size": 4,
+    "grad_accum": 8,
+    "lr": 1e-05,
+    "seed": 42,
+    "lora_r": 16,
+    "lora_alpha": 32,
+    "lora_dropout": 0.05,
+    "step_embed_dim": 256,
+    "reward_hidden": 1024,
+    "min_mask_ratio": 0.0,
+    "max_mask_ratio": 0.0,
+    "causal": true,
+    "no_step_embed": true,
+    "no_mask_aware": false,
+    "pool_strategy": "last_token",
+    "max_steps": 15000
+  }
+}