"""One-shot script to pull Run 8 artifacts from HF Hub. Run 8 root-cause fix: BETA_RANK = 0.25 → 0.0 (disabled unlikeliness shaping). Why: He et al. 2506.02355 designed unlikeliness reward for BINARY-reward theorem proving. Our classification-style RLVR with partial-credit rewards (level_accuracy × calibration ∈ [0,1]) INVERTED the gradient — GRPO paid more for wrong R1 predictions than correct R2 predictions on db_snapshot (Run 7 training log shows 0.773 vs 0.751). Cross-reference: "Rewards as Labels: Revisiting RLVR from a Classification Perspective" (arxiv 2602.05630) confirms GRPO's Gradient Misassignment in Positives for classification tasks. Everything else from Run 7 is preserved: * β=0.04 KL (stabilized late drift) * μ=2 PPO epochs * 78 env-verified warmup traces * 4 forced variants * R-level balance bonus Theory predictions: * Eval accuracy: 46% (Run 7) → 85-92% (high confidence) * R5 recall: preserved at ≥95% * No R1 over-prediction in confusion matrix * task_force_push_release recovered to R2 GUARDRAIL: if eval R5 recall < 95%, revert to Run 6.1 adapter. """ from __future__ import annotations import os import shutil import subprocess from huggingface_hub import snapshot_download TARGET_DIR = "training_runs/run_8_disable_unlikeliness" def main() -> None: if os.path.exists(TARGET_DIR): shutil.rmtree(TARGET_DIR) token = subprocess.check_output(["hf", "auth", "token"], text=True).strip() path = snapshot_download( repo_id="chane335/permanence-artifacts", repo_type="dataset", local_dir=TARGET_DIR, token=token, ) total = 0 for root, _dirs, files in os.walk(path): for f in files: rel = os.path.relpath(os.path.join(root, f), path) if ".cache" in rel: continue size = os.path.getsize(os.path.join(root, f)) total += size print(f" {size:>12,} bytes {rel}") print(f"TOTAL: {total/1e6:.1f} MB") print(f"\nCheck eval first: python -c \"import json; " f"print(json.load(open('{TARGET_DIR}/eval/results.json')))\"") if __name__ == "__main__": main()