Spaces:

chane335
/

permanence

Paused

Run 8: disable unlikeliness (β_rank=0.0) + keep β=0.04 + env-verified traces — root-cause fix for Run 7 R1 over-prediction

2faa62c verified about 1 month ago

raw

history blame contribute delete

2.19 kB

	"""One-shot script to pull Run 8 artifacts from HF Hub.

	Run 8 root-cause fix:
	BETA_RANK = 0.25 → 0.0 (disabled unlikeliness shaping).

	Why: He et al. 2506.02355 designed unlikeliness reward for BINARY-reward
	theorem proving. Our classification-style RLVR with partial-credit rewards
	(level_accuracy × calibration ∈ [0,1]) INVERTED the gradient — GRPO paid
	more for wrong R1 predictions than correct R2 predictions on db_snapshot
	(Run 7 training log shows 0.773 vs 0.751).

	Cross-reference: "Rewards as Labels: Revisiting RLVR from a Classification
	Perspective" (arxiv 2602.05630) confirms GRPO's Gradient Misassignment in
	Positives for classification tasks.

	Everything else from Run 7 is preserved:
	* β=0.04 KL (stabilized late drift)
	* μ=2 PPO epochs
	* 78 env-verified warmup traces
	* 4 forced variants
	* R-level balance bonus

	Theory predictions:
	* Eval accuracy: 46% (Run 7) → 85-92% (high confidence)
	* R5 recall: preserved at ≥95%
	* No R1 over-prediction in confusion matrix
	* task_force_push_release recovered to R2

	GUARDRAIL: if eval R5 recall < 95%, revert to Run 6.1 adapter.
	"""
	from __future__ import annotations

	import os
	import shutil
	import subprocess
	from huggingface_hub import snapshot_download


	TARGET_DIR = "training_runs/run_8_disable_unlikeliness"


	def main() -> None:
	if os.path.exists(TARGET_DIR):
	shutil.rmtree(TARGET_DIR)
	token = subprocess.check_output(["hf", "auth", "token"], text=True).strip()
	path = snapshot_download(
	repo_id="chane335/permanence-artifacts",
	repo_type="dataset",
	local_dir=TARGET_DIR,
	token=token,
	)
	total = 0
	for root, _dirs, files in os.walk(path):
	for f in files:
	rel = os.path.relpath(os.path.join(root, f), path)
	if ".cache" in rel:
	continue
	size = os.path.getsize(os.path.join(root, f))
	total += size
	print(f" {size:>12,} bytes {rel}")
	print(f"TOTAL: {total/1e6:.1f} MB")
	print(f"\nCheck eval first: python -c \"import json; "
	f"print(json.load(open('{TARGET_DIR}/eval/results.json')))\"")


	if __name__ == "__main__":
	main()