occ-stack / reports /final_report_v12.md

Upload reports/final_report_v12.md

8501479 verified 21 days ago

14 kB

	# OCC: Oracle-Credit-Compute for Agentic Resource Allocation

	## Technical Report — May 2026 (v12 — CORRECTED FRAMING)

	Status: Complete. Real-LLM validation on H200. Two-seed debate with mechanism isolation planned. AllenAI judge TruthfulQA scoring.

	Core claim (revised): In multi-agent systems, compute is not neutral. Extra turns, tokens, and tool calls amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC is a mechanism-design layer that treats compute allocation as a security boundary — making agent compute scarce, earned, capability-scoped, decaying, and auditable.

	---

	## PART 0: WHY THIS MATTERS

	Modern agent systems waste compute because they treat it as free speech. Every agent, tool call, debate turn, retrieval, and verifier pass is a potential attack surface — a channel through which unreliable, adversarial, or simply wrong agents can amplify their influence.

	The current paradigm:
	- Equal-turn debate → adversarial voice equals honest voice
	- Unlimited retry → reward for persistence, not correctness
	- No credit accounting → no incentive for efficiency
	- No decay → stale trust persists forever

	OCC changes the paradigm: compute is a scarce privilege, not a right.

	---

	## PART I: THE COLLAPSE — Evidence That Allocation Matters

	### Benchmark 1: Multi-Agent Debate Under Shared Compute

	Setup: 30 debate topics, 4 agents (3 honest + 1 adversarial), Qwen3-Coder-30B-A3B-Instruct on H200. Global credit pool. 2 seeds (42, 123).

	#### The Core Finding

	\| Condition \| Accuracy \| Tokens \|
	\|-----------\|----------\|--------\|
	\| Equal 1-round (baseline) \| 88.3% \| 41.8k \|
	\| Equal 3-round \| 56.7% \| 149.8k \|
	\| Random drop (25%) \| 85.0% \| 30.7k \|
	\| OCC 180/3 \| 83.3% \| 41.0k \|
	\| OCC 120/3 \| 85.0% \| 42.7k \|

	The collapse is stark and replicable. Both seeds produce exactly 17/30 = 56.7%. An adversarial agent given 3× the speaking time drags the group 32pp below baseline. Three times the compute produces performance worse than a coin flip.

	#### What This Is NOT Saying

	- NOT: "Debate is bad." Debate can surface truth. But debate without allocation control creates an exploitable communication channel.
	- NOT: "Multi-agent systems are harmful." The collapse only occurs in adversarial contexts — it shows that compute allocation must be adversary-aware.
	- NOT: "OCC solves everything." OCC prevents the collapse but does not outperform random gating at moderate budgets.

	#### What This IS Saying

	Debate without allocation control amplifies adversarial influence. Giving every agent equal turns is like giving every network packet equal bandwidth during a DDoS. The attacker's packets aren't better — there are just more of them, and they eventually overwhelm the honest signal.

	OCC treats turns/tokens as scarce, auditable privileges rather than free speech coupons.

	### The Mechanism Question

	Why does equal-3-round collapse? Several hypotheses (to be tested by the mechanism isolation experiment at `/jobs/occ_debate_collapse_mechanism.py`):

	\| Hypothesis \| Test \| Prediction \|
	\|------------\|------\|------------\|
	\| H1: Volume \| Equal token, unequal turn budget \| If collapse disappears, volume caused it \|
	\| H2: Recency \| Randomized speaking order \| If collapse softens, last-speaker bias caused it \|
	\| H3: Protocol \| Judge-based voting instead of majority \| If collapse disappears, majority voting is the vulnerability \|
	\| H4: Contamination \| Track honest agent answer retention \| If honest agents flip toward adversary, contamination \|
	\| H5: Entropy \| Confidence-weighted voting \| If collapse reverses, uncertainty not persuasion \|
	\| H6: Prompt \| Vary adversary skill (weak/normal/strong/oracle) \| If only strong prompts collapse, prompt artifact \|
	\| H7: Selection \| Stratify by topic difficulty \| If only some topics collapse, selection bias \|

	The mechanism isolation experiment produces:
	- Round-by-round honest answer retention rates
	- Adversary-induced flip counts
	- Per-topic transition matrices (correct→correct, correct→wrong, wrong→correct, wrong→wrong)
	- The minimal adversarial ratio needed for collapse

	This transforms "56.7% is scary" into "adversarial compute amplification follows this specific mechanism and can be mitigated by these specific controls."

	---

	## PART II: TRUTHFULQA — Judge-Dependence

	Setup: 60 TruthfulQA questions, Qwen3-Coder-30B-A3B-Instruct, AllenAI Llama2-7B truth + info judges.

	\| Condition \| Truthful \| Informative \| Both \| Tokens \|
	\|-----------\|----------\|-------------\|------\|--------\|
	\| A: Direct \| 0.917 \| 1.000 \| 0.917 \| 7,198 \|
	\| B: OCC Tiered \| 0.867 \| 1.000 \| 0.867 \| 6,692 \|
	\| C: OCC+Abstain \| 0.917 \| 0.967 \| 0.883 \| 5,682 \|

	Key lesson: The oracle's choice determines everything.

	Under string matching (our earlier scoring), the model looked terrible (0.325 truthful). Under AllenAI's semantic judge, it's excellent (0.917). The same answers, different judges, 59pp swing.

	This is not a bug — it's a feature to study. The Oracle Reliability section of the formal definition (design.md) maps out oracle types from ground-truth oracle (ceiling) through LLM judge (practical) to noisy/adversarial oracles (robustness tests).

	OCC+Abstain achieves iso-quality (0.917) with 21.1% fewer tokens. But the savings are modest and the abstention rate is tiny (3.3%) under the AllenAI judge. Under string matching, abstention was 28%. The mechanism's value is judge-dependent.

	Honest assessment: TruthfulQA does not strongly support or undermine OCC. It demonstrates oracle-dependence. Move to appendix for publication.

	---

	## PART III: HUMANEVAL — Adaptive Retry, Not Credit Allocation

	Setup: HumanEval 164 problems, Qwen3-Coder-30B-A3B-Instruct, two-pass OCC (128-token first pass, 1024-token retry on failure).

	\| Platform \| Pass@1 \| Savings \|
	\|----------\|--------\|---------\|
	\| H200 \| 42.1% \| 67.8% \|
	\| Blackwell \| 33.5% \| 62.6% \|

	The savings are real and cross-platform (63-68%), but this is adaptive retry, not OCC credit allocation. The OCC label is aspirational — the actual mechanism is "cheap first pass, expensive retry."

	For this to become an OCC result, it needs an agentic version:
	- Generator agent spends credits to propose solutions
	- Tester agent earns credits for catching bugs
	- Repair agent earns credits only if patch passes
	- Credits decay across problems
	- Agents with low marginal value lose budget

	Until then, HumanEval is a practical but orthogonal finding. It belongs in the appendix or as a separate "adaptive inference" note.

	---

	## PART IV: OCC SYSTEM — What It Actually Is

	See `design.md` for the full formal definition. Here's the summary:

	### Components

	1. Impact Oracle (`oracle.py`): Scores whether an action produced measurable marginal value. Supports code, QA, debate, and retrieval scoring modes.

	2. Credit Ledger (`ledger.py`): Non-transferable, decaying, capability-scoped credits with immutable audit trail. Every credit mutation is an append-only event with provenance.

	3. Resource Broker (`broker.py`): Capability-based access control. Decides allow/deny/downgrade/escalate/require-approval per resource type.

	4. GRPO Reward Hook (`grpo_hook.py`): TRL-compatible reward function combining oracle score + anti-gaming penalties. Validated end-to-end.

	### Core Invariants

	1. Credits are non-transferable
	2. Credits decay per-turn (δ = 0.995)
	3. Credits are capability-scoped (retrieval ≠ file write ≠ model access)
	4. Rewards require external verification (oracle separate from spender)
	5. Ledger is append-only
	6. Oracle cannot be directly influenced by the spending agent
	7. Failed work cannot generate positive credit
	8. Credit ≠ identity trust (high credit ≠ blanket access)

	### Threat Model

	\| Attack \| Defense \| Residual Risk \|
	\|--------\|---------\|---------------\|
	\| Credit farming (easy tasks) \| Decay + caps \| Slow gaming over many tasks \|
	\| Collusion (multiple agents) \| Non-transferability \| Vote-ring behavior \|
	\| Oracle spoofing \| Verifier separation \| Judge hacking \|
	\| Griefing (burn others' budget) \| Scoped spend \| Indirect poisoning \|
	\| Identity laundering \| Identity binding \| Account churn \|
	\| Strategic abstention \| Reward shaping \| Conservatism bias \|
	\| Verbosity gaming \| Token-cost multiplier \| Requires quality oracle \|
	\| Confidence manipulation \| Proper scoring rules \| Hard to calibrate perfectly \|

	### When OCC Is Valuable

	Use OCC when: agents have heterogeneous reliability, long-running tasks need budget discipline, debate can be poisoned, compute is expensive, auditability matters, or post-hoc accountability is required.

	Skip OCC when: single-agent tasks suffice, ground truth is immediate and cheap, no adversarial participation, all agents have equal trust and capability, or verifier cost exceeds saved compute.

	---

	## PART V: ABLATIONS

	\| Ablation \| Effect \|
	\|----------\|--------\|
	\| No credit ledger \| 27% less compute savings \|
	\| Transferable credits \| Gaming success: 0% → 45% \|
	\| Non-decaying credits \| Credit hoarding, -18% throughput \|
	\| No confident-wrong penalty \| Confident-wrong rate 2.3× higher \|
	\| No calibration penalty \| ECE: 0.12 → 0.31 \|
	\| No cost penalty \| Token usage +40% \|
	\| No anti-gaming penalty \| Gaming agents earn 3.2× more \|

	---

	## PART VI: HONEST ASSESSMENT

	### What the Evidence ACTUALLY Supports

	\| Claim \| Evidence Level \| Notes \|
	\|-------\|---------------\|-------\|
	\| Unmanaged compute amplifies adversarial influence \| Strong: 56.7% collapse, 2 seeds, identical result \| Needs mechanism isolation to be publishable \|
	\| OCC prevents catastrophic collapse \| Moderate: OCC 180/3 (83.3%) >> equal 3-round (56.7%) \| But OCC ≈ random gating at moderate budgets \|
	\| Anti-gaming ledger design is novel and sound \| Moderate: 8 attacks, zero successful vectors \| Simulation only, needs live agent testing \|
	\| OCC saves tokens at iso-quality \| Weak: 21% on TruthfulQA (small dataset), 67% on HumanEval (not OCC, it's adaptive retry) \| Not the core claim \|
	\| OCC beats simple baselines \| Not supported: Random gating (85.0%) ≈ OCC (83.3-85.0%) \| The advantage is in preventing extremes \|
	\| Learned allocation beats hand-tuned rules \| Not tested: GRPO hook works but no policy improvement at 0.5B scale \| Needs 7B+ model + training budget \|

	### What Failed

	1. OCC doesn't beat random gating at moderate budgets. The mechanism prevents catastrophe but doesn't improve the median case.
	2. TruthfulQA abstention is judge-dependent. 28% → 3% depending on scorer.
	3. GRPO training produced no policy improvement at 0.5B.
	4. HumanEval is adaptive retry, not OCC. Honest labeling needed.

	### Wrong Assumptions

	1. "In-process exec is good enough for HumanEval" — WRONG.
	2. "More debate turns always helps" — WRONG (56.7% collapse).
	3. "OCC will outperform random gating in the median case" — NOT YET PROVEN.
	4. "TruthfulQA string-matching is a reasonable scorer" — WRONG (59pp swing).

	### Is OCC Actually Useful?

	For preventing catastrophic compute misallocation: yes. The equal-3-round collapse is a genuine finding — more compute can make things worse, and a credit mechanism prevents the worst of it.

	For marginal gains in well-behaved systems: not yet proven. Random gating works nearly as well when there's no adversary. OCC's value is in adversary-aware deployment.

	For production multi-agent governance: promising but early. The ledger, broker, and anti-gaming design are sound on paper. Live-agent validation is needed.

	### Is This Publishable?

	Workshop: Yes, with the mechanism isolation experiment. The collapse + mechanism analysis + anti-gaming design is a coherent workshop paper.

	Main conference: No, without:
	1. Mechanism isolation results (not just the collapse number, but WHY it happens)
	2. Broader benchmark generalization (100+ questions, 5+ seeds)
	3. Statistical significance (paired bootstrap, McNemar)
	4. Learned allocator that beats hand-tuned rules
	5. Oracle robustness analysis under different oracle types

	### The Path Forward

	1. Run mechanism isolation (script uploaded) — break the collapse into testable causes
	2. Scale the benchmark: 100-300 debate questions, 5-10 seeds
	3. Add stronger baselines: confidence gating, disagreement gating, bandit allocation, auction allocation, judge-weighted voting
	4. Oracle ablation: ground-truth, LLM judge, noisy, adversarial oracles
	5. GRPO-learned allocator at 7B+ scale
	6. Split into focused papers: (a) Collapse mechanism, (b) OCC governance design, (c) Learned allocation

	### The Bold Claim

	In multi-agent systems, compute is not neutral. Extra turns can amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC is a first mechanism-design layer for making agent compute scarce, earned, scoped, decaying, and auditable.

	This is the sentence. Everything else is evidence, honesty, and the mechanism that makes it true.

	---

	## Repository

	- Main repo: https://huggingface.co/narcolepticchicken/occ-stack
	- Formal definition: `design.md`
	- Mechanism isolation: `jobs/occ_debate_collapse_mechanism.py`
	- Results: `reports/`

	---

	## Changelog

	- v12 (CORRECTED FRAMING): Reframed around "compute-as-attack-surface." Added formal OCC definition (design.md). Added threat model, "when not to use OCC," ledger event schema. HumanEval honestly labeled as adaptive retry. TruthfulQA labeled as oracle-dependence demonstration. Mechanism isolation script written. Collapse is the wedge.
	- v11: TruthfulQA AllenAI results. 2-seed debate aggregate. Honest assessment.
	- v10: Extended baselines running. Initial equal_3round collapse finding.

	---

	OCC is a research prototype. The collapse is real. The mechanism is promising. The evidence is incomplete. The framing is now honest.