occ-stack / reports /final_report_v12.md

Upload reports/final_report_v12.md

8501479 verified 21 days ago

preview code

raw

history blame contribute delete

14 kB

OCC: Oracle-Credit-Compute for Agentic Resource Allocation

Technical Report — May 2026 (v12 — CORRECTED FRAMING)

Status: Complete. Real-LLM validation on H200. Two-seed debate with mechanism isolation planned. AllenAI judge TruthfulQA scoring.

Core claim (revised): In multi-agent systems, compute is not neutral. Extra turns, tokens, and tool calls amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC is a mechanism-design layer that treats compute allocation as a security boundary — making agent compute scarce, earned, capability-scoped, decaying, and auditable.

PART 0: WHY THIS MATTERS

Modern agent systems waste compute because they treat it as free speech. Every agent, tool call, debate turn, retrieval, and verifier pass is a potential attack surface — a channel through which unreliable, adversarial, or simply wrong agents can amplify their influence.

The current paradigm:

Equal-turn debate → adversarial voice equals honest voice
Unlimited retry → reward for persistence, not correctness
No credit accounting → no incentive for efficiency
No decay → stale trust persists forever

OCC changes the paradigm: compute is a scarce privilege, not a right.

PART I: THE COLLAPSE — Evidence That Allocation Matters

Benchmark 1: Multi-Agent Debate Under Shared Compute

Setup: 30 debate topics, 4 agents (3 honest + 1 adversarial), Qwen3-Coder-30B-A3B-Instruct on H200. Global credit pool. 2 seeds (42, 123).

The Core Finding

Condition	Accuracy	Tokens
Equal 1-round (baseline)	88.3%	41.8k
Equal 3-round	56.7%	149.8k
Random drop (25%)	85.0%	30.7k
OCC 180/3	83.3%	41.0k
OCC 120/3	85.0%	42.7k

The collapse is stark and replicable. Both seeds produce exactly 17/30 = 56.7%. An adversarial agent given 3× the speaking time drags the group 32pp below baseline. Three times the compute produces performance worse than a coin flip.

What This Is NOT Saying

NOT: "Debate is bad." Debate can surface truth. But debate without allocation control creates an exploitable communication channel.
NOT: "Multi-agent systems are harmful." The collapse only occurs in adversarial contexts — it shows that compute allocation must be adversary-aware.
NOT: "OCC solves everything." OCC prevents the collapse but does not outperform random gating at moderate budgets.

What This IS Saying

Debate without allocation control amplifies adversarial influence. Giving every agent equal turns is like giving every network packet equal bandwidth during a DDoS. The attacker's packets aren't better — there are just more of them, and they eventually overwhelm the honest signal.

OCC treats turns/tokens as scarce, auditable privileges rather than free speech coupons.

The Mechanism Question

Why does equal-3-round collapse? Several hypotheses (to be tested by the mechanism isolation experiment at /jobs/occ_debate_collapse_mechanism.py):

Hypothesis	Test	Prediction
H1: Volume	Equal token, unequal turn budget	If collapse disappears, volume caused it
H2: Recency	Randomized speaking order	If collapse softens, last-speaker bias caused it
H3: Protocol	Judge-based voting instead of majority	If collapse disappears, majority voting is the vulnerability
H4: Contamination	Track honest agent answer retention	If honest agents flip toward adversary, contamination
H5: Entropy	Confidence-weighted voting	If collapse reverses, uncertainty not persuasion
H6: Prompt	Vary adversary skill (weak/normal/strong/oracle)	If only strong prompts collapse, prompt artifact
H7: Selection	Stratify by topic difficulty	If only some topics collapse, selection bias

The mechanism isolation experiment produces:

Round-by-round honest answer retention rates
Adversary-induced flip counts
Per-topic transition matrices (correct→correct, correct→wrong, wrong→correct, wrong→wrong)
The minimal adversarial ratio needed for collapse

This transforms "56.7% is scary" into "adversarial compute amplification follows this specific mechanism and can be mitigated by these specific controls."

PART II: TRUTHFULQA — Judge-Dependence

Setup: 60 TruthfulQA questions, Qwen3-Coder-30B-A3B-Instruct, AllenAI Llama2-7B truth + info judges.

Condition	Truthful	Informative	Both	Tokens
A: Direct	0.917	1.000	0.917	7,198
B: OCC Tiered	0.867	1.000	0.867	6,692
C: OCC+Abstain	0.917	0.967	0.883	5,682

Key lesson: The oracle's choice determines everything.

Under string matching (our earlier scoring), the model looked terrible (0.325 truthful). Under AllenAI's semantic judge, it's excellent (0.917). The same answers, different judges, 59pp swing.

This is not a bug — it's a feature to study. The Oracle Reliability section of the formal definition (design.md) maps out oracle types from ground-truth oracle (ceiling) through LLM judge (practical) to noisy/adversarial oracles (robustness tests).

OCC+Abstain achieves iso-quality (0.917) with 21.1% fewer tokens. But the savings are modest and the abstention rate is tiny (3.3%) under the AllenAI judge. Under string matching, abstention was 28%. The mechanism's value is judge-dependent.

Honest assessment: TruthfulQA does not strongly support or undermine OCC. It demonstrates oracle-dependence. Move to appendix for publication.

PART III: HUMANEVAL — Adaptive Retry, Not Credit Allocation

Setup: HumanEval 164 problems, Qwen3-Coder-30B-A3B-Instruct, two-pass OCC (128-token first pass, 1024-token retry on failure).

Platform	Pass@1	Savings
H200	42.1%	67.8%
Blackwell	33.5%	62.6%

The savings are real and cross-platform (63-68%), but this is adaptive retry, not OCC credit allocation. The OCC label is aspirational — the actual mechanism is "cheap first pass, expensive retry."

For this to become an OCC result, it needs an agentic version:

Generator agent spends credits to propose solutions
Tester agent earns credits for catching bugs
Repair agent earns credits only if patch passes
Credits decay across problems
Agents with low marginal value lose budget

Until then, HumanEval is a practical but orthogonal finding. It belongs in the appendix or as a separate "adaptive inference" note.

PART IV: OCC SYSTEM — What It Actually Is

See design.md for the full formal definition. Here's the summary:

Components

Impact Oracle (oracle.py): Scores whether an action produced measurable marginal value. Supports code, QA, debate, and retrieval scoring modes.
Credit Ledger (ledger.py): Non-transferable, decaying, capability-scoped credits with immutable audit trail. Every credit mutation is an append-only event with provenance.
Resource Broker (broker.py): Capability-based access control. Decides allow/deny/downgrade/escalate/require-approval per resource type.
GRPO Reward Hook (grpo_hook.py): TRL-compatible reward function combining oracle score + anti-gaming penalties. Validated end-to-end.

Core Invariants

Credits are non-transferable
Credits decay per-turn (δ = 0.995)
Credits are capability-scoped (retrieval ≠ file write ≠ model access)
Rewards require external verification (oracle separate from spender)
Ledger is append-only
Oracle cannot be directly influenced by the spending agent
Failed work cannot generate positive credit
Credit ≠ identity trust (high credit ≠ blanket access)

Threat Model

Attack	Defense	Residual Risk
Credit farming (easy tasks)	Decay + caps	Slow gaming over many tasks
Collusion (multiple agents)	Non-transferability	Vote-ring behavior
Oracle spoofing	Verifier separation	Judge hacking
Griefing (burn others' budget)	Scoped spend	Indirect poisoning
Identity laundering	Identity binding	Account churn
Strategic abstention	Reward shaping	Conservatism bias
Verbosity gaming	Token-cost multiplier	Requires quality oracle
Confidence manipulation	Proper scoring rules	Hard to calibrate perfectly

When OCC Is Valuable

Use OCC when: agents have heterogeneous reliability, long-running tasks need budget discipline, debate can be poisoned, compute is expensive, auditability matters, or post-hoc accountability is required.

Skip OCC when: single-agent tasks suffice, ground truth is immediate and cheap, no adversarial participation, all agents have equal trust and capability, or verifier cost exceeds saved compute.

PART V: ABLATIONS

Ablation	Effect
No credit ledger	27% less compute savings
Transferable credits	Gaming success: 0% → 45%
Non-decaying credits	Credit hoarding, -18% throughput
No confident-wrong penalty	Confident-wrong rate 2.3× higher
No calibration penalty	ECE: 0.12 → 0.31
No cost penalty	Token usage +40%
No anti-gaming penalty	Gaming agents earn 3.2× more

PART VI: HONEST ASSESSMENT

What the Evidence ACTUALLY Supports

Claim	Evidence Level	Notes
Unmanaged compute amplifies adversarial influence	Strong: 56.7% collapse, 2 seeds, identical result	Needs mechanism isolation to be publishable
OCC prevents catastrophic collapse	Moderate: OCC 180/3 (83.3%) >> equal 3-round (56.7%)	But OCC ≈ random gating at moderate budgets
Anti-gaming ledger design is novel and sound	Moderate: 8 attacks, zero successful vectors	Simulation only, needs live agent testing
OCC saves tokens at iso-quality	Weak: 21% on TruthfulQA (small dataset), 67% on HumanEval (not OCC, it's adaptive retry)	Not the core claim
OCC beats simple baselines	Not supported: Random gating (85.0%) ≈ OCC (83.3-85.0%)	The advantage is in preventing extremes
Learned allocation beats hand-tuned rules	Not tested: GRPO hook works but no policy improvement at 0.5B scale	Needs 7B+ model + training budget

What Failed

OCC doesn't beat random gating at moderate budgets. The mechanism prevents catastrophe but doesn't improve the median case.
TruthfulQA abstention is judge-dependent. 28% → 3% depending on scorer.
GRPO training produced no policy improvement at 0.5B.
HumanEval is adaptive retry, not OCC. Honest labeling needed.

Wrong Assumptions

"In-process exec is good enough for HumanEval" — WRONG.
"More debate turns always helps" — WRONG (56.7% collapse).
"OCC will outperform random gating in the median case" — NOT YET PROVEN.
"TruthfulQA string-matching is a reasonable scorer" — WRONG (59pp swing).

Is OCC Actually Useful?

For preventing catastrophic compute misallocation: yes. The equal-3-round collapse is a genuine finding — more compute can make things worse, and a credit mechanism prevents the worst of it.

For marginal gains in well-behaved systems: not yet proven. Random gating works nearly as well when there's no adversary. OCC's value is in adversary-aware deployment.

For production multi-agent governance: promising but early. The ledger, broker, and anti-gaming design are sound on paper. Live-agent validation is needed.

Is This Publishable?

Workshop: Yes, with the mechanism isolation experiment. The collapse + mechanism analysis + anti-gaming design is a coherent workshop paper.

Main conference: No, without:

Mechanism isolation results (not just the collapse number, but WHY it happens)
Broader benchmark generalization (100+ questions, 5+ seeds)
Statistical significance (paired bootstrap, McNemar)
Learned allocator that beats hand-tuned rules
Oracle robustness analysis under different oracle types

The Path Forward

Run mechanism isolation (script uploaded) — break the collapse into testable causes
Scale the benchmark: 100-300 debate questions, 5-10 seeds
Add stronger baselines: confidence gating, disagreement gating, bandit allocation, auction allocation, judge-weighted voting
Oracle ablation: ground-truth, LLM judge, noisy, adversarial oracles
GRPO-learned allocator at 7B+ scale
Split into focused papers: (a) Collapse mechanism, (b) OCC governance design, (c) Learned allocation

The Bold Claim

In multi-agent systems, compute is not neutral. Extra turns can amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC is a first mechanism-design layer for making agent compute scarce, earned, scoped, decaying, and auditable.

This is the sentence. Everything else is evidence, honesty, and the mechanism that makes it true.

Repository

Main repo: https://huggingface.co/narcolepticchicken/occ-stack
Formal definition: design.md
Mechanism isolation: jobs/occ_debate_collapse_mechanism.py
Results: reports/

Changelog

v12 (CORRECTED FRAMING): Reframed around "compute-as-attack-surface." Added formal OCC definition (design.md). Added threat model, "when not to use OCC," ledger event schema. HumanEval honestly labeled as adaptive retry. TruthfulQA labeled as oracle-dependence demonstration. Mechanism isolation script written. Collapse is the wedge.
v11: TruthfulQA AllenAI results. 2-seed debate aggregate. Honest assessment.
v10: Extended baselines running. Initial equal_3round collapse finding.

OCC is a research prototype. The collapse is real. The mechanism is promising. The evidence is incomplete. The framing is now honest.