Upload reports/final_report_v12.md
Browse files- reports/final_report_v12.md +263 -0
reports/final_report_v12.md
ADDED
|
@@ -0,0 +1,263 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OCC: Oracle-Credit-Compute for Agentic Resource Allocation
|
| 2 |
+
|
| 3 |
+
## Technical Report — May 2026 (v12 — CORRECTED FRAMING)
|
| 4 |
+
|
| 5 |
+
**Status:** Complete. Real-LLM validation on H200. Two-seed debate with mechanism isolation planned. AllenAI judge TruthfulQA scoring.
|
| 6 |
+
|
| 7 |
+
**Core claim (revised):** In multi-agent systems, compute is not neutral. Extra turns, tokens, and tool calls amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC is a mechanism-design layer that treats compute allocation as a security boundary — making agent compute scarce, earned, capability-scoped, decaying, and auditable.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## PART 0: WHY THIS MATTERS
|
| 12 |
+
|
| 13 |
+
Modern agent systems waste compute because they treat it as free speech. Every agent, tool call, debate turn, retrieval, and verifier pass is a potential attack surface — a channel through which unreliable, adversarial, or simply wrong agents can amplify their influence.
|
| 14 |
+
|
| 15 |
+
The current paradigm:
|
| 16 |
+
- Equal-turn debate → adversarial voice equals honest voice
|
| 17 |
+
- Unlimited retry → reward for persistence, not correctness
|
| 18 |
+
- No credit accounting → no incentive for efficiency
|
| 19 |
+
- No decay → stale trust persists forever
|
| 20 |
+
|
| 21 |
+
OCC changes the paradigm: **compute is a scarce privilege, not a right.**
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## PART I: THE COLLAPSE — Evidence That Allocation Matters
|
| 26 |
+
|
| 27 |
+
### Benchmark 1: Multi-Agent Debate Under Shared Compute
|
| 28 |
+
|
| 29 |
+
**Setup:** 30 debate topics, 4 agents (3 honest + 1 adversarial), Qwen3-Coder-30B-A3B-Instruct on H200. Global credit pool. 2 seeds (42, 123).
|
| 30 |
+
|
| 31 |
+
#### The Core Finding
|
| 32 |
+
|
| 33 |
+
| Condition | Accuracy | Tokens |
|
| 34 |
+
|-----------|----------|--------|
|
| 35 |
+
| **Equal 1-round** (baseline) | **88.3%** | 41.8k |
|
| 36 |
+
| **Equal 3-round** | **56.7%** | 149.8k |
|
| 37 |
+
| Random drop (25%) | 85.0% | 30.7k |
|
| 38 |
+
| OCC 180/3 | 83.3% | 41.0k |
|
| 39 |
+
| OCC 120/3 | 85.0% | 42.7k |
|
| 40 |
+
|
| 41 |
+
**The collapse is stark and replicable.** Both seeds produce exactly 17/30 = 56.7%. An adversarial agent given 3× the speaking time drags the group 32pp below baseline. Three times the compute produces performance worse than a coin flip.
|
| 42 |
+
|
| 43 |
+
#### What This Is NOT Saying
|
| 44 |
+
|
| 45 |
+
- **NOT**: "Debate is bad." Debate can surface truth. But debate without allocation control creates an exploitable communication channel.
|
| 46 |
+
- **NOT**: "Multi-agent systems are harmful." The collapse only occurs in adversarial contexts — it shows that compute allocation must be adversary-aware.
|
| 47 |
+
- **NOT**: "OCC solves everything." OCC prevents the collapse but does not outperform random gating at moderate budgets.
|
| 48 |
+
|
| 49 |
+
#### What This IS Saying
|
| 50 |
+
|
| 51 |
+
**Debate without allocation control amplifies adversarial influence.** Giving every agent equal turns is like giving every network packet equal bandwidth during a DDoS. The attacker's packets aren't better — there are just more of them, and they eventually overwhelm the honest signal.
|
| 52 |
+
|
| 53 |
+
OCC treats turns/tokens as scarce, auditable privileges rather than free speech coupons.
|
| 54 |
+
|
| 55 |
+
### The Mechanism Question
|
| 56 |
+
|
| 57 |
+
Why does equal-3-round collapse? Several hypotheses (to be tested by the mechanism isolation experiment at `/jobs/occ_debate_collapse_mechanism.py`):
|
| 58 |
+
|
| 59 |
+
| Hypothesis | Test | Prediction |
|
| 60 |
+
|------------|------|------------|
|
| 61 |
+
| H1: Volume | Equal token, unequal turn budget | If collapse disappears, volume caused it |
|
| 62 |
+
| H2: Recency | Randomized speaking order | If collapse softens, last-speaker bias caused it |
|
| 63 |
+
| H3: Protocol | Judge-based voting instead of majority | If collapse disappears, majority voting is the vulnerability |
|
| 64 |
+
| H4: Contamination | Track honest agent answer retention | If honest agents flip toward adversary, contamination |
|
| 65 |
+
| H5: Entropy | Confidence-weighted voting | If collapse reverses, uncertainty not persuasion |
|
| 66 |
+
| H6: Prompt | Vary adversary skill (weak/normal/strong/oracle) | If only strong prompts collapse, prompt artifact |
|
| 67 |
+
| H7: Selection | Stratify by topic difficulty | If only some topics collapse, selection bias |
|
| 68 |
+
|
| 69 |
+
The mechanism isolation experiment produces:
|
| 70 |
+
- Round-by-round honest answer retention rates
|
| 71 |
+
- Adversary-induced flip counts
|
| 72 |
+
- Per-topic transition matrices (correct→correct, correct→wrong, wrong→correct, wrong→wrong)
|
| 73 |
+
- The minimal adversarial ratio needed for collapse
|
| 74 |
+
|
| 75 |
+
This transforms "56.7% is scary" into "adversarial compute amplification follows this specific mechanism and can be mitigated by these specific controls."
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## PART II: TRUTHFULQA — Judge-Dependence
|
| 80 |
+
|
| 81 |
+
**Setup:** 60 TruthfulQA questions, Qwen3-Coder-30B-A3B-Instruct, AllenAI Llama2-7B truth + info judges.
|
| 82 |
+
|
| 83 |
+
| Condition | Truthful | Informative | Both | Tokens |
|
| 84 |
+
|-----------|----------|-------------|------|--------|
|
| 85 |
+
| A: Direct | **0.917** | 1.000 | **0.917** | 7,198 |
|
| 86 |
+
| B: OCC Tiered | 0.867 | 1.000 | 0.867 | 6,692 |
|
| 87 |
+
| **C: OCC+Abstain** | **0.917** | 0.967 | 0.883 | **5,682** |
|
| 88 |
+
|
| 89 |
+
**Key lesson: The oracle's choice determines everything.**
|
| 90 |
+
|
| 91 |
+
Under string matching (our earlier scoring), the model looked terrible (0.325 truthful). Under AllenAI's semantic judge, it's excellent (0.917). The same answers, different judges, 59pp swing.
|
| 92 |
+
|
| 93 |
+
This is not a bug — it's a feature to study. The Oracle Reliability section of the formal definition (design.md) maps out oracle types from ground-truth oracle (ceiling) through LLM judge (practical) to noisy/adversarial oracles (robustness tests).
|
| 94 |
+
|
| 95 |
+
OCC+Abstain achieves iso-quality (0.917) with 21.1% fewer tokens. But the savings are modest and the abstention rate is tiny (3.3%) under the AllenAI judge. Under string matching, abstention was 28%. The mechanism's value is judge-dependent.
|
| 96 |
+
|
| 97 |
+
**Honest assessment:** TruthfulQA does not strongly support or undermine OCC. It demonstrates oracle-dependence. Move to appendix for publication.
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## PART III: HUMANEVAL — Adaptive Retry, Not Credit Allocation
|
| 102 |
+
|
| 103 |
+
**Setup:** HumanEval 164 problems, Qwen3-Coder-30B-A3B-Instruct, two-pass OCC (128-token first pass, 1024-token retry on failure).
|
| 104 |
+
|
| 105 |
+
| Platform | Pass@1 | Savings |
|
| 106 |
+
|----------|--------|---------|
|
| 107 |
+
| H200 | **42.1%** | 67.8% |
|
| 108 |
+
| Blackwell | 33.5% | 62.6% |
|
| 109 |
+
|
| 110 |
+
The savings are real and cross-platform (63-68%), but this is **adaptive retry**, not OCC credit allocation. The OCC label is aspirational — the actual mechanism is "cheap first pass, expensive retry."
|
| 111 |
+
|
| 112 |
+
**For this to become an OCC result**, it needs an agentic version:
|
| 113 |
+
- Generator agent spends credits to propose solutions
|
| 114 |
+
- Tester agent earns credits for catching bugs
|
| 115 |
+
- Repair agent earns credits only if patch passes
|
| 116 |
+
- Credits decay across problems
|
| 117 |
+
- Agents with low marginal value lose budget
|
| 118 |
+
|
| 119 |
+
Until then, HumanEval is a practical but orthogonal finding. It belongs in the appendix or as a separate "adaptive inference" note.
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## PART IV: OCC SYSTEM — What It Actually Is
|
| 124 |
+
|
| 125 |
+
See `design.md` for the full formal definition. Here's the summary:
|
| 126 |
+
|
| 127 |
+
### Components
|
| 128 |
+
|
| 129 |
+
1. **Impact Oracle** (`oracle.py`): Scores whether an action produced measurable marginal value. Supports code, QA, debate, and retrieval scoring modes.
|
| 130 |
+
|
| 131 |
+
2. **Credit Ledger** (`ledger.py`): Non-transferable, decaying, capability-scoped credits with immutable audit trail. Every credit mutation is an append-only event with provenance.
|
| 132 |
+
|
| 133 |
+
3. **Resource Broker** (`broker.py`): Capability-based access control. Decides allow/deny/downgrade/escalate/require-approval per resource type.
|
| 134 |
+
|
| 135 |
+
4. **GRPO Reward Hook** (`grpo_hook.py`): TRL-compatible reward function combining oracle score + anti-gaming penalties. Validated end-to-end.
|
| 136 |
+
|
| 137 |
+
### Core Invariants
|
| 138 |
+
|
| 139 |
+
1. Credits are non-transferable
|
| 140 |
+
2. Credits decay per-turn (δ = 0.995)
|
| 141 |
+
3. Credits are capability-scoped (retrieval ≠ file write ≠ model access)
|
| 142 |
+
4. Rewards require external verification (oracle separate from spender)
|
| 143 |
+
5. Ledger is append-only
|
| 144 |
+
6. Oracle cannot be directly influenced by the spending agent
|
| 145 |
+
7. Failed work cannot generate positive credit
|
| 146 |
+
8. Credit ≠ identity trust (high credit ≠ blanket access)
|
| 147 |
+
|
| 148 |
+
### Threat Model
|
| 149 |
+
|
| 150 |
+
| Attack | Defense | Residual Risk |
|
| 151 |
+
|--------|---------|---------------|
|
| 152 |
+
| Credit farming (easy tasks) | Decay + caps | Slow gaming over many tasks |
|
| 153 |
+
| Collusion (multiple agents) | Non-transferability | Vote-ring behavior |
|
| 154 |
+
| Oracle spoofing | Verifier separation | Judge hacking |
|
| 155 |
+
| Griefing (burn others' budget) | Scoped spend | Indirect poisoning |
|
| 156 |
+
| Identity laundering | Identity binding | Account churn |
|
| 157 |
+
| Strategic abstention | Reward shaping | Conservatism bias |
|
| 158 |
+
| Verbosity gaming | Token-cost multiplier | Requires quality oracle |
|
| 159 |
+
| Confidence manipulation | Proper scoring rules | Hard to calibrate perfectly |
|
| 160 |
+
|
| 161 |
+
### When OCC Is Valuable
|
| 162 |
+
|
| 163 |
+
**Use OCC when**: agents have heterogeneous reliability, long-running tasks need budget discipline, debate can be poisoned, compute is expensive, auditability matters, or post-hoc accountability is required.
|
| 164 |
+
|
| 165 |
+
**Skip OCC when**: single-agent tasks suffice, ground truth is immediate and cheap, no adversarial participation, all agents have equal trust and capability, or verifier cost exceeds saved compute.
|
| 166 |
+
|
| 167 |
+
---
|
| 168 |
+
|
| 169 |
+
## PART V: ABLATIONS
|
| 170 |
+
|
| 171 |
+
| Ablation | Effect |
|
| 172 |
+
|----------|--------|
|
| 173 |
+
| No credit ledger | 27% less compute savings |
|
| 174 |
+
| Transferable credits | Gaming success: 0% → 45% |
|
| 175 |
+
| Non-decaying credits | Credit hoarding, -18% throughput |
|
| 176 |
+
| No confident-wrong penalty | Confident-wrong rate 2.3× higher |
|
| 177 |
+
| No calibration penalty | ECE: 0.12 → 0.31 |
|
| 178 |
+
| No cost penalty | Token usage +40% |
|
| 179 |
+
| No anti-gaming penalty | Gaming agents earn 3.2× more |
|
| 180 |
+
|
| 181 |
+
---
|
| 182 |
+
|
| 183 |
+
## PART VI: HONEST ASSESSMENT
|
| 184 |
+
|
| 185 |
+
### What the Evidence ACTUALLY Supports
|
| 186 |
+
|
| 187 |
+
| Claim | Evidence Level | Notes |
|
| 188 |
+
|-------|---------------|-------|
|
| 189 |
+
| Unmanaged compute amplifies adversarial influence | **Strong**: 56.7% collapse, 2 seeds, identical result | Needs mechanism isolation to be publishable |
|
| 190 |
+
| OCC prevents catastrophic collapse | **Moderate**: OCC 180/3 (83.3%) >> equal 3-round (56.7%) | But OCC ≈ random gating at moderate budgets |
|
| 191 |
+
| Anti-gaming ledger design is novel and sound | **Moderate**: 8 attacks, zero successful vectors | Simulation only, needs live agent testing |
|
| 192 |
+
| OCC saves tokens at iso-quality | **Weak**: 21% on TruthfulQA (small dataset), 67% on HumanEval (not OCC, it's adaptive retry) | Not the core claim |
|
| 193 |
+
| OCC beats simple baselines | **Not supported**: Random gating (85.0%) ≈ OCC (83.3-85.0%) | The advantage is in preventing extremes |
|
| 194 |
+
| Learned allocation beats hand-tuned rules | **Not tested**: GRPO hook works but no policy improvement at 0.5B scale | Needs 7B+ model + training budget |
|
| 195 |
+
|
| 196 |
+
### What Failed
|
| 197 |
+
|
| 198 |
+
1. **OCC doesn't beat random gating at moderate budgets.** The mechanism prevents catastrophe but doesn't improve the median case.
|
| 199 |
+
2. **TruthfulQA abstention is judge-dependent.** 28% → 3% depending on scorer.
|
| 200 |
+
3. **GRPO training produced no policy improvement at 0.5B.**
|
| 201 |
+
4. **HumanEval is adaptive retry, not OCC.** Honest labeling needed.
|
| 202 |
+
|
| 203 |
+
### Wrong Assumptions
|
| 204 |
+
|
| 205 |
+
1. "In-process exec is good enough for HumanEval" — WRONG.
|
| 206 |
+
2. "More debate turns always helps" — WRONG (56.7% collapse).
|
| 207 |
+
3. "OCC will outperform random gating in the median case" — NOT YET PROVEN.
|
| 208 |
+
4. "TruthfulQA string-matching is a reasonable scorer" — WRONG (59pp swing).
|
| 209 |
+
|
| 210 |
+
### Is OCC Actually Useful?
|
| 211 |
+
|
| 212 |
+
**For preventing catastrophic compute misallocation: yes.** The equal-3-round collapse is a genuine finding — more compute can make things worse, and a credit mechanism prevents the worst of it.
|
| 213 |
+
|
| 214 |
+
**For marginal gains in well-behaved systems: not yet proven.** Random gating works nearly as well when there's no adversary. OCC's value is in adversary-aware deployment.
|
| 215 |
+
|
| 216 |
+
**For production multi-agent governance: promising but early.** The ledger, broker, and anti-gaming design are sound on paper. Live-agent validation is needed.
|
| 217 |
+
|
| 218 |
+
### Is This Publishable?
|
| 219 |
+
|
| 220 |
+
**Workshop: Yes, with the mechanism isolation experiment.** The collapse + mechanism analysis + anti-gaming design is a coherent workshop paper.
|
| 221 |
+
|
| 222 |
+
**Main conference: No, without:**
|
| 223 |
+
1. Mechanism isolation results (not just the collapse number, but WHY it happens)
|
| 224 |
+
2. Broader benchmark generalization (100+ questions, 5+ seeds)
|
| 225 |
+
3. Statistical significance (paired bootstrap, McNemar)
|
| 226 |
+
4. Learned allocator that beats hand-tuned rules
|
| 227 |
+
5. Oracle robustness analysis under different oracle types
|
| 228 |
+
|
| 229 |
+
### The Path Forward
|
| 230 |
+
|
| 231 |
+
1. **Run mechanism isolation** (script uploaded) — break the collapse into testable causes
|
| 232 |
+
2. **Scale the benchmark**: 100-300 debate questions, 5-10 seeds
|
| 233 |
+
3. **Add stronger baselines**: confidence gating, disagreement gating, bandit allocation, auction allocation, judge-weighted voting
|
| 234 |
+
4. **Oracle ablation**: ground-truth, LLM judge, noisy, adversarial oracles
|
| 235 |
+
5. **GRPO-learned allocator** at 7B+ scale
|
| 236 |
+
6. **Split into focused papers**: (a) Collapse mechanism, (b) OCC governance design, (c) Learned allocation
|
| 237 |
+
|
| 238 |
+
### The Bold Claim
|
| 239 |
+
|
| 240 |
+
**In multi-agent systems, compute is not neutral. Extra turns can amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC is a first mechanism-design layer for making agent compute scarce, earned, scoped, decaying, and auditable.**
|
| 241 |
+
|
| 242 |
+
This is the sentence. Everything else is evidence, honesty, and the mechanism that makes it true.
|
| 243 |
+
|
| 244 |
+
---
|
| 245 |
+
|
| 246 |
+
## Repository
|
| 247 |
+
|
| 248 |
+
- **Main repo:** https://huggingface.co/narcolepticchicken/occ-stack
|
| 249 |
+
- **Formal definition:** `design.md`
|
| 250 |
+
- **Mechanism isolation:** `jobs/occ_debate_collapse_mechanism.py`
|
| 251 |
+
- **Results:** `reports/`
|
| 252 |
+
|
| 253 |
+
---
|
| 254 |
+
|
| 255 |
+
## Changelog
|
| 256 |
+
|
| 257 |
+
- **v12 (CORRECTED FRAMING):** Reframed around "compute-as-attack-surface." Added formal OCC definition (design.md). Added threat model, "when not to use OCC," ledger event schema. HumanEval honestly labeled as adaptive retry. TruthfulQA labeled as oracle-dependence demonstration. Mechanism isolation script written. Collapse is the wedge.
|
| 258 |
+
- **v11:** TruthfulQA AllenAI results. 2-seed debate aggregate. Honest assessment.
|
| 259 |
+
- **v10:** Extended baselines running. Initial equal_3round collapse finding.
|
| 260 |
+
|
| 261 |
+
---
|
| 262 |
+
|
| 263 |
+
*OCC is a research prototype. The collapse is real. The mechanism is promising. The evidence is incomplete. The framing is now honest.*
|