narcolepticchicken
/

occ-stack

narcolepticchicken commited on 25 days ago

Commit

799bf90

verified ·

1 Parent(s): 9731829

Upload reports/debate_real_results.json

Files changed (1) hide show

reports/debate_real_results.json ADDED Viewed

+{
+  "model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
+  "date": "2026-05-07",
+  "num_topics": 30,
+  "equal_turns": {
+    "accuracy": 0.533,
+    "correct": 16,
+    "total_tokens": 61440,
+    "decision_quality_per_1k_tokens": 0.0087,
+    "notes": "Single round, 4 agents (3 honest + 1 adversarial), majority vote with unclear-filtered positions. High 'unclear' rate weakens this baseline."
+  },
+  "occ": {
+    "accuracy": 0.833,
+    "correct": 25,
+    "total_tokens": 138752,
+    "decision_quality_per_1k_tokens": 0.0060,
+    "rounds": 3,
+    "denied_agent_turns": 12,
+    "notes": "3 rounds with credit decay (-2 per 2 rounds). Broker denies agents below credit threshold 5. 12 agent-turns denied across all topics. Position extraction still noisy."
+  },
+  "caveats": {
+    "not_iso_compute": "OCC ran 3 rounds vs 1 round for equal turns. The 2.3x token increase is expected. For iso-compute comparison, need a 3-round equal-turns baseline.",
+    "position_extraction": "The extract_position() heuristic is too simplistic for nuanced model responses. Many positions classified as 'unclear'.",
+    "credit_scoring": "The score_arg() heuristic is crude (rewards presence of words like 'because'). A proper verifier-based scorer would improve OCC allocation decisions."
+  }
+}