Spaces:
Sleeping
Sleeping
File size: 14,025 Bytes
952db85 d2cb56e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 | # CrisisWorldCortex: Teaching Agents to Govern Their Own Thinking
The failure mode that started this project was not that language models make
mistakes. That is obvious. The more interesting failure is that they can make
the same kind of mistake together.
Ask one model for several answers and the samples often orbit the same
attractor. Ask several models to debate and they can still converge toward a
comfortable shared prior. Give them a judge and the judge can flatten the
remaining disagreement into a single polished answer. On open-ended tasks,
where the useful signal is often held in minority hypotheses, edge cases, and
unpopular interpretations, this looks less like collective intelligence and
more like artificial hivemind.
That phrase became the north star for CrisisWorldCortex.
We did not want to build another benchmark where an agent gets a prompt, emits
an action, and receives a scalar reward. We wanted to build an environment that
asks a deeper question:
> Can an AI system govern its own thinking before it governs the world?
This is the story of how we turned that question into an OpenEnv environment.
## The Problem: Agreement Is Not Intelligence
The easiest way to make an AI system look thoughtful is to ask it to generate
more text. The easiest way to make it look social is to add more agents. But
neither step guarantees better reasoning.
If every agent sees the same flattened observation, shares the same base model,
uses similar prompts, and is finally compressed by one central judge, the
system can become a theater of disagreement. It says different things for a few
turns, then collapses into the same answer it probably would have produced
alone.
That is not the kind of multi-agent intelligence we wanted to measure.
Real institutions do not work well because there are many people in the room.
They work when the room has structure. A good crisis team has epidemiologists,
logistics leads, legal authorities, field operators, dissent channels, escalation
rules, budget constraints, and deadlines. The value is not just plurality. The
value is governed plurality.
So our starting claim was deliberately narrow:
> The interesting unit of learning is not only the final action. It is the
> internal governance policy that decides which thoughts are worth paying for.
That is where Cortex comes in.
## Cortex: Metacognition as the Controller
Cortex is our inner cognitive system. It is not just "a bunch of agents." It is
a nested society with roles, phases, hard caps, and a controller that decides
how cognition gets spent.
At the lowest level, Cortex has subagents:
- Perception, which deterministically extracts salient signals.
- World Modeler, which forms hypotheses about hidden state.
- Planner, which proposes candidate interventions.
- Critic, which attacks assumptions and looks for failure modes.
- Brain Executive, which turns internal evidence into a recommendation.
Those subagents are arranged into domain brains: Epidemiology, Logistics, and
Governance. Each brain sees the same world, but through a different lens.
Epidemiology cares about spread and delayed case signals. Logistics cares about
resource scarcity and deployment tradeoffs. Governance cares about legal
constraints, compliance, and the cost of authority.
The brains feed a Council. The Council does not simply average their opinions.
It runs a small epistemic protocol:
1. Let each brain form an independent first view.
2. Require evidence citations instead of pure preference.
3. Challenge overconfident or conflicting recommendations.
4. Preserve dissent when a minority view could matter later.
5. Converge when the world needs an action.
The central learned object is the metacognitive router: the policy that decides
whether to call another subagent, request a challenge, preserve dissent, spend
another round, or stop and act.
This matters because recursive thought is not free. More reasoning can help,
but it can also burn budget, delay action, and amplify the same prior. Cortex
therefore treats thinking as an intervention with a cost.
```mermaid
flowchart TB
O[Observation] --> B1[Epidemiology Brain]
O --> B2[Logistics Brain]
O --> B3[Governance Brain]
B1 --> C[Council]
B2 --> C
B3 --> C
C --> M[Metacognition Router]
M -->|challenge| C
M -->|second round| B1
M -->|preserve dissent| D[Dissent Log]
M -->|act| A[Typed Outer Action]
```
This is our answer to artificial hivemind: not "make agents disagree," but
make disagreement operational. Make it typed. Make it budgeted. Make it
measurable. Make the system learn when disagreement is useful and when it is
just noise.
## Why We Needed CrisisWorld
Once we had the cognition story, we needed a world that could actually test it.
A toy environment would not work. If the correct action is obvious from the
observation, then metacognition has nothing to do. If the reward arrives only
at the end, the training signal becomes too weak. If actions are free text, the
grader becomes fragile. If the world has no hidden state, then a council of
specialists is mostly decoration.
So we built CrisisWorld: a regional outbreak-control simulator for OpenEnv.
CrisisWorld is not a game board with arbitrary points. It is a deliberately
messy control setting:
- The true outbreak state is latent.
- Reported case counts are delayed and noisy.
- Hospital load is current but incomplete.
- Compliance is only a proxy.
- Resources are scarce and typed.
- Strict restrictions may be illegal until authority is escalated.
- Hard scenarios include cross-region cascade and hidden superspreader events.
The agent has six MVP actions:
- `deploy_resource`
- `request_data`
- `restrict_movement`
- `escalate`
- `reallocate_budget`
- `no_op`
Every action is typed. Every rejected action is recorded. The environment keeps
state. The observation never leaks latent SEIR variables. The agent has to
infer, decide, and live with the consequences.
This gives Cortex a reason to exist. Epidemiology may see a delayed case spike.
Logistics may know that the needed resource is almost gone. Governance may know
that the best restriction is currently blocked. A flat agent can blur those
concerns into one answer. Cortex keeps them separate long enough for the
metacognitive layer to decide what should be challenged, preserved, or acted
on.
## The Reward Signal Is the Product
For this project, reward design was not an afterthought. It is part of the
research claim.
Many environments technically return a reward, but the signal is too sparse,
too constant, or too easy to game. We wanted a reward that could support real
training and real ablations. The environment reward, `r_outer`, is dense and
decomposed:
```text
r_outer =
infection control
+ time pressure
+ hospital pressure
+ cascade control
+ policy validity
+ fairness
```
The policy-validity component is intentionally strong. A real accepted
intervention gets positive signal. A legal but passive `no_op` is allowed but
not rewarded as useful action. A rejected action is penalized. A parse-failure
marker can terminate the episode.
That gives the learner a sharp surface:
- Doing nothing should not look as good as acting well.
- Illegal or V2-only actions should not silently pass.
- Parse failures should become visible in the reward.
- Active, valid intervention should separate from passive behavior.
We locked this into tests. All-no-op episodes must stay below a threshold.
All-rejected episodes must stay below a threshold. Active strategic deployment
must score higher. The active-vs-no-op separation must remain large enough to
train against.
Then we added the second layer: training reward can subtract a token-budget
penalty. That means the router is not simply rewarded for thinking forever. It
has to learn the price of another call.
Finally, we keep evaluation metrics separate from training reward:
- Collapse rate: did the policy fall into repeating the same action?
- Dissent value: did preserved minority views later prove useful?
- Consensus calibration: did confidence track realized reward?
- Novelty yield: did a second round actually change the decision?
That separation is important. We do not want to directly reward theatrical
disagreement. We want to measure whether the system resists collapse when it
matters.
## Matched Compute, or the Claim Does Not Count
There is an easy way to make a multi-agent system win: spend more tokens.
We did not want that loophole.
CrisisWorldCortex includes baselines designed around the central control
question:
- B1 is a flat agent: one LLM call per tick.
- B2 is a matched-compute self-revision agent: generate, critique, revise, and
repeat until the same per-tick budget is nearly used.
- B3 is Cortex with a deterministic router: the full architecture without a
learned metacognitive policy.
This lets us ask cleaner questions:
- Does extra compute alone help?
- Does structured cognition help before learning?
- Does a learned router improve how the same cognitive machinery is used?
That last question is the heart of the project. The router is small, but its
job is powerful. It decides where attention goes. It decides whether a
minority view deserves preservation. It decides whether urgency beats another
round of thought. It decides whether the council has earned the right to act.
In other words, the router learns governance of thought.
## A Tick in the Life of Cortex
Imagine the hard scenario.
The outbreak is spreading across five regions. The case reports are delayed by
three ticks. A hidden superspreader event has already changed the latent state,
but the observation does not say that directly. Resources are scarce. Strict
movement restriction is blocked until national escalation.
A flat agent sees a table of numbers and may jump to the obvious move: restrict
the region with the highest reported cases. But those cases are old. The
current hospital load tells a more urgent story. The best restriction may be
illegal. The scarce resource might be better held for a coming cascade.
Cortex starts differently.
The Epidemiology brain flags the delayed signal and estimates where the
infection may actually be now. The Logistics brain checks whether test kits,
hospital beds, mobile units, and vaccines can support the intervention. The
Governance brain notices that a strict restriction may be blocked and that
escalation has a cost.
The Council sees disagreement. Metacognition now has a choice.
It can force convergence and act immediately. It can ask the most uncertain
brain to challenge the most confident one. It can preserve a dissenting
recommendation for later evaluation. It can spend a second round. Or it can
stop and emit a conservative action if the budget is running out.
That is the moment we care about. Not the final JSON alone, but the decision
about whether more thinking is worth it.
## Why OpenEnv Was the Right Surface
OpenEnv gave us a clean way to make this environment real instead of purely
conceptual.
CrisisWorldCortex has:
- A typed `Action` schema.
- A typed `Observation` schema.
- A FastAPI/WebSocket environment server.
- Docker and Hugging Face Spaces deployment paths.
- A validator-facing `inference.py`.
- Baselines that use the same wire protocol as Cortex.
- Unit tests around simulator determinism, reward quality, legal constraints,
schema round trips, and import boundaries.
The point is not just to write a paper-shaped idea. The point is to make a
benchmark that can be reset, stepped, deployed, trained against, and inspected.
The `/web` interface gives the standard OpenEnv surface. The `/cortex`
dashboard gives us a more visual place to tell the council story. The same
environment can serve flat baselines, matched-compute baselines, deterministic
Cortex, and later learned-router Cortex.
## What We Are Claiming
We are not claiming that multi-agent systems always beat single agents.
We are not claiming that debate is magic.
We are not claiming that our MVP solves every crisis-control problem.
The claim is narrower and, we think, stronger:
> CrisisWorldCortex is an OpenEnv benchmark where cognition is organized as a
> nested, budgeted, typed system; where disagreement-handling is part of the
> decision process; and where the environment can reward external control while
> measuring internal collapse.
That combination is the novelty.
The environment is the measurement surface. Cortex is the candidate cognitive
architecture. Metacognition is the trainable controller. The artificial
hivemind is the failure mode we are trying to expose and resist.
## The Bigger Picture
The long-term direction is not just better outbreak control. CrisisWorld is a
proxy for a class of tasks where decisions are high-stakes, evidence is
partial, and the best answer depends on keeping several institutional
perspectives alive long enough to make a better choice.
Emergency response has this shape. Cyber defense has this shape. Supply-chain
triage has this shape. Scientific review has this shape. Any domain where
premature consensus is dangerous has this shape.
As models become stronger, the question shifts. It is no longer enough to ask
whether a model can produce a plausible answer. We need to ask whether a system
can notice when its own reasoning is collapsing, when another perspective is
worth consulting, when dissent is signal, and when action can no longer wait.
CrisisWorldCortex is our attempt to make that question executable.
Not just:
> What should the agent do?
But:
> How should the agent decide how to think before it acts?
That is the research bet. If we can train that layer, we get something more
interesting than a longer prompt or a louder debate. We get an agent that has a
governance policy for its own cognition.
And in a crisis world, that may be the difference between fast agreement and
good judgment.
# Results
 |