● running

Conflict Arbitration Environment

OpenEnv FastAPI GRPO Team WooshiWooshi

What this is

Three agents, one task, one conflict, one arbitrator. Two frozen worker agents build the same spec in parallel. A third agent (Agent C) sees both outputs plus the original spec, decides who drifted, and stops the wrong one before merge fails. Agent C is trained via GRPO on programmatic, contrastive rewards — no LLM as judge, no hardcoded rules.

Endpoints

GET /health	liveness check	returns status + env name
POST /reset	start a new episode	returns spec + Agent A/B outputs
POST /step	submit Agent C decision	returns reward + merge result
GET /state	full episode ground truth	logging/debug only
GET /docs	interactive OpenAPI UI	try every endpoint live

Quick test

curl https://testingaccc-conflict-arbitration-env.hf.space/health

curl -X POST https://testingaccc-conflict-arbitration-env.hf.space/reset

curl -X POST https://testingaccc-conflict-arbitration-env.hf.space/step \\
  -H "Content-Type: application/json" \\
  -d '{"conflict_detected": true, "action": "stop_a",
       "reason": "A drifted", "correction_request": "use canonical name"}'

Action schema

{
  "conflict_detected": true | false,
  "action": "stop_a" | "stop_b" | "nothing",
  "reason": "one sentence describing the conflict",
  "correction_request": "specific instruction to the stopped agent"
}