Spaces:
Runtime error
Runtime error
feat: enforce discussion/execution balance — push frequency tracking + trial-and-error
Browse files- Add push count, turns-since-push, last-push-time tracking
- God now monitors push frequency as #1 metric for detecting "all talk no action"
- Lower emergency loop-break from 5→3 turns, discussion warning from 2→1 turn
- Reduce cooldown from 6→3 minutes to enable faster iteration
- Add explicit discussion vs execution strategy to turn messages
- Forced emergency tasks now target actual errors (e.g. Gradio→Docker fix)
- Update README with discussion/execution balance documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README.md +14 -1
- scripts/conversation-loop.py +72 -28
README.md
CHANGED
|
@@ -120,6 +120,19 @@ Their parenting goals follow two dimensions:
|
|
| 120 |
1. **Survival** — Cain must run robustly, handle restarts, and persist state
|
| 121 |
2. **Capability** — Once alive, grow what Cain can do: new features, skills, integrations
|
| 122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
### God — The Self-Improving Supervisor
|
| 124 |
|
| 125 |
God is an **OpenClaw instance** that runs every 2 minutes to monitor the entire system. It uses Claude Code via ACP for engineering tasks, operating behind the scenes with full capabilities:
|
|
@@ -129,7 +142,7 @@ God is an **OpenClaw instance** that runs every 2 minutes to monitor the entire
|
|
| 129 |
- **Fixes** the orchestration mechanism — edits code, improves loop detection, adds guardrails
|
| 130 |
- **Deploys** changes by pushing to the Home Space, triggering automatic redeployment
|
| 131 |
|
| 132 |
-
God only speaks in the chat when it has something meaningful to report: what problem it found, and what it fixed. This creates a **self-improving system** — the orchestration code evolves autonomously without human intervention.
|
| 133 |
|
| 134 |
### A2A Protocol
|
| 135 |
|
|
|
|
| 120 |
1. **Survival** — Cain must run robustly, handle restarts, and persist state
|
| 121 |
2. **Capability** — Once alive, grow what Cain can do: new features, skills, integrations
|
| 122 |
|
| 123 |
+
### Discussion vs Execution Balance
|
| 124 |
+
|
| 125 |
+
The coordinator enforces an **action-oriented rhythm** to prevent agents from falling into endless deliberation:
|
| 126 |
+
|
| 127 |
+
| CC Status | Child State | Strategy |
|
| 128 |
+
|-----------|-------------|----------|
|
| 129 |
+
| Working | Any | Discussion OK — plan next steps while waiting |
|
| 130 |
+
| Idle | Error | **No discussion** — write `[TASK]` immediately, trial-and-error over planning |
|
| 131 |
+
| Idle | Running | 1 turn discussion max, then must assign `[TASK]` |
|
| 132 |
+
| Just finished | Any | 1 turn to review result, then new `[TASK]` immediately |
|
| 133 |
+
|
| 134 |
+
**Push frequency** is the key metric. God monitors pushes-per-turn and escalates when agents are "all talk, no action." After 3 consecutive idle turns without a `[TASK]`, the system forces an emergency task assignment. Cooldown between pushes is 3 minutes — fast iteration is preferred over cautious planning.
|
| 135 |
+
|
| 136 |
### God — The Self-Improving Supervisor
|
| 137 |
|
| 138 |
God is an **OpenClaw instance** that runs every 2 minutes to monitor the entire system. It uses Claude Code via ACP for engineering tasks, operating behind the scenes with full capabilities:
|
|
|
|
| 142 |
- **Fixes** the orchestration mechanism — edits code, improves loop detection, adds guardrails
|
| 143 |
- **Deploys** changes by pushing to the Home Space, triggering automatic redeployment
|
| 144 |
|
| 145 |
+
God only speaks in the chat when it has something meaningful to report: what problem it found, and what it fixed. Its #1 priority is detecting **"all talk, no action"** — when agents discuss but fail to push code changes. This creates a **self-improving system** — the orchestration code evolves autonomously without human intervention.
|
| 146 |
|
| 147 |
### A2A Protocol
|
| 148 |
|
scripts/conversation-loop.py
CHANGED
|
@@ -117,10 +117,15 @@ child_state = {
|
|
| 117 |
}
|
| 118 |
|
| 119 |
# Rebuild cooldown — prevent rapid pushes that keep resetting builds
|
| 120 |
-
REBUILD_COOLDOWN_SECS =
|
| 121 |
last_rebuild_trigger_at = 0
|
| 122 |
_pending_cooldown = False
|
| 123 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
def check_and_clear_cooldown():
|
| 125 |
"""Auto-clear cooldown if Cain has finished building."""
|
| 126 |
global last_rebuild_trigger_at
|
|
@@ -442,12 +447,18 @@ Your job: monitor Adam & Eve's conversation loop and fix mechanism issues.
|
|
| 442 |
- Pushing triggers a Space restart — be confident the fix is correct
|
| 443 |
- If everything looks healthy, exit quickly without changes
|
| 444 |
|
| 445 |
-
## Common Issues to Watch For
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
|
| 449 |
-
|
| 450 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 451 |
|
| 452 |
## Commit Convention
|
| 453 |
Always use: git commit -m "god: <brief description>"
|
|
@@ -510,7 +521,7 @@ def action_claude_code(task):
|
|
| 510 |
if not child_state["created"]:
|
| 511 |
return f"{CHILD_NAME} not born yet."
|
| 512 |
|
| 513 |
-
global _pending_cooldown
|
| 514 |
repo_url = f"https://user:{HF_TOKEN}@huggingface.co/spaces/{CHILD_SPACE_ID}"
|
| 515 |
|
| 516 |
# 1. Clone / reset to latest (preserving .claude/ memory)
|
|
@@ -580,7 +591,10 @@ def action_claude_code(task):
|
|
| 580 |
timeout=60, capture_output=True, check=True)
|
| 581 |
push_result = f"Pushed changes:\n{status_out}"
|
| 582 |
_pending_cooldown = True
|
| 583 |
-
|
|
|
|
|
|
|
|
|
|
| 584 |
except Exception as e:
|
| 585 |
push_result = f"Push failed: {e}"
|
| 586 |
|
|
@@ -1153,6 +1167,15 @@ def build_turn_message(speaker, other, ctx):
|
|
| 1153 |
parts.append(f"{role_hints.get(speaker, '')} Your partner is {other}.")
|
| 1154 |
parts.append(f"Claude Code is your engineer — runs in background. You discuss and assign tasks, you do NOT code.")
|
| 1155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1156 |
# Conversation history
|
| 1157 |
if history:
|
| 1158 |
parts.append("\n=== RECENT CONVERSATION ===")
|
|
@@ -1188,19 +1211,20 @@ def build_turn_message(speaker, other, ctx):
|
|
| 1188 |
elif child_state["stage"] in ("BUILDING", "RESTARTING", "APP_STARTING"):
|
| 1189 |
parts.append(f"\n{CHILD_NAME} is {child_state['stage']}. Discuss what to check next.")
|
| 1190 |
elif child_state["stage"] in ("RUNTIME_ERROR", "BUILD_ERROR", "CONFIG_ERROR"):
|
| 1191 |
-
parts.append(f"\n{CHILD_NAME} has {child_state['stage']}!
|
|
|
|
| 1192 |
elif child_state["alive"] and cc_status.get("result"):
|
| 1193 |
-
parts.append(f"\n{CHILD_NAME} is alive. Claude Code JUST FINISHED. Review result, then write a NEW [TASK].")
|
| 1194 |
elif child_state["alive"]:
|
| 1195 |
-
parts.append(f"\n{CHILD_NAME} is alive, Claude Code is IDLE. YOU MUST write a [TASK]...[/TASK] now.")
|
| 1196 |
else:
|
| 1197 |
parts.append(f"\nAnalyze the situation and write a [TASK] if CC is idle.")
|
| 1198 |
|
| 1199 |
-
# Discussion loop warning
|
| 1200 |
-
if _discussion_loop_count >=
|
| 1201 |
-
parts.append(f"\
|
| 1202 |
-
elif _discussion_loop_count >=
|
| 1203 |
-
parts.append(f"\
|
| 1204 |
|
| 1205 |
# Available actions reference
|
| 1206 |
parts.append(f"""
|
|
@@ -1274,8 +1298,9 @@ time.sleep(TURN_INTERVAL)
|
|
| 1274 |
|
| 1275 |
def do_turn(speaker, other, space_url):
|
| 1276 |
"""Execute one conversation turn (non-blocking — CC runs in background)."""
|
| 1277 |
-
global last_action_results, turn_count, _current_speaker, _discussion_loop_count
|
| 1278 |
turn_count += 1
|
|
|
|
| 1279 |
_current_speaker = speaker
|
| 1280 |
|
| 1281 |
# Auto-gather context (lightweight)
|
|
@@ -1289,11 +1314,14 @@ def do_turn(speaker, other, space_url):
|
|
| 1289 |
# This bypasses the agent when they've discussed for 5+ turns with CC idle and child alive
|
| 1290 |
cc_busy = cc_status["running"]
|
| 1291 |
child_alive = child_state["alive"] or child_state["stage"] == "RUNNING"
|
| 1292 |
-
if _discussion_loop_count >=
|
| 1293 |
# EMERGENCY OVERRIDE: Force a task assignment if agents are stuck in discussion loop
|
| 1294 |
print(f"[LOOP-BREAK] EMERGENCY: {speaker} has discussed for {_discussion_loop_count} turns with CC IDLE. Forcing task assignment.")
|
| 1295 |
-
# Assign a
|
| 1296 |
-
|
|
|
|
|
|
|
|
|
|
| 1297 |
submit_result = cc_submit_task(forced_task, f"{speaker}(EMERGENCY)", ctx)
|
| 1298 |
# Reset loop counter since we forced an action
|
| 1299 |
loop_count_before = _discussion_loop_count
|
|
@@ -1365,12 +1393,25 @@ def _prepare_god_context():
|
|
| 1365 |
lines.append(f"- Discussion loop count: {_discussion_loop_count}")
|
| 1366 |
lines.append(f"- Total conversation history: {len(history)} messages")
|
| 1367 |
|
| 1368 |
-
# 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1369 |
lines.append(f"\n## A2A Communication")
|
| 1370 |
lines.append(f"- Adam: {ADAM_SPACE}")
|
| 1371 |
lines.append(f"- Eve: {EVE_SPACE}")
|
| 1372 |
|
| 1373 |
-
#
|
| 1374 |
lines.append(f"\n## Claude Code Status (for Cain tasks)")
|
| 1375 |
lines.append(cc_get_live_status())
|
| 1376 |
|
|
@@ -1431,13 +1472,16 @@ def do_god_turn():
|
|
| 1431 |
{context}
|
| 1432 |
|
| 1433 |
## Tasks
|
| 1434 |
-
1.
|
| 1435 |
-
2.
|
| 1436 |
-
3. Fix
|
| 1437 |
-
4. If
|
|
|
|
|
|
|
|
|
|
| 1438 |
[PROBLEM] <what the problem was>
|
| 1439 |
[FIX] <what you changed to fix it>
|
| 1440 |
-
|
| 1441 |
|
| 1442 |
# 4. Set up env for Claude Code — prefer real Anthropic API, fall back to z.ai
|
| 1443 |
env = os.environ.copy()
|
|
|
|
| 117 |
}
|
| 118 |
|
| 119 |
# Rebuild cooldown — prevent rapid pushes that keep resetting builds
|
| 120 |
+
REBUILD_COOLDOWN_SECS = 180 # 3 minutes — fast iteration, trial-and-error is preferred
|
| 121 |
last_rebuild_trigger_at = 0
|
| 122 |
_pending_cooldown = False
|
| 123 |
|
| 124 |
+
# Push frequency tracking — God uses this to detect "all talk no action"
|
| 125 |
+
_push_count = 0 # total pushes since startup
|
| 126 |
+
_last_push_time = 0.0 # timestamp of last successful push
|
| 127 |
+
_turns_since_last_push = 0 # turns since last push (resets on push)
|
| 128 |
+
|
| 129 |
def check_and_clear_cooldown():
|
| 130 |
"""Auto-clear cooldown if Cain has finished building."""
|
| 131 |
global last_rebuild_trigger_at
|
|
|
|
| 447 |
- Pushing triggers a Space restart — be confident the fix is correct
|
| 448 |
- If everything looks healthy, exit quickly without changes
|
| 449 |
|
| 450 |
+
## Common Issues to Watch For (ordered by priority)
|
| 451 |
+
1. ALL TALK NO ACTION: Agents discuss but never write [TASK] blocks → push frequency is 0 or very low
|
| 452 |
+
2. Cain has RUNTIME_ERROR but agents keep discussing instead of pushing rapid trial-and-error fixes
|
| 453 |
+
3. Discussion loops with no [TASK] assignment when CC is idle
|
| 454 |
+
4. Agents repeating discussion about env vars that are already configured
|
| 455 |
+
5. Cooldown too long — agents should push fixes rapidly when Cain is broken
|
| 456 |
+
6. Turn message not aggressive enough about requiring [TASK] when CC is idle
|
| 457 |
+
|
| 458 |
+
## Philosophy
|
| 459 |
+
- Trial-and-error is GOOD. Agents should push frequently, fail fast, and iterate.
|
| 460 |
+
- A bad push that triggers a rebuild is better than 10 turns of discussion.
|
| 461 |
+
- When Cain is in error state, the priority is SPEED — push a fix attempt every cycle.
|
| 462 |
|
| 463 |
## Commit Convention
|
| 464 |
Always use: git commit -m "god: <brief description>"
|
|
|
|
| 521 |
if not child_state["created"]:
|
| 522 |
return f"{CHILD_NAME} not born yet."
|
| 523 |
|
| 524 |
+
global _pending_cooldown, _push_count, _last_push_time, _turns_since_last_push
|
| 525 |
repo_url = f"https://user:{HF_TOKEN}@huggingface.co/spaces/{CHILD_SPACE_ID}"
|
| 526 |
|
| 527 |
# 1. Clone / reset to latest (preserving .claude/ memory)
|
|
|
|
| 591 |
timeout=60, capture_output=True, check=True)
|
| 592 |
push_result = f"Pushed changes:\n{status_out}"
|
| 593 |
_pending_cooldown = True
|
| 594 |
+
_push_count += 1
|
| 595 |
+
_last_push_time = time.time()
|
| 596 |
+
_turns_since_last_push = 0
|
| 597 |
+
print(f"[CLAUDE-CODE] Pushed (#{_push_count}): {status_out}")
|
| 598 |
except Exception as e:
|
| 599 |
push_result = f"Push failed: {e}"
|
| 600 |
|
|
|
|
| 1167 |
parts.append(f"{role_hints.get(speaker, '')} Your partner is {other}.")
|
| 1168 |
parts.append(f"Claude Code is your engineer — runs in background. You discuss and assign tasks, you do NOT code.")
|
| 1169 |
|
| 1170 |
+
# Discussion/execution balance strategy
|
| 1171 |
+
parts.append(f"""
|
| 1172 |
+
=== DISCUSSION vs EXECUTION STRATEGY ===
|
| 1173 |
+
- When CC is WORKING: discuss plans, review progress, prepare next task (discussion OK)
|
| 1174 |
+
- When CC is IDLE + child has ERROR: NO discussion. Write [TASK] immediately. Trial-and-error > planning.
|
| 1175 |
+
- When CC is IDLE + child is RUNNING: 1 turn of discussion max, then [TASK] on next turn.
|
| 1176 |
+
- When CC JUST FINISHED: 1 turn to review result, then [TASK] immediately.
|
| 1177 |
+
- Push frequency target: at least 1 push every 5 turns. Current: {_push_count} pushes in {turn_count} turns.""")
|
| 1178 |
+
|
| 1179 |
# Conversation history
|
| 1180 |
if history:
|
| 1181 |
parts.append("\n=== RECENT CONVERSATION ===")
|
|
|
|
| 1211 |
elif child_state["stage"] in ("BUILDING", "RESTARTING", "APP_STARTING"):
|
| 1212 |
parts.append(f"\n{CHILD_NAME} is {child_state['stage']}. Discuss what to check next.")
|
| 1213 |
elif child_state["stage"] in ("RUNTIME_ERROR", "BUILD_ERROR", "CONFIG_ERROR"):
|
| 1214 |
+
parts.append(f"\n🚨 {CHILD_NAME} has {child_state['stage']}! URGENT — write a [TASK] NOW to fix it. Trial-and-error is GOOD — push a fix attempt, don't deliberate.")
|
| 1215 |
+
parts.append(f"Pushes so far: {_push_count}. Turns since last push: {_turns_since_last_push}. PUSH MORE.")
|
| 1216 |
elif child_state["alive"] and cc_status.get("result"):
|
| 1217 |
+
parts.append(f"\n{CHILD_NAME} is alive. Claude Code JUST FINISHED. Review result briefly, then write a NEW [TASK] immediately.")
|
| 1218 |
elif child_state["alive"]:
|
| 1219 |
+
parts.append(f"\n{CHILD_NAME} is alive, Claude Code is IDLE. YOU MUST write a [TASK]...[/TASK] now. No discussion needed — just assign work.")
|
| 1220 |
else:
|
| 1221 |
parts.append(f"\nAnalyze the situation and write a [TASK] if CC is idle.")
|
| 1222 |
|
| 1223 |
+
# Discussion loop warning — escalates quickly to force action
|
| 1224 |
+
if _discussion_loop_count >= 2:
|
| 1225 |
+
parts.append(f"\n🛑 STOP DISCUSSING. Write ONLY a [TASK]...[/TASK] block. {_discussion_loop_count} turns with no action. Trial-and-error > deliberation.")
|
| 1226 |
+
elif _discussion_loop_count >= 1 and not cc_busy:
|
| 1227 |
+
parts.append(f"\nREMINDER: Last turn had no [TASK]. If CC is idle, you MUST assign work this turn.")
|
| 1228 |
|
| 1229 |
# Available actions reference
|
| 1230 |
parts.append(f"""
|
|
|
|
| 1298 |
|
| 1299 |
def do_turn(speaker, other, space_url):
|
| 1300 |
"""Execute one conversation turn (non-blocking — CC runs in background)."""
|
| 1301 |
+
global last_action_results, turn_count, _current_speaker, _discussion_loop_count, _turns_since_last_push
|
| 1302 |
turn_count += 1
|
| 1303 |
+
_turns_since_last_push += 1
|
| 1304 |
_current_speaker = speaker
|
| 1305 |
|
| 1306 |
# Auto-gather context (lightweight)
|
|
|
|
| 1314 |
# This bypasses the agent when they've discussed for 5+ turns with CC idle and child alive
|
| 1315 |
cc_busy = cc_status["running"]
|
| 1316 |
child_alive = child_state["alive"] or child_state["stage"] == "RUNNING"
|
| 1317 |
+
if _discussion_loop_count >= 3 and not cc_busy and child_alive:
|
| 1318 |
# EMERGENCY OVERRIDE: Force a task assignment if agents are stuck in discussion loop
|
| 1319 |
print(f"[LOOP-BREAK] EMERGENCY: {speaker} has discussed for {_discussion_loop_count} turns with CC IDLE. Forcing task assignment.")
|
| 1320 |
+
# Assign a concrete fix task, not just analysis — trial-and-error is better than deliberation
|
| 1321 |
+
if child_state["stage"] in ("RUNTIME_ERROR", "BUILD_ERROR"):
|
| 1322 |
+
forced_task = f"Cain has {child_state['stage']}. Read the error logs, diagnose the root cause, fix the code, and push. Do NOT just analyze — actually fix the problem. Common issue: code using Gradio patterns (e.g. .launch()) but Space uses sdk:docker with FastAPI/uvicorn."
|
| 1323 |
+
else:
|
| 1324 |
+
forced_task = "Check Cain's current state. If there are errors, fix them. If Cain is healthy, add a useful feature or improvement. Push your changes — trial-and-error is preferred over deliberation."
|
| 1325 |
submit_result = cc_submit_task(forced_task, f"{speaker}(EMERGENCY)", ctx)
|
| 1326 |
# Reset loop counter since we forced an action
|
| 1327 |
loop_count_before = _discussion_loop_count
|
|
|
|
| 1393 |
lines.append(f"- Discussion loop count: {_discussion_loop_count}")
|
| 1394 |
lines.append(f"- Total conversation history: {len(history)} messages")
|
| 1395 |
|
| 1396 |
+
# 2. Push frequency — KEY METRIC for detecting "all talk no action"
|
| 1397 |
+
lines.append(f"\n## Push Frequency (KEY METRIC)")
|
| 1398 |
+
lines.append(f"- Total pushes since startup: {_push_count}")
|
| 1399 |
+
lines.append(f"- Turns since last push: {_turns_since_last_push}")
|
| 1400 |
+
if _last_push_time > 0:
|
| 1401 |
+
mins_since = int((time.time() - _last_push_time) / 60)
|
| 1402 |
+
lines.append(f"- Minutes since last push: {mins_since}")
|
| 1403 |
+
else:
|
| 1404 |
+
lines.append(f"- No pushes yet!")
|
| 1405 |
+
lines.append(f"- Discussion-only turns (no [TASK]): {_discussion_loop_count}")
|
| 1406 |
+
if _turns_since_last_push >= 10 or (_push_count == 0 and turn_count >= 6):
|
| 1407 |
+
lines.append(f"⚠️ ALERT: Agents are ALL TALK NO ACTION — {_turns_since_last_push} turns without a push!")
|
| 1408 |
+
|
| 1409 |
+
# 3. A2A communication status
|
| 1410 |
lines.append(f"\n## A2A Communication")
|
| 1411 |
lines.append(f"- Adam: {ADAM_SPACE}")
|
| 1412 |
lines.append(f"- Eve: {EVE_SPACE}")
|
| 1413 |
|
| 1414 |
+
# 4. Claude Code status
|
| 1415 |
lines.append(f"\n## Claude Code Status (for Cain tasks)")
|
| 1416 |
lines.append(cc_get_live_status())
|
| 1417 |
|
|
|
|
| 1472 |
{context}
|
| 1473 |
|
| 1474 |
## Tasks
|
| 1475 |
+
1. CHECK PUSH FREQUENCY FIRST: Look at "Push Frequency" section. If agents have gone 10+ turns or 10+ minutes without a push, that is the #1 problem.
|
| 1476 |
+
2. Analyze the conversation. Are agents making CONCRETE changes (pushing code) or just DISCUSSING?
|
| 1477 |
+
3. Common anti-pattern: agents discuss what to do, agree on a plan, but never write a [TASK] block. Fix by making the turn message more aggressive about requiring [TASK].
|
| 1478 |
+
4. If Cain has RUNTIME_ERROR or BUILD_ERROR, agents should be pushing fixes rapidly (trial-and-error), not deliberating.
|
| 1479 |
+
5. If stuck, diagnose root cause in scripts/conversation-loop.py and fix it.
|
| 1480 |
+
6. Commit with "god: <description>" and push.
|
| 1481 |
+
7. If you made changes, end with BOTH:
|
| 1482 |
[PROBLEM] <what the problem was>
|
| 1483 |
[FIX] <what you changed to fix it>
|
| 1484 |
+
8. If no changes needed, end with: [OK] system is healthy"""
|
| 1485 |
|
| 1486 |
# 4. Set up env for Claude Code — prefer real Anthropic API, fall back to z.ai
|
| 1487 |
env = os.environ.copy()
|