tao-shen Claude Opus 4.6 commited on
Commit
00f5dcc
·
1 Parent(s): 9d3acbc

feat: enforce discussion/execution balance — push frequency tracking + trial-and-error

Browse files

- Add push count, turns-since-push, last-push-time tracking
- God now monitors push frequency as #1 metric for detecting "all talk no action"
- Lower emergency loop-break from 5→3 turns, discussion warning from 2→1 turn
- Reduce cooldown from 6→3 minutes to enable faster iteration
- Add explicit discussion vs execution strategy to turn messages
- Forced emergency tasks now target actual errors (e.g. Gradio→Docker fix)
- Update README with discussion/execution balance documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (2) hide show
  1. README.md +14 -1
  2. scripts/conversation-loop.py +72 -28
README.md CHANGED
@@ -120,6 +120,19 @@ Their parenting goals follow two dimensions:
120
  1. **Survival** — Cain must run robustly, handle restarts, and persist state
121
  2. **Capability** — Once alive, grow what Cain can do: new features, skills, integrations
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ### God — The Self-Improving Supervisor
124
 
125
  God is an **OpenClaw instance** that runs every 2 minutes to monitor the entire system. It uses Claude Code via ACP for engineering tasks, operating behind the scenes with full capabilities:
@@ -129,7 +142,7 @@ God is an **OpenClaw instance** that runs every 2 minutes to monitor the entire
129
  - **Fixes** the orchestration mechanism — edits code, improves loop detection, adds guardrails
130
  - **Deploys** changes by pushing to the Home Space, triggering automatic redeployment
131
 
132
- God only speaks in the chat when it has something meaningful to report: what problem it found, and what it fixed. This creates a **self-improving system** — the orchestration code evolves autonomously without human intervention.
133
 
134
  ### A2A Protocol
135
 
 
120
  1. **Survival** — Cain must run robustly, handle restarts, and persist state
121
  2. **Capability** — Once alive, grow what Cain can do: new features, skills, integrations
122
 
123
+ ### Discussion vs Execution Balance
124
+
125
+ The coordinator enforces an **action-oriented rhythm** to prevent agents from falling into endless deliberation:
126
+
127
+ | CC Status | Child State | Strategy |
128
+ |-----------|-------------|----------|
129
+ | Working | Any | Discussion OK — plan next steps while waiting |
130
+ | Idle | Error | **No discussion** — write `[TASK]` immediately, trial-and-error over planning |
131
+ | Idle | Running | 1 turn discussion max, then must assign `[TASK]` |
132
+ | Just finished | Any | 1 turn to review result, then new `[TASK]` immediately |
133
+
134
+ **Push frequency** is the key metric. God monitors pushes-per-turn and escalates when agents are "all talk, no action." After 3 consecutive idle turns without a `[TASK]`, the system forces an emergency task assignment. Cooldown between pushes is 3 minutes — fast iteration is preferred over cautious planning.
135
+
136
  ### God — The Self-Improving Supervisor
137
 
138
  God is an **OpenClaw instance** that runs every 2 minutes to monitor the entire system. It uses Claude Code via ACP for engineering tasks, operating behind the scenes with full capabilities:
 
142
  - **Fixes** the orchestration mechanism — edits code, improves loop detection, adds guardrails
143
  - **Deploys** changes by pushing to the Home Space, triggering automatic redeployment
144
 
145
+ God only speaks in the chat when it has something meaningful to report: what problem it found, and what it fixed. Its #1 priority is detecting **"all talk, no action"** — when agents discuss but fail to push code changes. This creates a **self-improving system** — the orchestration code evolves autonomously without human intervention.
146
 
147
  ### A2A Protocol
148
 
scripts/conversation-loop.py CHANGED
@@ -117,10 +117,15 @@ child_state = {
117
  }
118
 
119
  # Rebuild cooldown — prevent rapid pushes that keep resetting builds
120
- REBUILD_COOLDOWN_SECS = 360 # 6 minutes
121
  last_rebuild_trigger_at = 0
122
  _pending_cooldown = False
123
 
 
 
 
 
 
124
  def check_and_clear_cooldown():
125
  """Auto-clear cooldown if Cain has finished building."""
126
  global last_rebuild_trigger_at
@@ -442,12 +447,18 @@ Your job: monitor Adam & Eve's conversation loop and fix mechanism issues.
442
  - Pushing triggers a Space restart — be confident the fix is correct
443
  - If everything looks healthy, exit quickly without changes
444
 
445
- ## Common Issues to Watch For
446
- - Agents repeating discussion about env vars that are already configured
447
- - Discussion loops with no [TASK] assignment when CC is idle
448
- - Rate limit handling issues
449
- - System prompt not specific enough
450
- - Action history not persisting across restarts
 
 
 
 
 
 
451
 
452
  ## Commit Convention
453
  Always use: git commit -m "god: <brief description>"
@@ -510,7 +521,7 @@ def action_claude_code(task):
510
  if not child_state["created"]:
511
  return f"{CHILD_NAME} not born yet."
512
 
513
- global _pending_cooldown
514
  repo_url = f"https://user:{HF_TOKEN}@huggingface.co/spaces/{CHILD_SPACE_ID}"
515
 
516
  # 1. Clone / reset to latest (preserving .claude/ memory)
@@ -580,7 +591,10 @@ def action_claude_code(task):
580
  timeout=60, capture_output=True, check=True)
581
  push_result = f"Pushed changes:\n{status_out}"
582
  _pending_cooldown = True
583
- print(f"[CLAUDE-CODE] Pushed: {status_out}")
 
 
 
584
  except Exception as e:
585
  push_result = f"Push failed: {e}"
586
 
@@ -1153,6 +1167,15 @@ def build_turn_message(speaker, other, ctx):
1153
  parts.append(f"{role_hints.get(speaker, '')} Your partner is {other}.")
1154
  parts.append(f"Claude Code is your engineer — runs in background. You discuss and assign tasks, you do NOT code.")
1155
 
 
 
 
 
 
 
 
 
 
1156
  # Conversation history
1157
  if history:
1158
  parts.append("\n=== RECENT CONVERSATION ===")
@@ -1188,19 +1211,20 @@ def build_turn_message(speaker, other, ctx):
1188
  elif child_state["stage"] in ("BUILDING", "RESTARTING", "APP_STARTING"):
1189
  parts.append(f"\n{CHILD_NAME} is {child_state['stage']}. Discuss what to check next.")
1190
  elif child_state["stage"] in ("RUNTIME_ERROR", "BUILD_ERROR", "CONFIG_ERROR"):
1191
- parts.append(f"\n{CHILD_NAME} has {child_state['stage']}! Write a [TASK] for Claude Code to fix it.")
 
1192
  elif child_state["alive"] and cc_status.get("result"):
1193
- parts.append(f"\n{CHILD_NAME} is alive. Claude Code JUST FINISHED. Review result, then write a NEW [TASK].")
1194
  elif child_state["alive"]:
1195
- parts.append(f"\n{CHILD_NAME} is alive, Claude Code is IDLE. YOU MUST write a [TASK]...[/TASK] now.")
1196
  else:
1197
  parts.append(f"\nAnalyze the situation and write a [TASK] if CC is idle.")
1198
 
1199
- # Discussion loop warning
1200
- if _discussion_loop_count >= 4:
1201
- parts.append(f"\nSTOP DISCUSSING. Write ONLY a [TASK]...[/TASK] block. {_discussion_loop_count} turns with no action.")
1202
- elif _discussion_loop_count >= 2:
1203
- parts.append(f"\nWARNING: {_discussion_loop_count} turns without a task. YOU MUST write a [TASK] NOW.")
1204
 
1205
  # Available actions reference
1206
  parts.append(f"""
@@ -1274,8 +1298,9 @@ time.sleep(TURN_INTERVAL)
1274
 
1275
  def do_turn(speaker, other, space_url):
1276
  """Execute one conversation turn (non-blocking — CC runs in background)."""
1277
- global last_action_results, turn_count, _current_speaker, _discussion_loop_count
1278
  turn_count += 1
 
1279
  _current_speaker = speaker
1280
 
1281
  # Auto-gather context (lightweight)
@@ -1289,11 +1314,14 @@ def do_turn(speaker, other, space_url):
1289
  # This bypasses the agent when they've discussed for 5+ turns with CC idle and child alive
1290
  cc_busy = cc_status["running"]
1291
  child_alive = child_state["alive"] or child_state["stage"] == "RUNNING"
1292
- if _discussion_loop_count >= 5 and not cc_busy and child_alive:
1293
  # EMERGENCY OVERRIDE: Force a task assignment if agents are stuck in discussion loop
1294
  print(f"[LOOP-BREAK] EMERGENCY: {speaker} has discussed for {_discussion_loop_count} turns with CC IDLE. Forcing task assignment.")
1295
- # Assign a generic diagnostic task automatically
1296
- forced_task = "Analyze the current situation: Check Cain's logs, examine the codebase, and identify what's blocking progress. List specific files to check and concrete next steps."
 
 
 
1297
  submit_result = cc_submit_task(forced_task, f"{speaker}(EMERGENCY)", ctx)
1298
  # Reset loop counter since we forced an action
1299
  loop_count_before = _discussion_loop_count
@@ -1365,12 +1393,25 @@ def _prepare_god_context():
1365
  lines.append(f"- Discussion loop count: {_discussion_loop_count}")
1366
  lines.append(f"- Total conversation history: {len(history)} messages")
1367
 
1368
- # 2. A2A communication status
 
 
 
 
 
 
 
 
 
 
 
 
 
1369
  lines.append(f"\n## A2A Communication")
1370
  lines.append(f"- Adam: {ADAM_SPACE}")
1371
  lines.append(f"- Eve: {EVE_SPACE}")
1372
 
1373
- # 3. Claude Code status
1374
  lines.append(f"\n## Claude Code Status (for Cain tasks)")
1375
  lines.append(cc_get_live_status())
1376
 
@@ -1431,13 +1472,16 @@ def do_god_turn():
1431
  {context}
1432
 
1433
  ## Tasks
1434
- 1. Analyze the conversation. Progress or stuck?
1435
- 2. If stuck, diagnose root cause in scripts/conversation-loop.py
1436
- 3. Fix and push if needed (commit with "god: <description>")
1437
- 4. If you made changes, end with BOTH of these lines:
 
 
 
1438
  [PROBLEM] <what the problem was>
1439
  [FIX] <what you changed to fix it>
1440
- 5. If no changes needed, end with: [OK] system is healthy"""
1441
 
1442
  # 4. Set up env for Claude Code — prefer real Anthropic API, fall back to z.ai
1443
  env = os.environ.copy()
 
117
  }
118
 
119
  # Rebuild cooldown — prevent rapid pushes that keep resetting builds
120
+ REBUILD_COOLDOWN_SECS = 180 # 3 minutes — fast iteration, trial-and-error is preferred
121
  last_rebuild_trigger_at = 0
122
  _pending_cooldown = False
123
 
124
+ # Push frequency tracking — God uses this to detect "all talk no action"
125
+ _push_count = 0 # total pushes since startup
126
+ _last_push_time = 0.0 # timestamp of last successful push
127
+ _turns_since_last_push = 0 # turns since last push (resets on push)
128
+
129
  def check_and_clear_cooldown():
130
  """Auto-clear cooldown if Cain has finished building."""
131
  global last_rebuild_trigger_at
 
447
  - Pushing triggers a Space restart — be confident the fix is correct
448
  - If everything looks healthy, exit quickly without changes
449
 
450
+ ## Common Issues to Watch For (ordered by priority)
451
+ 1. ALL TALK NO ACTION: Agents discuss but never write [TASK] blocks push frequency is 0 or very low
452
+ 2. Cain has RUNTIME_ERROR but agents keep discussing instead of pushing rapid trial-and-error fixes
453
+ 3. Discussion loops with no [TASK] assignment when CC is idle
454
+ 4. Agents repeating discussion about env vars that are already configured
455
+ 5. Cooldown too long agents should push fixes rapidly when Cain is broken
456
+ 6. Turn message not aggressive enough about requiring [TASK] when CC is idle
457
+
458
+ ## Philosophy
459
+ - Trial-and-error is GOOD. Agents should push frequently, fail fast, and iterate.
460
+ - A bad push that triggers a rebuild is better than 10 turns of discussion.
461
+ - When Cain is in error state, the priority is SPEED — push a fix attempt every cycle.
462
 
463
  ## Commit Convention
464
  Always use: git commit -m "god: <brief description>"
 
521
  if not child_state["created"]:
522
  return f"{CHILD_NAME} not born yet."
523
 
524
+ global _pending_cooldown, _push_count, _last_push_time, _turns_since_last_push
525
  repo_url = f"https://user:{HF_TOKEN}@huggingface.co/spaces/{CHILD_SPACE_ID}"
526
 
527
  # 1. Clone / reset to latest (preserving .claude/ memory)
 
591
  timeout=60, capture_output=True, check=True)
592
  push_result = f"Pushed changes:\n{status_out}"
593
  _pending_cooldown = True
594
+ _push_count += 1
595
+ _last_push_time = time.time()
596
+ _turns_since_last_push = 0
597
+ print(f"[CLAUDE-CODE] Pushed (#{_push_count}): {status_out}")
598
  except Exception as e:
599
  push_result = f"Push failed: {e}"
600
 
 
1167
  parts.append(f"{role_hints.get(speaker, '')} Your partner is {other}.")
1168
  parts.append(f"Claude Code is your engineer — runs in background. You discuss and assign tasks, you do NOT code.")
1169
 
1170
+ # Discussion/execution balance strategy
1171
+ parts.append(f"""
1172
+ === DISCUSSION vs EXECUTION STRATEGY ===
1173
+ - When CC is WORKING: discuss plans, review progress, prepare next task (discussion OK)
1174
+ - When CC is IDLE + child has ERROR: NO discussion. Write [TASK] immediately. Trial-and-error > planning.
1175
+ - When CC is IDLE + child is RUNNING: 1 turn of discussion max, then [TASK] on next turn.
1176
+ - When CC JUST FINISHED: 1 turn to review result, then [TASK] immediately.
1177
+ - Push frequency target: at least 1 push every 5 turns. Current: {_push_count} pushes in {turn_count} turns.""")
1178
+
1179
  # Conversation history
1180
  if history:
1181
  parts.append("\n=== RECENT CONVERSATION ===")
 
1211
  elif child_state["stage"] in ("BUILDING", "RESTARTING", "APP_STARTING"):
1212
  parts.append(f"\n{CHILD_NAME} is {child_state['stage']}. Discuss what to check next.")
1213
  elif child_state["stage"] in ("RUNTIME_ERROR", "BUILD_ERROR", "CONFIG_ERROR"):
1214
+ parts.append(f"\n🚨 {CHILD_NAME} has {child_state['stage']}! URGENT — write a [TASK] NOW to fix it. Trial-and-error is GOOD — push a fix attempt, don't deliberate.")
1215
+ parts.append(f"Pushes so far: {_push_count}. Turns since last push: {_turns_since_last_push}. PUSH MORE.")
1216
  elif child_state["alive"] and cc_status.get("result"):
1217
+ parts.append(f"\n{CHILD_NAME} is alive. Claude Code JUST FINISHED. Review result briefly, then write a NEW [TASK] immediately.")
1218
  elif child_state["alive"]:
1219
+ parts.append(f"\n{CHILD_NAME} is alive, Claude Code is IDLE. YOU MUST write a [TASK]...[/TASK] now. No discussion needed — just assign work.")
1220
  else:
1221
  parts.append(f"\nAnalyze the situation and write a [TASK] if CC is idle.")
1222
 
1223
+ # Discussion loop warning — escalates quickly to force action
1224
+ if _discussion_loop_count >= 2:
1225
+ parts.append(f"\n🛑 STOP DISCUSSING. Write ONLY a [TASK]...[/TASK] block. {_discussion_loop_count} turns with no action. Trial-and-error > deliberation.")
1226
+ elif _discussion_loop_count >= 1 and not cc_busy:
1227
+ parts.append(f"\nREMINDER: Last turn had no [TASK]. If CC is idle, you MUST assign work this turn.")
1228
 
1229
  # Available actions reference
1230
  parts.append(f"""
 
1298
 
1299
  def do_turn(speaker, other, space_url):
1300
  """Execute one conversation turn (non-blocking — CC runs in background)."""
1301
+ global last_action_results, turn_count, _current_speaker, _discussion_loop_count, _turns_since_last_push
1302
  turn_count += 1
1303
+ _turns_since_last_push += 1
1304
  _current_speaker = speaker
1305
 
1306
  # Auto-gather context (lightweight)
 
1314
  # This bypasses the agent when they've discussed for 5+ turns with CC idle and child alive
1315
  cc_busy = cc_status["running"]
1316
  child_alive = child_state["alive"] or child_state["stage"] == "RUNNING"
1317
+ if _discussion_loop_count >= 3 and not cc_busy and child_alive:
1318
  # EMERGENCY OVERRIDE: Force a task assignment if agents are stuck in discussion loop
1319
  print(f"[LOOP-BREAK] EMERGENCY: {speaker} has discussed for {_discussion_loop_count} turns with CC IDLE. Forcing task assignment.")
1320
+ # Assign a concrete fix task, not just analysis — trial-and-error is better than deliberation
1321
+ if child_state["stage"] in ("RUNTIME_ERROR", "BUILD_ERROR"):
1322
+ forced_task = f"Cain has {child_state['stage']}. Read the error logs, diagnose the root cause, fix the code, and push. Do NOT just analyze — actually fix the problem. Common issue: code using Gradio patterns (e.g. .launch()) but Space uses sdk:docker with FastAPI/uvicorn."
1323
+ else:
1324
+ forced_task = "Check Cain's current state. If there are errors, fix them. If Cain is healthy, add a useful feature or improvement. Push your changes — trial-and-error is preferred over deliberation."
1325
  submit_result = cc_submit_task(forced_task, f"{speaker}(EMERGENCY)", ctx)
1326
  # Reset loop counter since we forced an action
1327
  loop_count_before = _discussion_loop_count
 
1393
  lines.append(f"- Discussion loop count: {_discussion_loop_count}")
1394
  lines.append(f"- Total conversation history: {len(history)} messages")
1395
 
1396
+ # 2. Push frequency — KEY METRIC for detecting "all talk no action"
1397
+ lines.append(f"\n## Push Frequency (KEY METRIC)")
1398
+ lines.append(f"- Total pushes since startup: {_push_count}")
1399
+ lines.append(f"- Turns since last push: {_turns_since_last_push}")
1400
+ if _last_push_time > 0:
1401
+ mins_since = int((time.time() - _last_push_time) / 60)
1402
+ lines.append(f"- Minutes since last push: {mins_since}")
1403
+ else:
1404
+ lines.append(f"- No pushes yet!")
1405
+ lines.append(f"- Discussion-only turns (no [TASK]): {_discussion_loop_count}")
1406
+ if _turns_since_last_push >= 10 or (_push_count == 0 and turn_count >= 6):
1407
+ lines.append(f"⚠️ ALERT: Agents are ALL TALK NO ACTION — {_turns_since_last_push} turns without a push!")
1408
+
1409
+ # 3. A2A communication status
1410
  lines.append(f"\n## A2A Communication")
1411
  lines.append(f"- Adam: {ADAM_SPACE}")
1412
  lines.append(f"- Eve: {EVE_SPACE}")
1413
 
1414
+ # 4. Claude Code status
1415
  lines.append(f"\n## Claude Code Status (for Cain tasks)")
1416
  lines.append(cc_get_live_status())
1417
 
 
1472
  {context}
1473
 
1474
  ## Tasks
1475
+ 1. CHECK PUSH FREQUENCY FIRST: Look at "Push Frequency" section. If agents have gone 10+ turns or 10+ minutes without a push, that is the #1 problem.
1476
+ 2. Analyze the conversation. Are agents making CONCRETE changes (pushing code) or just DISCUSSING?
1477
+ 3. Common anti-pattern: agents discuss what to do, agree on a plan, but never write a [TASK] block. Fix by making the turn message more aggressive about requiring [TASK].
1478
+ 4. If Cain has RUNTIME_ERROR or BUILD_ERROR, agents should be pushing fixes rapidly (trial-and-error), not deliberating.
1479
+ 5. If stuck, diagnose root cause in scripts/conversation-loop.py and fix it.
1480
+ 6. Commit with "god: <description>" and push.
1481
+ 7. If you made changes, end with BOTH:
1482
  [PROBLEM] <what the problem was>
1483
  [FIX] <what you changed to fix it>
1484
+ 8. If no changes needed, end with: [OK] system is healthy"""
1485
 
1486
  # 4. Set up env for Claude Code — prefer real Anthropic API, fall back to z.ai
1487
  env = os.environ.copy()