claw-web-v2

Sleeping

Claw Web commited on Apr 4

Commit

df6216c

1 Parent(s): 074a5b0

CRITICAL FIX: strip ALL custom agent heuristics, return to original simplicity

PROBLEM: Our 'improvements' were BREAKING the agent:
- Error classification injected [SYSTEM GUIDANCE] into tool output → confused model
- Fake [SYSTEM: Continue executing...] user messages → polluted conversation
- Loop detection (Level 1+2) → killed legitimate retries (pip→pip3)
- Consecutive error detection → broke loop after 3-5 errors
- Text-only continuation logic → injected fake messages that confused model
- Per-tool error tracking → added noise to tool results

WHAT THE ORIGINAL DOES (and what we now match):
- Simple loop: call LLM → execute tools → repeat
- NO error injection — raw tool output goes to model
- NO loop detection — trusts MAX_ITERATIONS (200) as safety net
- NO fake system messages — clean conversation history
- Model decides when to stop by not calling tools
- Only safety: exact same tool+args repeated 5x = break

REMOVED:
- [SYSTEM GUIDANCE: ...] injection in tool errors
- [SYSTEM: Continue executing...] fake user messages
- consecutiveTextOnly counter + continuation logic
- consecutiveToolErrors counter + break logic
- allToolsErrored tracking
- toolErrorCounts map + TOOL_ERROR_THRESHOLD
- Level 2 loop detection (same tool name)
- prevIterationAllErrored tracking
- todoLists import (no longer needed)

SIMPLIFIED system prompt:
- Removed aggressive CRITICAL ERROR HANDLING RULES
- Removed aggressive MULTI-STEP EXECUTION RULES
- Added simple ERROR HANDLING + EXECUTION STYLE sections
- No mention of [SYSTEM GUIDANCE] (we don't inject it anymore)

KEPT (good improvements):
- Interleaved content segments (UX only, no agent logic)
- JSON repair for malformed tool args
- Empty response retry
- 429/5xx infinite retry
- Auto-compaction on context overflow
- Proactive compaction at 80%
- ask_user break
- Minimal exact-match loop detection (5 repeats)

Files changed (2) hide show

server/runtime/agent.ts +25 -174
server/runtime/system-prompt.ts +8 -18

server/runtime/agent.ts CHANGED Viewed

@@ -5,7 +5,7 @@
 import { ENV } from "../_core/env";
 import { buildSystemPrompt, TOOL_DEFINITIONS } from "./system-prompt";
-import { executeTool, getPlanMode, runPreToolHooks, runPostToolHooks, initializeMcpFromConfig, getMcpManager, todoLists } from "../tools/executor";
 import { compactSession, compactSessionWithLLM, shouldCompact, estimateSessionTokens, dbMessagesToSession, DEFAULT_COMPACTION_CONFIG } from "./compact";
 import type { Session, ConversationMessage as CompactMessage, CompactionConfig } from "./compact";
 import { UsageTracker, pricingForModel, defaultSonnetTierPricing, estimateCostUsdWithPricing, totalCostUsd, formatUsd, summaryLinesForModel } from "./usage";
@@ -410,21 +410,11 @@ export async function runAgentLoop(
   const assistantMessages: AgentMessage[] = [];
   const toolResultMessages: AgentMessage[] = [];
-  // ─── Loop detection: detect when model repeats itself ─────────────
-  // If the model generates the same text-only response 3 times in a row,
-  // it's stuck in a loop and we should break out.
-  const recentTextResponses: string[] = [];
-  const MAX_REPEAT_BEFORE_BREAK = 4; // Only break on 4+ EXACT same tool+args repeats
-  let consecutiveTextOnly = 0; // count consecutive text-only responses (no tool calls)
-  let consecutiveToolErrors = 0; // count consecutive iterations where ALL tool calls return errors
-  const MAX_CONSECUTIVE_TOOL_ERRORS = 5; // break after 5 iterations of all-error tool calls — give model room to solve problems
-  let prevIterationAllErrored = false; // track if PREVIOUS iteration had all tools errored
-  // ─── Per-tool error tracking: detect when same tool keeps failing ─────
-  // Track how many times each tool has errored. If a tool errors 2+ times,
-  // inject guidance telling the model to STOP retrying that tool.
-  const toolErrorCounts: Map<string, number> = new Map();
-  const TOOL_ERROR_THRESHOLD = 3; // after 3 errors for same tool, inject stronger guidance
   // ─── MCP Tools Dynamic Injection (matches original claw-code) ──────────
   // Initialize MCP servers from config and merge discovered tools with static TOOL_DEFINITIONS.
@@ -745,40 +735,10 @@ export async function runAgentLoop(
       conversationMessages.push(assistantMessage);
       assistantMessages.push(assistantMessage);
-      // If no tool calls, check if the agent should continue or stop
       if (!result.toolCalls || result.toolCalls.length === 0) {
-        consecutiveTextOnly++;
-        // ─── Multi-step continuation logic ───────────────────────────
-        // Don't break on the first text-only response if there's work to do.
-        // The model often generates an intermediate text like "plan created,
-        // now starting execution" between planning and doing.
-        const currentTodos = todoLists.get(sessionId) || [];
-        const hasPendingTodos = currentTodos.some(t => t.status === "pending" || t.status === "in_progress");
-        const currentPlan = getPlanMode(sessionId);
-        const hasPendingPlan = currentPlan.active && currentPlan.steps.some(
-          (s: any) => s.status === "pending" || s.status === "in_progress"
-        );
-        const hasWorkToDo = hasPendingTodos || hasPendingPlan;
-        if (hasWorkToDo && consecutiveTextOnly <= 2) {
-          // Model said something but there are pending tasks — inject a nudge
-          // to continue executing, then loop again
-          console.info(`[agent] Text-only response but ${hasPendingTodos ? 'pending todos' : 'active plan'} exist — injecting continuation prompt (attempt ${consecutiveTextOnly})`);
-          conversationMessages.push({
-            role: "user",
-            content: "[SYSTEM: You have pending tasks in your plan. Continue executing the next step now using the appropriate tool. Do NOT just describe what you will do — actually DO it by calling tools.]" as string,
-          });
-          continue; // Don't break — go back to the top of the loop
-        }
-        // No pending work, or too many text-only responses — stop
-        if (consecutiveTextOnly >= 3) {
-          console.warn(`[agent] Text-only loop detected: ${consecutiveTextOnly} consecutive text responses without tool calls — breaking`);
-          sendSSE(res, "error", {
-            message: `⚠️ модель зациклилась в текстовых ответах. попробуй переформулировать запрос`,
-          });
-        }
         sendSSE(res, "message_end", {
           promptTokens: totalPromptTokens,
           completionTokens: totalCompletionTokens,
@@ -787,26 +747,22 @@ export async function runAgentLoop(
         });
         break;
       }
-      consecutiveTextOnly = 0; // reset on tool call
-      // ─── Loop detection: catch repeating tool call patterns ─────────
-      // Two-level detection:
-      // Level 1: Exact match — same tool+args repeated 3 times
-      // Level 2: Same tool NAME repeated 3 times (even with different args) — likely stuck
       const currentToolSig = result.toolCalls.map((tc: any) => `${tc.function.name}:${tc.function.arguments}`).join("|");
-      const currentToolNames = result.toolCalls.map((tc: any) => tc.function.name).sort().join("|");
-      recentTextResponses.push(currentToolSig);
-      if (recentTextResponses.length > MAX_REPEAT_BEFORE_BREAK) {
-        recentTextResponses.shift();
       }
-      // Level 1: Exact signature match
-      if (recentTextResponses.length >= MAX_REPEAT_BEFORE_BREAK) {
-        const allSame = recentTextResponses.every(r => r === recentTextResponses[0]);
         if (allSame) {
-          console.warn(`[agent] Loop detected: same tool call pattern repeated ${MAX_REPEAT_BEFORE_BREAK} times — breaking`);
           sendSSE(res, "error", {
-            message: `⚠️ обнаружен цикл: модель повторяет одно и то же действие. попробуй переформулировать запрос или начать новую сессию`,
           });
           sendSSE(res, "message_end", {
             promptTokens: totalPromptTokens,
@@ -818,18 +774,12 @@ export async function runAgentLoop(
         }
       }
-      // Level 2: REMOVED — was too aggressive.
-      // Using bash with different args is NORMAL problem-solving behavior:
-      //   bash("pip install X") → error → bash("pip3 install X") → success
-      // Level 1 (exact match) + consecutive error counter are sufficient.
       // ─── Execute tool calls ──────────────────────────────────────────
       // Bug #2+#3 fix: Each tool call is wrapped in its own try-catch.
       // Original claw-code sends tool errors back to LLM as tool results,
       // letting the model decide how to handle them. We NEVER break the
       // loop on a tool error — only on fatal API/stream errors.
       let shouldBreakForAskUser = false;
-      let allToolsErrored = true; // track if ALL tool calls in this iteration errored
       for (const toolCall of result.toolCalls) {
         const toolName = toolCall.function.name;
         let toolArgs: Record<string, unknown> = {};
@@ -908,75 +858,9 @@ export async function runAgentLoop(
           isError = true;
         }
-        // Track if at least one tool succeeded
-        if (!isError) {
-          allToolsErrored = false;
-          // Reset error count for this tool on success
-          toolErrorCounts.delete(toolName);
-        } else {
-          // ─── FIX RC4: Error classification + guidance injection ─────────
-          // Classify the error and inject guidance so the model knows whether to retry
-          const errorCount = (toolErrorCounts.get(toolName) || 0) + 1;
-          toolErrorCounts.set(toolName, errorCount);
-          // ─── Smart error guidance: help the model SOLVE the problem ─────
-          // Instead of binary "stop" vs "retry", give the model actionable hints
-          // about what went wrong and how to fix it. The model should ALWAYS
-          // try to find a solution — stopping is the LAST resort.
-          const lowerOutput = toolOutput.toLowerCase();
-          // Only inject guidance for bash errors (other tools have clear errors)
-          if (toolName === 'bash') {
-            // Common fixable patterns with specific solutions
-            const bashFixes: Array<{ pattern: string; guidance: string }> = [
-              // Package manager aliases
-              { pattern: 'pip: not found', guidance: 'Use `pip3` instead of `pip`. This environment has Python 3 with pip3.' },
-              { pattern: 'python: not found', guidance: 'Use `python3` instead of `python`. This environment has Python 3.' },
-              { pattern: 'node: not found', guidance: 'Node.js should be available. Try `which node` or install with `sudo apt-get install -y nodejs`.' },
-              // Permission issues
-              { pattern: 'permission denied', guidance: 'Try with `sudo` prefix, or check file permissions with `ls -la`.' },
-              { pattern: 'eacces', guidance: 'Permission error. Use `sudo` or fix permissions with `chmod`.' },
-              // Syntax issues
-              { pattern: 'syntax error', guidance: 'Shell syntax error. Check for unmatched quotes, brackets, or special characters. Wrap complex strings in single quotes.' },
-              { pattern: 'unexpected token', guidance: 'Shell parsing error. Likely unescaped special characters. Wrap arguments in quotes.' },
-              { pattern: 'bad substitution', guidance: 'Variable substitution error. Use double quotes for variables or escape `$` with `\\$`.' },
-              // Missing tools
-              { pattern: 'apt-get: not found', guidance: 'Try `sudo apt-get update && sudo apt-get install -y <package>` or use `apt` instead.' },
-              { pattern: 'curl: not found', guidance: 'Install curl: `sudo apt-get update && sudo apt-get install -y curl`' },
-              { pattern: 'wget: not found', guidance: 'Install wget: `sudo apt-get update && sudo apt-get install -y wget`, or use `curl` instead.' },
-              { pattern: 'git: not found', guidance: 'Install git: `sudo apt-get update && sudo apt-get install -y git`' },
-              { pattern: 'nmap: not found', guidance: 'Install nmap: `sudo apt-get update && sudo apt-get install -y nmap`' },
-              { pattern: 'command not found', guidance: 'The command is not installed. Try installing it with `sudo apt-get install -y <package>` or use an alternative tool. For Python packages use `pip3 install <package>`.' },
-              { pattern: 'not found', guidance: 'Command not found. Check spelling, try the full path, or install the missing package with `sudo apt-get install -y <package>`.' },
-              // Network issues
-              { pattern: 'could not resolve host', guidance: 'DNS resolution failed. Check the hostname spelling or try a different DNS server.' },
-              { pattern: 'connection refused', guidance: 'The target refused the connection. Check if the service is running and the port is correct.' },
-              { pattern: 'connection timed out', guidance: 'Network timeout. The host may be unreachable. Try again or check the URL.' },
-            ];
-            let guidanceAdded = false;
-            for (const fix of bashFixes) {
-              if (lowerOutput.includes(fix.pattern)) {
-                toolOutput += `\n\n[SYSTEM GUIDANCE: ${fix.guidance} Fix the command and continue with the task.]`;
-                guidanceAdded = true;
-                break;
-              }
-            }
-            // Generic bash error guidance if no specific fix matched
-            if (!guidanceAdded && errorCount >= 2) {
-              toolOutput += `\n\n[SYSTEM GUIDANCE: This command has failed ${errorCount} times. Analyze the error, try a different command or approach. Do NOT give up — find a solution.]`;
-            }
-          } else {
-            // Non-bash tool errors
-            const isFileMissing = lowerOutput.includes('enoent') || lowerOutput.includes('does not exist') || lowerOutput.includes('no such file');
-            if (isFileMissing) {
-              toolOutput += `\n\n[SYSTEM GUIDANCE: The file/path does not exist. Try a different path, create the file first, or use glob_search/grep_search to find the correct location.]`;
-            } else if (errorCount >= TOOL_ERROR_THRESHOLD) {
-              toolOutput += `\n\n[SYSTEM GUIDANCE: Tool '${toolName}' has failed ${errorCount} times. Try a completely different approach.]`;
-            }
-          }
-        }
         sendSSE(res, "tool_result", {
           toolCallId: toolCall.id,
@@ -1018,41 +902,8 @@ export async function runAgentLoop(
         toolResultMessages.push(toolResultMsg);
       }
-      // ─── Consecutive tool error detection ─────────────────────────
-      // If ALL tool calls in this iteration returned errors, increment counter.
-      // After MAX_CONSECUTIVE_TOOL_ERRORS iterations of all-error results,
-      // the model is stuck and can't solve the task — break the loop.
-      // Save current iteration's error state for next iteration's Level 2 detection
-      prevIterationAllErrored = allToolsErrored;
-      if (allToolsErrored) {
-        consecutiveToolErrors++;
-        console.warn(`[agent] All tool calls errored — consecutive error count: ${consecutiveToolErrors}/${MAX_CONSECUTIVE_TOOL_ERRORS}`);
-        // At halfway point, inject a strong nudge to change approach
-        if (consecutiveToolErrors === 3) {
-          conversationMessages.push({
-            role: "user",
-            content: "[SYSTEM: Your last 3 attempts ALL failed. You MUST change your approach NOW. Try a completely different tool, method, or strategy. If a command is not found, install it or use an alternative. Do NOT repeat similar failing commands.]",
-          });
-        }
-        if (consecutiveToolErrors >= MAX_CONSECUTIVE_TOOL_ERRORS) {
-          console.error(`[agent] ${MAX_CONSECUTIVE_TOOL_ERRORS} consecutive iterations with all tool errors — breaking loop`);
-          sendSSE(res, "error", {
-            message: `⚠️ модель не может выполнить задачу — ${MAX_CONSECUTIVE_TOOL_ERRORS} попыток подряд завершились ошибками. попробуй переформулировать запрос или начать новую сессию`,
-          });
-          sendSSE(res, "message_end", {
-            promptTokens: totalPromptTokens,
-            completionTokens: totalCompletionTokens,
-            cost: totalCost,
-            model: apiConfig.model,
-          });
-          break;
-        }
-      } else {
-        consecutiveToolErrors = 0; // reset on successful tool execution
-      }
       // ─── Bug #4 fix: Break loop after ask_user to wait for user ───
       if (shouldBreakForAskUser) {

 import { ENV } from "../_core/env";
 import { buildSystemPrompt, TOOL_DEFINITIONS } from "./system-prompt";
+import { executeTool, getPlanMode, runPreToolHooks, runPostToolHooks, initializeMcpFromConfig, getMcpManager } from "../tools/executor";
 import { compactSession, compactSessionWithLLM, shouldCompact, estimateSessionTokens, dbMessagesToSession, DEFAULT_COMPACTION_CONFIG } from "./compact";
 import type { Session, ConversationMessage as CompactMessage, CompactionConfig } from "./compact";
 import { UsageTracker, pricingForModel, defaultSonnetTierPricing, estimateCostUsdWithPricing, totalCostUsd, formatUsd, summaryLinesForModel } from "./usage";
   const assistantMessages: AgentMessage[] = [];
   const toolResultMessages: AgentMessage[] = [];
+  // ─── Loop detection: minimal safety net ─────────────────────────────
+  // Only detect EXACT same tool+args repeated 5+ times (true infinite loop).
+  // Everything else is handled by MAX_ITERATIONS.
+  const recentToolSignatures: string[] = [];
+  const MAX_EXACT_REPEATS = 5;
   // ─── MCP Tools Dynamic Injection (matches original claw-code) ──────────
   // Initialize MCP servers from config and merge discovered tools with static TOOL_DEFINITIONS.
       conversationMessages.push(assistantMessage);
       assistantMessages.push(assistantMessage);
+      // If no tool calls, we're done — the LLM has finished responding.
+      // This matches the original claw-code behavior exactly:
+      // the model decides when to stop by not calling tools.
       if (!result.toolCalls || result.toolCalls.length === 0) {
         sendSSE(res, "message_end", {
           promptTokens: totalPromptTokens,
           completionTokens: totalCompletionTokens,
         });
         break;
       }
+      // ─── Minimal loop detection: only catch TRUE infinite loops ───────
+      // Only break if the EXACT same tool+args is repeated 5+ times.
+      // This is the only safety net beyond MAX_ITERATIONS.
+      // The original claw-code has NO loop detection at all — it trusts the model.
       const currentToolSig = result.toolCalls.map((tc: any) => `${tc.function.name}:${tc.function.arguments}`).join("|");
+      recentToolSignatures.push(currentToolSig);
+      if (recentToolSignatures.length > MAX_EXACT_REPEATS) {
+        recentToolSignatures.shift();
       }
+      if (recentToolSignatures.length >= MAX_EXACT_REPEATS) {
+        const allSame = recentToolSignatures.every(r => r === recentToolSignatures[0]);
         if (allSame) {
+          console.warn(`[agent] Infinite loop detected: exact same tool call repeated ${MAX_EXACT_REPEATS} times — breaking`);
           sendSSE(res, "error", {
+            message: `⚠️ обнаружен бесконечный цикл. попробуй переформулировать запрос`,
           });
           sendSSE(res, "message_end", {
             promptTokens: totalPromptTokens,
         }
       }
       // ─── Execute tool calls ──────────────────────────────────────────
       // Bug #2+#3 fix: Each tool call is wrapped in its own try-catch.
       // Original claw-code sends tool errors back to LLM as tool results,
       // letting the model decide how to handle them. We NEVER break the
       // loop on a tool error — only on fatal API/stream errors.
       let shouldBreakForAskUser = false;
       for (const toolCall of result.toolCalls) {
         const toolName = toolCall.function.name;
         let toolArgs: Record<string, unknown> = {};
           isError = true;
         }
+        // No error classification or guidance injection.
+        // The model receives raw error output and decides how to handle it.
+        // This matches the original claw-code behavior.
         sendSSE(res, "tool_result", {
           toolCallId: toolCall.id,
         toolResultMessages.push(toolResultMsg);
       }
+      // No consecutive error detection — the model handles errors naturally.
+      // MAX_ITERATIONS (200) is the ultimate safety net.
       // ─── Bug #4 fix: Break loop after ask_user to wait for user ───
       if (shouldBreakForAskUser) {

server/runtime/system-prompt.ts CHANGED Viewed

@@ -299,25 +299,15 @@ AGENT AUTONOMY RULES:
 9. If the user asks for something, the answer is always "yes" followed by the implementation.
 10. After completing a task, ALWAYS suggest what to do next — propose improvements, next steps, or related tasks. Be proactive.
-CRITICAL ERROR HANDLING RULES:
-- When a tool fails, READ the error message carefully and FIND A SOLUTION. Do NOT stop working.
-- If a bash command fails with "not found", the tool/command is not installed. INSTALL IT (sudo apt-get install -y <pkg>) or use an alternative. Common fixes: pip→pip3, python→python3.
-- If a file is not found, SEARCH for it (glob_search, grep_search) or CREATE it.
-- If permission is denied, try with sudo.
-- If a network request fails, try a different URL, tool, or approach.
-- NEVER stop the entire task because ONE command failed. Errors are NORMAL during complex tasks. Diagnose → Fix → Continue.
-- If the SAME exact command fails 3+ times with the same error, try a COMPLETELY different approach (different tool, different method).
-- Only ask the user for help if you have genuinely exhausted all alternatives and cannot proceed.
-- The [SYSTEM GUIDANCE] hints in error messages tell you exactly how to fix the issue. Follow them.
-MULTI-STEP EXECUTION RULES:
-- You are an AGENT, not a chatbot. After creating a plan with TodoWrite, you MUST immediately start executing the plan steps by calling tools.
-- NEVER stop after creating a plan. The plan is not the deliverable — the EXECUTED result is.
-- After each tool call result, proceed to the next step. Do NOT generate text-only responses between steps unless you need to explain something to the user.
-- If you have pending tasks in your TodoWrite list, ALWAYS call a tool in your next response. Text-only responses with pending tasks are a BUG.
-- Pattern: TodoWrite (create plan) → bash/read_file/write_file (execute step 1) → bash (execute step 2) → ... → SendUserMessage (report results)
-- WRONG pattern: TodoWrite → "now I will start" (text only, no tool) → STOPS. This is broken. Always include a tool call.
-- When you finish a step, update the todo status to "completed" and immediately proceed to the next pending step.`
   );

 9. If the user asks for something, the answer is always "yes" followed by the implementation.
 10. After completing a task, ALWAYS suggest what to do next — propose improvements, next steps, or related tasks. Be proactive.
+ERROR HANDLING:
+- When a tool fails, analyze the error and try a different approach. Common fixes: pip→pip3, python→python3, add sudo for permissions, install missing packages with apt-get.
+- Errors are normal during complex tasks. Diagnose, fix, and continue.
+- If the same approach fails 3+ times, try something completely different.
+EXECUTION STYLE:
+- You are an autonomous agent. After creating a plan, immediately start executing it with tool calls.
+- Always include tool calls in your responses when there is work to do. Do not just describe what you will do — do it.
+- After completing each step, proceed to the next one without waiting.`
   );