Spaces:

smolagents
/

ml-intern

Running on CPU Upgrade

Aksel Joonas Reedi commited on about 1 month ago

Commit

e8ed637

unverified ·

1 Parent(s): 07c5699

Keep repetition guard internal (#144)

* Clarify repetition guard messaging

The loop breaker is intentional recovery behavior, but the user-facing 'doom loop' phrasing reads like a crash. Rename the visible log and system hint to a repetition guard while preserving the existing detector behavior and historical SFT tag compatibility.

Constraint: Existing trajectories may still contain the old wording, so the tagger must recognize both labels.

Rejected: Remove the event entirely | users would lose visibility into why the agent changed strategy.

Confidence: high

Scope-risk: narrow

Directive: Keep user-facing recovery logs operational and non-alarming; reserve internal jargon for code, not UI events.

Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --extra dev pytest tests/unit/test_doom_loop_polling.py tests/unit/test_sft_tagger.py

* Hide repetition guard from user surfaces

The repetition guard is an internal control-flow intervention, not a status event users need to act on. Remove the CLI/frontend tool_log emissions while keeping the internal corrective prompt and logger warnings.

Constraint: Historical trajectories can still contain the old doom-loop log text, so the SFT tagger test keeps compatibility for existing data.

Rejected: Show a renamed repetition-guard event | the requested behavior is that this remains fully internal and invisible in CLI/frontend surfaces.

Confidence: high

Scope-risk: narrow

Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --extra dev pytest tests/unit/test_doom_loop_polling.py tests/unit/test_sft_tagger.py

* Remove dead repetition tag branch

Review caught that no user-visible tool_log will contain the new repetition-guard text after this PR. Keep historical doom-loop log compatibility only and avoid a dead future-facing branch.

Confidence: high

Scope-risk: narrow

Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --extra dev pytest tests/unit/test_doom_loop_polling.py tests/unit/test_sft_tagger.py

Files changed (5) hide show

agent/core/agent_loop.py +0 -9
agent/core/doom_loop.py +8 -4
agent/tools/research_tool.py +4 -2
tests/unit/test_doom_loop_polling.py +3 -3
tests/unit/test_sft_tagger.py +1 -1

agent/core/agent_loop.py CHANGED Viewed

@@ -681,15 +681,6 @@ class Handlers:
                 session.context_manager.add_message(
                     Message(role="user", content=doom_prompt)
                 )
-                await session.send_event(
-                    Event(
-                        event_type="tool_log",
-                        data={
-                            "tool": "system",
-                            "log": "Doom loop detected — injecting corrective prompt",
-                        },
-                    )
-                )
             malformed_tool = _detect_repeated_malformed(session.context_manager.items)
             if malformed_tool:

                 session.context_manager.add_message(
                     Message(role="user", content=doom_prompt)
                 )
             malformed_tool = _detect_repeated_malformed(session.context_manager.items)
             if malformed_tool:

agent/core/doom_loop.py CHANGED Viewed

@@ -156,9 +156,13 @@ def check_for_doom_loop(messages: list[Message]) -> str | None:
     # Check for identical consecutive calls
     tool_name = detect_identical_consecutive(signatures, threshold=3)
     if tool_name:
-        logger.warning("Doom loop detected: %d+ identical consecutive calls to '%s'", 3, tool_name)
         return (
-            f"[SYSTEM: DOOM LOOP DETECTED] You have called '{tool_name}' with the same "
             f"arguments multiple times in a row, getting the same result each time. "
             f"STOP repeating this approach — it is not working. "
             f"Step back and try a fundamentally different strategy. "
@@ -170,9 +174,9 @@ def check_for_doom_loop(messages: list[Message]) -> str | None:
     pattern = detect_repeating_sequence(signatures)
     if pattern:
         pattern_desc = " → ".join(s.name for s in pattern)
-        logger.warning("Doom loop detected: repeating sequence [%s]", pattern_desc)
         return (
-            f"[SYSTEM: DOOM LOOP DETECTED] You are stuck in a repeating cycle of tool calls: "
             f"[{pattern_desc}]. This pattern has repeated multiple times without progress. "
             f"STOP this cycle and try a fundamentally different approach. "
             f"Consider: breaking down the problem differently, using alternative tools, "

     # Check for identical consecutive calls
     tool_name = detect_identical_consecutive(signatures, threshold=3)
     if tool_name:
+        logger.warning(
+            "Repetition guard activated: %d+ identical consecutive calls to '%s'",
+            3,
+            tool_name,
+        )
         return (
+            f"[SYSTEM: REPETITION GUARD] You have called '{tool_name}' with the same "
             f"arguments multiple times in a row, getting the same result each time. "
             f"STOP repeating this approach — it is not working. "
             f"Step back and try a fundamentally different strategy. "
     pattern = detect_repeating_sequence(signatures)
     if pattern:
         pattern_desc = " → ".join(s.name for s in pattern)
+        logger.warning("Repetition guard activated: repeating sequence [%s]", pattern_desc)
         return (
+            f"[SYSTEM: REPETITION GUARD] You are stuck in a repeating cycle of tool calls: "
             f"[{pattern_desc}]. This pattern has repeated multiple times without progress. "
             f"STOP this cycle and try a fundamentally different approach. "
             f"Consider: breaking down the problem differently, using alternative tools, "

agent/tools/research_tool.py CHANGED Viewed

@@ -306,8 +306,10 @@ async def research_handler(
         # ── Doom-loop detection ──
         doom_prompt = check_for_doom_loop(messages)
         if doom_prompt:
-            logger.warning("Research sub-agent doom loop detected at iteration %d", _iteration)
-            await _log("Doom loop detected — injecting corrective prompt")
             messages.append(Message(role="user", content=doom_prompt))
         # ── Context budget: warn at 75%, hard-stop at 95% ──

         # ── Doom-loop detection ──
         doom_prompt = check_for_doom_loop(messages)
         if doom_prompt:
+            logger.warning(
+                "Research sub-agent repetition guard activated at iteration %d",
+                _iteration,
+            )
             messages.append(Message(role="user", content=doom_prompt))
         # ── Context budget: warn at 75%, hard-stop at 95% ──

tests/unit/test_doom_loop_polling.py CHANGED Viewed

@@ -5,7 +5,7 @@ Reproduces the failure mode in observatory sessions 40fcb414 ($32.59),
 long-running job with `bash sleep 300 && wc -l output` four times in a
 row. The arguments were byte-identical, but the results moved (27210 →
 36454 → 45770 → 55138 — actual progress). The detector hashed args only
-and false-fired DOOM LOOP, which made the agent abandon perfectly valid
 polling.
 After the fix the signature includes the tool result hash, so identical
@@ -66,7 +66,7 @@ def test_truly_stuck_polling_with_identical_results_still_fires():
     ]
     prompt = check_for_doom_loop(msgs)
     assert prompt is not None
-    assert "DOOM LOOP" in prompt
     assert "bash" in prompt
@@ -80,7 +80,7 @@ def test_identical_calls_with_no_results_yet_still_fires():
     ]
     prompt = check_for_doom_loop(msgs)
     assert prompt is not None
-    assert "DOOM LOOP" in prompt
     assert "write" in prompt

 long-running job with `bash sleep 300 && wc -l output` four times in a
 row. The arguments were byte-identical, but the results moved (27210 →
 36454 → 45770 → 55138 — actual progress). The detector hashed args only
+and false-fired the repetition guard, which made the agent abandon perfectly valid
 polling.
 After the fix the signature includes the tool result hash, so identical
     ]
     prompt = check_for_doom_loop(msgs)
     assert prompt is not None
+    assert "REPETITION GUARD" in prompt
     assert "bash" in prompt
     ]
     prompt = check_for_doom_loop(msgs)
     assert prompt is not None
+    assert "REPETITION GUARD" in prompt
     assert "write" in prompt

tests/unit/test_sft_tagger.py CHANGED Viewed

@@ -79,7 +79,7 @@ def test_outcome_ongoing():
 def test_outcome_doom_loop_and_context():
     events = [
-        _ev("tool_log", {"tool": "system", "log": "Doom loop detected — injecting corrective prompt"}),
         _ev("compacted", {"old_tokens": 100, "new_tokens": 50}),
         _ev("turn_complete", {"history_size": 10}),
     ]

 def test_outcome_doom_loop_and_context():
     events = [
+        _ev("tool_log", {"tool": "system", "log": "Doom loop detected"}),
         _ev("compacted", {"old_tokens": 100, "new_tokens": 50}),
         _ev("turn_complete", {"history_size": 10}),
     ]