Mirrowel commited on
Commit
087aab7
·
1 Parent(s): d4593e5

feat(antigravity): ✨ add thinking mode sanitization for Claude API compatibility

Browse files

This commit introduces comprehensive thinking mode sanitization to prevent 400 errors when using Claude's extended thinking feature across different conversation states and model switches.

- Add new environment variable `ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION` (default: true) to control sanitization behavior
- Implement conversation state analysis to detect tool use loops and thinking block presence
- Handle four distinct scenarios per Claude API documentation:
1. Thinking disabled: strip all thinking blocks from conversation
2. Tool loop with existing thinking: preserve current turn thinking only
3. Tool loop without thinking (invalid toggle): inject synthetic assistant response to close the loop
4. No tool loop: strip old turn thinking, allow new response to add thinking naturally
- Add `_analyze_conversation_state()` to detect tool loops and thinking block locations
- Add `_sanitize_thinking_for_claude()` as main orchestration method
- Add helper methods for stripping, preserving, and closing tool loops
- Support `reasoning_content` field in message transformation for cached thinking blocks
- Add safety checks to maintain role alternation in edge cases
- Integrate sanitization into completion flow before message transformation

The sanitization prevents the Claude API error: "Expected `thinking` or `redacted_thinking`, but found `tool_use`" which occurs when attempting to toggle thinking mode mid-turn during tool use loops.

This fix enables seamless thinking mode across context compression, model switching (e.g., Gemini to Claude), and multi-turn tool use conversations.

DOCUMENTATION.md CHANGED
@@ -458,6 +458,7 @@ ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
458
  ANTIGRAVITY_PRESERVE_THOUGHT_SIGNATURES=true # Include signatures in client responses
459
  ANTIGRAVITY_ENABLE_DYNAMIC_MODELS=false # Use API model discovery
460
  ANTIGRAVITY_GEMINI3_TOOL_FIX=true # Enable Gemini 3 hallucination prevention
 
461
 
462
  # Gemini 3 tool fix customization
463
  ANTIGRAVITY_GEMINI3_TOOL_PREFIX="gemini3_" # Namespace prefix
@@ -465,6 +466,56 @@ ANTIGRAVITY_GEMINI3_DESCRIPTION_PROMPT="\n\nSTRICT PARAMETERS: {params}."
465
  ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..." # Full system prompt
466
  ```
467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
468
  #### File Logging
469
 
470
  Optional transaction logging for debugging:
 
458
  ANTIGRAVITY_PRESERVE_THOUGHT_SIGNATURES=true # Include signatures in client responses
459
  ANTIGRAVITY_ENABLE_DYNAMIC_MODELS=false # Use API model discovery
460
  ANTIGRAVITY_GEMINI3_TOOL_FIX=true # Enable Gemini 3 hallucination prevention
461
+ ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION=true # Enable Claude thinking mode auto-correction
462
 
463
  # Gemini 3 tool fix customization
464
  ANTIGRAVITY_GEMINI3_TOOL_PREFIX="gemini3_" # Namespace prefix
 
466
  ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..." # Full system prompt
467
  ```
468
 
469
+ #### Claude Extended Thinking Sanitization
470
+
471
+ The provider includes automatic sanitization for Claude's extended thinking mode, handling common error scenarios:
472
+
473
+ **Problem**: Claude's extended thinking API requires strict consistency in thinking blocks:
474
+ - If thinking is enabled, the final assistant turn must start with a thinking block
475
+ - If thinking is disabled, no thinking blocks can be present in the final turn
476
+ - Tool use loops are part of a single "assistant turn"
477
+ - You **cannot** toggle thinking mode mid-turn (this is invalid per Claude API)
478
+
479
+ **Scenarios Handled**:
480
+
481
+ | Scenario | Action |
482
+ |----------|--------|
483
+ | Tool loop WITH thinking + thinking enabled | Preserve thinking, continue normally |
484
+ | Tool loop WITHOUT thinking + thinking enabled | **Inject synthetic closure** to start fresh turn with thinking |
485
+ | Thinking disabled | Strip all thinking blocks |
486
+ | Normal conversation (no tool loop) | Strip old thinking, new response adds thinking naturally |
487
+
488
+ **Solution**: The `_sanitize_thinking_for_claude()` method:
489
+ - Analyzes conversation state to detect incomplete tool use loops
490
+ - When enabling thinking in a tool loop that started without thinking:
491
+ - Injects a minimal synthetic assistant message: `"[Tool execution completed. Processing results.]"`
492
+ - This **closes** the previous turn, allowing Claude to start a **fresh turn with thinking**
493
+ - Strips thinking from old turns (Claude API ignores them anyway)
494
+ - Preserves thinking when the turn was started with thinking enabled
495
+
496
+ **Key Insight**: Instead of force-disabling thinking, we close the tool loop with a synthetic message. This allows seamless model switching (e.g., Gemini → Claude with thinking) without losing the ability to think.
497
+
498
+ **Example**:
499
+ ```
500
+ Before sanitization:
501
+ User: "What's the weather?"
502
+ Assistant: [tool_use: get_weather] ← Made by Gemini (no thinking)
503
+ User: [tool_result: "20C sunny"]
504
+
505
+ After sanitization (thinking enabled):
506
+ User: "What's the weather?"
507
+ Assistant: [tool_use: get_weather]
508
+ User: [tool_result: "20C sunny"]
509
+ Assistant: "[Tool execution completed. Processing results.]" ← INJECTED
510
+
511
+ → Claude now starts a NEW turn and CAN think!
512
+ ```
513
+
514
+ **Configuration**:
515
+ ```env
516
+ ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION=true # Enable/disable auto-correction
517
+ ```
518
+
519
  #### File Logging
520
 
521
  Optional transaction logging for debugging:
README.md CHANGED
@@ -28,7 +28,7 @@ This project provides a powerful solution for developers building complex applic
28
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
29
  - **Advanced Model Filtering**: Supports both blacklists and whitelists to give you fine-grained control over which models are available through the proxy.
30
 
31
- - **🆕 Antigravity Provider**: Full support for Google's internal Antigravity API, providing access to Gemini 2.5, Gemini 3, and Claude Sonnet 4.5 models with advanced features like thought signature caching and tool hallucination prevention.
32
  - **🆕 Credential Prioritization**: Automatic tier detection and priority-based credential selection ensures paid-tier credentials are used for premium models that require them.
33
  - **🆕 Weighted Random Rotation**: Configurable credential rotation strategy - choose between deterministic (perfect balance) or weighted random (unpredictable, harder to fingerprint) selection.
34
  - **🆕 Enhanced Gemini CLI**: Improved project discovery, paid vs free tier detection, and Gemini 3 support with thoughtSignature caching.
 
28
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
29
  - **Advanced Model Filtering**: Supports both blacklists and whitelists to give you fine-grained control over which models are available through the proxy.
30
 
31
+ - **🆕 Antigravity Provider**: Full support for Google's internal Antigravity API, providing access to Gemini 2.5, Gemini 3, and Claude Sonnet 4.5 models with advanced features like thought signature caching and tool hallucination prevention. However - Sonnet 4.5 Thinking with native tool calls is very skittish, so if you have compaction or switch the model (or toggle thinking) mid task - it will error 400 on you, as claude needs it's previous thinking block. With compaction - it will be destroyed. There is a system to maybe catch all this, but i am hurting my head here trying to come up with a solution that makes sense.
32
  - **🆕 Credential Prioritization**: Automatic tier detection and priority-based credential selection ensures paid-tier credentials are used for premium models that require them.
33
  - **🆕 Weighted Random Rotation**: Configurable credential rotation strategy - choose between deterministic (perfect balance) or weighted random (unpredictable, harder to fingerprint) selection.
34
  - **🆕 Enhanced Gemini CLI**: Improved project discovery, paid vs free tier detection, and Gemini 3 support with thoughtSignature caching.
src/rotator_library/providers/antigravity_provider.py CHANGED
@@ -401,6 +401,7 @@ class AntigravityProvider(AntigravityAuthBase, ProviderInterface):
401
  self._enable_dynamic_models = _env_bool("ANTIGRAVITY_ENABLE_DYNAMIC_MODELS", False)
402
  self._enable_gemini3_tool_fix = _env_bool("ANTIGRAVITY_GEMINI3_TOOL_FIX", True)
403
  self._enable_claude_tool_fix = _env_bool("ANTIGRAVITY_CLAUDE_TOOL_FIX", True)
 
404
 
405
  # Gemini 3 tool fix configuration
406
  self._gemini3_tool_prefix = os.getenv("ANTIGRAVITY_GEMINI3_TOOL_PREFIX", "gemini3_")
@@ -431,7 +432,8 @@ class AntigravityProvider(AntigravityAuthBase, ProviderInterface):
431
  lib_logger.debug(
432
  f"Antigravity config: signatures_in_client={self._preserve_signatures_in_client}, "
433
  f"cache={self._enable_signature_cache}, dynamic_models={self._enable_dynamic_models}, "
434
- f"gemini3_fix={self._enable_gemini3_tool_fix}, claude_fix={self._enable_claude_tool_fix}"
 
435
  )
436
 
437
  # =========================================================================
@@ -512,6 +514,295 @@ class AntigravityProvider(AntigravityAuthBase, ProviderInterface):
512
 
513
  return "thinking_" + "_".join(key_parts) if key_parts else None
514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
515
  # =========================================================================
516
  # REASONING CONFIGURATION
517
  # =========================================================================
@@ -691,9 +982,43 @@ class AntigravityProvider(AntigravityAuthBase, ProviderInterface):
691
  parts = []
692
  content = msg.get("content")
693
  tool_calls = msg.get("tool_calls", [])
694
-
695
- # Try to inject cached thinking for Claude
696
- if self._is_claude(model) and self._enable_signature_cache:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
697
  thinking_parts = self._get_cached_thinking(content, tool_calls)
698
  parts.extend(thinking_parts)
699
 
@@ -754,6 +1079,16 @@ class AntigravityProvider(AntigravityAuthBase, ProviderInterface):
754
 
755
  parts.append(func_part)
756
 
 
 
 
 
 
 
 
 
 
 
757
  return parts
758
 
759
  def _get_cached_thinking(
@@ -1583,6 +1918,26 @@ class AntigravityProvider(AntigravityAuthBase, ProviderInterface):
1583
  # Create logger
1584
  file_logger = AntigravityFileLogger(model, enable_logging)
1585
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1586
  # Transform messages
1587
  system_instruction, gemini_contents = self._transform_messages(messages, model)
1588
  gemini_contents = self._fix_tool_response_grouping(gemini_contents)
 
401
  self._enable_dynamic_models = _env_bool("ANTIGRAVITY_ENABLE_DYNAMIC_MODELS", False)
402
  self._enable_gemini3_tool_fix = _env_bool("ANTIGRAVITY_GEMINI3_TOOL_FIX", True)
403
  self._enable_claude_tool_fix = _env_bool("ANTIGRAVITY_CLAUDE_TOOL_FIX", True)
404
+ self._enable_thinking_sanitization = _env_bool("ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION", True)
405
 
406
  # Gemini 3 tool fix configuration
407
  self._gemini3_tool_prefix = os.getenv("ANTIGRAVITY_GEMINI3_TOOL_PREFIX", "gemini3_")
 
432
  lib_logger.debug(
433
  f"Antigravity config: signatures_in_client={self._preserve_signatures_in_client}, "
434
  f"cache={self._enable_signature_cache}, dynamic_models={self._enable_dynamic_models}, "
435
+ f"gemini3_fix={self._enable_gemini3_tool_fix}, claude_fix={self._enable_claude_tool_fix}, "
436
+ f"thinking_sanitization={self._enable_thinking_sanitization}"
437
  )
438
 
439
  # =========================================================================
 
514
 
515
  return "thinking_" + "_".join(key_parts) if key_parts else None
516
 
517
+ # =========================================================================
518
+ # THINKING MODE SANITIZATION
519
+ # =========================================================================
520
+
521
+ def _analyze_conversation_state(
522
+ self,
523
+ messages: List[Dict[str, Any]]
524
+ ) -> Dict[str, Any]:
525
+ """
526
+ Analyze conversation state to detect tool use loops and thinking mode issues.
527
+
528
+ Returns:
529
+ {
530
+ "in_tool_loop": bool - True if we're in an incomplete tool use loop
531
+ "last_assistant_idx": int - Index of last assistant message
532
+ "last_assistant_has_thinking": bool - Whether last assistant msg has thinking
533
+ "last_assistant_has_tool_calls": bool - Whether last assistant msg has tool calls
534
+ "pending_tool_results": bool - Whether there are tool results after last assistant
535
+ "thinking_block_indices": List[int] - Indices of messages with thinking/reasoning
536
+ }
537
+ """
538
+ state = {
539
+ "in_tool_loop": False,
540
+ "last_assistant_idx": -1,
541
+ "last_assistant_has_thinking": False,
542
+ "last_assistant_has_tool_calls": False,
543
+ "pending_tool_results": False,
544
+ "thinking_block_indices": [],
545
+ }
546
+
547
+ # Find last assistant message and analyze the conversation
548
+ for i, msg in enumerate(messages):
549
+ role = msg.get("role")
550
+
551
+ if role == "assistant":
552
+ state["last_assistant_idx"] = i
553
+ state["last_assistant_has_tool_calls"] = bool(msg.get("tool_calls"))
554
+ # Check for thinking/reasoning content
555
+ has_thinking = bool(msg.get("reasoning_content"))
556
+ # Also check for thinking in content array (some formats)
557
+ content = msg.get("content")
558
+ if isinstance(content, list):
559
+ for item in content:
560
+ if isinstance(item, dict) and item.get("type") == "thinking":
561
+ has_thinking = True
562
+ break
563
+ state["last_assistant_has_thinking"] = has_thinking
564
+ if has_thinking:
565
+ state["thinking_block_indices"].append(i)
566
+ elif role == "tool":
567
+ # Tool result after an assistant message with tool calls = in tool loop
568
+ if state["last_assistant_has_tool_calls"]:
569
+ state["pending_tool_results"] = True
570
+
571
+ # We're in a tool loop if:
572
+ # 1. Last assistant message had tool calls
573
+ # 2. There are tool results after it
574
+ # 3. There's no final text response yet (the conversation ends with tool results)
575
+ if state["pending_tool_results"] and messages:
576
+ last_msg = messages[-1]
577
+ if last_msg.get("role") == "tool":
578
+ state["in_tool_loop"] = True
579
+
580
+ return state
581
+
582
+ def _sanitize_thinking_for_claude(
583
+ self,
584
+ messages: List[Dict[str, Any]],
585
+ thinking_enabled: bool
586
+ ) -> Tuple[List[Dict[str, Any]], bool]:
587
+ """
588
+ Sanitize thinking blocks in conversation history for Claude compatibility.
589
+
590
+ Handles the following scenarios per Claude docs:
591
+ 1. If thinking is disabled, remove all thinking blocks from conversation
592
+ 2. If thinking is enabled:
593
+ a. In a tool use loop WITH thinking: preserve it (same mode continues)
594
+ b. In a tool use loop WITHOUT thinking: this is INVALID toggle - force disable
595
+ c. Not in tool loop: strip old thinking, new response adds thinking naturally
596
+
597
+ Per Claude docs:
598
+ - "If thinking is enabled, the final assistant turn must start with a thinking block"
599
+ - "If thinking is disabled, the final assistant turn must not contain any thinking blocks"
600
+ - Tool use loops are part of a single assistant turn
601
+ - You CANNOT toggle thinking mid-turn
602
+
603
+ The key insight: We only force-disable thinking when TOGGLING it ON mid-turn.
604
+ If thinking was already enabled (assistant has thinking), we preserve.
605
+ If thinking was disabled (assistant has no thinking), enabling it now is invalid.
606
+
607
+ Returns:
608
+ Tuple of (sanitized_messages, force_disable_thinking)
609
+ - sanitized_messages: The cleaned message list
610
+ - force_disable_thinking: If True, thinking must be disabled for this request
611
+ """
612
+ messages = copy.deepcopy(messages)
613
+ state = self._analyze_conversation_state(messages)
614
+
615
+ lib_logger.debug(
616
+ f"[Thinking Sanitization] thinking_enabled={thinking_enabled}, "
617
+ f"in_tool_loop={state['in_tool_loop']}, "
618
+ f"last_assistant_has_thinking={state['last_assistant_has_thinking']}, "
619
+ f"last_assistant_has_tool_calls={state['last_assistant_has_tool_calls']}"
620
+ )
621
+
622
+ if not thinking_enabled:
623
+ # CASE 1: Thinking is disabled - strip ALL thinking blocks
624
+ return self._strip_all_thinking_blocks(messages), False
625
+
626
+ # CASE 2: Thinking is enabled
627
+ if state["in_tool_loop"]:
628
+ # We're in a tool use loop (conversation ends with tool_result)
629
+ # Per Claude docs: entire assistant turn must operate in single thinking mode
630
+
631
+ if state["last_assistant_has_thinking"]:
632
+ # Last assistant turn HAD thinking - this is valid!
633
+ # Thinking was enabled when tool was called, continue with thinking enabled.
634
+ # Only keep thinking for the current turn (last assistant + following tools)
635
+ lib_logger.debug(
636
+ "[Thinking Sanitization] Tool loop with existing thinking - preserving."
637
+ )
638
+ return self._preserve_current_turn_thinking(
639
+ messages, state["last_assistant_idx"]
640
+ ), False
641
+ else:
642
+ # Last assistant turn DID NOT have thinking, but thinking is NOW enabled
643
+ # This is the INVALID case: toggling thinking ON mid-turn
644
+ #
645
+ # Per Claude docs, this causes:
646
+ # "Expected `thinking` or `redacted_thinking`, but found `tool_use`."
647
+ #
648
+ # SOLUTION: Inject a synthetic assistant message to CLOSE the tool loop.
649
+ # This allows Claude to start a fresh turn WITH thinking.
650
+ #
651
+ # The synthetic message summarizes the tool results, allowing the model
652
+ # to respond naturally with thinking enabled on what is now a "new" turn.
653
+ lib_logger.info(
654
+ "[Thinking Sanitization] Closing tool loop with synthetic response. "
655
+ "This allows thinking to be enabled on the new turn."
656
+ )
657
+ return self._close_tool_loop_for_thinking(messages), False
658
+ else:
659
+ # Not in a tool loop - this is the simple case
660
+ # The conversation doesn't end with tool_result, so we're starting fresh.
661
+ # Strip thinking from old turns (API ignores them anyway).
662
+ # New response will include thinking naturally.
663
+
664
+ if state["last_assistant_idx"] >= 0 and not state["last_assistant_has_thinking"]:
665
+ if state["last_assistant_has_tool_calls"]:
666
+ # Last assistant made tool calls but no thinking
667
+ # This could be from context compression, model switch, or
668
+ # the assistant responded after tool results (completing the turn)
669
+ lib_logger.debug(
670
+ "[Thinking Sanitization] Last assistant has completed tool_calls but no thinking. "
671
+ "This is likely from context compression or completed tool loop. "
672
+ "New response will include thinking."
673
+ )
674
+
675
+ # Strip thinking from old turns, let new response add thinking naturally
676
+ return self._strip_old_turn_thinking(messages, state["last_assistant_idx"]), False
677
+
678
+ def _strip_all_thinking_blocks(
679
+ self,
680
+ messages: List[Dict[str, Any]]
681
+ ) -> List[Dict[str, Any]]:
682
+ """Remove all thinking/reasoning content from messages."""
683
+ for msg in messages:
684
+ if msg.get("role") == "assistant":
685
+ # Remove reasoning_content field
686
+ msg.pop("reasoning_content", None)
687
+
688
+ # Remove thinking blocks from content array
689
+ content = msg.get("content")
690
+ if isinstance(content, list):
691
+ filtered = [
692
+ item for item in content
693
+ if not (isinstance(item, dict) and item.get("type") == "thinking")
694
+ ]
695
+ # If filtering leaves empty list, we need to preserve message structure
696
+ # to maintain user/assistant alternation. Use empty string as placeholder
697
+ # (will result in empty "text" part which is valid).
698
+ if not filtered:
699
+ # Only if there are no tool_calls either - otherwise message is valid
700
+ if not msg.get("tool_calls"):
701
+ msg["content"] = ""
702
+ else:
703
+ msg["content"] = None # tool_calls exist, content not needed
704
+ else:
705
+ msg["content"] = filtered
706
+ return messages
707
+
708
+ def _strip_old_turn_thinking(
709
+ self,
710
+ messages: List[Dict[str, Any]],
711
+ last_assistant_idx: int
712
+ ) -> List[Dict[str, Any]]:
713
+ """
714
+ Strip thinking from old turns but preserve for the last assistant turn.
715
+
716
+ Per Claude docs: "thinking blocks from previous turns are removed from context"
717
+ This mimics the API behavior and prevents issues.
718
+ """
719
+ for i, msg in enumerate(messages):
720
+ if msg.get("role") == "assistant" and i < last_assistant_idx:
721
+ # Old turn - strip thinking
722
+ msg.pop("reasoning_content", None)
723
+ content = msg.get("content")
724
+ if isinstance(content, list):
725
+ filtered = [
726
+ item for item in content
727
+ if not (isinstance(item, dict) and item.get("type") == "thinking")
728
+ ]
729
+ # Preserve message structure with empty string if needed
730
+ if not filtered:
731
+ msg["content"] = "" if not msg.get("tool_calls") else None
732
+ else:
733
+ msg["content"] = filtered
734
+ return messages
735
+
736
+ def _preserve_current_turn_thinking(
737
+ self,
738
+ messages: List[Dict[str, Any]],
739
+ last_assistant_idx: int
740
+ ) -> List[Dict[str, Any]]:
741
+ """
742
+ Preserve thinking only for the current (last) assistant turn.
743
+ Strip from all previous turns.
744
+ """
745
+ # Same as strip_old_turn_thinking - we keep the last turn intact
746
+ return self._strip_old_turn_thinking(messages, last_assistant_idx)
747
+
748
+ def _close_tool_loop_for_thinking(
749
+ self,
750
+ messages: List[Dict[str, Any]]
751
+ ) -> List[Dict[str, Any]]:
752
+ """
753
+ Close an incomplete tool loop by injecting a synthetic assistant response.
754
+
755
+ This is used when:
756
+ - We're in a tool loop (conversation ends with tool_result)
757
+ - The tool call was made WITHOUT thinking (e.g., by Gemini or non-thinking Claude)
758
+ - We NOW want to enable thinking
759
+
760
+ By injecting a synthetic response that "closes" the previous turn,
761
+ Claude can start a fresh turn with thinking enabled.
762
+
763
+ The synthetic message is minimal and factual - it just acknowledges
764
+ the tool results were received, allowing the model to process them
765
+ with thinking on the new turn.
766
+ """
767
+ # Strip any old thinking first
768
+ messages = self._strip_all_thinking_blocks(messages)
769
+
770
+ # Collect tool results from the end of the conversation
771
+ tool_results = []
772
+ for msg in reversed(messages):
773
+ if msg.get("role") == "tool":
774
+ tool_results.append(msg)
775
+ elif msg.get("role") == "assistant":
776
+ break # Stop at the assistant that made the tool calls
777
+
778
+ tool_results.reverse() # Put back in order
779
+
780
+ # Safety check: if no tool results found, this shouldn't have been called
781
+ # But handle gracefully with a generic message
782
+ if not tool_results:
783
+ lib_logger.warning(
784
+ "[Thinking Sanitization] _close_tool_loop_for_thinking called but no tool results found. "
785
+ "This may indicate malformed conversation history."
786
+ )
787
+ synthetic_content = "[Processing previous context.]"
788
+ elif len(tool_results) == 1:
789
+ synthetic_content = "[Tool execution completed. Processing results.]"
790
+ else:
791
+ synthetic_content = f"[{len(tool_results)} tool executions completed. Processing results.]"
792
+
793
+ # Inject the synthetic assistant message to close the loop
794
+ synthetic_msg = {
795
+ "role": "assistant",
796
+ "content": synthetic_content
797
+ }
798
+ messages.append(synthetic_msg)
799
+
800
+ lib_logger.debug(
801
+ f"[Thinking Sanitization] Injected synthetic closure: '{synthetic_content}'"
802
+ )
803
+
804
+ return messages
805
+
806
  # =========================================================================
807
  # REASONING CONFIGURATION
808
  # =========================================================================
 
982
  parts = []
983
  content = msg.get("content")
984
  tool_calls = msg.get("tool_calls", [])
985
+ reasoning_content = msg.get("reasoning_content")
986
+
987
+ # Handle reasoning_content if present (from original Claude response with thinking)
988
+ if reasoning_content and self._is_claude(model):
989
+ # Add thinking part with cached signature
990
+ thinking_part = {
991
+ "text": reasoning_content,
992
+ "thought": True,
993
+ }
994
+ # Try to get signature from cache
995
+ cache_key = self._generate_thinking_cache_key(
996
+ content if isinstance(content, str) else "",
997
+ tool_calls
998
+ )
999
+ cached_sig = None
1000
+ if cache_key:
1001
+ cached_json = self._thinking_cache.retrieve(cache_key)
1002
+ if cached_json:
1003
+ try:
1004
+ cached_data = json.loads(cached_json)
1005
+ cached_sig = cached_data.get("thought_signature", "")
1006
+ except json.JSONDecodeError:
1007
+ pass
1008
+
1009
+ if cached_sig:
1010
+ thinking_part["thoughtSignature"] = cached_sig
1011
+ parts.append(thinking_part)
1012
+ lib_logger.debug(f"Added reasoning_content with cached signature ({len(reasoning_content)} chars)")
1013
+ else:
1014
+ # No cached signature - skip the thinking block
1015
+ # This can happen if context was compressed and signature was lost
1016
+ lib_logger.warning(
1017
+ f"Skipping reasoning_content - no valid signature found. "
1018
+ f"This may cause issues if thinking is enabled."
1019
+ )
1020
+ elif self._is_claude(model) and self._enable_signature_cache and not reasoning_content:
1021
+ # Fallback: Try to inject cached thinking for Claude (original behavior)
1022
  thinking_parts = self._get_cached_thinking(content, tool_calls)
1023
  parts.extend(thinking_parts)
1024
 
 
1079
 
1080
  parts.append(func_part)
1081
 
1082
+ # Safety: ensure we return at least one part to maintain role alternation
1083
+ # This handles edge cases like assistant messages that had only thinking content
1084
+ # which got stripped, leaving the message otherwise empty
1085
+ if not parts:
1086
+ # Use a minimal text part - can happen after thinking is stripped
1087
+ parts.append({"text": ""})
1088
+ lib_logger.debug(
1089
+ "[Transform] Added empty text part to maintain role alternation"
1090
+ )
1091
+
1092
  return parts
1093
 
1094
  def _get_cached_thinking(
 
1918
  # Create logger
1919
  file_logger = AntigravityFileLogger(model, enable_logging)
1920
 
1921
+ # Determine if thinking is enabled for this request
1922
+ # Thinking is enabled if reasoning_effort is set (and not "disable") for Claude
1923
+ thinking_enabled = False
1924
+ if self._is_claude(model):
1925
+ # For Claude, thinking is enabled when reasoning_effort is provided and not "disable"
1926
+ thinking_enabled = reasoning_effort is not None and reasoning_effort != "disable"
1927
+
1928
+ # Sanitize thinking blocks for Claude to prevent 400 errors
1929
+ # This handles: context compression, model switching, mid-turn thinking toggle
1930
+ # Returns (sanitized_messages, force_disable_thinking)
1931
+ force_disable_thinking = False
1932
+ if self._is_claude(model) and self._enable_thinking_sanitization:
1933
+ messages, force_disable_thinking = self._sanitize_thinking_for_claude(messages, thinking_enabled)
1934
+
1935
+ # If we're in a mid-turn thinking toggle situation, we MUST disable thinking
1936
+ # for this request. Thinking will naturally resume on the next turn.
1937
+ if force_disable_thinking:
1938
+ thinking_enabled = False
1939
+ reasoning_effort = "disable" # Force disable for this request
1940
+
1941
  # Transform messages
1942
  system_instruction, gemini_contents = self._transform_messages(messages, model)
1943
  gemini_contents = self._fix_tool_response_grouping(gemini_contents)