Spaces:

elmerzole
/

llm-api-proxy

Paused

Mirrowel commited on Dec 5, 2025

Commit

64f7fc0

1 Parent(s): ba6dcaa

docs: 📚 update documentation for enhanced claude thinking sanitization and remove obsolete todo file

This commit comprehensively updates documentation to reflect the improved Claude extended thinking sanitization system and removes the completed todo.md file.

- Enhanced DOCUMENTATION.md with detailed explanations of the robust thinking sanitization system, including:
- Clarification that Claude Opus 4.5 always uses the thinking variant (non-thinking version doesn't exist)
- Complete sanitization scenario table with new edge cases (function call ID mismatch, missing tool responses, cached conversations)
- Detailed implementation notes on Gemini-format message processing and turn state analysis
- Three-tier function call response pairing strategy (ID match → name match → fallback)
- Recovery mechanisms for cache post-transformation
- Increased default max output tokens to 64000 for thinking output
- Updated README.md to mention improved function call response pairing with three-tier matching strategy
- Removed todo.md as tasks have been completed (thinking sanitization refinements and function call pairing improvements are now implemented)

Files changed (4) hide show

DOCUMENTATION.md +30 -25
README.md +2 -1
src/rotator_library/pyproject.toml +1 -1
todo.md +0 -7

DOCUMENTATION.md CHANGED Viewed

@@ -420,10 +420,11 @@ The most sophisticated provider implementation, supporting Google's internal Ant
 **Claude Opus 4.5 (NEW!):**
 - Anthropic's most powerful model, now available via Antigravity proxy
-- Uses internal model name `claude-opus-4-5-thinking` when reasoning is enabled
-- Uses `thinkingBudget` parameter for extended thinking control
 - Full support for tool use with schema cleaning
 - Same thinking preservation and sanitization features as Sonnet
 **Claude Sonnet 4.5:**
 - Proxied through Antigravity API (uses internal model name `claude-sonnet-4-5-thinking`)
@@ -475,7 +476,7 @@ ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..."  # Full system prompt
 #### Claude Extended Thinking Sanitization
-The provider includes automatic sanitization for Claude's extended thinking mode, handling common error scenarios:
 **Problem**: Claude's extended thinking API requires strict consistency in thinking blocks:
 - If thinking is enabled, the final assistant turn must start with a thinking block
@@ -491,38 +492,42 @@ The provider includes automatic sanitization for Claude's extended thinking mode
 | Tool loop WITHOUT thinking + thinking enabled | **Inject synthetic closure** to start fresh turn with thinking |
 | Thinking disabled | Strip all thinking blocks |
 | Normal conversation (no tool loop) | Strip old thinking, new response adds thinking naturally |
-**Solution**: The `_sanitize_thinking_for_claude()` method:
-- Analyzes conversation state to detect incomplete tool use loops
-- When enabling thinking in a tool loop that started without thinking:
-  - Injects a minimal synthetic assistant message: `"[Tool execution completed. Processing results.]"`
-  - This **closes** the previous turn, allowing Claude to start a **fresh turn with thinking**
-- Strips thinking from old turns (Claude API ignores them anyway)
-- Preserves thinking when the turn was started with thinking enabled
-**Key Insight**: Instead of force-disabling thinking, we close the tool loop with a synthetic message. This allows seamless model switching (e.g., Gemini → Claude with thinking) without losing the ability to think.
-**Example**:
 ```
-Before sanitization:
-  User: "What's the weather?"
-  Assistant: [tool_use: get_weather]     ← Made by Gemini (no thinking)
-  User: [tool_result: "20C sunny"]
-After sanitization (thinking enabled):
-  User: "What's the weather?"
-  Assistant: [tool_use: get_weather]
-  User: [tool_result: "20C sunny"]
-  Assistant: "[Tool execution completed. Processing results.]"  ← INJECTED
-  → Claude now starts a NEW turn and CAN think!
 ```
 **Configuration**:
 ```env
-ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION=true  # Enable/disable auto-correction
 ```
 #### File Logging
 Optional transaction logging for debugging:

 **Claude Opus 4.5 (NEW!):**
 - Anthropic's most powerful model, now available via Antigravity proxy
+- **Always uses thinking variant** - `claude-opus-4-5-thinking` is the only available variant (non-thinking version doesn't exist)
+- Uses `thinkingBudget` parameter for extended thinking control (-1 for auto, 0 to disable, or specific token count)
 - Full support for tool use with schema cleaning
 - Same thinking preservation and sanitization features as Sonnet
+- Increased default max output tokens to 64000 to accommodate thinking output
 **Claude Sonnet 4.5:**
 - Proxied through Antigravity API (uses internal model name `claude-sonnet-4-5-thinking`)
 #### Claude Extended Thinking Sanitization
+The provider now includes robust automatic sanitization for Claude's extended thinking mode, handling all common error scenarios with conversation history.
 **Problem**: Claude's extended thinking API requires strict consistency in thinking blocks:
 - If thinking is enabled, the final assistant turn must start with a thinking block
 | Tool loop WITHOUT thinking + thinking enabled | **Inject synthetic closure** to start fresh turn with thinking |
 | Thinking disabled | Strip all thinking blocks |
 | Normal conversation (no tool loop) | Strip old thinking, new response adds thinking naturally |
+| Function call ID mismatch | Three-tier recovery: ID match → name match → fallback |
+| Missing tool responses | Automatic placeholder injection |
+| Compacted/cached conversations | Recover thinking from cache post-transformation |
+**Key Implementation Details**:
+The `_sanitize_thinking_for_claude()` method now:
+- Operates on Gemini-format messages (`parts[]` with `"thought": true` markers)
+- Detects tool results as user messages with `functionResponse` parts
+- Uses `_analyze_turn_state()` to classify conversation state on Gemini format
+- Recovers thinking from cache when client strips reasoning_content
+- When enabling thinking in a tool loop started without thinking:
+  - Injects synthetic assistant message to close the previous turn
+  - Allows Claude to start fresh turn with thinking capability
+**Function Call Response Grouping**:
+The enhanced pairing system ensures conversation history integrity:
 ```
+Problem: Client/proxy may mutate response IDs or lose responses during context processing
+Solution:
+1. Try direct ID match (tool_call_id == response.id)
+2. If no match, try function name match (tool.name == response.name)
+3. If still no match, use order-based fallback (nth tool → nth response)
+4. Repair "unknown_function" responses with correct names
+5. Create placeholders for completely missing responses
 ```
 **Configuration**:
 ```env
+ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION=true  # Enable/disable auto-correction (default: true)
 ```
+**Note**: These fixes ensure Claude thinking mode works seamlessly with tool use, model switching, context compression, and cached conversations. No manual intervention required.
 #### File Logging
 Optional transaction logging for debugging:

README.md CHANGED Viewed

@@ -33,7 +33,8 @@ This project provides a powerful solution for developers building complex applic
     - Claude Sonnet 4.5 with extended thinking support
     - Thought signature caching for multi-turn conversations
     - Tool hallucination prevention via parameter signature injection
-    - Automatic thinking block sanitization for Claude models
     - Note: Claude thinking mode requires careful conversation state management (see [Antigravity documentation](DOCUMENTATION.md#antigravity-claude-extended-thinking-sanitization) for details)
 -   **🆕 Credential Prioritization**: Automatic tier detection and priority-based credential selection ensures paid-tier credentials are used for premium models that require them.
 -   **🆕 Weighted Random Rotation**: Configurable credential rotation strategy - choose between deterministic (perfect balance) or weighted random (unpredictable, harder to fingerprint) selection.

     - Claude Sonnet 4.5 with extended thinking support
     - Thought signature caching for multi-turn conversations
     - Tool hallucination prevention via parameter signature injection
+    - Automatic thinking block sanitization for Claude models (with recovery strategies)
+    - Improved function call response pairing with three-tier matching strategy
     - Note: Claude thinking mode requires careful conversation state management (see [Antigravity documentation](DOCUMENTATION.md#antigravity-claude-extended-thinking-sanitization) for details)
 -   **🆕 Credential Prioritization**: Automatic tier detection and priority-based credential selection ensures paid-tier credentials are used for premium models that require them.
 -   **🆕 Weighted Random Rotation**: Configurable credential rotation strategy - choose between deterministic (perfect balance) or weighted random (unpredictable, harder to fingerprint) selection.

src/rotator_library/pyproject.toml CHANGED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "rotator_library"
-version = "0.95"
 authors = [
     { name="Mirrowel", email="nuh@uh.com" },
 ]

 [project]
 name = "rotator_library"
+version = "1.0"
 authors = [
     { name="Mirrowel", email="nuh@uh.com" },
 ]

todo.md DELETED Viewed

@@ -1,7 +0,0 @@
-~~Refine claude injection to inject even if we have correct thinking - to force it to think if we made ultrathink prompt. If last msg is tool use and you prompt - it never thinks again.~~ Maybe done
-Anthropic translation and anthropic compatible endpoint.
-Refine for deployment.