Spaces:

elmerzole
/

llm-api-proxy

Paused

Mirrowel commited on Nov 27, 2025

Commit

f5ccdf6

1 Parent(s): f35e0e7

docs: 📚 add comprehensive documentation for new features and providers

This commit adds extensive documentation for recently implemented features across all documentation files:

- **Antigravity Provider**: Complete documentation of the new Antigravity provider with support for Gemini 2.5, Gemini 3, and Claude Sonnet 4.5 models, including thought signature caching, tool hallucination prevention, and base URL fallback mechanisms
- **Credential Prioritization System**: Detailed explanation of the new tier-based credential selection system that ensures paid-tier credentials are used for premium models
- **Weighted Random Rotation**: Documentation of the configurable `rotation_tolerance` parameter that enables unpredictable credential selection patterns to avoid fingerprinting while maintaining load balance
- **Provider Cache System**: Architecture and usage documentation for the new modular caching system used for preserving conversation state across requests
- **Google OAuth Base Refactoring**: Documentation of the shared `GoogleOAuthBase` class that eliminates code duplication across OAuth providers
- **Enhanced Gemini CLI Features**: Updated documentation covering project tier detection, paid vs free tier credential prioritization, and Gemini 3 support
- **Temperature Override**: Global temperature=0 override configuration to prevent tool hallucination issues
- **Deployment Guide Updates**: Step-by-step instructions for setting up Antigravity OAuth credentials in both local and stateless deployment scenarios
- **Environment Variable Reference**: Comprehensive list of new configuration options including cache control, feature flags, and rotation strategy settings

The documentation includes practical examples, configuration snippets, use cases, and security benefits for each feature.

Files changed (4) hide show

DOCUMENTATION.md +247 -1
Deployment guide.md +31 -0
README.md +112 -1
src/rotator_library/README.md +39 -3

DOCUMENTATION.md CHANGED Viewed

@@ -57,6 +57,7 @@ client = RotatingClient(
 -   `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): Whitelist of models to always include, overriding `ignore_models`.
 -   `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging.
 -   `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): Max concurrent requests allowed for a single API key per provider.
 #### Core Responsibilities
@@ -110,8 +111,16 @@ The `acquire_key` method uses a sophisticated strategy to balance load:
 2.  **Tiering**: Valid keys are split into two tiers:
     *   **Tier 1 (Ideal)**: Keys that are completely idle (0 concurrent requests).
     *   **Tier 2 (Acceptable)**: Keys that are busy but still under their configured `MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>` limit for the requested model. This allows a single key to be used multiple times for the same model, maximizing throughput.
-3.  **Prioritization**: Within each tier, keys with the **lowest daily usage** are prioritized to spread costs evenly.
 4.  **Concurrency Limits**: Checks against `max_concurrent` limits to prevent overloading a single key.
 #### Failure Handling & Cooldowns
@@ -313,6 +322,243 @@ The `CooldownManager` handles IP or account-level rate limiting that affects all
 - If so, `CooldownManager.start_cooldown()` is called for the entire provider
 - All subsequent `acquire_key()` calls for that provider will wait until the cooldown expires
 ---
 ## 3. Provider Specific Implementations

 -   `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): Whitelist of models to always include, overriding `ignore_models`.
 -   `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging.
 -   `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): Max concurrent requests allowed for a single API key per provider.
+-   `rotation_tolerance` (`float`, default: `3.0`): Controls the credential rotation strategy. See Section 2.2 for details.
 #### Core Responsibilities
 2.  **Tiering**: Valid keys are split into two tiers:
     *   **Tier 1 (Ideal)**: Keys that are completely idle (0 concurrent requests).
     *   **Tier 2 (Acceptable)**: Keys that are busy but still under their configured `MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>` limit for the requested model. This allows a single key to be used multiple times for the same model, maximizing throughput.
+3.  **Selection Strategy** (configurable via `rotation_tolerance`):
+    *   **Deterministic (tolerance=0.0)**: Within each tier, keys are sorted by daily usage count and the least-used key is always selected. This provides perfect load balance but predictable patterns.
+    *   **Weighted Random (tolerance>0, default)**: Keys are selected randomly with weights biased toward less-used ones:
+        - Formula: `weight = (max_usage - credential_usage) + tolerance + 1`
+        - `tolerance=2.0` (recommended): Balanced randomness - credentials within 2 uses of the maximum can still be selected with reasonable probability
+        - `tolerance=5.0+`: High randomness - even heavily-used credentials have significant probability
+        - **Security Benefit**: Unpredictable selection patterns make rate limit detection and fingerprinting harder
+        - **Load Balance**: Lower-usage credentials still preferred, maintaining reasonable distribution
 4.  **Concurrency Limits**: Checks against `max_concurrent` limits to prevent overloading a single key.
+5.  **Priority Groups**: When credential prioritization is enabled, higher-tier credentials (lower priority numbers) are tried first before moving to lower tiers.
 #### Failure Handling & Cooldowns
 - If so, `CooldownManager.start_cooldown()` is called for the entire provider
 - All subsequent `acquire_key()` calls for that provider will wait until the cooldown expires
+### 2.10. Credential Prioritization System (`client.py` & `usage_manager.py`)
+The library now includes an intelligent credential prioritization system that automatically detects credential tiers and ensures optimal credential selection for each request.
+**Key Concepts:**
+- **Provider-Level Priorities**: Providers can implement `get_credential_priority()` to return a priority level (1=highest, 10=lowest) for each credential
+- **Model-Level Requirements**: Providers can implement `get_model_tier_requirement()` to specify minimum priority required for specific models
+- **Automatic Filtering**: The client automatically filters out incompatible credentials before making requests
+- **Priority-Aware Selection**: The `UsageManager` prioritizes higher-tier credentials (lower numbers) within the same priority group
+**Implementation Example (Gemini CLI):**
+```python
+def get_credential_priority(self, credential: str) -> Optional[int]:
+    """Returns priority based on Gemini tier."""
+    tier = self.project_tier_cache.get(credential)
+    if not tier:
+        return None  # Not yet discovered
+    # Paid tiers get highest priority
+    if tier not in ['free-tier', 'legacy-tier', 'unknown']:
+        return 1
+    # Free tier gets lower priority
+    if tier == 'free-tier':
+        return 2
+    return 10
+def get_model_tier_requirement(self, model: str) -> Optional[int]:
+    """Returns minimum priority required for model."""
+    if model.startswith("gemini-3-"):
+        return 1  # Only paid tier (priority 1) credentials
+    return None  # All other models have no restrictions
+```
+**Usage Manager Integration:**
+The `acquire_key()` method has been enhanced to:
+1. Group credentials by priority level
+2. Try highest priority group first (priority 1, then 2, etc.)
+3. Within each group, use existing tier1/tier2 logic (idle keys first, then busy keys)
+4. Load balance within priority groups by usage count
+5. Only move to next priority if all higher-priority credentials are exhausted
+**Benefits:**
+- Ensures paid-tier credentials are always used for premium models
+- Prevents failed requests due to tier restrictions
+- Optimal cost distribution (free tier used when possible, paid when required)
+- Graceful fallback if primary credentials are unavailable
+---
+### 2.11. Provider Cache System (`providers/provider_cache.py`)
+A modular, shared caching system for providers to persist conversation state across requests.
+**Architecture:**
+- **Dual-TTL Design**: Short-lived memory cache (default: 1 hour) + longer-lived disk persistence (default: 24 hours)
+- **Background Persistence**: Batched disk writes every 60 seconds (configurable)
+- **Automatic Cleanup**: Background task removes expired entries from memory cache
+### 3.5. Antigravity (`antigravity_provider.py`)
+The most sophisticated provider implementation, supporting Google's internal Antigravity API for Gemini and Claude models.
+#### Architecture
+- **Unified Streaming/Non-Streaming**: Single code path handles both response types with optimal transformations
+- **Thought Signature Caching**: Server-side caching of encrypted signatures for multi-turn Gemini 3 conversations
+- **Model-Specific Logic**: Automatic configuration based on model type (Gemini 2.5, Gemini 3, Claude)
+#### Model Support
+**Gemini 2.5 (Pro/Flash):**
+- Uses `thinkingBudget` parameter (integer tokens: -1 for auto, 0 to disable, or specific value)
+- Standard safety settings and toolConfig
+- Stream processing with thinking content separation
+**Gemini 3 (Pro/Image):**
+- Uses `thinkingLevel` parameter (string: "low" or "high")
+- **Tool Hallucination Prevention**:
+  - Automatic system instruction injection explaining custom tool schema rules
+  - Parameter signature injection into tool descriptions (e.g., "STRICT PARAMETERS: files (ARRAY_OF_OBJECTS[path: string REQUIRED, ...])")
+  - Namespace prefix for tool names (`gemini3_` prefix) to avoid training data conflicts
+  - Malformed JSON auto-correction (handles extra trailing braces)
+- **ThoughtSignature Management**:
+  - Caching signatures from responses for reuse in follow-up messages
+  - Automatic injection into functionCalls for multi-turn conversations
+  - Fallback to bypass value if signature unavailable
+**Claude Sonnet 4.5:**
+- Proxied through Antigravity API (uses internal model name `claude-sonnet-4-5-thinking`)
+- Uses `thinkingBudget` parameter like Gemini 2.5
+- **Thinking Preservation**: Caches thinking content using composite keys (tool_call_id + text_hash)
+- **Schema Cleaning**: Removes unsupported properties (`$schema`, `additionalProperties`, `const` → `enum`)
+#### Base URL Fallback
+Automatic fallback chain for resilience:
+1. `daily-cloudcode-pa.sandbox.googleapis.com` (primary sandbox)
+2. `autopush-cloudcode-pa.sandbox.googleapis.com` (fallback sandbox)
+3. `cloudcode-pa.googleapis.com` (production fallback)
+#### Message Transformation
+**OpenAI → Gemini Format:**
+- System messages → `systemInstruction` with parts array
+- Multi-part content (text + images) → `inlineData` format
+- Tool calls → `functionCall` with args and id
+- Tool responses → `functionResponse` with name and response
+- ThoughtSignatures preserved/injected as needed
+**Tool Response Grouping:**
+- Converts linear format (call, response, call, response) to grouped format
+- Groups all function calls in one `model` message
+- Groups all responses in one `user` message
+- Required for Antigravity API compatibility
+#### Configuration (Environment Variables)
+```env
+# Cache control
+ANTIGRAVITY_SIGNATURE_CACHE_TTL=3600  # Memory cache TTL
+ANTIGRAVITY_SIGNATURE_DISK_TTL=86400  # Disk cache TTL
+ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
+# Feature flags
+ANTIGRAVITY_PRESERVE_THOUGHT_SIGNATURES=true  # Include signatures in client responses
+ANTIGRAVITY_ENABLE_DYNAMIC_MODELS=false  # Use API model discovery
+ANTIGRAVITY_GEMINI3_TOOL_FIX=true  # Enable Gemini 3 hallucination prevention
+# Gemini 3 tool fix customization
+ANTIGRAVITY_GEMINI3_TOOL_PREFIX="gemini3_"  # Namespace prefix
+ANTIGRAVITY_GEMINI3_DESCRIPTION_PROMPT="\n\nSTRICT PARAMETERS: {params}."
+ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..."  # Full system prompt
+```
+#### File Logging
+Optional transaction logging for debugging:
+- Enabled via `enable_request_logging` parameter
+- Creates `logs/antigravity_logs/TIMESTAMP_MODEL_UUID/` directory per request
+- Logs: `request_payload.json`, `response_stream.log`, `final_response.json`, `error.log`
+---
+- **Atomic Disk Writes**: Uses temp-file-and-move pattern to prevent corruption
+**Key Methods:**
+1. **`store(key, value)`**: Synchronously queues value for storage (schedules async write)
+2. **`retrieve(key)`**: Synchronously retrieves from memory, optionally schedules disk fallback
+3. **`store_async(key, value)`**: Awaitable storage for guaranteed persistence
+4. **`retrieve_async(key)`**: Awaitable retrieval with disk fallback
+**Use Cases:**
+- **Gemini 3 ThoughtSignatures**: Caching tool call signatures for multi-turn conversations
+- **Claude Thinking**: Preserving thinking content for consistency across conversation turns
+- **Any Transient State**: Generic key-value storage for provider-specific needs
+**Configuration (Environment Variables):**
+```env
+# Cache control (prefix can be customized per cache instance)
+PROVIDER_CACHE_ENABLE=true
+PROVIDER_CACHE_WRITE_INTERVAL=60  # seconds between disk writes
+PROVIDER_CACHE_CLEANUP_INTERVAL=1800  # 30 min between cleanups
+# Gemini 3 specific
+GEMINI_CLI_SIGNATURE_CACHE_ENABLE=true
+GEMINI_CLI_SIGNATURE_CACHE_TTL=3600  # 1 hour memory TTL
+GEMINI_CLI_SIGNATURE_DISK_TTL=86400  # 24 hours disk TTL
+```
+**File Structure:**
+```
+cache/
+├── gemini_cli/
+│   └── gemini3_signatures.json
+└── antigravity/
+    ├── gemini3_signatures.json
+    └── claude_thinking.json
+```
+---
+### 2.12. Google OAuth Base (`providers/google_oauth_base.py`)
+A refactored, reusable OAuth2 base class that eliminates code duplication across Google-based providers.
+**Refactoring Benefits:**
+- **Single Source of Truth**: All OAuth logic centralized in one class
+- **Easy Provider Addition**: New providers only need to override constants
+- **Consistent Behavior**: Token refresh, expiry handling, and validation work identically across providers
+- **Maintainability**: OAuth bugs fixed once apply to all inheriting providers
+**Provider Implementation:**
+```python
+class AntigravityAuthBase(GoogleOAuthBase):
+    # Required overrides
+    CLIENT_ID = "antigravity-client-id"
+    CLIENT_SECRET = "antigravity-secret"
+    OAUTH_SCOPES = [
+        "https://www.googleapis.com/auth/cloud-platform",
+        "https://www.googleapis.com/auth/cclog",  # Antigravity-specific
+        "https://www.googleapis.com/auth/experimentsandconfigs",
+    ]
+    ENV_PREFIX = "ANTIGRAVITY"  # Used for env var loading
+    # Optional overrides (defaults provided)
+    CALLBACK_PORT = 51121
+    CALLBACK_PATH = "/oauthcallback"
+```
+**Inherited Features:**
+- Automatic token refresh with exponential backoff
+- Invalid grant re-authentication flow
+- Stateless deployment support (env var loading)
+- Atomic credential file writes
+- Headless environment detection
+- Sequential refresh queue processing
+---
 ---
 ## 3. Provider Specific Implementations

Deployment guide.md CHANGED Viewed

@@ -79,6 +79,37 @@ If you are using providers that require complex OAuth files (like **Gemini CLI**
 4.  Copy the contents of this file and paste them directly into your `.env` file or Render's "Environment Variables" section.
 5.  The proxy will automatically detect and use these variables—no file upload required!
 4. Save the file. (We'll upload it to Render in Step 5.)

 4.  Copy the contents of this file and paste them directly into your `.env` file or Render's "Environment Variables" section.
 5.  The proxy will automatically detect and use these variables—no file upload required!
+### Advanced: Antigravity OAuth Provider
+The Antigravity provider requires OAuth2 authentication similar to Gemini CLI. It provides access to:
+- Gemini 2.5 models (Pro/Flash)
+- Gemini 3 models (Pro/Image-preview) - **requires paid-tier Google Cloud project**
+- Claude Sonnet 4.5 via Google's Antigravity proxy
+**Setting up Antigravity locally:**
+1. Run the credential tool: `python -m rotator_library.credential_tool`
+2. Select "Add OAuth Credential" and choose "Antigravity"
+3. Complete the OAuth flow in your browser
+4. The credential is saved to `oauth_creds/antigravity_oauth_1.json`
+**Exporting for stateless deployment:**
+1. Run: `python -m rotator_library.credential_tool`
+2. Select "Export Antigravity to .env"
+3. Copy the generated environment variables to your deployment platform:
+   ```env
+   ANTIGRAVITY_ACCESS_TOKEN="..."
+   ANTIGRAVITY_REFRESH_TOKEN="..."
+   ANTIGRAVITY_EXPIRY_DATE="..."
+   ANTIGRAVITY_EMAIL="your-email@gmail.com"
+   ```
+**Important Notes:**
+- Antigravity uses Google OAuth with additional scopes for cloud platform access
+- Gemini 3 models require a paid-tier Google Cloud project (free tier will fail)
+- The provider automatically handles thought signature caching for multi-turn conversations
+- Tool hallucination prevention is enabled by default for Gemini 3 models
 4. Save the file. (We'll upload it to Render in Step 5.)

README.md CHANGED Viewed

@@ -27,6 +27,15 @@ This project provides a powerful solution for developers building complex applic
 -   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
 -   **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
 -   **Advanced Model Filtering**: Supports both blacklists and whitelists to give you fine-grained control over which models are available through the proxy.
 -   **🆕 Interactive Launcher TUI**: Beautiful, cross-platform TUI for configuration and management with an integrated settings tool for advanced configuration.
@@ -234,11 +243,12 @@ python src/proxy_app/main.py
 **Main Menu Features:**
-1. **Add OAuth Credential** - Interactive OAuth flow for Gemini CLI, Qwen Code, and iFlow
    - Automatically opens your browser for authentication
    - Handles the entire OAuth flow including callbacks
    - Saves credentials to the local `oauth_creds/` directory
    - For Gemini CLI: Automatically discovers or creates a Google Cloud project
    - For Qwen Code: Uses Device Code flow (you'll enter a code in your browser)
    - For iFlow: Starts a local callback server on port 11451
@@ -488,6 +498,42 @@ The following advanced settings can be added to your `.env` file (or configured
 -   **`SKIP_OAUTH_INIT_CHECK`**: Set to `true` to skip the interactive OAuth setup/validation check on startup. Essential for non-interactive environments like Docker containers or CI/CD pipelines.
     ```env
     SKIP_OAUTH_INIT_CHECK=true
     ```
 #### Concurrency Control
@@ -516,6 +562,71 @@ For providers that support custom model definitions (Qwen Code, iFlow), you can
 #### Provider-Specific Settings
 -   **`GEMINI_CLI_PROJECT_ID`**: Manually specify a Google Cloud Project ID for Gemini CLI OAuth. Only needed if automatic discovery fails.
     ```env
     GEMINI_CLI_PROJECT_ID="your-gcp-project-id"
     ```

 -   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
 -   **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
 -   **Advanced Model Filtering**: Supports both blacklists and whitelists to give you fine-grained control over which models are available through the proxy.
+-   **🆕 Antigravity Provider**: Full support for Google's internal Antigravity API, providing access to Gemini 2.5, Gemini 3, and Claude Sonnet 4.5 models with advanced features like thought signature caching and tool hallucination prevention.
+-   **🆕 Credential Prioritization**: Automatic tier detection and priority-based credential selection ensures paid-tier credentials are used for premium models that require them.
+-   **🆕 Weighted Random Rotation**: Configurable credential rotation strategy - choose between deterministic (perfect balance) or weighted random (unpredictable, harder to fingerprint) selection.
+-   **🆕 Enhanced Gemini CLI**: Improved project discovery, paid vs free tier detection, and Gemini 3 support with thoughtSignature caching.
+-   **🆕 Temperature Override**: Global temperature=0 override option to prevent tool hallucination issues with low-temperature settings.
+-   **🆕 Provider Cache System**: Modular caching system for preserving conversation state (thought signatures, thinking content) across requests.
+-   **🆕 Refactored OAuth Base**: Shared [`GoogleOAuthBase`](src/rotator_library/providers/google_oauth_base.py) class eliminates code duplication across OAuth providers.
 -   **🆕 Interactive Launcher TUI**: Beautiful, cross-platform TUI for configuration and management with an integrated settings tool for advanced configuration.
 **Main Menu Features:**
+1. **Add OAuth Credential** - Interactive OAuth flow for Gemini CLI, Antigravity, Qwen Code, and iFlow
    - Automatically opens your browser for authentication
    - Handles the entire OAuth flow including callbacks
    - Saves credentials to the local `oauth_creds/` directory
    - For Gemini CLI: Automatically discovers or creates a Google Cloud project
+   - For Antigravity: Similar to Gemini CLI with Antigravity-specific scopes
    - For Qwen Code: Uses Device Code flow (you'll enter a code in your browser)
    - For iFlow: Starts a local callback server on port 11451
 -   **`SKIP_OAUTH_INIT_CHECK`**: Set to `true` to skip the interactive OAuth setup/validation check on startup. Essential for non-interactive environments like Docker containers or CI/CD pipelines.
     ```env
     SKIP_OAUTH_INIT_CHECK=true
+#### **Antigravity (Advanced - Gemini 3 \Claude 4.5 Access)**
+The newest and most sophisticated provider, offering access to cutting-edge models via Google's internal Antigravity API.
+**Supported Models:**
+-   Gemini 2.5 (Pro/Flash) with `thinkingBudget` parameter
+-   **Gemini 3 Pro (High/Low)** - Latest preview models
+-   **Claude Sonnet 4.5 + Thinking** via Antigravity proxy
+**Advanced Features:**
+-   **Thought Signature Caching**: Preserves encrypted signatures for multi-turn Gemini 3 conversations
+-   **Tool Hallucination Prevention**: Automatic system instruction and parameter signature injection for Gemini 3 to prevent tools from being called with incorrect parameters
+-   **Thinking Preservation**: Caches Claude thinking content for consistency across conversation turns
+-   **Automatic Fallback**: Tries sandbox endpoints before falling back to production
+-   **Schema Cleaning**: Handles Claude-specific tool schema requirements
+**Configuration:**
+-   **OAuth Setup**: Uses Google OAuth similar to Gemini CLI (separate scopes)
+-   **Stateless Deployment**: Full environment variable support
+-   **Paid Tier Recommended**: Gemini 3 models require a paid Google Cloud project
+**Environment Variables:**
+```env
+# Stateless deployment
+ANTIGRAVITY_ACCESS_TOKEN="..."
+ANTIGRAVITY_REFRESH_TOKEN="..."
+ANTIGRAVITY_EXPIRY_DATE="..."
+ANTIGRAVITY_EMAIL="user@gmail.com"
+# Feature toggles
+ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true  # Multi-turn conversation support
+ANTIGRAVITY_GEMINI3_TOOL_FIX=true  # Prevent tool hallucination
+```
     ```
 #### Concurrency Control
 #### Provider-Specific Settings
 -   **`GEMINI_CLI_PROJECT_ID`**: Manually specify a Google Cloud Project ID for Gemini CLI OAuth. Only needed if automatic discovery fails.
+#### Antigravity Provider
+-   **`ANTIGRAVITY_OAUTH_1`**: Path to Antigravity OAuth credential file (auto-discovered from `~/.antigravity/` or use the credential tool).
+    ```env
+    ANTIGRAVITY_OAUTH_1="/path/to/your/antigravity_creds.json"
+    ```
+-   **Stateless Deployment** (Environment Variables):
+    ```env
+    ANTIGRAVITY_ACCESS_TOKEN="ya29.your-access-token"
+#### Credential Rotation Strategy
+-   **`ROTATION_TOLERANCE`**: Controls how credentials are selected for requests. Set via environment variable or programmatically.
+    - `0.0`: **Deterministic** - Always selects the least-used credential for perfect load balance
+    - `3.0` (default, recommended): **Weighted Random** - Randomly selects with bias toward less-used credentials. Provides unpredictability (harder to fingerprint/detect) while maintaining good balance
+    - `5.0+`: **High Randomness** - Maximum unpredictability, even heavily-used credentials can be selected
+    ```env
+    # For maximum security/unpredictability (recommended for production)
+    ROTATION_TOLERANCE=3.0
+    # For perfect load balancing (default)
+    ROTATION_TOLERANCE=0.0
+    ```
+    **Why use weighted random?**
+    - Makes traffic patterns less predictable
+    - Still maintains good load distribution across keys
+    - Recommended for production environments with multiple credentials
+    ANTIGRAVITY_REFRESH_TOKEN="1//your-refresh-token"
+    ANTIGRAVITY_EXPIRY_DATE="1234567890000"
+    ANTIGRAVITY_EMAIL="your-email@gmail.com"
+    ```
+-   **`ANTIGRAVITY_ENABLE_SIGNATURE_CACHE`**: Enable/disable thought signature caching for Gemini 3 multi-turn conversations. Default: `true`.
+    ```env
+    ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
+    ```
+-   **`ANTIGRAVITY_GEMINI3_TOOL_FIX`**: Enable/disable tool hallucination prevention for Gemini 3 models. Default: `true`.
+    ```env
+    ANTIGRAVITY_GEMINI3_TOOL_FIX=true
+    ```
+#### Temperature Override (Global)
+-   **`OVERRIDE_TEMPERATURE_ZERO`**: Prevents tool hallucination caused by temperature=0 settings. Modes:
+    - `"remove"`: Deletes temperature=0 from requests (lets provider use default)
+    - `"set"`: Changes temperature=0 to temperature=1.0
+    - `"false"` or unset: Disabled (default)
+#### Credential Prioritization
+-   **`GEMINI_CLI_PROJECT_ID`**: Manually specify a Google Cloud Project ID for Gemini CLI OAuth. Auto-discovered unless unexpected failure occurs.
+    ```env
+    GEMINI_CLI_PROJECT_ID="your-gcp-project-id"
+    ```
     ```env
     GEMINI_CLI_PROJECT_ID="your-gcp-project-id"
     ```

src/rotator_library/README.md CHANGED Viewed

@@ -7,9 +7,11 @@ A robust, asynchronous, and thread-safe Python library for managing a pool of AP
 -   **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
 -   **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`), it can also support multiple concurrent requests to the *same* model using the same key.
 -   **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability.
 -   **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key selections, exceeds a specified time limit.
 -   **OAuth & API Key Support**: Built-in support for standard API keys and complex OAuth flows.
-    -   **Gemini CLI**: Full OAuth 2.0 web flow with automatic project discovery and free-tier onboarding.
     -   **Qwen Code**: Device Code flow support.
     -   **iFlow**: Authorization Code flow with local callback handling.
 -   **Stateless Deployment Ready**: Can load complex OAuth credentials from environment variables, eliminating the need for physical credential files in containerized environments.
@@ -17,11 +19,15 @@ A robust, asynchronous, and thread-safe Python library for managing a pool of AP
     -   **Escalating Per-Model Cooldowns**: Failed keys are placed on a temporary, escalating cooldown for specific models.
     -   **Key-Level Lockouts**: Keys failing across multiple models are temporarily removed from rotation.
     -   **Stream Recovery**: The client detects mid-stream errors (like quota limits) and gracefully handles them.
 -   **Robust Streaming Support**: Includes a wrapper for streaming responses that reassembles fragmented JSON chunks.
 -   **Detailed Usage Tracking**: Tracks daily and global usage for each key, persisted to a JSON file.
 -   **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
 -   **Provider Agnostic**: Works with any provider supported by `litellm`.
 -   **Extensible**: Easily add support for new providers through a simple plugin-based architecture.
 ## Installation
@@ -71,7 +77,8 @@ client = RotatingClient(
     ignore_models={},
     whitelist_models={},
     enable_request_logging=False,
-    max_concurrent_requests_per_key={}
 )
 ```
@@ -89,6 +96,17 @@ client = RotatingClient(
 -   `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): A dictionary where keys are provider names and values are lists of model names/patterns to always include, overriding `ignore_models`.
 -   `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging (useful for debugging complex interactions).
 -   `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): A dictionary defining the maximum number of concurrent requests allowed for a single API key for a specific provider. Defaults to 1 if not specified.
 ### Concurrency and Resource Management
@@ -185,9 +203,27 @@ Use this tool to:
 ### Google Gemini (CLI)
 -   **Auth**: Simulates the Google Cloud CLI authentication flow.
--   **Project Discovery**: Automatically discovers the default Google Cloud Project ID.
 -   **Rate Limits**: Implements smart fallback strategies (e.g., switching from `gemini-1.5-pro` to `gemini-1.5-pro-002`) when rate limits are hit.
 ## Error Handling and Cooldowns
 The client uses a sophisticated error handling mechanism:

 -   **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
 -   **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`), it can also support multiple concurrent requests to the *same* model using the same key.
 -   **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability.
+-   **Configurable Rotation Strategy**: Choose between deterministic least-used selection (perfect balance) or default weighted random selection (unpredictable, harder to fingerprint).
 -   **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key selections, exceeds a specified time limit.
 -   **OAuth & API Key Support**: Built-in support for standard API keys and complex OAuth flows.
+    -   **Gemini CLI**: Full OAuth 2.0 web flow with automatic project discovery, free-tier onboarding, and credential prioritization (paid vs free tier).
+    -   **Antigravity**: Full OAuth 2.0 support for Gemini 3, Gemini 2.5, and Claude Sonnet 4.5 models with thought signature caching(Full support for Gemini 3 and Claude models). **First on the scene to provide full support for Gemini 3** via Antigravity with advanced features like thought signature caching and tool hallucination prevention.
     -   **Qwen Code**: Device Code flow support.
     -   **iFlow**: Authorization Code flow with local callback handling.
 -   **Stateless Deployment Ready**: Can load complex OAuth credentials from environment variables, eliminating the need for physical credential files in containerized environments.
     -   **Escalating Per-Model Cooldowns**: Failed keys are placed on a temporary, escalating cooldown for specific models.
     -   **Key-Level Lockouts**: Keys failing across multiple models are temporarily removed from rotation.
     -   **Stream Recovery**: The client detects mid-stream errors (like quota limits) and gracefully handles them.
+-   **Credential Prioritization**: Automatic tier detection and priority-based credential selection (e.g., paid tier credentials used first for models that require them).
+-   **Advanced Model Requirements**: Support for model-tier restrictions (e.g., Gemini 3 requires paid-tier credentials).
 -   **Robust Streaming Support**: Includes a wrapper for streaming responses that reassembles fragmented JSON chunks.
 -   **Detailed Usage Tracking**: Tracks daily and global usage for each key, persisted to a JSON file.
 -   **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
 -   **Provider Agnostic**: Works with any provider supported by `litellm`.
 -   **Extensible**: Easily add support for new providers through a simple plugin-based architecture.
+-   **Temperature Override**: Global temperature=0 override to prevent tool hallucination with low-temperature settings.
+-   **Shared OAuth Base**: Refactored OAuth implementation with reusable [`GoogleOAuthBase`](providers/google_oauth_base.py) for multiple providers.
 ## Installation
     ignore_models={},
     whitelist_models={},
     enable_request_logging=False,
+    max_concurrent_requests_per_key={},
+    rotation_tolerance=2.0  # 0.0=deterministic, 2.0=recommended random
 )
 ```
 -   `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): A dictionary where keys are provider names and values are lists of model names/patterns to always include, overriding `ignore_models`.
 -   `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging (useful for debugging complex interactions).
 -   `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): A dictionary defining the maximum number of concurrent requests allowed for a single API key for a specific provider. Defaults to 1 if not specified.
+-   `rotation_tolerance` (`float`, default: `0.0`): Controls credential rotation strategy:
+    - `0.0`: **Deterministic** - Always selects the least-used credential for perfect load balance.
+    - `2.0` (default, recommended): **Weighted Random** - Randomly selects credentials with bias toward less-used ones. Provides unpredictability (harder to fingerprint) while maintaining good balance.
+    - `5.0+`: **High Randomness** - Even heavily-used credentials have significant selection probability. Maximum unpredictability.
+    The weight formula is: `weight = (max_usage - credential_usage) + tolerance + 1`
+    **Use Cases:**
+    - `0.0`: When perfect load balance is critical
+    - `2.0`: When avoiding fingerprinting/rate limit detection is important
+    - `5.0+`: For stress testing or maximum unpredictability
 ### Concurrency and Resource Management
 ### Google Gemini (CLI)
 -   **Auth**: Simulates the Google Cloud CLI authentication flow.
+-   **Project Discovery**: Automatically discovers the default Google Cloud Project ID with enhanced onboarding flow.
+-   **Credential Prioritization**: Automatic detection and prioritization of paid vs free tier credentials.
+-   **Model Tier Requirements**: Gemini 3 models automatically filtered to paid-tier credentials only.
+-   **Gemini 3 Support**: Full support for Gemini 3 models with:
+    - `thinkingLevel` configuration (low/high)
+    - Tool hallucination prevention via system instruction injection
+    - ThoughtSignature caching for multi-turn conversations
+    - Parameter signature injection into tool descriptions
 -   **Rate Limits**: Implements smart fallback strategies (e.g., switching from `gemini-1.5-pro` to `gemini-1.5-pro-002`) when rate limits are hit.
+### Antigravity
+-   **Auth**: Uses OAuth 2.0 flow similar to Gemini CLI, with Antigravity-specific credentials and scopes.
+-   **Models**: Supports Gemini 2.5 (Pro/Flash), Gemini 3 (Pro/Image), and Claude Sonnet 4.5 via Google's internal Antigravity API.
+-   **Thought Signature Caching**: Server-side caching of `thoughtSignature` data for multi-turn conversations with Gemini 3 models.
+-   **Tool Hallucination Prevention**: Automatic injection of system instructions and parameter signatures for Gemini 3 to prevent tool parameter hallucination.
+-   **Thinking Support**:
+    - Gemini 2.5: Uses `thinkingBudget` (integer tokens)
+    - Gemini 3: Uses `thinkingLevel` (string: "low"/"high")
+    - Claude: Uses `thinkingBudget` via Antigravity proxy
+-   **Base URL Fallback**: Automatic fallback between sandbox and production endpoints.
 ## Error Handling and Cooldowns
 The client uses a sophisticated error handling mechanism: