Mirrowel commited on
Commit
f5ccdf6
Β·
1 Parent(s): f35e0e7

docs: πŸ“š add comprehensive documentation for new features and providers

Browse files

This commit adds extensive documentation for recently implemented features across all documentation files:

- **Antigravity Provider**: Complete documentation of the new Antigravity provider with support for Gemini 2.5, Gemini 3, and Claude Sonnet 4.5 models, including thought signature caching, tool hallucination prevention, and base URL fallback mechanisms
- **Credential Prioritization System**: Detailed explanation of the new tier-based credential selection system that ensures paid-tier credentials are used for premium models
- **Weighted Random Rotation**: Documentation of the configurable `rotation_tolerance` parameter that enables unpredictable credential selection patterns to avoid fingerprinting while maintaining load balance
- **Provider Cache System**: Architecture and usage documentation for the new modular caching system used for preserving conversation state across requests
- **Google OAuth Base Refactoring**: Documentation of the shared `GoogleOAuthBase` class that eliminates code duplication across OAuth providers
- **Enhanced Gemini CLI Features**: Updated documentation covering project tier detection, paid vs free tier credential prioritization, and Gemini 3 support
- **Temperature Override**: Global temperature=0 override configuration to prevent tool hallucination issues
- **Deployment Guide Updates**: Step-by-step instructions for setting up Antigravity OAuth credentials in both local and stateless deployment scenarios
- **Environment Variable Reference**: Comprehensive list of new configuration options including cache control, feature flags, and rotation strategy settings

The documentation includes practical examples, configuration snippets, use cases, and security benefits for each feature.

Files changed (4) hide show
  1. DOCUMENTATION.md +247 -1
  2. Deployment guide.md +31 -0
  3. README.md +112 -1
  4. src/rotator_library/README.md +39 -3
DOCUMENTATION.md CHANGED
@@ -57,6 +57,7 @@ client = RotatingClient(
57
  - `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): Whitelist of models to always include, overriding `ignore_models`.
58
  - `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging.
59
  - `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): Max concurrent requests allowed for a single API key per provider.
 
60
 
61
  #### Core Responsibilities
62
 
@@ -110,8 +111,16 @@ The `acquire_key` method uses a sophisticated strategy to balance load:
110
  2. **Tiering**: Valid keys are split into two tiers:
111
  * **Tier 1 (Ideal)**: Keys that are completely idle (0 concurrent requests).
112
  * **Tier 2 (Acceptable)**: Keys that are busy but still under their configured `MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>` limit for the requested model. This allows a single key to be used multiple times for the same model, maximizing throughput.
113
- 3. **Prioritization**: Within each tier, keys with the **lowest daily usage** are prioritized to spread costs evenly.
 
 
 
 
 
 
 
114
  4. **Concurrency Limits**: Checks against `max_concurrent` limits to prevent overloading a single key.
 
115
 
116
  #### Failure Handling & Cooldowns
117
 
@@ -313,6 +322,243 @@ The `CooldownManager` handles IP or account-level rate limiting that affects all
313
  - If so, `CooldownManager.start_cooldown()` is called for the entire provider
314
  - All subsequent `acquire_key()` calls for that provider will wait until the cooldown expires
315
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316
  ---
317
 
318
  ## 3. Provider Specific Implementations
 
57
  - `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): Whitelist of models to always include, overriding `ignore_models`.
58
  - `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging.
59
  - `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): Max concurrent requests allowed for a single API key per provider.
60
+ - `rotation_tolerance` (`float`, default: `3.0`): Controls the credential rotation strategy. See Section 2.2 for details.
61
 
62
  #### Core Responsibilities
63
 
 
111
  2. **Tiering**: Valid keys are split into two tiers:
112
  * **Tier 1 (Ideal)**: Keys that are completely idle (0 concurrent requests).
113
  * **Tier 2 (Acceptable)**: Keys that are busy but still under their configured `MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>` limit for the requested model. This allows a single key to be used multiple times for the same model, maximizing throughput.
114
+ 3. **Selection Strategy** (configurable via `rotation_tolerance`):
115
+ * **Deterministic (tolerance=0.0)**: Within each tier, keys are sorted by daily usage count and the least-used key is always selected. This provides perfect load balance but predictable patterns.
116
+ * **Weighted Random (tolerance>0, default)**: Keys are selected randomly with weights biased toward less-used ones:
117
+ - Formula: `weight = (max_usage - credential_usage) + tolerance + 1`
118
+ - `tolerance=2.0` (recommended): Balanced randomness - credentials within 2 uses of the maximum can still be selected with reasonable probability
119
+ - `tolerance=5.0+`: High randomness - even heavily-used credentials have significant probability
120
+ - **Security Benefit**: Unpredictable selection patterns make rate limit detection and fingerprinting harder
121
+ - **Load Balance**: Lower-usage credentials still preferred, maintaining reasonable distribution
122
  4. **Concurrency Limits**: Checks against `max_concurrent` limits to prevent overloading a single key.
123
+ 5. **Priority Groups**: When credential prioritization is enabled, higher-tier credentials (lower priority numbers) are tried first before moving to lower tiers.
124
 
125
  #### Failure Handling & Cooldowns
126
 
 
322
  - If so, `CooldownManager.start_cooldown()` is called for the entire provider
323
  - All subsequent `acquire_key()` calls for that provider will wait until the cooldown expires
324
 
325
+
326
+ ### 2.10. Credential Prioritization System (`client.py` & `usage_manager.py`)
327
+
328
+ The library now includes an intelligent credential prioritization system that automatically detects credential tiers and ensures optimal credential selection for each request.
329
+
330
+ **Key Concepts:**
331
+
332
+ - **Provider-Level Priorities**: Providers can implement `get_credential_priority()` to return a priority level (1=highest, 10=lowest) for each credential
333
+ - **Model-Level Requirements**: Providers can implement `get_model_tier_requirement()` to specify minimum priority required for specific models
334
+ - **Automatic Filtering**: The client automatically filters out incompatible credentials before making requests
335
+ - **Priority-Aware Selection**: The `UsageManager` prioritizes higher-tier credentials (lower numbers) within the same priority group
336
+
337
+ **Implementation Example (Gemini CLI):**
338
+
339
+ ```python
340
+ def get_credential_priority(self, credential: str) -> Optional[int]:
341
+ """Returns priority based on Gemini tier."""
342
+ tier = self.project_tier_cache.get(credential)
343
+ if not tier:
344
+ return None # Not yet discovered
345
+
346
+ # Paid tiers get highest priority
347
+ if tier not in ['free-tier', 'legacy-tier', 'unknown']:
348
+ return 1
349
+
350
+ # Free tier gets lower priority
351
+ if tier == 'free-tier':
352
+ return 2
353
+
354
+ return 10
355
+
356
+ def get_model_tier_requirement(self, model: str) -> Optional[int]:
357
+ """Returns minimum priority required for model."""
358
+ if model.startswith("gemini-3-"):
359
+ return 1 # Only paid tier (priority 1) credentials
360
+
361
+ return None # All other models have no restrictions
362
+ ```
363
+
364
+ **Usage Manager Integration:**
365
+
366
+ The `acquire_key()` method has been enhanced to:
367
+ 1. Group credentials by priority level
368
+ 2. Try highest priority group first (priority 1, then 2, etc.)
369
+ 3. Within each group, use existing tier1/tier2 logic (idle keys first, then busy keys)
370
+ 4. Load balance within priority groups by usage count
371
+ 5. Only move to next priority if all higher-priority credentials are exhausted
372
+
373
+ **Benefits:**
374
+
375
+ - Ensures paid-tier credentials are always used for premium models
376
+ - Prevents failed requests due to tier restrictions
377
+ - Optimal cost distribution (free tier used when possible, paid when required)
378
+ - Graceful fallback if primary credentials are unavailable
379
+
380
+ ---
381
+
382
+ ### 2.11. Provider Cache System (`providers/provider_cache.py`)
383
+
384
+ A modular, shared caching system for providers to persist conversation state across requests.
385
+
386
+ **Architecture:**
387
+
388
+ - **Dual-TTL Design**: Short-lived memory cache (default: 1 hour) + longer-lived disk persistence (default: 24 hours)
389
+ - **Background Persistence**: Batched disk writes every 60 seconds (configurable)
390
+ - **Automatic Cleanup**: Background task removes expired entries from memory cache
391
+
392
+ ### 3.5. Antigravity (`antigravity_provider.py`)
393
+
394
+ The most sophisticated provider implementation, supporting Google's internal Antigravity API for Gemini and Claude models.
395
+
396
+ #### Architecture
397
+
398
+ - **Unified Streaming/Non-Streaming**: Single code path handles both response types with optimal transformations
399
+ - **Thought Signature Caching**: Server-side caching of encrypted signatures for multi-turn Gemini 3 conversations
400
+ - **Model-Specific Logic**: Automatic configuration based on model type (Gemini 2.5, Gemini 3, Claude)
401
+
402
+ #### Model Support
403
+
404
+ **Gemini 2.5 (Pro/Flash):**
405
+ - Uses `thinkingBudget` parameter (integer tokens: -1 for auto, 0 to disable, or specific value)
406
+ - Standard safety settings and toolConfig
407
+ - Stream processing with thinking content separation
408
+
409
+ **Gemini 3 (Pro/Image):**
410
+ - Uses `thinkingLevel` parameter (string: "low" or "high")
411
+ - **Tool Hallucination Prevention**:
412
+ - Automatic system instruction injection explaining custom tool schema rules
413
+ - Parameter signature injection into tool descriptions (e.g., "STRICT PARAMETERS: files (ARRAY_OF_OBJECTS[path: string REQUIRED, ...])")
414
+ - Namespace prefix for tool names (`gemini3_` prefix) to avoid training data conflicts
415
+ - Malformed JSON auto-correction (handles extra trailing braces)
416
+ - **ThoughtSignature Management**:
417
+ - Caching signatures from responses for reuse in follow-up messages
418
+ - Automatic injection into functionCalls for multi-turn conversations
419
+ - Fallback to bypass value if signature unavailable
420
+
421
+ **Claude Sonnet 4.5:**
422
+ - Proxied through Antigravity API (uses internal model name `claude-sonnet-4-5-thinking`)
423
+ - Uses `thinkingBudget` parameter like Gemini 2.5
424
+ - **Thinking Preservation**: Caches thinking content using composite keys (tool_call_id + text_hash)
425
+ - **Schema Cleaning**: Removes unsupported properties (`$schema`, `additionalProperties`, `const` β†’ `enum`)
426
+
427
+ #### Base URL Fallback
428
+
429
+ Automatic fallback chain for resilience:
430
+ 1. `daily-cloudcode-pa.sandbox.googleapis.com` (primary sandbox)
431
+ 2. `autopush-cloudcode-pa.sandbox.googleapis.com` (fallback sandbox)
432
+ 3. `cloudcode-pa.googleapis.com` (production fallback)
433
+
434
+ #### Message Transformation
435
+
436
+ **OpenAI β†’ Gemini Format:**
437
+ - System messages β†’ `systemInstruction` with parts array
438
+ - Multi-part content (text + images) β†’ `inlineData` format
439
+ - Tool calls β†’ `functionCall` with args and id
440
+ - Tool responses β†’ `functionResponse` with name and response
441
+ - ThoughtSignatures preserved/injected as needed
442
+
443
+ **Tool Response Grouping:**
444
+ - Converts linear format (call, response, call, response) to grouped format
445
+ - Groups all function calls in one `model` message
446
+ - Groups all responses in one `user` message
447
+ - Required for Antigravity API compatibility
448
+
449
+ #### Configuration (Environment Variables)
450
+
451
+ ```env
452
+ # Cache control
453
+ ANTIGRAVITY_SIGNATURE_CACHE_TTL=3600 # Memory cache TTL
454
+ ANTIGRAVITY_SIGNATURE_DISK_TTL=86400 # Disk cache TTL
455
+ ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
456
+
457
+ # Feature flags
458
+ ANTIGRAVITY_PRESERVE_THOUGHT_SIGNATURES=true # Include signatures in client responses
459
+ ANTIGRAVITY_ENABLE_DYNAMIC_MODELS=false # Use API model discovery
460
+ ANTIGRAVITY_GEMINI3_TOOL_FIX=true # Enable Gemini 3 hallucination prevention
461
+
462
+ # Gemini 3 tool fix customization
463
+ ANTIGRAVITY_GEMINI3_TOOL_PREFIX="gemini3_" # Namespace prefix
464
+ ANTIGRAVITY_GEMINI3_DESCRIPTION_PROMPT="\n\nSTRICT PARAMETERS: {params}."
465
+ ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..." # Full system prompt
466
+ ```
467
+
468
+ #### File Logging
469
+
470
+ Optional transaction logging for debugging:
471
+ - Enabled via `enable_request_logging` parameter
472
+ - Creates `logs/antigravity_logs/TIMESTAMP_MODEL_UUID/` directory per request
473
+ - Logs: `request_payload.json`, `response_stream.log`, `final_response.json`, `error.log`
474
+
475
+ ---
476
+
477
+
478
+ - **Atomic Disk Writes**: Uses temp-file-and-move pattern to prevent corruption
479
+
480
+ **Key Methods:**
481
+
482
+ 1. **`store(key, value)`**: Synchronously queues value for storage (schedules async write)
483
+ 2. **`retrieve(key)`**: Synchronously retrieves from memory, optionally schedules disk fallback
484
+ 3. **`store_async(key, value)`**: Awaitable storage for guaranteed persistence
485
+ 4. **`retrieve_async(key)`**: Awaitable retrieval with disk fallback
486
+
487
+ **Use Cases:**
488
+
489
+ - **Gemini 3 ThoughtSignatures**: Caching tool call signatures for multi-turn conversations
490
+ - **Claude Thinking**: Preserving thinking content for consistency across conversation turns
491
+ - **Any Transient State**: Generic key-value storage for provider-specific needs
492
+
493
+ **Configuration (Environment Variables):**
494
+
495
+ ```env
496
+ # Cache control (prefix can be customized per cache instance)
497
+ PROVIDER_CACHE_ENABLE=true
498
+ PROVIDER_CACHE_WRITE_INTERVAL=60 # seconds between disk writes
499
+ PROVIDER_CACHE_CLEANUP_INTERVAL=1800 # 30 min between cleanups
500
+
501
+ # Gemini 3 specific
502
+ GEMINI_CLI_SIGNATURE_CACHE_ENABLE=true
503
+ GEMINI_CLI_SIGNATURE_CACHE_TTL=3600 # 1 hour memory TTL
504
+ GEMINI_CLI_SIGNATURE_DISK_TTL=86400 # 24 hours disk TTL
505
+ ```
506
+
507
+ **File Structure:**
508
+
509
+ ```
510
+ cache/
511
+ β”œβ”€β”€ gemini_cli/
512
+ β”‚ └── gemini3_signatures.json
513
+ └── antigravity/
514
+ β”œβ”€β”€ gemini3_signatures.json
515
+ └── claude_thinking.json
516
+ ```
517
+
518
+ ---
519
+
520
+ ### 2.12. Google OAuth Base (`providers/google_oauth_base.py`)
521
+
522
+ A refactored, reusable OAuth2 base class that eliminates code duplication across Google-based providers.
523
+
524
+ **Refactoring Benefits:**
525
+
526
+ - **Single Source of Truth**: All OAuth logic centralized in one class
527
+ - **Easy Provider Addition**: New providers only need to override constants
528
+ - **Consistent Behavior**: Token refresh, expiry handling, and validation work identically across providers
529
+ - **Maintainability**: OAuth bugs fixed once apply to all inheriting providers
530
+
531
+ **Provider Implementation:**
532
+
533
+ ```python
534
+ class AntigravityAuthBase(GoogleOAuthBase):
535
+ # Required overrides
536
+ CLIENT_ID = "antigravity-client-id"
537
+ CLIENT_SECRET = "antigravity-secret"
538
+ OAUTH_SCOPES = [
539
+ "https://www.googleapis.com/auth/cloud-platform",
540
+ "https://www.googleapis.com/auth/cclog", # Antigravity-specific
541
+ "https://www.googleapis.com/auth/experimentsandconfigs",
542
+ ]
543
+ ENV_PREFIX = "ANTIGRAVITY" # Used for env var loading
544
+
545
+ # Optional overrides (defaults provided)
546
+ CALLBACK_PORT = 51121
547
+ CALLBACK_PATH = "/oauthcallback"
548
+ ```
549
+
550
+ **Inherited Features:**
551
+
552
+ - Automatic token refresh with exponential backoff
553
+ - Invalid grant re-authentication flow
554
+ - Stateless deployment support (env var loading)
555
+ - Atomic credential file writes
556
+ - Headless environment detection
557
+ - Sequential refresh queue processing
558
+
559
+ ---
560
+
561
+
562
  ---
563
 
564
  ## 3. Provider Specific Implementations
Deployment guide.md CHANGED
@@ -79,6 +79,37 @@ If you are using providers that require complex OAuth files (like **Gemini CLI**
79
  4. Copy the contents of this file and paste them directly into your `.env` file or Render's "Environment Variables" section.
80
  5. The proxy will automatically detect and use these variablesβ€”no file upload required!
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  4. Save the file. (We'll upload it to Render in Step 5.)
83
 
84
 
 
79
  4. Copy the contents of this file and paste them directly into your `.env` file or Render's "Environment Variables" section.
80
  5. The proxy will automatically detect and use these variablesβ€”no file upload required!
81
 
82
+
83
+ ### Advanced: Antigravity OAuth Provider
84
+
85
+ The Antigravity provider requires OAuth2 authentication similar to Gemini CLI. It provides access to:
86
+ - Gemini 2.5 models (Pro/Flash)
87
+ - Gemini 3 models (Pro/Image-preview) - **requires paid-tier Google Cloud project**
88
+ - Claude Sonnet 4.5 via Google's Antigravity proxy
89
+
90
+ **Setting up Antigravity locally:**
91
+ 1. Run the credential tool: `python -m rotator_library.credential_tool`
92
+ 2. Select "Add OAuth Credential" and choose "Antigravity"
93
+ 3. Complete the OAuth flow in your browser
94
+ 4. The credential is saved to `oauth_creds/antigravity_oauth_1.json`
95
+
96
+ **Exporting for stateless deployment:**
97
+ 1. Run: `python -m rotator_library.credential_tool`
98
+ 2. Select "Export Antigravity to .env"
99
+ 3. Copy the generated environment variables to your deployment platform:
100
+ ```env
101
+ ANTIGRAVITY_ACCESS_TOKEN="..."
102
+ ANTIGRAVITY_REFRESH_TOKEN="..."
103
+ ANTIGRAVITY_EXPIRY_DATE="..."
104
+ ANTIGRAVITY_EMAIL="your-email@gmail.com"
105
+ ```
106
+
107
+ **Important Notes:**
108
+ - Antigravity uses Google OAuth with additional scopes for cloud platform access
109
+ - Gemini 3 models require a paid-tier Google Cloud project (free tier will fail)
110
+ - The provider automatically handles thought signature caching for multi-turn conversations
111
+ - Tool hallucination prevention is enabled by default for Gemini 3 models
112
+
113
  4. Save the file. (We'll upload it to Render in Step 5.)
114
 
115
 
README.md CHANGED
@@ -27,6 +27,15 @@ This project provides a powerful solution for developers building complex applic
27
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
28
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
29
  - **Advanced Model Filtering**: Supports both blacklists and whitelists to give you fine-grained control over which models are available through the proxy.
 
 
 
 
 
 
 
 
 
30
  - **πŸ†• Interactive Launcher TUI**: Beautiful, cross-platform TUI for configuration and management with an integrated settings tool for advanced configuration.
31
 
32
 
@@ -234,11 +243,12 @@ python src/proxy_app/main.py
234
 
235
  **Main Menu Features:**
236
 
237
- 1. **Add OAuth Credential** - Interactive OAuth flow for Gemini CLI, Qwen Code, and iFlow
238
  - Automatically opens your browser for authentication
239
  - Handles the entire OAuth flow including callbacks
240
  - Saves credentials to the local `oauth_creds/` directory
241
  - For Gemini CLI: Automatically discovers or creates a Google Cloud project
 
242
  - For Qwen Code: Uses Device Code flow (you'll enter a code in your browser)
243
  - For iFlow: Starts a local callback server on port 11451
244
 
@@ -488,6 +498,42 @@ The following advanced settings can be added to your `.env` file (or configured
488
  - **`SKIP_OAUTH_INIT_CHECK`**: Set to `true` to skip the interactive OAuth setup/validation check on startup. Essential for non-interactive environments like Docker containers or CI/CD pipelines.
489
  ```env
490
  SKIP_OAUTH_INIT_CHECK=true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
491
  ```
492
 
493
  #### Concurrency Control
@@ -516,6 +562,71 @@ For providers that support custom model definitions (Qwen Code, iFlow), you can
516
  #### Provider-Specific Settings
517
 
518
  - **`GEMINI_CLI_PROJECT_ID`**: Manually specify a Google Cloud Project ID for Gemini CLI OAuth. Only needed if automatic discovery fails.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
519
  ```env
520
  GEMINI_CLI_PROJECT_ID="your-gcp-project-id"
521
  ```
 
27
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
28
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
29
  - **Advanced Model Filtering**: Supports both blacklists and whitelists to give you fine-grained control over which models are available through the proxy.
30
+
31
+ - **πŸ†• Antigravity Provider**: Full support for Google's internal Antigravity API, providing access to Gemini 2.5, Gemini 3, and Claude Sonnet 4.5 models with advanced features like thought signature caching and tool hallucination prevention.
32
+ - **πŸ†• Credential Prioritization**: Automatic tier detection and priority-based credential selection ensures paid-tier credentials are used for premium models that require them.
33
+ - **πŸ†• Weighted Random Rotation**: Configurable credential rotation strategy - choose between deterministic (perfect balance) or weighted random (unpredictable, harder to fingerprint) selection.
34
+ - **πŸ†• Enhanced Gemini CLI**: Improved project discovery, paid vs free tier detection, and Gemini 3 support with thoughtSignature caching.
35
+ - **πŸ†• Temperature Override**: Global temperature=0 override option to prevent tool hallucination issues with low-temperature settings.
36
+ - **πŸ†• Provider Cache System**: Modular caching system for preserving conversation state (thought signatures, thinking content) across requests.
37
+ - **πŸ†• Refactored OAuth Base**: Shared [`GoogleOAuthBase`](src/rotator_library/providers/google_oauth_base.py) class eliminates code duplication across OAuth providers.
38
+
39
  - **πŸ†• Interactive Launcher TUI**: Beautiful, cross-platform TUI for configuration and management with an integrated settings tool for advanced configuration.
40
 
41
 
 
243
 
244
  **Main Menu Features:**
245
 
246
+ 1. **Add OAuth Credential** - Interactive OAuth flow for Gemini CLI, Antigravity, Qwen Code, and iFlow
247
  - Automatically opens your browser for authentication
248
  - Handles the entire OAuth flow including callbacks
249
  - Saves credentials to the local `oauth_creds/` directory
250
  - For Gemini CLI: Automatically discovers or creates a Google Cloud project
251
+ - For Antigravity: Similar to Gemini CLI with Antigravity-specific scopes
252
  - For Qwen Code: Uses Device Code flow (you'll enter a code in your browser)
253
  - For iFlow: Starts a local callback server on port 11451
254
 
 
498
  - **`SKIP_OAUTH_INIT_CHECK`**: Set to `true` to skip the interactive OAuth setup/validation check on startup. Essential for non-interactive environments like Docker containers or CI/CD pipelines.
499
  ```env
500
  SKIP_OAUTH_INIT_CHECK=true
501
+
502
+
503
+ #### **Antigravity (Advanced - Gemini 3 \Claude 4.5 Access)**
504
+ The newest and most sophisticated provider, offering access to cutting-edge models via Google's internal Antigravity API.
505
+
506
+ **Supported Models:**
507
+ - Gemini 2.5 (Pro/Flash) with `thinkingBudget` parameter
508
+ - **Gemini 3 Pro (High/Low)** - Latest preview models
509
+ - **Claude Sonnet 4.5 + Thinking** via Antigravity proxy
510
+
511
+ **Advanced Features:**
512
+ - **Thought Signature Caching**: Preserves encrypted signatures for multi-turn Gemini 3 conversations
513
+ - **Tool Hallucination Prevention**: Automatic system instruction and parameter signature injection for Gemini 3 to prevent tools from being called with incorrect parameters
514
+ - **Thinking Preservation**: Caches Claude thinking content for consistency across conversation turns
515
+ - **Automatic Fallback**: Tries sandbox endpoints before falling back to production
516
+ - **Schema Cleaning**: Handles Claude-specific tool schema requirements
517
+
518
+ **Configuration:**
519
+ - **OAuth Setup**: Uses Google OAuth similar to Gemini CLI (separate scopes)
520
+ - **Stateless Deployment**: Full environment variable support
521
+ - **Paid Tier Recommended**: Gemini 3 models require a paid Google Cloud project
522
+
523
+ **Environment Variables:**
524
+ ```env
525
+ # Stateless deployment
526
+ ANTIGRAVITY_ACCESS_TOKEN="..."
527
+ ANTIGRAVITY_REFRESH_TOKEN="..."
528
+ ANTIGRAVITY_EXPIRY_DATE="..."
529
+ ANTIGRAVITY_EMAIL="user@gmail.com"
530
+
531
+ # Feature toggles
532
+ ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true # Multi-turn conversation support
533
+ ANTIGRAVITY_GEMINI3_TOOL_FIX=true # Prevent tool hallucination
534
+ ```
535
+
536
+
537
  ```
538
 
539
  #### Concurrency Control
 
562
  #### Provider-Specific Settings
563
 
564
  - **`GEMINI_CLI_PROJECT_ID`**: Manually specify a Google Cloud Project ID for Gemini CLI OAuth. Only needed if automatic discovery fails.
565
+
566
+
567
+ #### Antigravity Provider
568
+
569
+ - **`ANTIGRAVITY_OAUTH_1`**: Path to Antigravity OAuth credential file (auto-discovered from `~/.antigravity/` or use the credential tool).
570
+ ```env
571
+ ANTIGRAVITY_OAUTH_1="/path/to/your/antigravity_creds.json"
572
+ ```
573
+
574
+ - **Stateless Deployment** (Environment Variables):
575
+ ```env
576
+ ANTIGRAVITY_ACCESS_TOKEN="ya29.your-access-token"
577
+
578
+
579
+ #### Credential Rotation Strategy
580
+
581
+ - **`ROTATION_TOLERANCE`**: Controls how credentials are selected for requests. Set via environment variable or programmatically.
582
+ - `0.0`: **Deterministic** - Always selects the least-used credential for perfect load balance
583
+ - `3.0` (default, recommended): **Weighted Random** - Randomly selects with bias toward less-used credentials. Provides unpredictability (harder to fingerprint/detect) while maintaining good balance
584
+ - `5.0+`: **High Randomness** - Maximum unpredictability, even heavily-used credentials can be selected
585
+
586
+ ```env
587
+ # For maximum security/unpredictability (recommended for production)
588
+ ROTATION_TOLERANCE=3.0
589
+
590
+ # For perfect load balancing (default)
591
+ ROTATION_TOLERANCE=0.0
592
+ ```
593
+
594
+ **Why use weighted random?**
595
+ - Makes traffic patterns less predictable
596
+ - Still maintains good load distribution across keys
597
+ - Recommended for production environments with multiple credentials
598
+
599
+
600
+ ANTIGRAVITY_REFRESH_TOKEN="1//your-refresh-token"
601
+ ANTIGRAVITY_EXPIRY_DATE="1234567890000"
602
+ ANTIGRAVITY_EMAIL="your-email@gmail.com"
603
+ ```
604
+
605
+ - **`ANTIGRAVITY_ENABLE_SIGNATURE_CACHE`**: Enable/disable thought signature caching for Gemini 3 multi-turn conversations. Default: `true`.
606
+ ```env
607
+ ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
608
+ ```
609
+
610
+ - **`ANTIGRAVITY_GEMINI3_TOOL_FIX`**: Enable/disable tool hallucination prevention for Gemini 3 models. Default: `true`.
611
+ ```env
612
+ ANTIGRAVITY_GEMINI3_TOOL_FIX=true
613
+ ```
614
+
615
+ #### Temperature Override (Global)
616
+
617
+ - **`OVERRIDE_TEMPERATURE_ZERO`**: Prevents tool hallucination caused by temperature=0 settings. Modes:
618
+ - `"remove"`: Deletes temperature=0 from requests (lets provider use default)
619
+ - `"set"`: Changes temperature=0 to temperature=1.0
620
+ - `"false"` or unset: Disabled (default)
621
+
622
+ #### Credential Prioritization
623
+
624
+ - **`GEMINI_CLI_PROJECT_ID`**: Manually specify a Google Cloud Project ID for Gemini CLI OAuth. Auto-discovered unless unexpected failure occurs.
625
+ ```env
626
+ GEMINI_CLI_PROJECT_ID="your-gcp-project-id"
627
+ ```
628
+
629
+
630
  ```env
631
  GEMINI_CLI_PROJECT_ID="your-gcp-project-id"
632
  ```
src/rotator_library/README.md CHANGED
@@ -7,9 +7,11 @@ A robust, asynchronous, and thread-safe Python library for managing a pool of AP
7
  - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
  - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`), it can also support multiple concurrent requests to the *same* model using the same key.
9
  - **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability.
 
10
  - **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key selections, exceeds a specified time limit.
11
  - **OAuth & API Key Support**: Built-in support for standard API keys and complex OAuth flows.
12
- - **Gemini CLI**: Full OAuth 2.0 web flow with automatic project discovery and free-tier onboarding.
 
13
  - **Qwen Code**: Device Code flow support.
14
  - **iFlow**: Authorization Code flow with local callback handling.
15
  - **Stateless Deployment Ready**: Can load complex OAuth credentials from environment variables, eliminating the need for physical credential files in containerized environments.
@@ -17,11 +19,15 @@ A robust, asynchronous, and thread-safe Python library for managing a pool of AP
17
  - **Escalating Per-Model Cooldowns**: Failed keys are placed on a temporary, escalating cooldown for specific models.
18
  - **Key-Level Lockouts**: Keys failing across multiple models are temporarily removed from rotation.
19
  - **Stream Recovery**: The client detects mid-stream errors (like quota limits) and gracefully handles them.
 
 
20
  - **Robust Streaming Support**: Includes a wrapper for streaming responses that reassembles fragmented JSON chunks.
21
  - **Detailed Usage Tracking**: Tracks daily and global usage for each key, persisted to a JSON file.
22
  - **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
23
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
24
  - **Extensible**: Easily add support for new providers through a simple plugin-based architecture.
 
 
25
 
26
  ## Installation
27
 
@@ -71,7 +77,8 @@ client = RotatingClient(
71
  ignore_models={},
72
  whitelist_models={},
73
  enable_request_logging=False,
74
- max_concurrent_requests_per_key={}
 
75
  )
76
  ```
77
 
@@ -89,6 +96,17 @@ client = RotatingClient(
89
  - `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): A dictionary where keys are provider names and values are lists of model names/patterns to always include, overriding `ignore_models`.
90
  - `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging (useful for debugging complex interactions).
91
  - `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): A dictionary defining the maximum number of concurrent requests allowed for a single API key for a specific provider. Defaults to 1 if not specified.
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  ### Concurrency and Resource Management
94
 
@@ -185,9 +203,27 @@ Use this tool to:
185
 
186
  ### Google Gemini (CLI)
187
  - **Auth**: Simulates the Google Cloud CLI authentication flow.
188
- - **Project Discovery**: Automatically discovers the default Google Cloud Project ID.
 
 
 
 
 
 
 
189
  - **Rate Limits**: Implements smart fallback strategies (e.g., switching from `gemini-1.5-pro` to `gemini-1.5-pro-002`) when rate limits are hit.
190
 
 
 
 
 
 
 
 
 
 
 
 
191
  ## Error Handling and Cooldowns
192
 
193
  The client uses a sophisticated error handling mechanism:
 
7
  - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
  - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`), it can also support multiple concurrent requests to the *same* model using the same key.
9
  - **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability.
10
+ - **Configurable Rotation Strategy**: Choose between deterministic least-used selection (perfect balance) or default weighted random selection (unpredictable, harder to fingerprint).
11
  - **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key selections, exceeds a specified time limit.
12
  - **OAuth & API Key Support**: Built-in support for standard API keys and complex OAuth flows.
13
+ - **Gemini CLI**: Full OAuth 2.0 web flow with automatic project discovery, free-tier onboarding, and credential prioritization (paid vs free tier).
14
+ - **Antigravity**: Full OAuth 2.0 support for Gemini 3, Gemini 2.5, and Claude Sonnet 4.5 models with thought signature caching(Full support for Gemini 3 and Claude models). **First on the scene to provide full support for Gemini 3** via Antigravity with advanced features like thought signature caching and tool hallucination prevention.
15
  - **Qwen Code**: Device Code flow support.
16
  - **iFlow**: Authorization Code flow with local callback handling.
17
  - **Stateless Deployment Ready**: Can load complex OAuth credentials from environment variables, eliminating the need for physical credential files in containerized environments.
 
19
  - **Escalating Per-Model Cooldowns**: Failed keys are placed on a temporary, escalating cooldown for specific models.
20
  - **Key-Level Lockouts**: Keys failing across multiple models are temporarily removed from rotation.
21
  - **Stream Recovery**: The client detects mid-stream errors (like quota limits) and gracefully handles them.
22
+ - **Credential Prioritization**: Automatic tier detection and priority-based credential selection (e.g., paid tier credentials used first for models that require them).
23
+ - **Advanced Model Requirements**: Support for model-tier restrictions (e.g., Gemini 3 requires paid-tier credentials).
24
  - **Robust Streaming Support**: Includes a wrapper for streaming responses that reassembles fragmented JSON chunks.
25
  - **Detailed Usage Tracking**: Tracks daily and global usage for each key, persisted to a JSON file.
26
  - **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
27
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
28
  - **Extensible**: Easily add support for new providers through a simple plugin-based architecture.
29
+ - **Temperature Override**: Global temperature=0 override to prevent tool hallucination with low-temperature settings.
30
+ - **Shared OAuth Base**: Refactored OAuth implementation with reusable [`GoogleOAuthBase`](providers/google_oauth_base.py) for multiple providers.
31
 
32
  ## Installation
33
 
 
77
  ignore_models={},
78
  whitelist_models={},
79
  enable_request_logging=False,
80
+ max_concurrent_requests_per_key={},
81
+ rotation_tolerance=2.0 # 0.0=deterministic, 2.0=recommended random
82
  )
83
  ```
84
 
 
96
  - `whitelist_models` (`Optional[Dict[str, List[str]]]`, default: `None`): A dictionary where keys are provider names and values are lists of model names/patterns to always include, overriding `ignore_models`.
97
  - `enable_request_logging` (`bool`, default: `False`): If `True`, enables detailed per-request file logging (useful for debugging complex interactions).
98
  - `max_concurrent_requests_per_key` (`Optional[Dict[str, int]]`, default: `None`): A dictionary defining the maximum number of concurrent requests allowed for a single API key for a specific provider. Defaults to 1 if not specified.
99
+ - `rotation_tolerance` (`float`, default: `0.0`): Controls credential rotation strategy:
100
+ - `0.0`: **Deterministic** - Always selects the least-used credential for perfect load balance.
101
+ - `2.0` (default, recommended): **Weighted Random** - Randomly selects credentials with bias toward less-used ones. Provides unpredictability (harder to fingerprint) while maintaining good balance.
102
+ - `5.0+`: **High Randomness** - Even heavily-used credentials have significant selection probability. Maximum unpredictability.
103
+
104
+ The weight formula is: `weight = (max_usage - credential_usage) + tolerance + 1`
105
+
106
+ **Use Cases:**
107
+ - `0.0`: When perfect load balance is critical
108
+ - `2.0`: When avoiding fingerprinting/rate limit detection is important
109
+ - `5.0+`: For stress testing or maximum unpredictability
110
 
111
  ### Concurrency and Resource Management
112
 
 
203
 
204
  ### Google Gemini (CLI)
205
  - **Auth**: Simulates the Google Cloud CLI authentication flow.
206
+ - **Project Discovery**: Automatically discovers the default Google Cloud Project ID with enhanced onboarding flow.
207
+ - **Credential Prioritization**: Automatic detection and prioritization of paid vs free tier credentials.
208
+ - **Model Tier Requirements**: Gemini 3 models automatically filtered to paid-tier credentials only.
209
+ - **Gemini 3 Support**: Full support for Gemini 3 models with:
210
+ - `thinkingLevel` configuration (low/high)
211
+ - Tool hallucination prevention via system instruction injection
212
+ - ThoughtSignature caching for multi-turn conversations
213
+ - Parameter signature injection into tool descriptions
214
  - **Rate Limits**: Implements smart fallback strategies (e.g., switching from `gemini-1.5-pro` to `gemini-1.5-pro-002`) when rate limits are hit.
215
 
216
+ ### Antigravity
217
+ - **Auth**: Uses OAuth 2.0 flow similar to Gemini CLI, with Antigravity-specific credentials and scopes.
218
+ - **Models**: Supports Gemini 2.5 (Pro/Flash), Gemini 3 (Pro/Image), and Claude Sonnet 4.5 via Google's internal Antigravity API.
219
+ - **Thought Signature Caching**: Server-side caching of `thoughtSignature` data for multi-turn conversations with Gemini 3 models.
220
+ - **Tool Hallucination Prevention**: Automatic injection of system instructions and parameter signatures for Gemini 3 to prevent tool parameter hallucination.
221
+ - **Thinking Support**:
222
+ - Gemini 2.5: Uses `thinkingBudget` (integer tokens)
223
+ - Gemini 3: Uses `thinkingLevel` (string: "low"/"high")
224
+ - Claude: Uses `thinkingBudget` via Antigravity proxy
225
+ - **Base URL Fallback**: Automatic fallback between sandbox and production endpoints.
226
+
227
  ## Error Handling and Cooldowns
228
 
229
  The client uses a sophisticated error handling mechanism: