Mirrowel commited on
Commit
2eb0cb6
Β·
1 Parent(s): 09eea32

docs(antigravity): πŸ“š document quota tracking, background jobs, and error handling

Browse files

Adds comprehensive documentation for recently implemented Antigravity provider features:

- Provider-specific background jobs system with independent timers
- Antigravity quota tracker with API baseline fetching and request counting
- TransientQuotaError for handling bare 429 responses without retry info
- Quota groups configuration for models sharing usage limits
- Parallel tool usage instruction injection for Claude and Gemini 3
- Quota cost constants and model name mappings

Documents both the architecture and implementation details to help developers understand the advanced usage tracking capabilities.

Files changed (3) hide show
  1. DOCUMENTATION.md +222 -15
  2. README.md +16 -1
  3. src/rotator_library/README.md +9 -1
DOCUMENTATION.md CHANGED
@@ -151,13 +151,49 @@ The `EmbeddingBatcher` class optimizes high-throughput embedding workloads.
151
  2. A time window (`timeout`, default: 0.1s) elapses since the first request in the batch.
152
  * **Efficiency**: This reduces dozens of HTTP calls to a single API request, significantly reducing overhead and rate limit usage.
153
 
154
- ### 2.4. `background_refresher.py` - Automated Token Maintenance
155
 
156
- The `BackgroundRefresher` ensures that OAuth tokens (for providers like Gemini CLI, Qwen, iFlow) never expire while the proxy is running.
157
 
158
- * **Periodic Checks**: It runs a background task that wakes up at a configurable interval (default: 3600 seconds/1 hour).
 
 
159
  * **Proactive Refresh**: It iterates through all loaded OAuth credentials and calls their `proactively_refresh` method to ensure tokens are valid before they are needed.
160
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
  ### 2.6. Credential Management Architecture
162
 
163
  The `CredentialManager` class (`credential_manager.py`) centralizes the lifecycle of all API credentials. It adheres to a "Local First" philosophy.
@@ -295,15 +331,19 @@ class ErrorType(Enum):
295
  - `400` with "quota" β†’ `QUOTA`
296
  - `500`/`502`/`503` β†’ `SERVER_ERROR`
297
 
298
- 2. **Message Analysis**: Fallback for ambiguous errors
 
 
 
 
299
  - Searches for keywords like "quota exceeded", "rate limit", "invalid api key"
300
 
301
- 3. **Provider-Specific Overrides**: Some providers use non-standard error formats
302
 
303
  **Usage in Client:**
304
  - `AUTHENTICATION` β†’ Immediate 5-minute global lockout
305
  - `RATE_LIMIT`/`QUOTA` β†’ Escalating per-model cooldown
306
- - `SERVER_ERROR` β†’ Retry with same key (up to `max_retries`)
307
  - `CONTEXT_LENGTH`/`CONTENT_FILTER` β†’ Immediate failure (user needs to fix request)
308
 
309
  ---
@@ -409,6 +449,124 @@ A modular, shared caching system for providers to persist conversation state acr
409
  - **Background Persistence**: Batched disk writes every 60 seconds (configurable)
410
  - **Automatic Cleanup**: Background task removes expired entries from memory cache
411
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
412
  ### 3.5. Antigravity (`antigravity_provider.py`)
413
 
414
  The most sophisticated provider implementation, supporting Google's internal Antigravity API for Gemini 3 and Claude models (including **Claude Opus 4.5**, Anthropic's most powerful model).
@@ -421,8 +579,10 @@ The most sophisticated provider implementation, supporting Google's internal Ant
421
  - **Credential Prioritization**: Automatic tier detection with paid credentials prioritized over free (paid tier resets every 5 hours, free tier resets weekly)
422
  - **Sequential Rotation Mode**: Default rotation mode is sequential (use credentials until exhausted) to maximize thought signature cache hits
423
  - **Per-Model Quota Tracking**: Each model tracks independent usage windows with authoritative reset timestamps from quota errors
424
- - **Quota Groups**: Claude models (Sonnet 4.5 + Opus 4.5) can be grouped to share quota limits (disabled by default, configurable via `QUOTA_GROUPS_ANTIGRAVITY_CLAUDE`)
425
  - **Priority Multipliers**: Paid tier credentials get higher concurrency limits (Priority 1: 5x, Priority 2: 3x, Priority 3+: 2x in sequential mode)
 
 
426
 
427
  #### Model Support
428
 
@@ -437,8 +597,18 @@ The most sophisticated provider implementation, supporting Google's internal Ant
437
  - Caching signatures from responses for reuse in follow-up messages
438
  - Automatic injection into functionCalls for multi-turn conversations
439
  - Fallback to bypass value if signature unavailable
 
 
 
 
 
 
 
 
 
 
440
 
441
- **Claude Opus 4.5 (NEW!):**
442
  - Anthropic's most powerful model, now available via Antigravity proxy
443
  - **Always uses thinking variant** - `claude-opus-4-5-thinking` is the only available variant (non-thinking version doesn't exist)
444
  - Uses `thinkingBudget` parameter for extended thinking control (-1 for auto, 0 to disable, or specific token count)
@@ -453,6 +623,11 @@ The most sophisticated provider implementation, supporting Google's internal Ant
453
  - Without `reasoning_effort`: Uses standard `claude-sonnet-4-5` variant
454
  - **Thinking Preservation**: Caches thinking content using composite keys (tool_call_id + text_hash)
455
  - **Schema Cleaning**: Removes unsupported properties (`$schema`, `additionalProperties`, `const` β†’ `enum`)
 
 
 
 
 
456
 
457
  #### Base URL Fallback
458
 
@@ -494,6 +669,14 @@ ANTIGRAVITY_CLAUDE_THINKING_SANITIZATION=true # Enable Claude thinking mode aut
494
  ANTIGRAVITY_GEMINI3_TOOL_PREFIX="gemini3_" # Namespace prefix
495
  ANTIGRAVITY_GEMINI3_DESCRIPTION_PROMPT="\n\nSTRICT PARAMETERS: {params}."
496
  ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..." # Full system prompt
 
 
 
 
 
 
 
 
497
  ```
498
 
499
  #### Claude Extended Thinking Sanitization
@@ -714,15 +897,24 @@ Models that share the same quota limits can be grouped:
714
  **Configuration**:
715
  ```env
716
  # Models in a group share quota/cooldown timing
717
- QUOTA_GROUPS_ANTIGRAVITY_CLAUDE="claude-sonnet-4-5,claude-opus-4-5"
 
 
718
 
719
  # To disable a default group:
720
  QUOTA_GROUPS_ANTIGRAVITY_CLAUDE=""
721
  ```
722
 
 
 
 
 
 
 
 
 
723
  **Behavior**:
724
  - When one model hits quota, all models in the group receive the same `quota_reset_ts`
725
- - Combined weighted usage for credential selection (e.g., Opus counts 2x vs Sonnet)
726
  - Group resets only when ALL models' quotas have reset
727
  - Preserves unexpired cooldowns during other resets
728
 
@@ -730,11 +922,26 @@ QUOTA_GROUPS_ANTIGRAVITY_CLAUDE=""
730
  ```python
731
  class AntigravityProvider(ProviderInterface):
732
  model_quota_groups = {
733
- "claude": ["claude-sonnet-4-5", "claude-opus-4-5"]
734
- }
735
-
736
- model_usage_weights = {
737
- "claude-opus-4-5": 2 # Opus counts 2x vs Sonnet
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
738
  }
739
  ```
740
 
 
151
  2. A time window (`timeout`, default: 0.1s) elapses since the first request in the batch.
152
  * **Efficiency**: This reduces dozens of HTTP calls to a single API request, significantly reducing overhead and rate limit usage.
153
 
154
+ ### 2.4. `background_refresher.py` - Automated Token Maintenance & Provider Jobs
155
 
156
+ The `BackgroundRefresher` manages background tasks for the proxy, including OAuth token refresh and provider-specific periodic jobs.
157
 
158
+ #### OAuth Token Refresh
159
+
160
+ * **Periodic Checks**: It runs a background task that wakes up at a configurable interval (default: 600 seconds/10 minutes via `OAUTH_REFRESH_INTERVAL`).
161
  * **Proactive Refresh**: It iterates through all loaded OAuth credentials and calls their `proactively_refresh` method to ensure tokens are valid before they are needed.
162
 
163
+ #### Provider-Specific Background Jobs
164
+
165
+ Providers can define their own background jobs that run on independent schedules:
166
+
167
+ * **Independent Timers**: Each provider's job runs on its own interval, separate from the OAuth refresh cycle.
168
+ * **Configuration**: Providers implement `get_background_job_config()` to define their job settings.
169
+ * **Execution**: Providers implement `run_background_job()` to execute the periodic task.
170
+
171
+ **Provider Job Configuration:**
172
+ ```python
173
+ def get_background_job_config(self) -> Optional[Dict[str, Any]]:
174
+ """Return configuration for provider-specific background job."""
175
+ return {
176
+ "interval": 300, # seconds between runs
177
+ "name": "quota_refresh", # for logging
178
+ "run_on_start": True, # whether to run immediately at startup
179
+ }
180
+
181
+ async def run_background_job(
182
+ self,
183
+ usage_manager: "UsageManager",
184
+ credentials: List[str],
185
+ ) -> None:
186
+ """Execute the provider's periodic background job."""
187
+ # Provider-specific logic here
188
+ pass
189
+ ```
190
+
191
+ **Current Provider Jobs:**
192
+
193
+ | Provider | Job Name | Default Interval | Purpose |
194
+ |----------|----------|------------------|---------|
195
+ | Antigravity | `quota_baseline_refresh` | 300s (5 min) | Fetches quota status from API to update remaining quota estimates |
196
+
197
  ### 2.6. Credential Management Architecture
198
 
199
  The `CredentialManager` class (`credential_manager.py`) centralizes the lifecycle of all API credentials. It adheres to a "Local First" philosophy.
 
331
  - `400` with "quota" β†’ `QUOTA`
332
  - `500`/`502`/`503` β†’ `SERVER_ERROR`
333
 
334
+ 2. **Special Exception Types**:
335
+ - `EmptyResponseError` β†’ `SERVER_ERROR` (status 503, rotatable)
336
+ - `TransientQuotaError` β†’ `SERVER_ERROR` (status 503, rotatable - bare 429 without retry info)
337
+
338
+ 3. **Message Analysis**: Fallback for ambiguous errors
339
  - Searches for keywords like "quota exceeded", "rate limit", "invalid api key"
340
 
341
+ 4. **Provider-Specific Overrides**: Some providers use non-standard error formats
342
 
343
  **Usage in Client:**
344
  - `AUTHENTICATION` β†’ Immediate 5-minute global lockout
345
  - `RATE_LIMIT`/`QUOTA` β†’ Escalating per-model cooldown
346
+ - `SERVER_ERROR` β†’ Retry with same key (up to `max_retries`), then rotate
347
  - `CONTEXT_LENGTH`/`CONTENT_FILTER` β†’ Immediate failure (user needs to fix request)
348
 
349
  ---
 
449
  - **Background Persistence**: Batched disk writes every 60 seconds (configurable)
450
  - **Automatic Cleanup**: Background task removes expired entries from memory cache
451
 
452
+ ### 2.15. Antigravity Quota Tracker (`providers/utilities/antigravity_quota_tracker.py`)
453
+
454
+ A mixin class providing quota tracking functionality for the Antigravity provider. This enables accurate remaining quota estimation based on API-fetched baselines and local request counting.
455
+
456
+ #### Core Concepts
457
+
458
+ **Quota Baseline Tracking:**
459
+ - Periodically fetches quota status from the Antigravity `fetchAvailableModels` API
460
+ - Stores the remaining fraction as a baseline in UsageManager
461
+ - Tracks requests since baseline to estimate current remaining quota
462
+ - Syncs local request count with API's authoritative values
463
+
464
+ **Quota Cost Constants:**
465
+ Based on empirical testing (see `docs/ANTIGRAVITY_QUOTA_REPORT.md`), quota costs are known per model and tier:
466
+
467
+ | Tier | Model Group | Cost per Request | Requests per 100% |
468
+ |------|-------------|------------------|-------------------|
469
+ | standard-tier | Claude/GPT-OSS | 0.40% | 250 |
470
+ | standard-tier | Gemini 3 Pro | 0.25% | 400 |
471
+ | standard-tier | Gemini 2.5 Flash | 0.0333% | ~3000 |
472
+ | free-tier | Claude/GPT-OSS | 1.333% | 75 |
473
+ | free-tier | Gemini 3 Pro | 0.40% | 250 |
474
+
475
+ **Model Name Mappings:**
476
+ Some user-facing model names don't exist directly in the API response:
477
+ - `claude-opus-4-5` β†’ `claude-opus-4-5-thinking` (Opus only exists as thinking variant)
478
+ - `gemini-3-pro-preview` β†’ `gemini-3-pro-high` (preview maps to high by default)
479
+
480
+ #### Key Methods
481
+
482
+ **`fetch_quota_from_api(credential_path)`:**
483
+ Fetches current quota status from the Antigravity API. Returns remaining fraction and reset times for all models.
484
+
485
+ **`estimate_remaining_quota(credential_path, model, model_data, tier)`:**
486
+ Estimates remaining quota based on baseline + request tracking. Returns confidence level (high/medium/low) based on baseline age.
487
+
488
+ **`refresh_active_quota_baselines(credentials, usage_data)`:**
489
+ Only refreshes baselines for credentials that have been used recently (within the refresh interval).
490
+
491
+ **`discover_quota_costs(credential_path, models_to_test)`:**
492
+ Manual utility to discover quota costs by making test requests and measuring before/after quota. Saves learned costs to `cache/antigravity/learned_quota_costs.json`.
493
+
494
+ #### Integration with Background Jobs
495
+
496
+ The Antigravity provider defines a background job for quota baseline refresh:
497
+
498
+ ```python
499
+ def get_background_job_config(self) -> Optional[Dict[str, Any]]:
500
+ return {
501
+ "interval": 300, # 5 minutes (configurable via ANTIGRAVITY_QUOTA_REFRESH_INTERVAL)
502
+ "name": "quota_baseline_refresh",
503
+ "run_on_start": True,
504
+ }
505
+ ```
506
+
507
+ This job:
508
+ 1. Identifies credentials used since the last refresh
509
+ 2. Fetches current quota from the API for those credentials
510
+ 3. Updates baselines in UsageManager for accurate estimation
511
+
512
+ #### Data Storage
513
+
514
+ Quota baselines are stored in UsageManager's per-model data:
515
+
516
+ ```json
517
+ {
518
+ "credential_path": {
519
+ "models": {
520
+ "antigravity/claude-sonnet-4-5": {
521
+ "request_count": 15,
522
+ "baseline_remaining_fraction": 0.94,
523
+ "baseline_fetched_at": 1734567890.0,
524
+ "requests_at_baseline": 15,
525
+ "quota_max_requests": 250,
526
+ "quota_display": "15/250"
527
+ }
528
+ }
529
+ }
530
+ }
531
+ ```
532
+
533
+ ### 2.16. TransientQuotaError (`error_handler.py`)
534
+
535
+ A new error type for handling bare 429 responses without retry timing information.
536
+
537
+ **When Raised:**
538
+ - Provider returns HTTP 429 status code
539
+ - Response doesn't contain retry timing info (no `quotaResetTimeStamp` or `retryDelay`)
540
+ - After internal retry attempts are exhausted
541
+
542
+ **Behavior:**
543
+ - Classified as `server_error` (status 503) rather than quota exhaustion
544
+ - Causes credential rotation to try the next credential
545
+ - Does NOT trigger long-term quota cooldowns
546
+
547
+ **Implementation in Antigravity:**
548
+ ```python
549
+ # Non-streaming and streaming both retry bare 429s
550
+ for attempt in range(EMPTY_RESPONSE_MAX_ATTEMPTS):
551
+ try:
552
+ result = await self._handle_request(...)
553
+ except httpx.HTTPStatusError as e:
554
+ if e.response.status_code == 429:
555
+ quota_info = self.parse_quota_error(e)
556
+ if quota_info is None:
557
+ # Bare 429 - retry like empty response
558
+ if attempt < EMPTY_RESPONSE_MAX_ATTEMPTS - 1:
559
+ await asyncio.sleep(EMPTY_RESPONSE_RETRY_DELAY)
560
+ continue
561
+ else:
562
+ raise TransientQuotaError(provider, model, message)
563
+ # Has retry info - real quota exhaustion
564
+ raise
565
+ ```
566
+
567
+ **Rationale:**
568
+ Some 429 responses are transient rate limits rather than true quota exhaustion. These occur when the API is temporarily overloaded but the credential still has quota available. Retrying internally before rotating credentials provides better resilience.
569
+
570
  ### 3.5. Antigravity (`antigravity_provider.py`)
571
 
572
  The most sophisticated provider implementation, supporting Google's internal Antigravity API for Gemini 3 and Claude models (including **Claude Opus 4.5**, Anthropic's most powerful model).
 
579
  - **Credential Prioritization**: Automatic tier detection with paid credentials prioritized over free (paid tier resets every 5 hours, free tier resets weekly)
580
  - **Sequential Rotation Mode**: Default rotation mode is sequential (use credentials until exhausted) to maximize thought signature cache hits
581
  - **Per-Model Quota Tracking**: Each model tracks independent usage windows with authoritative reset timestamps from quota errors
582
+ - **Quota Groups**: Models that share quota limits are grouped together (Claude/GPT-OSS share quota, Gemini 3 Pro variants share quota, Gemini 2.5 Flash variants share quota)
583
  - **Priority Multipliers**: Paid tier credentials get higher concurrency limits (Priority 1: 5x, Priority 2: 3x, Priority 3+: 2x in sequential mode)
584
+ - **Quota Baseline Tracking**: Background job fetches quota status from API to provide accurate remaining quota estimates
585
+ - **TransientQuotaError Handling**: Bare 429 responses (without retry info) are retried internally before credential rotation
586
 
587
  #### Model Support
588
 
 
597
  - Caching signatures from responses for reuse in follow-up messages
598
  - Automatic injection into functionCalls for multi-turn conversations
599
  - Fallback to bypass value if signature unavailable
600
+ - **Parallel Tool Usage Instruction**: Configurable instruction injection to encourage parallel tool calls (disabled by default for Gemini 3)
601
+
602
+ **Gemini 2.5 Flash:**
603
+ - Uses `-thinking` variant when `reasoning_effort` is provided
604
+ - Shares quota with `gemini-2.5-flash-thinking` and `gemini-2.5-flash-lite` variants
605
+ - Parallel tool usage instruction configurable
606
+
607
+ **Gemini 2.5 Flash Lite:**
608
+ - Configurable thinking budget, no name change required
609
+ - Shares quota with Flash variants
610
 
611
+ **Claude Opus 4.5:**
612
  - Anthropic's most powerful model, now available via Antigravity proxy
613
  - **Always uses thinking variant** - `claude-opus-4-5-thinking` is the only available variant (non-thinking version doesn't exist)
614
  - Uses `thinkingBudget` parameter for extended thinking control (-1 for auto, 0 to disable, or specific token count)
 
623
  - Without `reasoning_effort`: Uses standard `claude-sonnet-4-5` variant
624
  - **Thinking Preservation**: Caches thinking content using composite keys (tool_call_id + text_hash)
625
  - **Schema Cleaning**: Removes unsupported properties (`$schema`, `additionalProperties`, `const` β†’ `enum`)
626
+ - **Parallel Tool Usage Instruction**: Automatic instruction injection to encourage parallel tool calls (enabled by default for Claude)
627
+
628
+ **GPT-OSS 120B Medium:**
629
+ - OpenAI-compatible model available via Antigravity
630
+ - Shares quota with Claude models (Claude/GPT-OSS quota group)
631
 
632
  #### Base URL Fallback
633
 
 
669
  ANTIGRAVITY_GEMINI3_TOOL_PREFIX="gemini3_" # Namespace prefix
670
  ANTIGRAVITY_GEMINI3_DESCRIPTION_PROMPT="\n\nSTRICT PARAMETERS: {params}."
671
  ANTIGRAVITY_GEMINI3_SYSTEM_INSTRUCTION="..." # Full system prompt
672
+
673
+ # Parallel tool usage instruction
674
+ ANTIGRAVITY_PARALLEL_TOOL_INSTRUCTION_CLAUDE=true # Inject parallel tool instruction for Claude (default: true)
675
+ ANTIGRAVITY_PARALLEL_TOOL_INSTRUCTION_GEMINI3=false # Inject parallel tool instruction for Gemini 3 (default: false)
676
+ ANTIGRAVITY_PARALLEL_TOOL_INSTRUCTION="..." # Custom instruction text
677
+
678
+ # Quota tracking
679
+ ANTIGRAVITY_QUOTA_REFRESH_INTERVAL=300 # Background quota refresh interval in seconds (default: 300 = 5 min)
680
  ```
681
 
682
  #### Claude Extended Thinking Sanitization
 
897
  **Configuration**:
898
  ```env
899
  # Models in a group share quota/cooldown timing
900
+ QUOTA_GROUPS_ANTIGRAVITY_CLAUDE="claude-sonnet-4-5,claude-sonnet-4-5-thinking,claude-opus-4-5,claude-opus-4-5-thinking,gpt-oss-120b-medium"
901
+ QUOTA_GROUPS_ANTIGRAVITY_GEMINI_3_PRO="gemini-3-pro-high,gemini-3-pro-low,gemini-3-pro-preview"
902
+ QUOTA_GROUPS_ANTIGRAVITY_GEMINI_2_5_FLASH="gemini-2.5-flash,gemini-2.5-flash-thinking,gemini-2.5-flash-lite"
903
 
904
  # To disable a default group:
905
  QUOTA_GROUPS_ANTIGRAVITY_CLAUDE=""
906
  ```
907
 
908
+ **Default Quota Groups (Antigravity)**:
909
+
910
+ | Group Name | Models | Shared Quota |
911
+ |------------|--------|--------------|
912
+ | `claude` | claude-sonnet-4-5, claude-sonnet-4-5-thinking, claude-opus-4-5, claude-opus-4-5-thinking, gpt-oss-120b-medium | Yes (Claude and GPT-OSS share quota) |
913
+ | `gemini-3-pro` | gemini-3-pro-high, gemini-3-pro-low, gemini-3-pro-preview | Yes |
914
+ | `gemini-2.5-flash` | gemini-2.5-flash, gemini-2.5-flash-thinking, gemini-2.5-flash-lite | Yes |
915
+
916
  **Behavior**:
917
  - When one model hits quota, all models in the group receive the same `quota_reset_ts`
 
918
  - Group resets only when ALL models' quotas have reset
919
  - Preserves unexpired cooldowns during other resets
920
 
 
922
  ```python
923
  class AntigravityProvider(ProviderInterface):
924
  model_quota_groups = {
925
+ # Claude and GPT-OSS share the same quota pool
926
+ "claude": [
927
+ "claude-sonnet-4-5",
928
+ "claude-sonnet-4-5-thinking",
929
+ "claude-opus-4-5",
930
+ "claude-opus-4-5-thinking",
931
+ "gpt-oss-120b-medium",
932
+ ],
933
+ # Gemini 3 Pro variants share quota
934
+ "gemini-3-pro": [
935
+ "gemini-3-pro-high",
936
+ "gemini-3-pro-low",
937
+ "gemini-3-pro-preview",
938
+ ],
939
+ # Gemini 2.5 Flash variants share quota
940
+ "gemini-2.5-flash": [
941
+ "gemini-2.5-flash",
942
+ "gemini-2.5-flash-thinking",
943
+ "gemini-2.5-flash-lite",
944
+ ],
945
  }
946
  ```
947
 
README.md CHANGED
@@ -329,10 +329,19 @@ The proxy includes a powerful text-based UI for configuration and management.
329
 
330
  **Antigravity:**
331
  - Gemini 3 Pro with `thinkingLevel` support
 
332
  - Claude Opus 4.5 (thinking mode)
333
  - Claude Sonnet 4.5 (thinking and non-thinking)
 
334
  - Thought signature caching for multi-turn conversations
335
  - Tool hallucination prevention
 
 
 
 
 
 
 
336
 
337
  **Qwen Code:**
338
  - Dual auth (API key + OAuth Device Flow)
@@ -531,9 +540,11 @@ Access Google's internal Antigravity API for cutting-edge models.
531
 
532
  **Supported Models:**
533
  - **Gemini 3 Pro** β€” with `thinkingLevel` support (low/high)
 
 
534
  - **Claude Opus 4.5** β€” Anthropic's most powerful model (thinking mode only)
535
  - **Claude Sonnet 4.5** β€” supports both thinking and non-thinking modes
536
- - Gemini 2.5 Pro/Flash
537
 
538
  **Setup:**
539
  1. Run `python -m rotator_library.credential_tool`
@@ -545,6 +556,8 @@ Access Google's internal Antigravity API for cutting-edge models.
545
  - Tool hallucination prevention via parameter signature injection
546
  - Automatic thinking block sanitization for Claude
547
  - Credential prioritization (paid resets every 5 hours, free weekly)
 
 
548
 
549
  **Environment Variables:**
550
  ```env
@@ -556,6 +569,8 @@ ANTIGRAVITY_EMAIL="your-email@gmail.com"
556
  # Feature toggles
557
  ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
558
  ANTIGRAVITY_GEMINI3_TOOL_FIX=true
 
 
559
  ```
560
 
561
  > **Note:** Gemini 3 models require a paid-tier Google Cloud project.
 
329
 
330
  **Antigravity:**
331
  - Gemini 3 Pro with `thinkingLevel` support
332
+ - Gemini 2.5 Flash/Flash Lite with thinking mode
333
  - Claude Opus 4.5 (thinking mode)
334
  - Claude Sonnet 4.5 (thinking and non-thinking)
335
+ - GPT-OSS 120B Medium
336
  - Thought signature caching for multi-turn conversations
337
  - Tool hallucination prevention
338
+ - Quota baseline tracking with background refresh
339
+ - Parallel tool usage instruction injection
340
+ - **Quota Groups**: Models that share quota are automatically grouped:
341
+ - Claude/GPT-OSS: `claude-sonnet-4-5`, `claude-opus-4-5`, `gpt-oss-120b-medium`
342
+ - Gemini 3 Pro: `gemini-3-pro-high`, `gemini-3-pro-low`, `gemini-3-pro-preview`
343
+ - Gemini 2.5 Flash: `gemini-2.5-flash`, `gemini-2.5-flash-thinking`, `gemini-2.5-flash-lite`
344
+ - All models in a group deplete the usage of the group equally. So in claude group - it is beneficial to use only Opus, and forget about Sonnet and GPT-OSS.
345
 
346
  **Qwen Code:**
347
  - Dual auth (API key + OAuth Device Flow)
 
540
 
541
  **Supported Models:**
542
  - **Gemini 3 Pro** β€” with `thinkingLevel` support (low/high)
543
+ - **Gemini 2.5 Flash** β€” with thinking mode support
544
+ - **Gemini 2.5 Flash Lite** β€” configurable thinking budget
545
  - **Claude Opus 4.5** β€” Anthropic's most powerful model (thinking mode only)
546
  - **Claude Sonnet 4.5** β€” supports both thinking and non-thinking modes
547
+ - **GPT-OSS 120B** β€” OpenAI-compatible model
548
 
549
  **Setup:**
550
  1. Run `python -m rotator_library.credential_tool`
 
556
  - Tool hallucination prevention via parameter signature injection
557
  - Automatic thinking block sanitization for Claude
558
  - Credential prioritization (paid resets every 5 hours, free weekly)
559
+ - Quota baseline tracking with background refresh (accurate remaining quota estimates)
560
+ - Parallel tool usage instruction injection for Claude
561
 
562
  **Environment Variables:**
563
  ```env
 
569
  # Feature toggles
570
  ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
571
  ANTIGRAVITY_GEMINI3_TOOL_FIX=true
572
+ ANTIGRAVITY_QUOTA_REFRESH_INTERVAL=300 # Quota refresh interval (seconds)
573
+ ANTIGRAVITY_PARALLEL_TOOL_INSTRUCTION_CLAUDE=true # Parallel tool instruction for Claude
574
  ```
575
 
576
  > **Note:** Gemini 3 models require a paid-tier Google Cloud project.
src/rotator_library/README.md CHANGED
@@ -216,11 +216,19 @@ Use this tool to:
216
  ### Antigravity
217
  - **Auth**: Uses OAuth 2.0 flow similar to Gemini CLI, with Antigravity-specific credentials and scopes.
218
  - **Credential Prioritization**: Automatic detection and prioritization of paid vs free tier credentials (paid tier resets every 5 hours, free tier resets weekly).
219
- - **Models**: Supports Gemini 3 Pro, Claude Sonnet 4.5 (with/without thinking), and Claude Opus 4.5 (thinking only) via Google's internal Antigravity API.
 
 
 
 
 
 
220
  - **Thought Signature Caching**: Server-side caching of `thoughtSignature` data for multi-turn conversations with Gemini 3 models.
221
  - **Tool Hallucination Prevention**: Automatic injection of system instructions and parameter signatures for Gemini 3 and Claude to prevent tool parameter hallucination.
 
222
  - **Thinking Support**:
223
  - Gemini 3: Uses `thinkingLevel` (string: "low"/"high")
 
224
  - Claude Sonnet 4.5: Uses `thinkingBudget` (optional - supports both thinking and non-thinking modes)
225
  - Claude Opus 4.5: Uses `thinkingBudget` (always uses thinking variant)
226
  - **Base URL Fallback**: Automatic fallback between sandbox and production endpoints.
 
216
  ### Antigravity
217
  - **Auth**: Uses OAuth 2.0 flow similar to Gemini CLI, with Antigravity-specific credentials and scopes.
218
  - **Credential Prioritization**: Automatic detection and prioritization of paid vs free tier credentials (paid tier resets every 5 hours, free tier resets weekly).
219
+ - **Models**: Supports Gemini 3 Pro, Gemini 2.5 Flash/Flash Lite, Claude Sonnet 4.5 (with/without thinking), Claude Opus 4.5 (thinking only), and GPT-OSS 120B via Google's internal Antigravity API.
220
+ - **Quota Groups**: Models that share quota are automatically grouped:
221
+ - Claude/GPT-OSS: `claude-sonnet-4-5`, `claude-opus-4-5`, `gpt-oss-120b-medium`
222
+ - Gemini 3 Pro: `gemini-3-pro-high`, `gemini-3-pro-low`, `gemini-3-pro-preview`
223
+ - Gemini 2.5 Flash: `gemini-2.5-flash`, `gemini-2.5-flash-thinking`, `gemini-2.5-flash-lite`
224
+ - All models in a group deplete the usage of the group equally. So in claude group - it is beneficial to use only Opus, and forget about Sonnet and GPT-OSS.
225
+ - **Quota Baseline Tracking**: Background job fetches quota status from API every 5 minutes to provide accurate remaining quota estimates.
226
  - **Thought Signature Caching**: Server-side caching of `thoughtSignature` data for multi-turn conversations with Gemini 3 models.
227
  - **Tool Hallucination Prevention**: Automatic injection of system instructions and parameter signatures for Gemini 3 and Claude to prevent tool parameter hallucination.
228
+ - **Parallel Tool Usage Instruction**: Configurable instruction injection to encourage parallel tool calls (enabled by default for Claude).
229
  - **Thinking Support**:
230
  - Gemini 3: Uses `thinkingLevel` (string: "low"/"high")
231
+ - Gemini 2.5 Flash: Uses `-thinking` variant when `reasoning_effort` is provided
232
  - Claude Sonnet 4.5: Uses `thinkingBudget` (optional - supports both thinking and non-thinking modes)
233
  - Claude Opus 4.5: Uses `thinkingBudget` (always uses thinking variant)
234
  - **Base URL Fallback**: Automatic fallback between sandbox and production endpoints.