Spaces:

elmerzole
/

llm-api-proxy

Paused

Mirrowel commited on Jun 11, 2025

Commit

c2eea0c

1 Parent(s): 26c6a6e

feat: Implement per-model API key cooldowns

Previously, when an API key encountered a rate limit or authentication error for any model, it was put on a global cooldown, preventing its use across all models.

This change introduces per-model cooldowns. If a key fails for a specific model, it is now only put on cooldown for that particular model. This allows the key to continue being used with other models, significantly improving key utilization and overall system resilience.

Updated:
- README.md and src/rotator_library/README.md to reflect the new per-model cooldown behavior.
- src/rotator_library/usage_manager.py to manage cooldowns on a per-model basis.

Files changed (3) hide show

README.md +1 -1
src/rotator_library/README.md +3 -3
src/rotator_library/usage_manager.py +20 -18

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ This project provides a robust solution for managing and rotating API keys for v
 -   **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
 -   **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
--   **Key Cooldowns**: Temporarily disables keys that encounter rate limits or authentication errors to prevent further issues.
 -   **Usage Tracking**: Monitors daily and global usage for each API key.
 -   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
 -   **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.

 -   **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
 -   **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
+-   **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
 -   **Usage Tracking**: Monitors daily and global usage for each API key.
 -   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
 -   **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.

src/rotator_library/README.md CHANGED Viewed

@@ -6,7 +6,7 @@ A simple, thread-safe client that intelligently rotates and retries API keys for
 -   **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
 -   **Automatic Retries**: Retries requests on transient server errors.
--   **Cooldowns**: Puts keys on a temporary cooldown after rate limit or authentication errors.
 -   **Usage Tracking**: Tracks daily and global usage for each key.
 -   **Provider Agnostic**: Works with any provider supported by `litellm`.
 -   **Extensible**: Easily add support for new providers through a plugin-based architecture.
@@ -97,10 +97,10 @@ Fetches a dictionary of all available models, grouped by provider.
 The client is designed to handle errors gracefully:
 -   **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
--   **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown and try the request again with a different key.
 -   **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
-Cooldowns are managed by the `UsageManager` and prevent failing keys from being used repeatedly.
 ## Extending with Provider Plugins

 -   **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
 -   **Automatic Retries**: Retries requests on transient server errors.
+-   **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
 -   **Usage Tracking**: Tracks daily and global usage for each key.
 -   **Provider Agnostic**: Works with any provider supported by `litellm`.
 -   **Extensible**: Easily add support for new providers through a plugin-based architecture.
 The client is designed to handle errors gracefully:
 -   **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
+-   **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown for that specific model and retry the request with a different key. This ensures that a single model failure does not sideline a key for all other models.
 -   **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
+Cooldowns are managed by the `UsageManager` on a per-model basis, preventing failing keys from being used repeatedly for models they have recently failed with. Upon a successful call, any existing cooldown for that key-model pair is cleared.
 ## Extending with Provider Plugins

src/rotator_library/usage_manager.py CHANGED Viewed

@@ -57,15 +57,17 @@ class UsageManager:
     def get_next_smart_key(self, available_keys: List[str], model: str) -> Optional[str]:
         """
-        Gets the least-used, available key based on daily stats.
         """
         best_key = None
         min_usage = float('inf')
-        # Filter for keys that are not on cooldown
         active_keys = []
         for key in available_keys:
-            cooldown_until = self.usage_data.get(key, {}).get("cooldown_until")
             if not cooldown_until or time.time() > cooldown_until:
                 active_keys.append(key)
@@ -74,7 +76,7 @@ class UsageManager:
         # Find the key with the minimum daily success_count for the given model
         for key in active_keys:
-            key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "cooldown_until": None})
             daily_model_usage = key_data.get("daily", {}).get("models", {}).get(model, {})
             usage_count = daily_model_usage.get("success_count", 0)
@@ -85,11 +87,15 @@ class UsageManager:
         return best_key if best_key else active_keys[0]
     def record_success(self, key: str, model: str, completion_response: litellm.ModelResponse):
-        key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "cooldown_until": None})
         # Ensure daily stats are for today
         if key_data["daily"].get("date") != date.today().isoformat():
-            self._reset_daily_stats_if_needed() # Should be rare, but as a safeguard
             key_data = self.usage_data[key]
         daily_model_data = key_data["daily"]["models"].setdefault(model, {"success_count": 0, "prompt_tokens": 0, "completion_tokens": 0, "approx_cost": 0.0})
@@ -99,11 +105,8 @@ class UsageManager:
         daily_model_data["prompt_tokens"] += usage.prompt_tokens
         daily_model_data["completion_tokens"] += usage.completion_tokens
-        # Calculate approximate cost using LiteLLM
         try:
-            cost = litellm.completion_cost(
-                completion_response=completion_response
-            )
             daily_model_data["approx_cost"] += cost
         except Exception as e:
             print(f"Warning: Could not calculate cost for model {model}: {e}")
@@ -112,22 +115,21 @@ class UsageManager:
         self._save_usage()
     def record_rotation_error(self, key: str, model: str, error: Exception):
-        key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "cooldown_until": None})
-        # Default cooldown of 24 hours
-        cooldown_seconds = 86400
-        # Try to parse retry_delay from the error message (very provider-specific)
         error_str = str(error).lower()
         if "retry_delay" in error_str:
             try:
-                # A simple way to parse, might need to be more robust
                 delay_str = error_str.split("retry_delay")[1].split("seconds:")[1].strip().split("}")[0]
                 cooldown_seconds = int(delay_str)
             except (IndexError, ValueError):
-                pass # Stick to default
-        key_data["cooldown_until"] = time.time() + cooldown_seconds
         key_data["last_rotation_error"] = {
             "timestamp": time.time(),
             "model": model,

     def get_next_smart_key(self, available_keys: List[str], model: str) -> Optional[str]:
         """
+        Gets the least-used, available key for a specific model, considering model-specific cooldowns.
         """
         best_key = None
         min_usage = float('inf')
         active_keys = []
         for key in available_keys:
+            key_data = self.usage_data.get(key, {})
+            model_cooldowns = key_data.get("model_cooldowns", {})
+            cooldown_until = model_cooldowns.get(model)
             if not cooldown_until or time.time() > cooldown_until:
                 active_keys.append(key)
         # Find the key with the minimum daily success_count for the given model
         for key in active_keys:
+            key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
             daily_model_usage = key_data.get("daily", {}).get("models", {}).get(model, {})
             usage_count = daily_model_usage.get("success_count", 0)
         return best_key if best_key else active_keys[0]
     def record_success(self, key: str, model: str, completion_response: litellm.ModelResponse):
+        key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
+        # Clear any cooldown for this specific model on success
+        if model in key_data.get("model_cooldowns", {}):
+            del key_data["model_cooldowns"][model]
         # Ensure daily stats are for today
         if key_data["daily"].get("date") != date.today().isoformat():
+            self._reset_daily_stats_if_needed()
             key_data = self.usage_data[key]
         daily_model_data = key_data["daily"]["models"].setdefault(model, {"success_count": 0, "prompt_tokens": 0, "completion_tokens": 0, "approx_cost": 0.0})
         daily_model_data["prompt_tokens"] += usage.prompt_tokens
         daily_model_data["completion_tokens"] += usage.completion_tokens
         try:
+            cost = litellm.completion_cost(completion_response=completion_response)
             daily_model_data["approx_cost"] += cost
         except Exception as e:
             print(f"Warning: Could not calculate cost for model {model}: {e}")
         self._save_usage()
     def record_rotation_error(self, key: str, model: str, error: Exception):
+        key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
+        cooldown_seconds = 86400  # Default cooldown of 24 hours
         error_str = str(error).lower()
         if "retry_delay" in error_str:
             try:
                 delay_str = error_str.split("retry_delay")[1].split("seconds:")[1].strip().split("}")[0]
                 cooldown_seconds = int(delay_str)
             except (IndexError, ValueError):
+                pass
+        model_cooldowns = key_data.setdefault("model_cooldowns", {})
+        model_cooldowns[model] = time.time() + cooldown_seconds
         key_data["last_rotation_error"] = {
             "timestamp": time.time(),
             "model": model,