Mirrowel commited on
Commit
c2eea0c
·
1 Parent(s): 26c6a6e

feat: Implement per-model API key cooldowns

Browse files

Previously, when an API key encountered a rate limit or authentication error for any model, it was put on a global cooldown, preventing its use across all models.

This change introduces per-model cooldowns. If a key fails for a specific model, it is now only put on cooldown for that particular model. This allows the key to continue being used with other models, significantly improving key utilization and overall system resilience.

Updated:
- README.md and src/rotator_library/README.md to reflect the new per-model cooldown behavior.
- src/rotator_library/usage_manager.py to manage cooldowns on a per-model basis.

README.md CHANGED
@@ -9,7 +9,7 @@ This project provides a robust solution for managing and rotating API keys for v
9
 
10
  - **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
11
  - **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
12
- - **Key Cooldowns**: Temporarily disables keys that encounter rate limits or authentication errors to prevent further issues.
13
  - **Usage Tracking**: Monitors daily and global usage for each API key.
14
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
15
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.
 
9
 
10
  - **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
11
  - **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
12
+ - **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
13
  - **Usage Tracking**: Monitors daily and global usage for each API key.
14
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
15
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.
src/rotator_library/README.md CHANGED
@@ -6,7 +6,7 @@ A simple, thread-safe client that intelligently rotates and retries API keys for
6
 
7
  - **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
8
  - **Automatic Retries**: Retries requests on transient server errors.
9
- - **Cooldowns**: Puts keys on a temporary cooldown after rate limit or authentication errors.
10
  - **Usage Tracking**: Tracks daily and global usage for each key.
11
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
12
  - **Extensible**: Easily add support for new providers through a plugin-based architecture.
@@ -97,10 +97,10 @@ Fetches a dictionary of all available models, grouped by provider.
97
  The client is designed to handle errors gracefully:
98
 
99
  - **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
100
- - **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown and try the request again with a different key.
101
  - **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
102
 
103
- Cooldowns are managed by the `UsageManager` and prevent failing keys from being used repeatedly.
104
 
105
  ## Extending with Provider Plugins
106
 
 
6
 
7
  - **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
8
  - **Automatic Retries**: Retries requests on transient server errors.
9
+ - **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
10
  - **Usage Tracking**: Tracks daily and global usage for each key.
11
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
12
  - **Extensible**: Easily add support for new providers through a plugin-based architecture.
 
97
  The client is designed to handle errors gracefully:
98
 
99
  - **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
100
+ - **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown for that specific model and retry the request with a different key. This ensures that a single model failure does not sideline a key for all other models.
101
  - **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
102
 
103
+ Cooldowns are managed by the `UsageManager` on a per-model basis, preventing failing keys from being used repeatedly for models they have recently failed with. Upon a successful call, any existing cooldown for that key-model pair is cleared.
104
 
105
  ## Extending with Provider Plugins
106
 
src/rotator_library/usage_manager.py CHANGED
@@ -57,15 +57,17 @@ class UsageManager:
57
 
58
  def get_next_smart_key(self, available_keys: List[str], model: str) -> Optional[str]:
59
  """
60
- Gets the least-used, available key based on daily stats.
61
  """
62
  best_key = None
63
  min_usage = float('inf')
64
-
65
- # Filter for keys that are not on cooldown
66
  active_keys = []
67
  for key in available_keys:
68
- cooldown_until = self.usage_data.get(key, {}).get("cooldown_until")
 
 
 
69
  if not cooldown_until or time.time() > cooldown_until:
70
  active_keys.append(key)
71
 
@@ -74,7 +76,7 @@ class UsageManager:
74
 
75
  # Find the key with the minimum daily success_count for the given model
76
  for key in active_keys:
77
- key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "cooldown_until": None})
78
  daily_model_usage = key_data.get("daily", {}).get("models", {}).get(model, {})
79
  usage_count = daily_model_usage.get("success_count", 0)
80
 
@@ -85,11 +87,15 @@ class UsageManager:
85
  return best_key if best_key else active_keys[0]
86
 
87
  def record_success(self, key: str, model: str, completion_response: litellm.ModelResponse):
88
- key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "cooldown_until": None})
89
 
 
 
 
 
90
  # Ensure daily stats are for today
91
  if key_data["daily"].get("date") != date.today().isoformat():
92
- self._reset_daily_stats_if_needed() # Should be rare, but as a safeguard
93
  key_data = self.usage_data[key]
94
 
95
  daily_model_data = key_data["daily"]["models"].setdefault(model, {"success_count": 0, "prompt_tokens": 0, "completion_tokens": 0, "approx_cost": 0.0})
@@ -99,11 +105,8 @@ class UsageManager:
99
  daily_model_data["prompt_tokens"] += usage.prompt_tokens
100
  daily_model_data["completion_tokens"] += usage.completion_tokens
101
 
102
- # Calculate approximate cost using LiteLLM
103
  try:
104
- cost = litellm.completion_cost(
105
- completion_response=completion_response
106
- )
107
  daily_model_data["approx_cost"] += cost
108
  except Exception as e:
109
  print(f"Warning: Could not calculate cost for model {model}: {e}")
@@ -112,22 +115,21 @@ class UsageManager:
112
  self._save_usage()
113
 
114
  def record_rotation_error(self, key: str, model: str, error: Exception):
115
- key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "cooldown_until": None})
116
 
117
- # Default cooldown of 24 hours
118
- cooldown_seconds = 86400
119
 
120
- # Try to parse retry_delay from the error message (very provider-specific)
121
  error_str = str(error).lower()
122
  if "retry_delay" in error_str:
123
  try:
124
- # A simple way to parse, might need to be more robust
125
  delay_str = error_str.split("retry_delay")[1].split("seconds:")[1].strip().split("}")[0]
126
  cooldown_seconds = int(delay_str)
127
  except (IndexError, ValueError):
128
- pass # Stick to default
 
 
 
129
 
130
- key_data["cooldown_until"] = time.time() + cooldown_seconds
131
  key_data["last_rotation_error"] = {
132
  "timestamp": time.time(),
133
  "model": model,
 
57
 
58
  def get_next_smart_key(self, available_keys: List[str], model: str) -> Optional[str]:
59
  """
60
+ Gets the least-used, available key for a specific model, considering model-specific cooldowns.
61
  """
62
  best_key = None
63
  min_usage = float('inf')
64
+
 
65
  active_keys = []
66
  for key in available_keys:
67
+ key_data = self.usage_data.get(key, {})
68
+ model_cooldowns = key_data.get("model_cooldowns", {})
69
+ cooldown_until = model_cooldowns.get(model)
70
+
71
  if not cooldown_until or time.time() > cooldown_until:
72
  active_keys.append(key)
73
 
 
76
 
77
  # Find the key with the minimum daily success_count for the given model
78
  for key in active_keys:
79
+ key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
80
  daily_model_usage = key_data.get("daily", {}).get("models", {}).get(model, {})
81
  usage_count = daily_model_usage.get("success_count", 0)
82
 
 
87
  return best_key if best_key else active_keys[0]
88
 
89
  def record_success(self, key: str, model: str, completion_response: litellm.ModelResponse):
90
+ key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
91
 
92
+ # Clear any cooldown for this specific model on success
93
+ if model in key_data.get("model_cooldowns", {}):
94
+ del key_data["model_cooldowns"][model]
95
+
96
  # Ensure daily stats are for today
97
  if key_data["daily"].get("date") != date.today().isoformat():
98
+ self._reset_daily_stats_if_needed()
99
  key_data = self.usage_data[key]
100
 
101
  daily_model_data = key_data["daily"]["models"].setdefault(model, {"success_count": 0, "prompt_tokens": 0, "completion_tokens": 0, "approx_cost": 0.0})
 
105
  daily_model_data["prompt_tokens"] += usage.prompt_tokens
106
  daily_model_data["completion_tokens"] += usage.completion_tokens
107
 
 
108
  try:
109
+ cost = litellm.completion_cost(completion_response=completion_response)
 
 
110
  daily_model_data["approx_cost"] += cost
111
  except Exception as e:
112
  print(f"Warning: Could not calculate cost for model {model}: {e}")
 
115
  self._save_usage()
116
 
117
  def record_rotation_error(self, key: str, model: str, error: Exception):
118
+ key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
119
 
120
+ cooldown_seconds = 86400 # Default cooldown of 24 hours
 
121
 
 
122
  error_str = str(error).lower()
123
  if "retry_delay" in error_str:
124
  try:
 
125
  delay_str = error_str.split("retry_delay")[1].split("seconds:")[1].strip().split("}")[0]
126
  cooldown_seconds = int(delay_str)
127
  except (IndexError, ValueError):
128
+ pass
129
+
130
+ model_cooldowns = key_data.setdefault("model_cooldowns", {})
131
+ model_cooldowns[model] = time.time() + cooldown_seconds
132
 
 
133
  key_data["last_rotation_error"] = {
134
  "timestamp": time.time(),
135
  "model": model,