Spaces:
Paused
Paused
Mirrowel commited on
Commit ·
c2eea0c
1
Parent(s): 26c6a6e
feat: Implement per-model API key cooldowns
Browse filesPreviously, when an API key encountered a rate limit or authentication error for any model, it was put on a global cooldown, preventing its use across all models.
This change introduces per-model cooldowns. If a key fails for a specific model, it is now only put on cooldown for that particular model. This allows the key to continue being used with other models, significantly improving key utilization and overall system resilience.
Updated:
- README.md and src/rotator_library/README.md to reflect the new per-model cooldown behavior.
- src/rotator_library/usage_manager.py to manage cooldowns on a per-model basis.
- README.md +1 -1
- src/rotator_library/README.md +3 -3
- src/rotator_library/usage_manager.py +20 -18
README.md
CHANGED
|
@@ -9,7 +9,7 @@ This project provides a robust solution for managing and rotating API keys for v
|
|
| 9 |
|
| 10 |
- **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
|
| 11 |
- **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
|
| 12 |
-
- **
|
| 13 |
- **Usage Tracking**: Monitors daily and global usage for each API key.
|
| 14 |
- **Provider Agnostic**: Compatible with any provider supported by `litellm`.
|
| 15 |
- **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.
|
|
|
|
| 9 |
|
| 10 |
- **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
|
| 11 |
- **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
|
| 12 |
+
- **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
|
| 13 |
- **Usage Tracking**: Monitors daily and global usage for each API key.
|
| 14 |
- **Provider Agnostic**: Compatible with any provider supported by `litellm`.
|
| 15 |
- **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.
|
src/rotator_library/README.md
CHANGED
|
@@ -6,7 +6,7 @@ A simple, thread-safe client that intelligently rotates and retries API keys for
|
|
| 6 |
|
| 7 |
- **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
|
| 8 |
- **Automatic Retries**: Retries requests on transient server errors.
|
| 9 |
-
- **Cooldowns**:
|
| 10 |
- **Usage Tracking**: Tracks daily and global usage for each key.
|
| 11 |
- **Provider Agnostic**: Works with any provider supported by `litellm`.
|
| 12 |
- **Extensible**: Easily add support for new providers through a plugin-based architecture.
|
|
@@ -97,10 +97,10 @@ Fetches a dictionary of all available models, grouped by provider.
|
|
| 97 |
The client is designed to handle errors gracefully:
|
| 98 |
|
| 99 |
- **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
|
| 100 |
-
- **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown and
|
| 101 |
- **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
|
| 102 |
|
| 103 |
-
Cooldowns are managed by the `UsageManager`
|
| 104 |
|
| 105 |
## Extending with Provider Plugins
|
| 106 |
|
|
|
|
| 6 |
|
| 7 |
- **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
|
| 8 |
- **Automatic Retries**: Retries requests on transient server errors.
|
| 9 |
+
- **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
|
| 10 |
- **Usage Tracking**: Tracks daily and global usage for each key.
|
| 11 |
- **Provider Agnostic**: Works with any provider supported by `litellm`.
|
| 12 |
- **Extensible**: Easily add support for new providers through a plugin-based architecture.
|
|
|
|
| 97 |
The client is designed to handle errors gracefully:
|
| 98 |
|
| 99 |
- **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
|
| 100 |
+
- **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown for that specific model and retry the request with a different key. This ensures that a single model failure does not sideline a key for all other models.
|
| 101 |
- **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
|
| 102 |
|
| 103 |
+
Cooldowns are managed by the `UsageManager` on a per-model basis, preventing failing keys from being used repeatedly for models they have recently failed with. Upon a successful call, any existing cooldown for that key-model pair is cleared.
|
| 104 |
|
| 105 |
## Extending with Provider Plugins
|
| 106 |
|
src/rotator_library/usage_manager.py
CHANGED
|
@@ -57,15 +57,17 @@ class UsageManager:
|
|
| 57 |
|
| 58 |
def get_next_smart_key(self, available_keys: List[str], model: str) -> Optional[str]:
|
| 59 |
"""
|
| 60 |
-
Gets the least-used, available key
|
| 61 |
"""
|
| 62 |
best_key = None
|
| 63 |
min_usage = float('inf')
|
| 64 |
-
|
| 65 |
-
# Filter for keys that are not on cooldown
|
| 66 |
active_keys = []
|
| 67 |
for key in available_keys:
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
| 69 |
if not cooldown_until or time.time() > cooldown_until:
|
| 70 |
active_keys.append(key)
|
| 71 |
|
|
@@ -74,7 +76,7 @@ class UsageManager:
|
|
| 74 |
|
| 75 |
# Find the key with the minimum daily success_count for the given model
|
| 76 |
for key in active_keys:
|
| 77 |
-
key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "
|
| 78 |
daily_model_usage = key_data.get("daily", {}).get("models", {}).get(model, {})
|
| 79 |
usage_count = daily_model_usage.get("success_count", 0)
|
| 80 |
|
|
@@ -85,11 +87,15 @@ class UsageManager:
|
|
| 85 |
return best_key if best_key else active_keys[0]
|
| 86 |
|
| 87 |
def record_success(self, key: str, model: str, completion_response: litellm.ModelResponse):
|
| 88 |
-
key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
# Ensure daily stats are for today
|
| 91 |
if key_data["daily"].get("date") != date.today().isoformat():
|
| 92 |
-
self._reset_daily_stats_if_needed()
|
| 93 |
key_data = self.usage_data[key]
|
| 94 |
|
| 95 |
daily_model_data = key_data["daily"]["models"].setdefault(model, {"success_count": 0, "prompt_tokens": 0, "completion_tokens": 0, "approx_cost": 0.0})
|
|
@@ -99,11 +105,8 @@ class UsageManager:
|
|
| 99 |
daily_model_data["prompt_tokens"] += usage.prompt_tokens
|
| 100 |
daily_model_data["completion_tokens"] += usage.completion_tokens
|
| 101 |
|
| 102 |
-
# Calculate approximate cost using LiteLLM
|
| 103 |
try:
|
| 104 |
-
cost = litellm.completion_cost(
|
| 105 |
-
completion_response=completion_response
|
| 106 |
-
)
|
| 107 |
daily_model_data["approx_cost"] += cost
|
| 108 |
except Exception as e:
|
| 109 |
print(f"Warning: Could not calculate cost for model {model}: {e}")
|
|
@@ -112,22 +115,21 @@ class UsageManager:
|
|
| 112 |
self._save_usage()
|
| 113 |
|
| 114 |
def record_rotation_error(self, key: str, model: str, error: Exception):
|
| 115 |
-
key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "
|
| 116 |
|
| 117 |
-
# Default cooldown of 24 hours
|
| 118 |
-
cooldown_seconds = 86400
|
| 119 |
|
| 120 |
-
# Try to parse retry_delay from the error message (very provider-specific)
|
| 121 |
error_str = str(error).lower()
|
| 122 |
if "retry_delay" in error_str:
|
| 123 |
try:
|
| 124 |
-
# A simple way to parse, might need to be more robust
|
| 125 |
delay_str = error_str.split("retry_delay")[1].split("seconds:")[1].strip().split("}")[0]
|
| 126 |
cooldown_seconds = int(delay_str)
|
| 127 |
except (IndexError, ValueError):
|
| 128 |
-
pass
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
-
key_data["cooldown_until"] = time.time() + cooldown_seconds
|
| 131 |
key_data["last_rotation_error"] = {
|
| 132 |
"timestamp": time.time(),
|
| 133 |
"model": model,
|
|
|
|
| 57 |
|
| 58 |
def get_next_smart_key(self, available_keys: List[str], model: str) -> Optional[str]:
|
| 59 |
"""
|
| 60 |
+
Gets the least-used, available key for a specific model, considering model-specific cooldowns.
|
| 61 |
"""
|
| 62 |
best_key = None
|
| 63 |
min_usage = float('inf')
|
| 64 |
+
|
|
|
|
| 65 |
active_keys = []
|
| 66 |
for key in available_keys:
|
| 67 |
+
key_data = self.usage_data.get(key, {})
|
| 68 |
+
model_cooldowns = key_data.get("model_cooldowns", {})
|
| 69 |
+
cooldown_until = model_cooldowns.get(model)
|
| 70 |
+
|
| 71 |
if not cooldown_until or time.time() > cooldown_until:
|
| 72 |
active_keys.append(key)
|
| 73 |
|
|
|
|
| 76 |
|
| 77 |
# Find the key with the minimum daily success_count for the given model
|
| 78 |
for key in active_keys:
|
| 79 |
+
key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
|
| 80 |
daily_model_usage = key_data.get("daily", {}).get("models", {}).get(model, {})
|
| 81 |
usage_count = daily_model_usage.get("success_count", 0)
|
| 82 |
|
|
|
|
| 87 |
return best_key if best_key else active_keys[0]
|
| 88 |
|
| 89 |
def record_success(self, key: str, model: str, completion_response: litellm.ModelResponse):
|
| 90 |
+
key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
|
| 91 |
|
| 92 |
+
# Clear any cooldown for this specific model on success
|
| 93 |
+
if model in key_data.get("model_cooldowns", {}):
|
| 94 |
+
del key_data["model_cooldowns"][model]
|
| 95 |
+
|
| 96 |
# Ensure daily stats are for today
|
| 97 |
if key_data["daily"].get("date") != date.today().isoformat():
|
| 98 |
+
self._reset_daily_stats_if_needed()
|
| 99 |
key_data = self.usage_data[key]
|
| 100 |
|
| 101 |
daily_model_data = key_data["daily"]["models"].setdefault(model, {"success_count": 0, "prompt_tokens": 0, "completion_tokens": 0, "approx_cost": 0.0})
|
|
|
|
| 105 |
daily_model_data["prompt_tokens"] += usage.prompt_tokens
|
| 106 |
daily_model_data["completion_tokens"] += usage.completion_tokens
|
| 107 |
|
|
|
|
| 108 |
try:
|
| 109 |
+
cost = litellm.completion_cost(completion_response=completion_response)
|
|
|
|
|
|
|
| 110 |
daily_model_data["approx_cost"] += cost
|
| 111 |
except Exception as e:
|
| 112 |
print(f"Warning: Could not calculate cost for model {model}: {e}")
|
|
|
|
| 115 |
self._save_usage()
|
| 116 |
|
| 117 |
def record_rotation_error(self, key: str, model: str, error: Exception):
|
| 118 |
+
key_data = self.usage_data.setdefault(key, {"daily": {"date": date.today().isoformat(), "models": {}}, "global": {"models": {}}, "model_cooldowns": {}})
|
| 119 |
|
| 120 |
+
cooldown_seconds = 86400 # Default cooldown of 24 hours
|
|
|
|
| 121 |
|
|
|
|
| 122 |
error_str = str(error).lower()
|
| 123 |
if "retry_delay" in error_str:
|
| 124 |
try:
|
|
|
|
| 125 |
delay_str = error_str.split("retry_delay")[1].split("seconds:")[1].strip().split("}")[0]
|
| 126 |
cooldown_seconds = int(delay_str)
|
| 127 |
except (IndexError, ValueError):
|
| 128 |
+
pass
|
| 129 |
+
|
| 130 |
+
model_cooldowns = key_data.setdefault("model_cooldowns", {})
|
| 131 |
+
model_cooldowns[model] = time.time() + cooldown_seconds
|
| 132 |
|
|
|
|
| 133 |
key_data["last_rotation_error"] = {
|
| 134 |
"timestamp": time.time(),
|
| 135 |
"model": model,
|