Spaces:
Paused
Paused
Mirrowel commited on
Commit ·
c9419cb
1
Parent(s): b0569d9
feat: Update documentation and example configurations for improved clarity and usability
Browse files- .env.example +4 -2
- DOCUMENTATION.md +87 -58
- README.md +133 -105
- src/rotator_library/README.md +49 -44
.env.example
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
# Add more keys by creating GEMINI_API_KEY_2, GEMINI_API_KEY_3, etc.
|
| 3 |
GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
|
| 4 |
GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
|
|
@@ -6,6 +6,8 @@ OPENROUTER_API_KEY_1="YOUR_OPENROUTER_API_KEY_1"
|
|
| 6 |
OPENROUTER_API_KEY_2="YOUR_OPENROUTER_API_KEY_2"
|
| 7 |
CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
|
| 8 |
CHUTES_API_KEY_2="YOUR_CHUTES_API_KEY_2"
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
# A secret key for your proxy server to authenticate requests
|
| 11 |
PROXY_API_KEY="YOUR_PROXY_API_KEY"
|
|
|
|
| 1 |
+
# Library will automatically pick up these keys.
|
| 2 |
# Add more keys by creating GEMINI_API_KEY_2, GEMINI_API_KEY_3, etc.
|
| 3 |
GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
|
| 4 |
GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
|
|
|
|
| 6 |
OPENROUTER_API_KEY_2="YOUR_OPENROUTER_API_KEY_2"
|
| 7 |
CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
|
| 8 |
CHUTES_API_KEY_2="YOUR_CHUTES_API_KEY_2"
|
| 9 |
+
NVIDIA_NIM_API_KEY_1="YOUR_NVIDIA_NIM_API_KEY_1"
|
| 10 |
+
NVIDIA_NIM_API_KEY_2="YOUR_NVIDIA_NIM_API_KEY_2"
|
| 11 |
|
| 12 |
+
# A secret key for your proxy server to authenticate requests(Can be anything. Used for compatibility)
|
| 13 |
PROXY_API_KEY="YOUR_PROXY_API_KEY"
|
DOCUMENTATION.md
CHANGED
|
@@ -1,90 +1,119 @@
|
|
| 1 |
# Technical Documentation: `rotating-api-key-client`
|
| 2 |
|
| 3 |
-
This document provides a detailed technical explanation of the `rotating-api-key-client` library, its components, and its internal workings.
|
| 4 |
|
| 5 |
## 1. `client.py` - The `RotatingClient`
|
| 6 |
|
| 7 |
-
The `RotatingClient` is the central component
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
### Request Lifecycle (`acompletion`)
|
| 10 |
|
| 11 |
When `acompletion` is called, it follows these steps:
|
| 12 |
|
| 13 |
-
1. **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
a. **Get Next Smart Key**: It calls `self.usage_manager.get_next_smart_key()` to get the least-used key for the given model that is not currently on cooldown.
|
| 17 |
-
b. **No Key Available**: If all keys for the provider are on cooldown, it waits for 5 seconds before restarting the loop.
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
- If the call is successful and **streaming**, it returns a `_streaming_wrapper` async generator. This wrapper formats the response chunks as Server-Sent Events (SSE) and calls `self.usage_manager.record_success()` only when the stream is fully consumed.
|
| 24 |
-
c. **Failure**: If an exception occurs:
|
| 25 |
-
- The failure is logged using `log_failure()`.
|
| 26 |
-
- **Server Error**: If `is_server_error()` returns `True` and there are retries left, it waits for a moment and continues to the next attempt with the *same key*.
|
| 27 |
-
- **Unrecoverable Error**: If `is_unrecoverable_error()` returns `True`, the exception is immediately raised, terminating the process.
|
| 28 |
-
- **Other Errors (Rate Limit, Auth, etc.)**: For any other error, it's considered a "rotation" error. `self.usage_manager.record_rotation_error()` is called to put the key on cooldown, and the inner `attempt` loop is broken. The outer `while` loop then continues, fetching a new key.
|
| 29 |
|
| 30 |
-
##
|
| 31 |
|
| 32 |
-
This
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
```json
|
| 39 |
{
|
| 40 |
-
"
|
| 41 |
-
"
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
},
|
| 47 |
-
"
|
| 48 |
-
|
| 49 |
-
}
|
| 50 |
}
|
| 51 |
}
|
| 52 |
```
|
| 53 |
|
| 54 |
-
- **Key Hashing**: Keys are stored by their SHA256 hash to avoid exposing sensitive keys in logs or files.
|
| 55 |
-
- `cooldown_until`: If a key fails, this timestamp is set. The key will not be selected until the current time is past this timestamp.
|
| 56 |
-
- `model_usage`: Tracks the usage count for each specific model, which is the primary metric for the "smart" key selection.
|
| 57 |
-
|
| 58 |
-
### Core Methods
|
| 59 |
-
|
| 60 |
-
- `get_next_smart_key()`: This is the key selection logic. It filters out any keys that are on cooldown and then finds the key with the lowest usage count for the requested `model`.
|
| 61 |
-
- `record_success()`: Increments the usage counters (`global_usage`, `daily_usage`, `model_usage`) for the given key.
|
| 62 |
-
- `record_rotation_error()`: Sets the `cooldown_until` timestamp for the given key, effectively taking it out of rotation for a short period.
|
| 63 |
-
|
| 64 |
## 3. `error_handler.py`
|
| 65 |
|
| 66 |
-
This module
|
| 67 |
|
| 68 |
-
-
|
| 69 |
-
-
|
|
|
|
| 70 |
|
| 71 |
-
## 4. `
|
| 72 |
|
| 73 |
-
-
|
|
|
|
| 74 |
|
| 75 |
## 5. `providers/` - Provider Plugins
|
| 76 |
|
| 77 |
-
The provider plugin system
|
| 78 |
-
|
| 79 |
-
- **`provider_interface.py`**: Defines the abstract base class `ProviderPlugin` with a single abstract method, `get_models`. Any new provider plugin must inherit from this class and implement this method.
|
| 80 |
-
- **Implementations**: Each provider (e.g., `openai_provider.py`, `gemini_provider.py`) has its own file containing a class that implements the `ProviderPlugin` interface. The `get_models` method contains the specific logic to call the provider's API and return a list of their available models.
|
| 81 |
-
- **`__init__.py`**: This file contains a dynamic plugin system that automatically discovers and registers any provider implementation placed in the `providers/` directory.
|
| 82 |
-
|
| 83 |
-
### Special Provider: `chutes.ai`
|
| 84 |
-
|
| 85 |
-
The `chutes` provider is handled as a special case within the `RotatingClient`. Since `litellm` does not have native support for `chutes.ai`, the client performs the following modifications at runtime:
|
| 86 |
-
|
| 87 |
-
1. **Sets `api_base`**: It sets the `api_base` to `https://llm.chutes.ai/v1`.
|
| 88 |
-
2. **Remaps the Model**: It changes the model name from `chutes/some-model` to `openai/some-model` before passing the request to `litellm`.
|
| 89 |
-
|
| 90 |
-
This allows the system to use `chutes.ai` as if it were a custom OpenAI endpoint, while still leveraging the library's key rotation and management features.
|
|
|
|
| 1 |
# Technical Documentation: `rotating-api-key-client`
|
| 2 |
|
| 3 |
+
This document provides a detailed technical explanation of the `rotating-api-key-client` library, its components, and its internal workings. The library has evolved into a sophisticated, asynchronous client for managing LLM API keys with a strong focus on concurrency, resilience, and state management.
|
| 4 |
|
| 5 |
## 1. `client.py` - The `RotatingClient`
|
| 6 |
|
| 7 |
+
The `RotatingClient` is the central component, orchestrating API calls, key management, and error handling. It is designed as a long-lived, async-native object.
|
| 8 |
+
|
| 9 |
+
### Core Responsibilities
|
| 10 |
+
- Managing an `httpx.AsyncClient` for non-blocking HTTP requests.
|
| 11 |
+
- Interfacing with the `UsageManager` to acquire and release API keys.
|
| 12 |
+
- Handling provider-specific request modifications.
|
| 13 |
+
- Executing API calls via `litellm` with a robust retry and rotation strategy.
|
| 14 |
+
- Providing a safe wrapper for streaming responses.
|
| 15 |
|
| 16 |
### Request Lifecycle (`acompletion`)
|
| 17 |
|
| 18 |
When `acompletion` is called, it follows these steps:
|
| 19 |
|
| 20 |
+
1. **Provider and Key Validation**: It extracts the provider from the `model` name and ensures keys are configured for it.
|
| 21 |
+
|
| 22 |
+
2. **Key Acquisition Loop**: The client enters a loop to find a valid key and complete the request. It iterates through all keys for the provider until one succeeds or all have been tried.
|
| 23 |
+
a. **Acquire Best Key**: It calls `self.usage_manager.acquire_key()`. This is a blocking call that waits until a suitable key is available, based on the manager's tiered locking strategy (see `UsageManager` section).
|
| 24 |
+
b. **Prepare Request**: It prepares the `litellm` keyword arguments. This includes:
|
| 25 |
+
- **Request Sanitization**: Calling `sanitize_request_payload()` to remove parameters that might be unsupported by the target model, preventing errors.
|
| 26 |
+
- **Provider-Specific Logic**: Applying special handling for providers like Gemini (safety settings), Gemma (system prompts), and Chutes.ai (`api_base` and model name remapping).
|
| 27 |
+
|
| 28 |
+
3. **Retry Loop**: Once a key is acquired, it enters an inner retry loop (`for attempt in range(self.max_retries)`):
|
| 29 |
+
a. **API Call**: It calls `litellm.acompletion` with the acquired key.
|
| 30 |
+
b. **Success (Non-Streaming)**:
|
| 31 |
+
- It calls `self.usage_manager.record_success()` to update usage stats and clear any cooldowns for the key-model pair.
|
| 32 |
+
- It calls `self.usage_manager.release_key()` to release the lock on the key for this model.
|
| 33 |
+
- It returns the response, and the process ends.
|
| 34 |
+
c. **Success (Streaming)**:
|
| 35 |
+
- It returns a `_safe_streaming_wrapper` async generator. This wrapper is critical:
|
| 36 |
+
- It yields SSE-formatted chunks to the consumer.
|
| 37 |
+
- After the stream is fully consumed, its `finally` block ensures that `record_success()` and `release_key()` are called. This guarantees that the key lock is held for the entire duration of the stream and released correctly, even if the consumer abandons the stream.
|
| 38 |
+
d. **Failure**: If an exception occurs:
|
| 39 |
+
- The failure is logged in detail by `log_failure()`.
|
| 40 |
+
- The exception is passed to `classify_error()` to get a structured `ClassifiedError` object.
|
| 41 |
+
- **Server Error**: If the error type is `server_error`, it waits with exponential backoff and retries the request with the *same key*.
|
| 42 |
+
- **Rotation Error (Rate Limit, Auth, etc.)**: For any other error, it's considered a rotation trigger. `self.usage_manager.record_failure()` is called to apply an escalating cooldown, and `self.usage_manager.release_key()` releases the lock. The inner `attempt` loop is broken, and the outer `while` loop continues, acquiring a new key.
|
| 43 |
+
|
| 44 |
+
## 2. `usage_manager.py` - Stateful Concurrency & Usage Management
|
| 45 |
+
|
| 46 |
+
This class is the heart of the library's state management and concurrency control. It is a stateful, async-native service that ensures keys are used efficiently and safely across multiple concurrent requests.
|
| 47 |
|
| 48 |
+
### Key Concepts
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
- **Asynchronous Design & Lazy Loading**: The entire class is asynchronous, using `aiofiles` for non-blocking file I/O and a `_lazy_init` pattern. The usage data from the JSON file is loaded only when the first request is made.
|
| 51 |
+
- **Concurrency Primitives**:
|
| 52 |
+
- **`filelock`**: A file-level lock (`.json.lock`) prevents race conditions if multiple *processes* are running and sharing the same usage file.
|
| 53 |
+
- **`asyncio.Lock` & `asyncio.Condition`**: Each key has its own `asyncio.Lock` and `asyncio.Condition` object. This enables the fine-grained, model-aware locking strategy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
### Tiered Key Acquisition (`acquire_key`)
|
| 56 |
|
| 57 |
+
This method implements the core logic for selecting a key. It is a "smart" blocking call.
|
| 58 |
|
| 59 |
+
1. **Filtering**: It first filters out any keys that are on a global or model-specific cooldown.
|
| 60 |
+
2. **Tiering**: It categorizes the remaining, valid keys into two tiers:
|
| 61 |
+
- **Tier 1 (Ideal)**: Keys that are completely free (not being used by any model).
|
| 62 |
+
- **Tier 2 (Acceptable)**: Keys that are currently in use, but for *different models* than the one being requested.
|
| 63 |
+
3. **Selection**: It attempts to acquire a lock on a key, prioritizing Tier 1 over Tier 2. Within each tier, it prioritizes the least-used key.
|
| 64 |
+
4. **Waiting**: If no keys in Tier 1 or Tier 2 can be locked, it means all eligible keys are currently handling requests for the *same model*. The method then `await`s on the `asyncio.Condition` of the best available key, waiting until it is notified that the key has been released.
|
| 65 |
|
| 66 |
+
### Failure Handling & Cooldowns (`record_failure`)
|
| 67 |
|
| 68 |
+
- **Escalating Backoff**: When a failure is recorded, it applies a cooldown that increases with the number of consecutive failures for a specific key-model pair (e.g., 10s, 30s, 60s, up to 2 hours).
|
| 69 |
+
- **Authentication Errors**: These are treated more severely, applying an immediate 5-minute key-level lockout.
|
| 70 |
+
- **Key-Level Lockouts**: If a single key accumulates 3 or more long-term (2-hour) cooldowns across different models, the manager assumes the key is compromised or disabled and applies a 5-minute global lockout on the key.
|
| 71 |
+
|
| 72 |
+
### Data Structure
|
| 73 |
+
|
| 74 |
+
The `key_usage.json` file has a more complex structure to store this detailed state:
|
| 75 |
```json
|
| 76 |
{
|
| 77 |
+
"api_key_hash": {
|
| 78 |
+
"daily": {
|
| 79 |
+
"date": "YYYY-MM-DD",
|
| 80 |
+
"models": {
|
| 81 |
+
"gemini/gemini-1.5-pro": {
|
| 82 |
+
"success_count": 10,
|
| 83 |
+
"prompt_tokens": 5000,
|
| 84 |
+
"completion_tokens": 10000,
|
| 85 |
+
"approx_cost": 0.075
|
| 86 |
+
}
|
| 87 |
+
}
|
| 88 |
+
},
|
| 89 |
+
"global": { /* ... similar to daily, but accumulates over time ... */ },
|
| 90 |
+
"model_cooldowns": {
|
| 91 |
+
"gemini/gemini-1.5-flash": 1719987600.0
|
| 92 |
+
},
|
| 93 |
+
"failures": {
|
| 94 |
+
"gemini/gemini-1.5-flash": {
|
| 95 |
+
"consecutive_failures": 2
|
| 96 |
+
}
|
| 97 |
},
|
| 98 |
+
"key_cooldown_until": null,
|
| 99 |
+
"last_daily_reset": "YYYY-MM-DD"
|
|
|
|
| 100 |
}
|
| 101 |
}
|
| 102 |
```
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
## 3. `error_handler.py`
|
| 105 |
|
| 106 |
+
This module provides a centralized function, `classify_error`, which is a significant improvement over the previous boolean checks.
|
| 107 |
|
| 108 |
+
- It takes a raw exception from `litellm` and returns a `ClassifiedError` data object.
|
| 109 |
+
- This object contains the `error_type` (e.g., `'rate_limit'`, `'authentication'`, `'server_error'`), the original exception, the status code, and any `retry_after` information extracted from the error message.
|
| 110 |
+
- This structured classification allows the `RotatingClient` to make more intelligent decisions about whether to retry with the same key or rotate to a new one.
|
| 111 |
|
| 112 |
+
## 4. `request_sanitizer.py` (New Module)
|
| 113 |
|
| 114 |
+
- This module's purpose is to prevent `InvalidRequestError` exceptions from `litellm` that occur when a payload contains parameters not supported by the target model (e.g., sending a `thinking` parameter to a model that doesn't support it).
|
| 115 |
+
- The `sanitize_request_payload` function is called just before `litellm.acompletion` to strip out any such unsupported parameters, making the system more robust.
|
| 116 |
|
| 117 |
## 5. `providers/` - Provider Plugins
|
| 118 |
|
| 119 |
+
The provider plugin system remains for fetching model lists. The interface now correctly specifies that the `get_models` method receives an `httpx.AsyncClient` instance, which it should use to make its API calls. This ensures all HTTP traffic goes through the client's managed session.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -7,148 +7,179 @@ This project provides a robust solution for managing and rotating API keys for v
|
|
| 7 |
|
| 8 |
## Features
|
| 9 |
|
| 10 |
-
- **
|
| 11 |
-
- **
|
| 12 |
-
- **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it
|
| 13 |
-
- **
|
|
|
|
|
|
|
| 14 |
- **Provider Agnostic**: Compatible with any provider supported by `litellm`.
|
| 15 |
-
- **OpenAI-Compatible Proxy**: Offers a familiar API interface
|
| 16 |
|
| 17 |
-
##
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
1.
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
|
|
|
|
|
|
|
|
|
|
| 32 |
```
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
│ │ └── main.py
|
| 38 |
-
│ └── rotator_library/ # The rotating-api-key-client library
|
| 39 |
-
│ ├── __init__.py
|
| 40 |
-
│ ├── client.py
|
| 41 |
-
│ ├── error_handler.py
|
| 42 |
-
│ ├── failure_logger.py
|
| 43 |
-
│ ├── usage_manager.py
|
| 44 |
-
│ ├── providers/
|
| 45 |
-
│ └── ...
|
| 46 |
-
├── .env.example
|
| 47 |
-
├── README.md
|
| 48 |
-
└── requirements.txt
|
| 49 |
```
|
| 50 |
|
| 51 |
-
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
-
```bash
|
| 55 |
-
git clone <repository-url>
|
| 56 |
-
cd <repository-name>
|
| 57 |
-
```
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
python -m venv venv
|
| 62 |
-
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
| 63 |
-
```
|
| 64 |
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
cp .env.example .env
|
| 75 |
-
```
|
| 76 |
-
Edit the `.env` file to add your API keys. The proxy automatically detects keys for different providers based on the naming convention `PROVIDER_API_KEY_N`.
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
PROXY_API_KEY="your-secret-proxy-key"
|
| 81 |
|
| 82 |
-
|
| 83 |
-
GEMINI_API_KEY_1="your-gemini-api-key-1"
|
| 84 |
-
GEMINI_API_KEY_2="your-gemini-api-key-2"
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
OPENROUTER_API_KEY_1="your-openrouter-api-key-1"
|
| 89 |
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
|
|
|
| 93 |
|
| 94 |
-
|
| 95 |
|
| 96 |
-
To start the proxy application, run the following command:
|
| 97 |
```bash
|
| 98 |
uvicorn src.proxy_app.main:app --reload
|
| 99 |
```
|
| 100 |
-
The proxy will be available at `http://127.0.0.1:8000`.
|
| 101 |
|
| 102 |
-
|
|
|
|
|
|
|
| 103 |
|
| 104 |
-
You can
|
| 105 |
|
| 106 |
-
|
|
|
|
|
|
|
| 107 |
|
| 108 |
-
|
| 109 |
```bash
|
| 110 |
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
|
| 111 |
-H "Content-Type: application/json" \
|
| 112 |
-
-H "Authorization: Bearer your-secret-proxy-key" \
|
| 113 |
-d '{
|
| 114 |
"model": "gemini/gemini-2.5-flash-preview-05-20",
|
| 115 |
"messages": [{"role": "user", "content": "What is the capital of France?"}]
|
| 116 |
}'
|
| 117 |
```
|
| 118 |
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
"messages": [{"role": "user", "content": "Write a short story about a robot."}],
|
| 127 |
-
"stream": true
|
| 128 |
-
}'
|
| 129 |
-
```
|
| 130 |
|
| 131 |
-
### Example with Python `requests`:
|
| 132 |
```python
|
| 133 |
-
import
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
-
|
| 137 |
-
proxy_key = "your-secret-proxy-key"
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
}
|
| 148 |
|
| 149 |
-
|
| 150 |
-
|
|
|
|
| 151 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
|
| 153 |
## Troubleshooting
|
| 154 |
|
|
@@ -156,10 +187,7 @@ print(response.json())
|
|
| 156 |
- **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
|
| 157 |
- **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
|
| 158 |
|
| 159 |
-
##
|
| 160 |
-
|
| 161 |
-
The `rotating-api-key-client` is a standalone library that can be integrated into any Python project. For detailed documentation on how to use it, please refer to its `README.md` file located at `src/rotator_library/README.md`.
|
| 162 |
-
|
| 163 |
-
## Detailed Documentation
|
| 164 |
|
| 165 |
-
|
|
|
|
|
|
| 7 |
|
| 8 |
## Features
|
| 9 |
|
| 10 |
+
- **Advanced Concurrency Control**: A single API key can handle multiple concurrent requests to different models, maximizing throughput.
|
| 11 |
+
- **Smart Key Rotation**: Intelligently selects the least-used, available API key to distribute request loads evenly.
|
| 12 |
+
- **Escalating Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
|
| 13 |
+
- **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes) with exponential backoff.
|
| 14 |
+
- **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
|
| 15 |
+
- **Request Logging**: Optional logging of full request and response payloads for easy debugging.
|
| 16 |
- **Provider Agnostic**: Compatible with any provider supported by `litellm`.
|
| 17 |
+
- **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
|
| 18 |
|
| 19 |
+
## Quick Start Guide
|
| 20 |
|
| 21 |
+
This guide will get you up and running in just a few minutes.
|
| 22 |
|
| 23 |
+
### 1. Setup
|
| 24 |
+
|
| 25 |
+
First, clone the repository and install the required dependencies.
|
| 26 |
+
|
| 27 |
+
**For Linux/macOS:**
|
| 28 |
+
```bash
|
| 29 |
+
# Clone the repository
|
| 30 |
+
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
|
| 31 |
+
cd LLM-API-Key-Proxy
|
| 32 |
+
|
| 33 |
+
# Create and activate a virtual environment
|
| 34 |
+
python3 -m venv venv
|
| 35 |
+
source venv/bin/activate
|
| 36 |
+
|
| 37 |
+
# Install dependencies
|
| 38 |
+
pip install -r requirements.txt
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
**For Windows:**
|
| 42 |
+
```powershell
|
| 43 |
+
# Clone the repository
|
| 44 |
+
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
|
| 45 |
+
cd LLM-API-Key-Proxy
|
| 46 |
+
|
| 47 |
+
# Create and activate a virtual environment
|
| 48 |
+
python -m venv venv
|
| 49 |
+
.\venv\Scripts\Activate.ps1
|
| 50 |
|
| 51 |
+
# Install dependencies
|
| 52 |
+
pip install -r requirements.txt
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
### 2. Configure API Keys
|
| 56 |
|
| 57 |
+
Next, create your `.env` file by copying the provided example. This file is where you will store all your secret keys.
|
| 58 |
|
| 59 |
+
**For Linux/macOS:**
|
| 60 |
+
```bash
|
| 61 |
+
cp .env.example .env
|
| 62 |
```
|
| 63 |
+
|
| 64 |
+
**For Windows:**
|
| 65 |
+
```powershell
|
| 66 |
+
copy .env.example .env
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
```
|
| 68 |
|
| 69 |
+
Now, open the new `.env` file and replace the placeholder values with your actual API keys.
|
| 70 |
+
|
| 71 |
+
**Refer to the `.env.example` file for the correct format and a full list of supported providers.**
|
| 72 |
|
| 73 |
+
The two main types of keys are:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
+
1. **`PROXY_API_KEY`**: This is a secret key *you create*. It is used to authorize requests to *your* proxy, preventing unauthorized use.
|
| 76 |
+
2. **Provider Keys**: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g., `GEMINI_API_KEY_1`).
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
**Example `.env` configuration:**
|
| 79 |
+
```env
|
| 80 |
+
# A secret key for your proxy server to authenticate requests.
|
| 81 |
+
# This can be any secret string you choose.
|
| 82 |
+
PROXY_API_KEY="YOUR_PROXY_API_KEY"
|
| 83 |
|
| 84 |
+
# --- Provider API Keys ---
|
| 85 |
+
# Add your keys from various providers below.
|
| 86 |
+
# You can add multiple keys for one provider by numbering them (e.g., _1, _2).
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
+
GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
|
| 89 |
+
GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
|
|
|
|
| 90 |
|
| 91 |
+
OPENROUTER_API_KEY_1="YOUR_OPENROUTER_API_KEY_1"
|
|
|
|
|
|
|
| 92 |
|
| 93 |
+
NVIDIA_NIM_API_KEY_1="YOUR_NVIDIA_NIM_API_KEY_1"
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### 3. Run the Proxy
|
| 99 |
|
| 100 |
+
Start the FastAPI server with `uvicorn`. The `--reload` flag will automatically restart the server when you make code changes.
|
| 101 |
|
|
|
|
| 102 |
```bash
|
| 103 |
uvicorn src.proxy_app.main:app --reload
|
| 104 |
```
|
|
|
|
| 105 |
|
| 106 |
+
The proxy is now running and available at `http://127.0.0.1:8000`.
|
| 107 |
+
|
| 108 |
+
### 4. Make a Request
|
| 109 |
|
| 110 |
+
You can now send requests to the proxy. The endpoint is `http://127.0.0.1:8000/v1/chat/completions`.
|
| 111 |
|
| 112 |
+
Remember to:
|
| 113 |
+
1. Set the `Authorization` header to `Bearer your-super-secret-proxy-key`.
|
| 114 |
+
2. Specify the `model` in the format `provider/model_name`.
|
| 115 |
|
| 116 |
+
Here is an example using `curl`:
|
| 117 |
```bash
|
| 118 |
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
|
| 119 |
-H "Content-Type: application/json" \
|
| 120 |
+
-H "Authorization: Bearer your-super-secret-proxy-key" \
|
| 121 |
-d '{
|
| 122 |
"model": "gemini/gemini-2.5-flash-preview-05-20",
|
| 123 |
"messages": [{"role": "user", "content": "What is the capital of France?"}]
|
| 124 |
}'
|
| 125 |
```
|
| 126 |
|
| 127 |
+
---
|
| 128 |
+
|
| 129 |
+
## Advanced Usage
|
| 130 |
+
|
| 131 |
+
### Using with the OpenAI Python Library
|
| 132 |
+
|
| 133 |
+
The proxy is OpenAI-compatible, so you can use it directly with the `openai` Python client. This is the recommended way to integrate the proxy into your applications.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
|
|
|
| 135 |
```python
|
| 136 |
+
import openai
|
| 137 |
+
|
| 138 |
+
# Point the client to your local proxy
|
| 139 |
+
client = openai.OpenAI(
|
| 140 |
+
base_url="http://127.0.0.1:8000/v1",
|
| 141 |
+
api_key="your-super-secret-proxy-key" # Use your proxy key here
|
| 142 |
+
)
|
| 143 |
+
|
| 144 |
+
# Make a request
|
| 145 |
+
response = client.chat.completions.create(
|
| 146 |
+
model="gemini/gemini-2.5-flash-preview-05-20", # Specify provider and model
|
| 147 |
+
messages=[
|
| 148 |
+
{"role": "user", "content": "Write a short poem about space."}
|
| 149 |
+
]
|
| 150 |
+
)
|
| 151 |
+
|
| 152 |
+
print(response.choices[0].message.content)
|
| 153 |
+
```
|
| 154 |
|
| 155 |
+
### Available API Endpoints
|
|
|
|
| 156 |
|
| 157 |
+
- `POST /v1/chat/completions`: The main endpoint for making chat requests.
|
| 158 |
+
- `GET /v1/models`: Returns a list of all available models from your configured providers.
|
| 159 |
+
- `GET /v1/providers`: Returns a list of all configured providers.
|
| 160 |
+
- `POST /v1/token-count`: Calculates the token count for a given message payload.
|
| 161 |
|
| 162 |
+
### Enabling Request Logging
|
| 163 |
+
|
| 164 |
+
For debugging purposes, you can log the full request and response for every API call. To enable this, open `src/proxy_app/main.py` and change the following line:
|
|
|
|
| 165 |
|
| 166 |
+
```python
|
| 167 |
+
# Set to True to enable request/response logging
|
| 168 |
+
ENABLE_REQUEST_LOGGING = True
|
| 169 |
```
|
| 170 |
+
Logs will be saved in the `logs/` directory.
|
| 171 |
+
|
| 172 |
+
## How It Works
|
| 173 |
+
|
| 174 |
+
The core of this project is the `RotatingClient` library, which manages a pool of API keys with a sophisticated concurrency model. When a request is made, the client:
|
| 175 |
+
|
| 176 |
+
1. **Acquires the Best Key**: It requests the best available key from the `UsageManager`. The manager uses a tiered locking strategy to find a key that is not on cooldown and preferably not in use. If a key is busy with another request for the *same model*, it waits. Otherwise, it allows concurrent use for *different models*.
|
| 177 |
+
2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
|
| 178 |
+
3. **Handles Errors**:
|
| 179 |
+
- It uses a `classify_error` function to determine the failure type.
|
| 180 |
+
- For **server errors**, it retries the request with the same key using exponential backoff.
|
| 181 |
+
- For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
|
| 182 |
+
4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
|
| 183 |
|
| 184 |
## Troubleshooting
|
| 185 |
|
|
|
|
| 187 |
- **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
|
| 188 |
- **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
|
| 189 |
|
| 190 |
+
## Library and Technical Docs
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
|
| 192 |
+
- **Using the Library**: For documentation on how to use the `rotating-api-key-client` library directly in your own Python projects, please refer to its [README.md](src/rotator_library/README.md).
|
| 193 |
+
- **Technical Details**: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
|
src/rotator_library/README.md
CHANGED
|
@@ -1,13 +1,16 @@
|
|
| 1 |
# Rotating API Key Client
|
| 2 |
|
| 3 |
-
A
|
| 4 |
|
| 5 |
## Features
|
| 6 |
|
| 7 |
-
- **
|
| 8 |
-
- **
|
| 9 |
-
- **
|
| 10 |
-
- **
|
|
|
|
|
|
|
|
|
|
| 11 |
- **Provider Agnostic**: Works with any provider supported by `litellm`.
|
| 12 |
- **Extensible**: Easily add support for new providers through a plugin-based architecture.
|
| 13 |
|
|
@@ -22,7 +25,7 @@ pip install -e .
|
|
| 22 |
|
| 23 |
## `RotatingClient` Class
|
| 24 |
|
| 25 |
-
This is the main class for interacting with the library.
|
| 26 |
|
| 27 |
### Initialization
|
| 28 |
|
|
@@ -40,16 +43,33 @@ client = RotatingClient(
|
|
| 40 |
- `max_retries`: The number of times to retry a request with the *same key* if a transient server error occurs.
|
| 41 |
- `usage_file_path`: The path to the JSON file where key usage data will be stored.
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
### Methods
|
| 44 |
|
| 45 |
#### `async def acompletion(self, **kwargs) -> Any:`
|
| 46 |
|
| 47 |
-
This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds key rotation and
|
| 48 |
|
| 49 |
-
- **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`
|
| 50 |
- **Returns**:
|
| 51 |
- For non-streaming requests, it returns the `litellm` response object.
|
| 52 |
-
- For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE).
|
| 53 |
|
| 54 |
**Example:**
|
| 55 |
|
|
@@ -59,13 +79,12 @@ from rotating_api_key_client import RotatingClient
|
|
| 59 |
|
| 60 |
async def main():
|
| 61 |
api_keys = {"gemini": ["key1", "key2"]}
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
print(response)
|
| 69 |
|
| 70 |
asyncio.run(main())
|
| 71 |
```
|
|
@@ -73,61 +92,47 @@ asyncio.run(main())
|
|
| 73 |
#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
|
| 74 |
|
| 75 |
Calculates the token count for a given text or list of messages using `litellm.token_counter`.
|
| 76 |
-
The `model` parameter is required and must be a string in the format `provider/model_name` (e.g., `"gemini/gemini-2.5-flash-preview-05-20"`).
|
| 77 |
-
**Example:**
|
| 78 |
-
|
| 79 |
-
```python
|
| 80 |
-
count = client.token_count(
|
| 81 |
-
model="gemini/gemini-2.5-flash-preview-05-20",
|
| 82 |
-
messages=[{"role": "user", "content": "Count these tokens."}]
|
| 83 |
-
)
|
| 84 |
-
print(f"Token count: {count}")
|
| 85 |
-
```
|
| 86 |
|
| 87 |
#### `async def get_available_models(self, provider: str) -> List[str]:`
|
| 88 |
|
| 89 |
-
Fetches a list of available models for a specific provider. Results are cached.
|
| 90 |
|
| 91 |
-
#### `async def get_all_available_models(self) -> Dict[str, List[str]]:`
|
| 92 |
|
| 93 |
-
Fetches a dictionary of all available models, grouped by provider.
|
| 94 |
|
| 95 |
## Error Handling and Cooldowns
|
| 96 |
|
| 97 |
-
The client
|
| 98 |
-
|
| 99 |
-
- **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
|
| 100 |
-
- **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown for that specific model and retry the request with a different key. This ensures that a single model failure does not sideline a key for all other models.
|
| 101 |
-
- **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
## Extending with Provider Plugins
|
| 106 |
|
| 107 |
-
The library uses a dynamic plugin system. To add support for a new provider, you only need to
|
| 108 |
|
| 109 |
-
1. **Create a new provider file** in `src/rotator_library/providers/` (e.g., `my_provider.py`).
|
| 110 |
2. **Implement the `ProviderInterface`**: Inside your new file, create a class that inherits from `ProviderInterface` and implements the `get_models` method.
|
| 111 |
|
| 112 |
```python
|
| 113 |
# src/rotator_library/providers/my_provider.py
|
| 114 |
from .provider_interface import ProviderInterface
|
| 115 |
from typing import List
|
|
|
|
| 116 |
|
| 117 |
class MyProvider(ProviderInterface):
|
| 118 |
-
async def get_models(self, api_key: str) -> List[str]:
|
| 119 |
# Logic to fetch and return a list of model names
|
| 120 |
# The model names should be prefixed with the provider name.
|
| 121 |
# e.g., ["my-provider/model-1", "my-provider/model-2"]
|
| 122 |
pass
|
| 123 |
```
|
| 124 |
|
| 125 |
-
The system will automatically discover and register your new provider
|
| 126 |
-
|
| 127 |
-
### Special Case: `chutes.ai`
|
| 128 |
-
|
| 129 |
-
The `chutes` provider is handled as a special case. Since `litellm` does not support it directly, the `RotatingClient` modifies the request by setting the `api_base` to `https://llm.chutes.ai/v1` and remapping the model from `chutes/model-name` to `openai/model-name`. This allows `chutes.ai` to be used as a custom OpenAI-compatible endpoint.
|
| 130 |
|
| 131 |
## Detailed Documentation
|
| 132 |
|
| 133 |
-
For a more in-depth technical explanation of the
|
|
|
|
| 1 |
# Rotating API Key Client
|
| 2 |
|
| 3 |
+
A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
|
| 4 |
|
| 5 |
## Features
|
| 6 |
|
| 7 |
+
- **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
|
| 8 |
+
- **Advanced Concurrency Control**: A single key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety.
|
| 9 |
+
- **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy.
|
| 10 |
+
- **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model.
|
| 11 |
+
- **Automatic Retries**: Retries requests on transient server errors with exponential backoff.
|
| 12 |
+
- **Detailed Usage Tracking**: Tracks daily and global usage for each key, including token counts and approximate cost.
|
| 13 |
+
- **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
|
| 14 |
- **Provider Agnostic**: Works with any provider supported by `litellm`.
|
| 15 |
- **Extensible**: Easily add support for new providers through a plugin-based architecture.
|
| 16 |
|
|
|
|
| 25 |
|
| 26 |
## `RotatingClient` Class
|
| 27 |
|
| 28 |
+
This is the main class for interacting with the library. It is designed to be a long-lived object that manages its own HTTP client and key usage state.
|
| 29 |
|
| 30 |
### Initialization
|
| 31 |
|
|
|
|
| 43 |
- `max_retries`: The number of times to retry a request with the *same key* if a transient server error occurs.
|
| 44 |
- `usage_file_path`: The path to the JSON file where key usage data will be stored.
|
| 45 |
|
| 46 |
+
### Concurrency and Resource Management
|
| 47 |
+
|
| 48 |
+
The `RotatingClient` is asynchronous and manages an `httpx.AsyncClient` internally. It's crucial to close the client properly to release resources. This can be done manually or by using an `async with` block.
|
| 49 |
+
|
| 50 |
+
**Manual Management:**
|
| 51 |
+
```python
|
| 52 |
+
client = RotatingClient(api_keys=api_keys)
|
| 53 |
+
# ... use the client ...
|
| 54 |
+
await client.close()
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
**Recommended (`async with`):**
|
| 58 |
+
```python
|
| 59 |
+
async with RotatingClient(api_keys=api_keys) as client:
|
| 60 |
+
# ... use the client ...
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
### Methods
|
| 64 |
|
| 65 |
#### `async def acompletion(self, **kwargs) -> Any:`
|
| 66 |
|
| 67 |
+
This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds the core logic for key acquisition, rotation, and retries.
|
| 68 |
|
| 69 |
+
- **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`. The `model` parameter is required and must be a string in the format `provider/model_name`.
|
| 70 |
- **Returns**:
|
| 71 |
- For non-streaming requests, it returns the `litellm` response object.
|
| 72 |
+
- For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE). The wrapper ensures that key locks are released and usage is recorded only after the stream is fully consumed.
|
| 73 |
|
| 74 |
**Example:**
|
| 75 |
|
|
|
|
| 79 |
|
| 80 |
async def main():
|
| 81 |
api_keys = {"gemini": ["key1", "key2"]}
|
| 82 |
+
async with RotatingClient(api_keys=api_keys) as client:
|
| 83 |
+
response = await client.acompletion(
|
| 84 |
+
model="gemini/gemini-2.5-flash-preview-05-20",
|
| 85 |
+
messages=[{"role": "user", "content": "Hello!"}]
|
| 86 |
+
)
|
| 87 |
+
print(response)
|
|
|
|
| 88 |
|
| 89 |
asyncio.run(main())
|
| 90 |
```
|
|
|
|
| 92 |
#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
|
| 93 |
|
| 94 |
Calculates the token count for a given text or list of messages using `litellm.token_counter`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
#### `async def get_available_models(self, provider: str) -> List[str]:`
|
| 97 |
|
| 98 |
+
Fetches a list of available models for a specific provider. Results are cached in memory.
|
| 99 |
|
| 100 |
+
#### `async def get_all_available_models(self, grouped: bool = True) -> Union[Dict[str, List[str]], List[str]]:`
|
| 101 |
|
| 102 |
+
Fetches a dictionary of all available models, grouped by provider, or as a single flat list if `grouped=False`.
|
| 103 |
|
| 104 |
## Error Handling and Cooldowns
|
| 105 |
|
| 106 |
+
The client uses a sophisticated error handling mechanism:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
+
- **Error Classification**: All exceptions from `litellm` are passed through a `classify_error` function to determine their type (`rate_limit`, `authentication`, `server_error`, etc.).
|
| 109 |
+
- **Server Errors**: The client will retry the request with the *same key* up to `max_retries` times, using an exponential backoff strategy.
|
| 110 |
+
- **Rotation Errors (Rate Limit, Auth, etc.)**: The client records the failure in the `UsageManager`, which applies an escalating cooldown to the key for that specific model. The client then immediately acquires a new key and continues its attempt to complete the request.
|
| 111 |
+
- **Key-Level Lockouts**: If a key fails on multiple different models, the `UsageManager` can apply a key-level lockout, taking it out of rotation entirely for a short period.
|
| 112 |
|
| 113 |
## Extending with Provider Plugins
|
| 114 |
|
| 115 |
+
The library uses a dynamic plugin system. To add support for a new provider's model list, you only need to:
|
| 116 |
|
| 117 |
+
1. **Create a new provider file** in `src/rotator_library/providers/` (e.g., `my_provider.py`).
|
| 118 |
2. **Implement the `ProviderInterface`**: Inside your new file, create a class that inherits from `ProviderInterface` and implements the `get_models` method.
|
| 119 |
|
| 120 |
```python
|
| 121 |
# src/rotator_library/providers/my_provider.py
|
| 122 |
from .provider_interface import ProviderInterface
|
| 123 |
from typing import List
|
| 124 |
+
import httpx
|
| 125 |
|
| 126 |
class MyProvider(ProviderInterface):
|
| 127 |
+
async def get_models(self, api_key: str, http_client: httpx.AsyncClient) -> List[str]:
|
| 128 |
# Logic to fetch and return a list of model names
|
| 129 |
# The model names should be prefixed with the provider name.
|
| 130 |
# e.g., ["my-provider/model-1", "my-provider/model-2"]
|
| 131 |
pass
|
| 132 |
```
|
| 133 |
|
| 134 |
+
The system will automatically discover and register your new provider.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
## Detailed Documentation
|
| 137 |
|
| 138 |
+
For a more in-depth technical explanation of the library's architecture, including the `UsageManager`'s concurrency model and the error classification system, please refer to the [Technical Documentation](../../DOCUMENTATION.md).
|