Mirrowel commited on
Commit
c9419cb
·
1 Parent(s): b0569d9

feat: Update documentation and example configurations for improved clarity and usability

Browse files
Files changed (4) hide show
  1. .env.example +4 -2
  2. DOCUMENTATION.md +87 -58
  3. README.md +133 -105
  4. src/rotator_library/README.md +49 -44
.env.example CHANGED
@@ -1,4 +1,4 @@
1
- # LiteLLM will automatically pick up these keys.
2
  # Add more keys by creating GEMINI_API_KEY_2, GEMINI_API_KEY_3, etc.
3
  GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
4
  GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
@@ -6,6 +6,8 @@ OPENROUTER_API_KEY_1="YOUR_OPENROUTER_API_KEY_1"
6
  OPENROUTER_API_KEY_2="YOUR_OPENROUTER_API_KEY_2"
7
  CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
8
  CHUTES_API_KEY_2="YOUR_CHUTES_API_KEY_2"
 
 
9
 
10
- # A secret key for your proxy server to authenticate requests
11
  PROXY_API_KEY="YOUR_PROXY_API_KEY"
 
1
+ # Library will automatically pick up these keys.
2
  # Add more keys by creating GEMINI_API_KEY_2, GEMINI_API_KEY_3, etc.
3
  GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
4
  GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
 
6
  OPENROUTER_API_KEY_2="YOUR_OPENROUTER_API_KEY_2"
7
  CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
8
  CHUTES_API_KEY_2="YOUR_CHUTES_API_KEY_2"
9
+ NVIDIA_NIM_API_KEY_1="YOUR_NVIDIA_NIM_API_KEY_1"
10
+ NVIDIA_NIM_API_KEY_2="YOUR_NVIDIA_NIM_API_KEY_2"
11
 
12
+ # A secret key for your proxy server to authenticate requests(Can be anything. Used for compatibility)
13
  PROXY_API_KEY="YOUR_PROXY_API_KEY"
DOCUMENTATION.md CHANGED
@@ -1,90 +1,119 @@
1
  # Technical Documentation: `rotating-api-key-client`
2
 
3
- This document provides a detailed technical explanation of the `rotating-api-key-client` library, its components, and its internal workings.
4
 
5
  ## 1. `client.py` - The `RotatingClient`
6
 
7
- The `RotatingClient` is the central component of the library, orchestrating API calls, key rotation, and error handling.
 
 
 
 
 
 
 
8
 
9
  ### Request Lifecycle (`acompletion`)
10
 
11
  When `acompletion` is called, it follows these steps:
12
 
13
- 1. **Model and Provider Validation**: It first checks that a `model` is specified and extracts the provider name from it (e.g., `"gemini"` from `"gemini/gemini-2.5-flash-preview-05-20"`). It ensures that API keys for this provider are available.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- 2. **Key Selection Loop**: The client enters a loop to find a valid key and complete the request.
16
- a. **Get Next Smart Key**: It calls `self.usage_manager.get_next_smart_key()` to get the least-used key for the given model that is not currently on cooldown.
17
- b. **No Key Available**: If all keys for the provider are on cooldown, it waits for 5 seconds before restarting the loop.
18
 
19
- 3. **Attempt Loop**: Once a key is selected, it enters a retry loop (`for attempt in range(self.max_retries)`):
20
- a. **API Call**: It calls `litellm.acompletion` with the selected key and the user-provided arguments.
21
- b. **Success**:
22
- - If the call is successful and **non-streaming**, it calls `self.usage_manager.record_success()`, returns the response, and the process ends.
23
- - If the call is successful and **streaming**, it returns a `_streaming_wrapper` async generator. This wrapper formats the response chunks as Server-Sent Events (SSE) and calls `self.usage_manager.record_success()` only when the stream is fully consumed.
24
- c. **Failure**: If an exception occurs:
25
- - The failure is logged using `log_failure()`.
26
- - **Server Error**: If `is_server_error()` returns `True` and there are retries left, it waits for a moment and continues to the next attempt with the *same key*.
27
- - **Unrecoverable Error**: If `is_unrecoverable_error()` returns `True`, the exception is immediately raised, terminating the process.
28
- - **Other Errors (Rate Limit, Auth, etc.)**: For any other error, it's considered a "rotation" error. `self.usage_manager.record_rotation_error()` is called to put the key on cooldown, and the inner `attempt` loop is broken. The outer `while` loop then continues, fetching a new key.
29
 
30
- ## 2. `usage_manager.py` - The `UsageManager`
31
 
32
- This class is responsible for all logic related to tracking and selecting API keys.
33
 
34
- ### Key Data Structure
 
 
 
 
 
35
 
36
- Usage data is stored in a JSON file (e.g., `key_usage.json`). Here's a conceptual view of its structure:
37
 
 
 
 
 
 
 
 
38
  ```json
39
  {
40
- "api_key_1_hash": {
41
- "last_used": "timestamp",
42
- "cooldown_until": "timestamp",
43
- "global_usage": 150,
44
- "daily_usage": {
45
- "YYYY-MM-DD": 100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  },
47
- "model_usage": {
48
- "gemini/gemini-2.5-flash-preview-05-20": 50
49
- }
50
  }
51
  }
52
  ```
53
 
54
- - **Key Hashing**: Keys are stored by their SHA256 hash to avoid exposing sensitive keys in logs or files.
55
- - `cooldown_until`: If a key fails, this timestamp is set. The key will not be selected until the current time is past this timestamp.
56
- - `model_usage`: Tracks the usage count for each specific model, which is the primary metric for the "smart" key selection.
57
-
58
- ### Core Methods
59
-
60
- - `get_next_smart_key()`: This is the key selection logic. It filters out any keys that are on cooldown and then finds the key with the lowest usage count for the requested `model`.
61
- - `record_success()`: Increments the usage counters (`global_usage`, `daily_usage`, `model_usage`) for the given key.
62
- - `record_rotation_error()`: Sets the `cooldown_until` timestamp for the given key, effectively taking it out of rotation for a short period.
63
-
64
  ## 3. `error_handler.py`
65
 
66
- This module contains functions to classify exceptions returned by `litellm`.
67
 
68
- - `is_server_error(e)`: Checks if the exception is a transient server-side error (typically a `5xx` status code) that is worth retrying with the same key.
69
- - `is_unrecoverable_error(e)`: Checks for critical errors (e.g., invalid request parameters) that should immediately stop the process. Any error that is not a server error or an unrecoverable error is treated as a "rotation" error by the client.
 
70
 
71
- ## 4. `failure_logger.py`
72
 
73
- - `log_failure()`: This function logs detailed information about a failed API request to a file in the `logs/` directory. This is crucial for debugging issues with specific keys or providers. The log includes the hashed API key, the model, the error message, and the request data.
 
74
 
75
  ## 5. `providers/` - Provider Plugins
76
 
77
- The provider plugin system allows for easy extension to support model list fetching from new LLM providers.
78
-
79
- - **`provider_interface.py`**: Defines the abstract base class `ProviderPlugin` with a single abstract method, `get_models`. Any new provider plugin must inherit from this class and implement this method.
80
- - **Implementations**: Each provider (e.g., `openai_provider.py`, `gemini_provider.py`) has its own file containing a class that implements the `ProviderPlugin` interface. The `get_models` method contains the specific logic to call the provider's API and return a list of their available models.
81
- - **`__init__.py`**: This file contains a dynamic plugin system that automatically discovers and registers any provider implementation placed in the `providers/` directory.
82
-
83
- ### Special Provider: `chutes.ai`
84
-
85
- The `chutes` provider is handled as a special case within the `RotatingClient`. Since `litellm` does not have native support for `chutes.ai`, the client performs the following modifications at runtime:
86
-
87
- 1. **Sets `api_base`**: It sets the `api_base` to `https://llm.chutes.ai/v1`.
88
- 2. **Remaps the Model**: It changes the model name from `chutes/some-model` to `openai/some-model` before passing the request to `litellm`.
89
-
90
- This allows the system to use `chutes.ai` as if it were a custom OpenAI endpoint, while still leveraging the library's key rotation and management features.
 
1
  # Technical Documentation: `rotating-api-key-client`
2
 
3
+ This document provides a detailed technical explanation of the `rotating-api-key-client` library, its components, and its internal workings. The library has evolved into a sophisticated, asynchronous client for managing LLM API keys with a strong focus on concurrency, resilience, and state management.
4
 
5
  ## 1. `client.py` - The `RotatingClient`
6
 
7
+ The `RotatingClient` is the central component, orchestrating API calls, key management, and error handling. It is designed as a long-lived, async-native object.
8
+
9
+ ### Core Responsibilities
10
+ - Managing an `httpx.AsyncClient` for non-blocking HTTP requests.
11
+ - Interfacing with the `UsageManager` to acquire and release API keys.
12
+ - Handling provider-specific request modifications.
13
+ - Executing API calls via `litellm` with a robust retry and rotation strategy.
14
+ - Providing a safe wrapper for streaming responses.
15
 
16
  ### Request Lifecycle (`acompletion`)
17
 
18
  When `acompletion` is called, it follows these steps:
19
 
20
+ 1. **Provider and Key Validation**: It extracts the provider from the `model` name and ensures keys are configured for it.
21
+
22
+ 2. **Key Acquisition Loop**: The client enters a loop to find a valid key and complete the request. It iterates through all keys for the provider until one succeeds or all have been tried.
23
+ a. **Acquire Best Key**: It calls `self.usage_manager.acquire_key()`. This is a blocking call that waits until a suitable key is available, based on the manager's tiered locking strategy (see `UsageManager` section).
24
+ b. **Prepare Request**: It prepares the `litellm` keyword arguments. This includes:
25
+ - **Request Sanitization**: Calling `sanitize_request_payload()` to remove parameters that might be unsupported by the target model, preventing errors.
26
+ - **Provider-Specific Logic**: Applying special handling for providers like Gemini (safety settings), Gemma (system prompts), and Chutes.ai (`api_base` and model name remapping).
27
+
28
+ 3. **Retry Loop**: Once a key is acquired, it enters an inner retry loop (`for attempt in range(self.max_retries)`):
29
+ a. **API Call**: It calls `litellm.acompletion` with the acquired key.
30
+ b. **Success (Non-Streaming)**:
31
+ - It calls `self.usage_manager.record_success()` to update usage stats and clear any cooldowns for the key-model pair.
32
+ - It calls `self.usage_manager.release_key()` to release the lock on the key for this model.
33
+ - It returns the response, and the process ends.
34
+ c. **Success (Streaming)**:
35
+ - It returns a `_safe_streaming_wrapper` async generator. This wrapper is critical:
36
+ - It yields SSE-formatted chunks to the consumer.
37
+ - After the stream is fully consumed, its `finally` block ensures that `record_success()` and `release_key()` are called. This guarantees that the key lock is held for the entire duration of the stream and released correctly, even if the consumer abandons the stream.
38
+ d. **Failure**: If an exception occurs:
39
+ - The failure is logged in detail by `log_failure()`.
40
+ - The exception is passed to `classify_error()` to get a structured `ClassifiedError` object.
41
+ - **Server Error**: If the error type is `server_error`, it waits with exponential backoff and retries the request with the *same key*.
42
+ - **Rotation Error (Rate Limit, Auth, etc.)**: For any other error, it's considered a rotation trigger. `self.usage_manager.record_failure()` is called to apply an escalating cooldown, and `self.usage_manager.release_key()` releases the lock. The inner `attempt` loop is broken, and the outer `while` loop continues, acquiring a new key.
43
+
44
+ ## 2. `usage_manager.py` - Stateful Concurrency & Usage Management
45
+
46
+ This class is the heart of the library's state management and concurrency control. It is a stateful, async-native service that ensures keys are used efficiently and safely across multiple concurrent requests.
47
 
48
+ ### Key Concepts
 
 
49
 
50
+ - **Asynchronous Design & Lazy Loading**: The entire class is asynchronous, using `aiofiles` for non-blocking file I/O and a `_lazy_init` pattern. The usage data from the JSON file is loaded only when the first request is made.
51
+ - **Concurrency Primitives**:
52
+ - **`filelock`**: A file-level lock (`.json.lock`) prevents race conditions if multiple *processes* are running and sharing the same usage file.
53
+ - **`asyncio.Lock` & `asyncio.Condition`**: Each key has its own `asyncio.Lock` and `asyncio.Condition` object. This enables the fine-grained, model-aware locking strategy.
 
 
 
 
 
 
54
 
55
+ ### Tiered Key Acquisition (`acquire_key`)
56
 
57
+ This method implements the core logic for selecting a key. It is a "smart" blocking call.
58
 
59
+ 1. **Filtering**: It first filters out any keys that are on a global or model-specific cooldown.
60
+ 2. **Tiering**: It categorizes the remaining, valid keys into two tiers:
61
+ - **Tier 1 (Ideal)**: Keys that are completely free (not being used by any model).
62
+ - **Tier 2 (Acceptable)**: Keys that are currently in use, but for *different models* than the one being requested.
63
+ 3. **Selection**: It attempts to acquire a lock on a key, prioritizing Tier 1 over Tier 2. Within each tier, it prioritizes the least-used key.
64
+ 4. **Waiting**: If no keys in Tier 1 or Tier 2 can be locked, it means all eligible keys are currently handling requests for the *same model*. The method then `await`s on the `asyncio.Condition` of the best available key, waiting until it is notified that the key has been released.
65
 
66
+ ### Failure Handling & Cooldowns (`record_failure`)
67
 
68
+ - **Escalating Backoff**: When a failure is recorded, it applies a cooldown that increases with the number of consecutive failures for a specific key-model pair (e.g., 10s, 30s, 60s, up to 2 hours).
69
+ - **Authentication Errors**: These are treated more severely, applying an immediate 5-minute key-level lockout.
70
+ - **Key-Level Lockouts**: If a single key accumulates 3 or more long-term (2-hour) cooldowns across different models, the manager assumes the key is compromised or disabled and applies a 5-minute global lockout on the key.
71
+
72
+ ### Data Structure
73
+
74
+ The `key_usage.json` file has a more complex structure to store this detailed state:
75
  ```json
76
  {
77
+ "api_key_hash": {
78
+ "daily": {
79
+ "date": "YYYY-MM-DD",
80
+ "models": {
81
+ "gemini/gemini-1.5-pro": {
82
+ "success_count": 10,
83
+ "prompt_tokens": 5000,
84
+ "completion_tokens": 10000,
85
+ "approx_cost": 0.075
86
+ }
87
+ }
88
+ },
89
+ "global": { /* ... similar to daily, but accumulates over time ... */ },
90
+ "model_cooldowns": {
91
+ "gemini/gemini-1.5-flash": 1719987600.0
92
+ },
93
+ "failures": {
94
+ "gemini/gemini-1.5-flash": {
95
+ "consecutive_failures": 2
96
+ }
97
  },
98
+ "key_cooldown_until": null,
99
+ "last_daily_reset": "YYYY-MM-DD"
 
100
  }
101
  }
102
  ```
103
 
 
 
 
 
 
 
 
 
 
 
104
  ## 3. `error_handler.py`
105
 
106
+ This module provides a centralized function, `classify_error`, which is a significant improvement over the previous boolean checks.
107
 
108
+ - It takes a raw exception from `litellm` and returns a `ClassifiedError` data object.
109
+ - This object contains the `error_type` (e.g., `'rate_limit'`, `'authentication'`, `'server_error'`), the original exception, the status code, and any `retry_after` information extracted from the error message.
110
+ - This structured classification allows the `RotatingClient` to make more intelligent decisions about whether to retry with the same key or rotate to a new one.
111
 
112
+ ## 4. `request_sanitizer.py` (New Module)
113
 
114
+ - This module's purpose is to prevent `InvalidRequestError` exceptions from `litellm` that occur when a payload contains parameters not supported by the target model (e.g., sending a `thinking` parameter to a model that doesn't support it).
115
+ - The `sanitize_request_payload` function is called just before `litellm.acompletion` to strip out any such unsupported parameters, making the system more robust.
116
 
117
  ## 5. `providers/` - Provider Plugins
118
 
119
+ The provider plugin system remains for fetching model lists. The interface now correctly specifies that the `get_models` method receives an `httpx.AsyncClient` instance, which it should use to make its API calls. This ensures all HTTP traffic goes through the client's managed session.
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -7,148 +7,179 @@ This project provides a robust solution for managing and rotating API keys for v
7
 
8
  ## Features
9
 
10
- - **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
11
- - **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
12
- - **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
13
- - **Usage Tracking**: Monitors daily and global usage for each API key.
 
 
14
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
15
- - **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.
16
 
17
- ## How It Works
18
 
19
- The core of this project is the `RotatingClient` library, which manages a pool of API keys. When a request is made, the client:
20
 
21
- 1. **Selects the Best Key**: It identifies the key with the lowest usage count that is not currently in a cooldown period.
22
- 2. **Makes the Request**: It uses the selected key to make the API call via `litellm`.
23
- 3. **Handles Errors**:
24
- - If a **retriable error** (like a 500 server error) occurs, it waits and retries the request.
25
- - If a **non-retriable error** (like a rate limit or invalid key error) occurs, it places the key on a temporary cooldown and selects a new key for the next attempt.
26
- 4. **Tracks Usage**: On a successful request, it records the usage for the key.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- The FastAPI proxy application exposes this functionality through an API endpoint that mimics the OpenAI API, making it easy to integrate with existing tools and applications.
 
 
 
 
29
 
30
- ## Project Structure
31
 
 
 
 
32
  ```
33
- .
34
- ├── logs/ # Logs for failed requests
35
- ├── src/
36
- │ ├── proxy_app/ # The FastAPI proxy application
37
- │ │ └── main.py
38
- │ └── rotator_library/ # The rotating-api-key-client library
39
- │ ├── __init__.py
40
- │ ├── client.py
41
- │ ├── error_handler.py
42
- │ ├── failure_logger.py
43
- │ ├── usage_manager.py
44
- │ ├── providers/
45
- │ └── ...
46
- ├── .env.example
47
- ├── README.md
48
- └── requirements.txt
49
  ```
50
 
51
- ## Setup and Installation
 
 
52
 
53
- 1. **Clone the repository:**
54
- ```bash
55
- git clone <repository-url>
56
- cd <repository-name>
57
- ```
58
 
59
- 2. **Create a virtual environment:**
60
- ```bash
61
- python -m venv venv
62
- source venv/bin/activate # On Windows, use `venv\Scripts\activate`
63
- ```
64
 
65
- 3. **Install dependencies:**
66
- The `requirements.txt` file includes all necessary packages and installs the `rotator_library` in editable mode (`-e`), allowing for simultaneous development of the library and the proxy.
67
- ```bash
68
- pip install -r requirements.txt
69
- ```
70
 
71
- 4. **Configure environment variables:**
72
- Create a `.env` file by copying the example file:
73
- ```bash
74
- cp .env.example .env
75
- ```
76
- Edit the `.env` file to add your API keys. The proxy automatically detects keys for different providers based on the naming convention `PROVIDER_API_KEY_N`.
77
 
78
- ```env
79
- # A secret key to protect your proxy from unauthorized access
80
- PROXY_API_KEY="your-secret-proxy-key"
81
 
82
- # Add API keys for each provider. They will be rotated automatically.
83
- GEMINI_API_KEY_1="your-gemini-api-key-1"
84
- GEMINI_API_KEY_2="your-gemini-api-key-2"
85
 
86
- OPENAI_API_KEY_1="your-openai-api-key-1"
87
-
88
- OPENROUTER_API_KEY_1="your-openrouter-api-key-1"
89
 
90
- # chutes.ai is used as a custom OpenAI endpoint
91
- CHUTES_API_KEY_1="your-chutes-api-key-1"
92
- ```
 
93
 
94
- ## Running the Proxy
95
 
96
- To start the proxy application, run the following command:
97
  ```bash
98
  uvicorn src.proxy_app.main:app --reload
99
  ```
100
- The proxy will be available at `http://127.0.0.1:8000`.
101
 
102
- ## Using the Proxy
 
 
103
 
104
- You can make requests to the proxy as if it were the OpenAI API. Remember to include your `PROXY_API_KEY` in the `Authorization` header.
105
 
106
- The `model` parameter must be specified in the format `provider/model_name` (e.g., `gemini/gemini-2.5-flash-preview-05-20`, `openai/gpt-4`, `openrouter/google/gemini-flash-1.5`, `chutes/deepseek-ai/DeepSeek-R1-0528`).
 
 
107
 
108
- ### Example with `curl` (Non-Streaming):
109
  ```bash
110
  curl -X POST http://127.0.0.1:8000/v1/chat/completions \
111
  -H "Content-Type: application/json" \
112
- -H "Authorization: Bearer your-secret-proxy-key" \
113
  -d '{
114
  "model": "gemini/gemini-2.5-flash-preview-05-20",
115
  "messages": [{"role": "user", "content": "What is the capital of France?"}]
116
  }'
117
  ```
118
 
119
- ### Example with `curl` (Streaming):
120
- ```bash
121
- curl -X POST http://127.0.0.1:8000/v1/chat/completions \
122
- -H "Content-Type: application/json" \
123
- -H "Authorization: Bearer your-secret-proxy-key" \
124
- -d '{
125
- "model": "gemini/gemini-2.5-flash-preview-05-20",
126
- "messages": [{"role": "user", "content": "Write a short story about a robot."}],
127
- "stream": true
128
- }'
129
- ```
130
 
131
- ### Example with Python `requests`:
132
  ```python
133
- import requests
134
- import json
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
- proxy_url = "http://127.0.0.1:8000/v1/chat/completions"
137
- proxy_key = "your-secret-proxy-key"
138
 
139
- headers = {
140
- "Content-Type": "application/json",
141
- "Authorization": f"Bearer {proxy_key}"
142
- }
143
 
144
- data = {
145
- "model": "gemini/gemini-2.5-flash-preview-05-20",
146
- "messages": [{"role": "user", "content": "What is the capital of France?"}]
147
- }
148
 
149
- response = requests.post(proxy_url, headers=headers, data=json.dumps(data))
150
- print(response.json())
 
151
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  ## Troubleshooting
154
 
@@ -156,10 +187,7 @@ print(response.json())
156
  - **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
157
  - **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
158
 
159
- ## Using the Library in Other Projects
160
-
161
- The `rotating-api-key-client` is a standalone library that can be integrated into any Python project. For detailed documentation on how to use it, please refer to its `README.md` file located at `src/rotator_library/README.md`.
162
-
163
- ## Detailed Documentation
164
 
165
- For a more in-depth technical explanation of the `rotating-api-key-client` library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
 
 
7
 
8
  ## Features
9
 
10
+ - **Advanced Concurrency Control**: A single API key can handle multiple concurrent requests to different models, maximizing throughput.
11
+ - **Smart Key Rotation**: Intelligently selects the least-used, available API key to distribute request loads evenly.
12
+ - **Escalating Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
13
+ - **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes) with exponential backoff.
14
+ - **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
15
+ - **Request Logging**: Optional logging of full request and response payloads for easy debugging.
16
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
17
+ - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
18
 
19
+ ## Quick Start Guide
20
 
21
+ This guide will get you up and running in just a few minutes.
22
 
23
+ ### 1. Setup
24
+
25
+ First, clone the repository and install the required dependencies.
26
+
27
+ **For Linux/macOS:**
28
+ ```bash
29
+ # Clone the repository
30
+ git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
31
+ cd LLM-API-Key-Proxy
32
+
33
+ # Create and activate a virtual environment
34
+ python3 -m venv venv
35
+ source venv/bin/activate
36
+
37
+ # Install dependencies
38
+ pip install -r requirements.txt
39
+ ```
40
+
41
+ **For Windows:**
42
+ ```powershell
43
+ # Clone the repository
44
+ git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
45
+ cd LLM-API-Key-Proxy
46
+
47
+ # Create and activate a virtual environment
48
+ python -m venv venv
49
+ .\venv\Scripts\Activate.ps1
50
 
51
+ # Install dependencies
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ ### 2. Configure API Keys
56
 
57
+ Next, create your `.env` file by copying the provided example. This file is where you will store all your secret keys.
58
 
59
+ **For Linux/macOS:**
60
+ ```bash
61
+ cp .env.example .env
62
  ```
63
+
64
+ **For Windows:**
65
+ ```powershell
66
+ copy .env.example .env
 
 
 
 
 
 
 
 
 
 
 
 
67
  ```
68
 
69
+ Now, open the new `.env` file and replace the placeholder values with your actual API keys.
70
+
71
+ **Refer to the `.env.example` file for the correct format and a full list of supported providers.**
72
 
73
+ The two main types of keys are:
 
 
 
 
74
 
75
+ 1. **`PROXY_API_KEY`**: This is a secret key *you create*. It is used to authorize requests to *your* proxy, preventing unauthorized use.
76
+ 2. **Provider Keys**: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g., `GEMINI_API_KEY_1`).
 
 
 
77
 
78
+ **Example `.env` configuration:**
79
+ ```env
80
+ # A secret key for your proxy server to authenticate requests.
81
+ # This can be any secret string you choose.
82
+ PROXY_API_KEY="YOUR_PROXY_API_KEY"
83
 
84
+ # --- Provider API Keys ---
85
+ # Add your keys from various providers below.
86
+ # You can add multiple keys for one provider by numbering them (e.g., _1, _2).
 
 
 
87
 
88
+ GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
89
+ GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
 
90
 
91
+ OPENROUTER_API_KEY_1="YOUR_OPENROUTER_API_KEY_1"
 
 
92
 
93
+ NVIDIA_NIM_API_KEY_1="YOUR_NVIDIA_NIM_API_KEY_1"
 
 
94
 
95
+ CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
96
+ ```
97
+
98
+ ### 3. Run the Proxy
99
 
100
+ Start the FastAPI server with `uvicorn`. The `--reload` flag will automatically restart the server when you make code changes.
101
 
 
102
  ```bash
103
  uvicorn src.proxy_app.main:app --reload
104
  ```
 
105
 
106
+ The proxy is now running and available at `http://127.0.0.1:8000`.
107
+
108
+ ### 4. Make a Request
109
 
110
+ You can now send requests to the proxy. The endpoint is `http://127.0.0.1:8000/v1/chat/completions`.
111
 
112
+ Remember to:
113
+ 1. Set the `Authorization` header to `Bearer your-super-secret-proxy-key`.
114
+ 2. Specify the `model` in the format `provider/model_name`.
115
 
116
+ Here is an example using `curl`:
117
  ```bash
118
  curl -X POST http://127.0.0.1:8000/v1/chat/completions \
119
  -H "Content-Type: application/json" \
120
+ -H "Authorization: Bearer your-super-secret-proxy-key" \
121
  -d '{
122
  "model": "gemini/gemini-2.5-flash-preview-05-20",
123
  "messages": [{"role": "user", "content": "What is the capital of France?"}]
124
  }'
125
  ```
126
 
127
+ ---
128
+
129
+ ## Advanced Usage
130
+
131
+ ### Using with the OpenAI Python Library
132
+
133
+ The proxy is OpenAI-compatible, so you can use it directly with the `openai` Python client. This is the recommended way to integrate the proxy into your applications.
 
 
 
 
134
 
 
135
  ```python
136
+ import openai
137
+
138
+ # Point the client to your local proxy
139
+ client = openai.OpenAI(
140
+ base_url="http://127.0.0.1:8000/v1",
141
+ api_key="your-super-secret-proxy-key" # Use your proxy key here
142
+ )
143
+
144
+ # Make a request
145
+ response = client.chat.completions.create(
146
+ model="gemini/gemini-2.5-flash-preview-05-20", # Specify provider and model
147
+ messages=[
148
+ {"role": "user", "content": "Write a short poem about space."}
149
+ ]
150
+ )
151
+
152
+ print(response.choices[0].message.content)
153
+ ```
154
 
155
+ ### Available API Endpoints
 
156
 
157
+ - `POST /v1/chat/completions`: The main endpoint for making chat requests.
158
+ - `GET /v1/models`: Returns a list of all available models from your configured providers.
159
+ - `GET /v1/providers`: Returns a list of all configured providers.
160
+ - `POST /v1/token-count`: Calculates the token count for a given message payload.
161
 
162
+ ### Enabling Request Logging
163
+
164
+ For debugging purposes, you can log the full request and response for every API call. To enable this, open `src/proxy_app/main.py` and change the following line:
 
165
 
166
+ ```python
167
+ # Set to True to enable request/response logging
168
+ ENABLE_REQUEST_LOGGING = True
169
  ```
170
+ Logs will be saved in the `logs/` directory.
171
+
172
+ ## How It Works
173
+
174
+ The core of this project is the `RotatingClient` library, which manages a pool of API keys with a sophisticated concurrency model. When a request is made, the client:
175
+
176
+ 1. **Acquires the Best Key**: It requests the best available key from the `UsageManager`. The manager uses a tiered locking strategy to find a key that is not on cooldown and preferably not in use. If a key is busy with another request for the *same model*, it waits. Otherwise, it allows concurrent use for *different models*.
177
+ 2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
178
+ 3. **Handles Errors**:
179
+ - It uses a `classify_error` function to determine the failure type.
180
+ - For **server errors**, it retries the request with the same key using exponential backoff.
181
+ - For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
182
+ 4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
183
 
184
  ## Troubleshooting
185
 
 
187
  - **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
188
  - **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
189
 
190
+ ## Library and Technical Docs
 
 
 
 
191
 
192
+ - **Using the Library**: For documentation on how to use the `rotating-api-key-client` library directly in your own Python projects, please refer to its [README.md](src/rotator_library/README.md).
193
+ - **Technical Details**: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
src/rotator_library/README.md CHANGED
@@ -1,13 +1,16 @@
1
  # Rotating API Key Client
2
 
3
- A simple, thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient and efficient.
4
 
5
  ## Features
6
 
7
- - **Smart Key Rotation**: Automatically uses the least-used key to distribute load.
8
- - **Automatic Retries**: Retries requests on transient server errors.
9
- - **Per-Model Cooldowns**: If a key fails for a specific model (e.g., due to rate limits), it is only put on cooldown for that model, allowing it to be used with other models.
10
- - **Usage Tracking**: Tracks daily and global usage for each key.
 
 
 
11
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
12
  - **Extensible**: Easily add support for new providers through a plugin-based architecture.
13
 
@@ -22,7 +25,7 @@ pip install -e .
22
 
23
  ## `RotatingClient` Class
24
 
25
- This is the main class for interacting with the library.
26
 
27
  ### Initialization
28
 
@@ -40,16 +43,33 @@ client = RotatingClient(
40
  - `max_retries`: The number of times to retry a request with the *same key* if a transient server error occurs.
41
  - `usage_file_path`: The path to the JSON file where key usage data will be stored.
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ### Methods
44
 
45
  #### `async def acompletion(self, **kwargs) -> Any:`
46
 
47
- This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds key rotation and retry logic.
48
 
49
- - **Parameters**: Accepts the same keyword arguments as `litellm.acompletion` (e.g., `messages`, `stream`). The `model` parameter is required and must be a string in the format `provider/model_name` (e.g., `"gemini/gemini-2.5-flash-preview-05-20"`, `"openrouter/google/gemini-flash-1.5"`, `"chutes/deepseek-ai/DeepSeek-R1-0528"`).
50
  - **Returns**:
51
  - For non-streaming requests, it returns the `litellm` response object.
52
- - For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE).
53
 
54
  **Example:**
55
 
@@ -59,13 +79,12 @@ from rotating_api_key_client import RotatingClient
59
 
60
  async def main():
61
  api_keys = {"gemini": ["key1", "key2"]}
62
- client = RotatingClient(api_keys=api_keys)
63
-
64
- response = await client.acompletion(
65
- model="gemini/gemini-2.5-flash-preview-05-20",
66
- messages=[{"role": "user", "content": "Hello!"}]
67
- )
68
- print(response)
69
 
70
  asyncio.run(main())
71
  ```
@@ -73,61 +92,47 @@ asyncio.run(main())
73
  #### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
74
 
75
  Calculates the token count for a given text or list of messages using `litellm.token_counter`.
76
- The `model` parameter is required and must be a string in the format `provider/model_name` (e.g., `"gemini/gemini-2.5-flash-preview-05-20"`).
77
- **Example:**
78
-
79
- ```python
80
- count = client.token_count(
81
- model="gemini/gemini-2.5-flash-preview-05-20",
82
- messages=[{"role": "user", "content": "Count these tokens."}]
83
- )
84
- print(f"Token count: {count}")
85
- ```
86
 
87
  #### `async def get_available_models(self, provider: str) -> List[str]:`
88
 
89
- Fetches a list of available models for a specific provider. Results are cached.
90
 
91
- #### `async def get_all_available_models(self) -> Dict[str, List[str]]:`
92
 
93
- Fetches a dictionary of all available models, grouped by provider.
94
 
95
  ## Error Handling and Cooldowns
96
 
97
- The client is designed to handle errors gracefully:
98
-
99
- - **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
100
- - **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown for that specific model and retry the request with a different key. This ensures that a single model failure does not sideline a key for all other models.
101
- - **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
102
 
103
- Cooldowns are managed by the `UsageManager` on a per-model basis, preventing failing keys from being used repeatedly for models they have recently failed with. Upon a successful call, any existing cooldown for that key-model pair is cleared.
 
 
 
104
 
105
  ## Extending with Provider Plugins
106
 
107
- The library uses a dynamic plugin system. To add support for a new provider, you only need to do two things:
108
 
109
- 1. **Create a new provider file** in `src/rotator_library/providers/` (e.g., `my_provider.py`). The name of the file (without `_provider.py`) will be used as the provider name (e.g., `my_provider`).
110
  2. **Implement the `ProviderInterface`**: Inside your new file, create a class that inherits from `ProviderInterface` and implements the `get_models` method.
111
 
112
  ```python
113
  # src/rotator_library/providers/my_provider.py
114
  from .provider_interface import ProviderInterface
115
  from typing import List
 
116
 
117
  class MyProvider(ProviderInterface):
118
- async def get_models(self, api_key: str) -> List[str]:
119
  # Logic to fetch and return a list of model names
120
  # The model names should be prefixed with the provider name.
121
  # e.g., ["my-provider/model-1", "my-provider/model-2"]
122
  pass
123
  ```
124
 
125
- The system will automatically discover and register your new provider when the library is imported.
126
-
127
- ### Special Case: `chutes.ai`
128
-
129
- The `chutes` provider is handled as a special case. Since `litellm` does not support it directly, the `RotatingClient` modifies the request by setting the `api_base` to `https://llm.chutes.ai/v1` and remapping the model from `chutes/model-name` to `openai/model-name`. This allows `chutes.ai` to be used as a custom OpenAI-compatible endpoint.
130
 
131
  ## Detailed Documentation
132
 
133
- For a more in-depth technical explanation of the `rotating-api-key-client` library's architecture, components, and internal workings, please refer to the [Technical Documentation](../../DOCUMENTATION.md).
 
1
  # Rotating API Key Client
2
 
3
+ A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
4
 
5
  ## Features
6
 
7
+ - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
+ - **Advanced Concurrency Control**: A single key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety.
9
+ - **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy.
10
+ - **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model.
11
+ - **Automatic Retries**: Retries requests on transient server errors with exponential backoff.
12
+ - **Detailed Usage Tracking**: Tracks daily and global usage for each key, including token counts and approximate cost.
13
+ - **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
14
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
15
  - **Extensible**: Easily add support for new providers through a plugin-based architecture.
16
 
 
25
 
26
  ## `RotatingClient` Class
27
 
28
+ This is the main class for interacting with the library. It is designed to be a long-lived object that manages its own HTTP client and key usage state.
29
 
30
  ### Initialization
31
 
 
43
  - `max_retries`: The number of times to retry a request with the *same key* if a transient server error occurs.
44
  - `usage_file_path`: The path to the JSON file where key usage data will be stored.
45
 
46
+ ### Concurrency and Resource Management
47
+
48
+ The `RotatingClient` is asynchronous and manages an `httpx.AsyncClient` internally. It's crucial to close the client properly to release resources. This can be done manually or by using an `async with` block.
49
+
50
+ **Manual Management:**
51
+ ```python
52
+ client = RotatingClient(api_keys=api_keys)
53
+ # ... use the client ...
54
+ await client.close()
55
+ ```
56
+
57
+ **Recommended (`async with`):**
58
+ ```python
59
+ async with RotatingClient(api_keys=api_keys) as client:
60
+ # ... use the client ...
61
+ ```
62
+
63
  ### Methods
64
 
65
  #### `async def acompletion(self, **kwargs) -> Any:`
66
 
67
+ This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds the core logic for key acquisition, rotation, and retries.
68
 
69
+ - **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`. The `model` parameter is required and must be a string in the format `provider/model_name`.
70
  - **Returns**:
71
  - For non-streaming requests, it returns the `litellm` response object.
72
+ - For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE). The wrapper ensures that key locks are released and usage is recorded only after the stream is fully consumed.
73
 
74
  **Example:**
75
 
 
79
 
80
  async def main():
81
  api_keys = {"gemini": ["key1", "key2"]}
82
+ async with RotatingClient(api_keys=api_keys) as client:
83
+ response = await client.acompletion(
84
+ model="gemini/gemini-2.5-flash-preview-05-20",
85
+ messages=[{"role": "user", "content": "Hello!"}]
86
+ )
87
+ print(response)
 
88
 
89
  asyncio.run(main())
90
  ```
 
92
  #### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
93
 
94
  Calculates the token count for a given text or list of messages using `litellm.token_counter`.
 
 
 
 
 
 
 
 
 
 
95
 
96
  #### `async def get_available_models(self, provider: str) -> List[str]:`
97
 
98
+ Fetches a list of available models for a specific provider. Results are cached in memory.
99
 
100
+ #### `async def get_all_available_models(self, grouped: bool = True) -> Union[Dict[str, List[str]], List[str]]:`
101
 
102
+ Fetches a dictionary of all available models, grouped by provider, or as a single flat list if `grouped=False`.
103
 
104
  ## Error Handling and Cooldowns
105
 
106
+ The client uses a sophisticated error handling mechanism:
 
 
 
 
107
 
108
+ - **Error Classification**: All exceptions from `litellm` are passed through a `classify_error` function to determine their type (`rate_limit`, `authentication`, `server_error`, etc.).
109
+ - **Server Errors**: The client will retry the request with the *same key* up to `max_retries` times, using an exponential backoff strategy.
110
+ - **Rotation Errors (Rate Limit, Auth, etc.)**: The client records the failure in the `UsageManager`, which applies an escalating cooldown to the key for that specific model. The client then immediately acquires a new key and continues its attempt to complete the request.
111
+ - **Key-Level Lockouts**: If a key fails on multiple different models, the `UsageManager` can apply a key-level lockout, taking it out of rotation entirely for a short period.
112
 
113
  ## Extending with Provider Plugins
114
 
115
+ The library uses a dynamic plugin system. To add support for a new provider's model list, you only need to:
116
 
117
+ 1. **Create a new provider file** in `src/rotator_library/providers/` (e.g., `my_provider.py`).
118
  2. **Implement the `ProviderInterface`**: Inside your new file, create a class that inherits from `ProviderInterface` and implements the `get_models` method.
119
 
120
  ```python
121
  # src/rotator_library/providers/my_provider.py
122
  from .provider_interface import ProviderInterface
123
  from typing import List
124
+ import httpx
125
 
126
  class MyProvider(ProviderInterface):
127
+ async def get_models(self, api_key: str, http_client: httpx.AsyncClient) -> List[str]:
128
  # Logic to fetch and return a list of model names
129
  # The model names should be prefixed with the provider name.
130
  # e.g., ["my-provider/model-1", "my-provider/model-2"]
131
  pass
132
  ```
133
 
134
+ The system will automatically discover and register your new provider.
 
 
 
 
135
 
136
  ## Detailed Documentation
137
 
138
+ For a more in-depth technical explanation of the library's architecture, including the `UsageManager`'s concurrency model and the error classification system, please refer to the [Technical Documentation](../../DOCUMENTATION.md).