Mirrowel commited on
Commit
7ba1fcd
·
1 Parent(s): d195a5f

docs: Big documentation update part

Browse files
Files changed (3) hide show
  1. DOCUMENTATION.md +93 -58
  2. README.md +72 -38
  3. src/rotator_library/README.md +51 -35
DOCUMENTATION.md CHANGED
@@ -1,73 +1,85 @@
1
- # Technical Documentation: `rotating-api-key-client`
2
 
3
- This document provides a detailed technical explanation of the `rotating-api-key-client` library, its components, and its internal workings. The library has evolved into a sophisticated, asynchronous client for managing LLM API keys with a strong focus on concurrency, resilience, and state management.
4
 
5
- ## 1. `client.py` - The `RotatingClient`
6
 
7
- The `RotatingClient` is the central component, orchestrating API calls, key management, and error handling. It is designed as a long-lived, async-native object.
8
 
9
- ### Core Responsibilities
10
- - Managing an `httpx.AsyncClient` for non-blocking HTTP requests.
11
- - Interfacing with the `UsageManager` to acquire and release API keys.
12
- - Handling provider-specific request modifications.
13
- - Executing API calls via `litellm` with a robust retry and rotation strategy.
14
- - Providing a safe wrapper for streaming responses.
15
 
16
- ### Request Lifecycle (`acompletion`)
17
 
18
- When `acompletion` is called, it follows these steps:
19
 
20
- 1. **Provider and Key Validation**: It extracts the provider from the `model` name and ensures keys are configured for it.
21
 
22
- 2. **Key Acquisition Loop**: The client enters a loop to find a valid key and complete the request. It iterates through all keys for the provider until one succeeds or all have been tried.
23
- a. **Acquire Best Key**: It calls `self.usage_manager.acquire_key()`. This is a blocking call that waits until a suitable key is available, based on the manager's tiered locking strategy (see `UsageManager` section).
24
- b. **Prepare Request**: It prepares the `litellm` keyword arguments. This includes:
25
- - **Request Sanitization**: Calling `sanitize_request_payload()` to remove parameters that might be unsupported by the target model, preventing errors.
26
- - **Provider-Specific Logic**: Applying special handling for providers like Gemini (safety settings), Gemma (system prompts), and Chutes.ai (`api_base` and model name remapping).
27
 
28
- 3. **Retry Loop**: Once a key is acquired, it enters an inner retry loop (`for attempt in range(self.max_retries)`):
29
- a. **API Call**: It calls `litellm.acompletion` with the acquired key.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  b. **Success (Non-Streaming)**:
31
- - It calls `self.usage_manager.record_success()` to update usage stats and clear any cooldowns for the key-model pair.
32
- - It calls `self.usage_manager.release_key()` to release the lock on the key for this model.
33
  - It returns the response, and the process ends.
34
  c. **Success (Streaming)**:
35
- - It returns a `_safe_streaming_wrapper` async generator. This wrapper is critical:
36
  - It yields SSE-formatted chunks to the consumer.
37
- - After the stream is fully consumed, its `finally` block ensures that `record_success()` and `release_key()` are called. This guarantees that the key lock is held for the entire duration of the stream and released correctly, even if the consumer abandons the stream.
 
38
  d. **Failure**: If an exception occurs:
39
- - The failure is logged in detail by `log_failure()`.
40
  - The exception is passed to `classify_error()` to get a structured `ClassifiedError` object.
41
- - **Server Error**: If the error type is `server_error`, it waits with exponential backoff and retries the request with the *same key*.
42
- - **Rotation Error (Rate Limit, Auth, etc.)**: For any other error, it's considered a rotation trigger. `self.usage_manager.record_failure()` is called to apply an escalating cooldown, and `self.usage_manager.release_key()` releases the lock. The inner `attempt` loop is broken, and the outer `while` loop continues, acquiring a new key.
43
 
44
- ## 2. `usage_manager.py` - Stateful Concurrency & Usage Management
45
 
46
- This class is the heart of the library's state management and concurrency control. It is a stateful, async-native service that ensures keys are used efficiently and safely across multiple concurrent requests.
47
 
48
- ### Key Concepts
49
 
50
- - **Asynchronous Design & Lazy Loading**: The entire class is asynchronous, using `aiofiles` for non-blocking file I/O and a `_lazy_init` pattern. The usage data from the JSON file is loaded only when the first request is made.
51
- - **Concurrency Primitives**:
52
- - **`filelock`**: A file-level lock (`.json.lock`) prevents race conditions if multiple *processes* are running and sharing the same usage file.
53
- - **`asyncio.Lock` & `asyncio.Condition`**: Each key has its own `asyncio.Lock` and `asyncio.Condition` object. This enables the fine-grained, model-aware locking strategy.
54
 
55
- ### Tiered Key Acquisition (`acquire_key`)
56
 
57
- This method implements the core logic for selecting a key. It is a "smart" blocking call.
58
 
59
  1. **Filtering**: It first filters out any keys that are on a global or model-specific cooldown.
60
  2. **Tiering**: It categorizes the remaining, valid keys into two tiers:
61
  - **Tier 1 (Ideal)**: Keys that are completely free (not being used by any model).
62
- - **Tier 2 (Acceptable)**: Keys that are currently in use, but for *different models* than the one being requested.
63
- 3. **Selection**: It attempts to acquire a lock on a key, prioritizing Tier 1 over Tier 2. Within each tier, it prioritizes the least-used key.
64
- 4. **Waiting**: If no keys in Tier 1 or Tier 2 can be locked, it means all eligible keys are currently handling requests for the *same model*. The method then `await`s on the `asyncio.Condition` of the best available key, waiting until it is notified that the key has been released.
65
 
66
- ### Failure Handling & Cooldowns (`record_failure`)
67
 
68
- - **Escalating Backoff**: When a failure is recorded, it applies a cooldown that increases with the number of consecutive failures for a specific key-model pair (e.g., 10s, 30s, 60s, up to 2 hours).
69
- - **Authentication Errors**: These are treated more severely, applying an immediate 5-minute key-level lockout.
70
- - **Key-Level Lockouts**: If a single key accumulates 3 or more long-term (2-hour) cooldowns across different models, the manager assumes the key is compromised or disabled and applies a 5-minute global lockout on the key.
71
 
72
  ### Data Structure
73
 
@@ -103,29 +115,52 @@ The `key_usage.json` file has a more complex structure to store this detailed st
103
 
104
  ## 3. `error_handler.py`
105
 
106
- This module provides a centralized function, `classify_error`, which is a significant improvement over the previous boolean checks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
- - It takes a raw exception from `litellm` and returns a `ClassifiedError` data object.
109
- - This object contains the `error_type` (e.g., `'rate_limit'`, `'authentication'`, `'server_error'`), the original exception, the status code, and any `retry_after` information extracted from the error message.
110
- - This structured classification allows the `RotatingClient` to make more intelligent decisions about whether to retry with the same key or rotate to a new one.
111
 
112
- ## 4. `request_sanitizer.py` (New Module)
113
 
114
- - This module's purpose is to prevent `InvalidRequestError` exceptions from `litellm` that occur when a payload contains parameters not supported by the target model (e.g., sending a `thinking` parameter to a model that doesn't support it).
115
- - The `sanitize_request_payload` function is called just before `litellm.acompletion` to strip out any such unsupported parameters, making the system more robust.
 
 
 
116
 
117
- ## 5. `providers/` - Provider Plugins
118
 
119
- The provider plugin system remains for fetching model lists. The interface now correctly specifies that the `get_models` method receives an `httpx.AsyncClient` instance, which it should use to make its API calls. This ensures all HTTP traffic goes through the client's managed session.
120
 
121
- ## 6. `proxy_app/` - The Proxy Application
122
 
123
- The `proxy_app` directory contains the FastAPI application that serves the rotating client.
 
 
124
 
125
- ### `main.py` - The FastAPI App
126
 
127
- This file contains the FastAPI application that exposes the `RotatingClient` through an OpenAI-compatible API.
128
 
129
- #### Command-Line Arguments
130
 
131
- - `--enable-request-logging`: This flag enables logging of all incoming requests and outgoing responses to the `logs/` directory. This is useful for debugging and monitoring the proxy's activity. By default, this is disabled.
 
1
+ # Technical Documentation: API Key Proxy & Rotator Library
2
 
3
+ This document provides a detailed technical explanation of the API Key Proxy and the `rotating-api-key-client` library, covering their architecture, components, and internal workings.
4
 
5
+ ## 1. Architecture Overview
6
 
7
+ The project is a monorepo containing two primary components:
8
 
9
+ 1. **`rotator_library`**: A standalone, reusable Python library for intelligent API key rotation and management.
10
+ 2. **`proxy_app`**: A FastAPI application that consumes the `rotator_library` and exposes its functionality through an OpenAI-compatible web API.
 
 
 
 
11
 
12
+ This architecture separates the core rotation logic from the web-serving layer, making the library portable and the proxy a clean implementation of its features.
13
 
14
+ ---
15
 
16
+ ## 2. `rotator_library` - The Core Engine
17
 
18
+ This library is the heart of the project, containing all the logic for key rotation, usage tracking, and provider management.
 
 
 
 
19
 
20
+ ### 2.1. `client.py` - The `RotatingClient`
21
+
22
+ The `RotatingClient` is the central class that orchestrates all operations. It is designed as a long-lived, async-native object.
23
+
24
+ #### Core Responsibilities
25
+
26
+ * Managing a shared `httpx.AsyncClient` for all non-blocking HTTP requests.
27
+ * Interfacing with the `UsageManager` to acquire and release API keys.
28
+ * Dynamically loading and using provider-specific plugins from the `providers/` directory.
29
+ * Executing API calls via `litellm` with a robust retry and rotation strategy.
30
+ * Providing a safe, stateful wrapper for handling streaming responses.
31
+
32
+ #### Request Lifecycle (`acompletion` & `aembedding`)
33
+
34
+ When `acompletion` or `aembedding` is called, it follows a sophisticated, multi-layered process:
35
+
36
+ 1. **Provider & Key Validation**: It extracts the provider from the `model` name (e.g., `"gemini/gemini-1.5-pro"` -> `"gemini"`) and ensures keys are configured for it.
37
+
38
+ 2. **Key Acquisition Loop**: The client enters a `while` loop that attempts to find a valid key and complete the request. It iterates until one key succeeds or all have been tried.
39
+ a. **Acquire Best Key**: It calls `self.usage_manager.acquire_key()`. This is a crucial, potentially blocking call that waits until a suitable key is available, based on the manager's tiered locking strategy (see `UsageManager` section).
40
+ b. **Prepare Request**: It prepares the `litellm` keyword arguments. This includes applying provider-specific logic (e.g., remapping safety settings for Gemini, handling `api_base` for Chutes.ai) and sanitizing the payload to remove unsupported parameters.
41
+
42
+ 3. **Retry Loop**: Once a key is acquired, it enters an inner `for` loop (`for attempt in range(self.max_retries)`):
43
+ a. **API Call**: It calls `litellm.acompletion` or `litellm.aembedding`.
44
  b. **Success (Non-Streaming)**:
45
+ - It calls `self.usage_manager.record_success()` to update usage stats and clear any cooldowns.
46
+ - It calls `self.usage_manager.release_key()` to release the lock.
47
  - It returns the response, and the process ends.
48
  c. **Success (Streaming)**:
49
+ - It returns the `_safe_streaming_wrapper` async generator. This wrapper is critical:
50
  - It yields SSE-formatted chunks to the consumer.
51
+ - It can reassemble fragmented JSON chunks and detect errors mid-stream.
52
+ - Its `finally` block ensures that `record_success()` and `release_key()` are called *only after the stream is fully consumed or closed*. This guarantees the key lock is held for the entire duration of the stream.
53
  d. **Failure**: If an exception occurs:
 
54
  - The exception is passed to `classify_error()` to get a structured `ClassifiedError` object.
55
+ - **Server Error**: If the error is temporary (e.g., 5xx), it waits with exponential backoff and retries the request with the *same key*.
56
+ - **Rotation Error (Rate Limit, Auth, etc.)**: For any other error, it's a trigger to rotate. `self.usage_manager.record_failure()` is called to apply a cooldown, and the lock is released. The inner `attempt` loop is broken, and the outer `while` loop continues, acquiring a new key.
57
 
58
+ ### 2.2. `usage_manager.py` - Stateful Concurrency & Usage Management
59
 
60
+ This class is the stateful core of the library, managing concurrency, usage, and cooldowns.
61
 
62
+ #### Key Concepts
63
 
64
+ * **Async-Native & Lazy-Loaded**: The class is fully asynchronous, using `aiofiles` for non-blocking file I/O. The usage data from the JSON file is loaded only when the first request is made (`_lazy_init`).
65
+ * **Fine-Grained Locking**: Each API key is associated with its own `asyncio.Lock` and `asyncio.Condition` object. This allows for a highly granular and efficient locking strategy.
 
 
66
 
67
+ #### Tiered Key Acquisition (`acquire_key`)
68
 
69
+ This method implements the intelligent logic for selecting the best key for a job.
70
 
71
  1. **Filtering**: It first filters out any keys that are on a global or model-specific cooldown.
72
  2. **Tiering**: It categorizes the remaining, valid keys into two tiers:
73
  - **Tier 1 (Ideal)**: Keys that are completely free (not being used by any model).
74
+ - **Tier 2 (Acceptable)**: Keys that are currently in use, but for *different models* than the one being requested. This allows a single key to be used for concurrent calls to, for example, `gemini-1.5-pro` and `gemini-1.5-flash`.
75
+ 3. **Selection**: It attempts to acquire a lock on a key, prioritizing Tier 1 over Tier 2. Within each tier, it prioritizes the key with the lowest usage count.
76
+ 4. **Waiting**: If no keys in Tier 1 or Tier 2 can be locked, it means all eligible keys are currently handling requests for the *same model*. The method then `await`s on the `asyncio.Condition` of the best available key, waiting efficiently until it is notified that a key has been released.
77
 
78
+ #### Failure Handling & Cooldowns (`record_failure`)
79
 
80
+ * **Escalating Backoff**: When a failure is recorded, it applies a cooldown that increases with the number of consecutive failures for that specific key-model pair (e.g., 10s, 30s, 60s, up to 2 hours).
81
+ * **Authentication Errors**: These are treated more severely, applying an immediate 5-minute key-level lockout.
82
+ * **Key-Level Lockouts**: If a single key accumulates 3 or more long-term (2-hour) cooldowns across different models, the manager assumes the key is compromised or disabled and applies a 5-minute global lockout on the key.
83
 
84
  ### Data Structure
85
 
 
115
 
116
  ## 3. `error_handler.py`
117
 
118
+ This module provides a centralized function, `classify_error`, which is a significant improvement over simple boolean checks.
119
+
120
+ * It takes a raw exception from `litellm` and returns a `ClassifiedError` data object.
121
+ * This object contains the `error_type` (e.g., `'rate_limit'`, `'authentication'`), the original exception, the status code, and any `retry_after` information extracted from the error message.
122
+ * This structured classification allows the `RotatingClient` to make more intelligent decisions about whether to retry with the same key or rotate to a new one.
123
+
124
+ ### 2.4. `providers/` - Provider Plugins
125
+
126
+ The provider plugin system allows for easy extension. The `__init__.py` file in this directory dynamically scans for all modules ending in `_provider.py`, imports the provider class from each, and registers it in the `PROVIDER_PLUGINS` dictionary. This makes adding new providers as simple as dropping a new file into the directory.
127
+
128
+ ---
129
+
130
+ ## 3. `proxy_app` - The FastAPI Proxy
131
+
132
+ The `proxy_app` directory contains the FastAPI application that serves the `rotator_library`.
133
+
134
+ ### 3.1. `main.py` - The FastAPI App
135
+
136
+ This file defines the web server and its endpoints.
137
+
138
+ #### Lifespan Management
139
 
140
+ The application uses FastAPI's `lifespan` context manager to manage the `RotatingClient` instance. The client is initialized when the application starts and gracefully closed (releasing its `httpx` resources) when the application shuts down. This ensures that a single, stateful client instance is shared across all requests.
 
 
141
 
142
+ #### Endpoints
143
 
144
+ * `POST /v1/chat/completions`: The main endpoint for chat requests.
145
+ * `POST /v1/embeddings`: The endpoint for creating embeddings.
146
+ * `GET /v1/models`: Returns a list of all available models from configured providers.
147
+ * `GET /v1/providers`: Returns a list of all configured providers.
148
+ * `POST /v1/token-count`: Calculates the token count for a given message payload.
149
 
150
+ #### Authentication
151
 
152
+ All endpoints are protected by the `verify_api_key` dependency, which checks for a valid `Authorization: Bearer <PROXY_API_KEY>` header.
153
 
154
+ #### Streaming Response Handling
155
 
156
+ For streaming requests, the `chat_completions` endpoint returns a `StreamingResponse` whose content is generated by the `streaming_response_wrapper` function. This wrapper serves two purposes:
157
+ 1. It passes the chunks from the `RotatingClient`'s stream directly to the user.
158
+ 2. It aggregates the full response in the background so that it can be logged completely once the stream is finished.
159
 
160
+ ### 3.2. `request_logger.py`
161
 
162
+ This module provides the `log_request_response` function, which writes the request and response data to a timestamped JSON file in the `logs/` directory. It handles creating separate directories for `completions` and `embeddings`.
163
 
164
+ ### 3.3. `build.py`
165
 
166
+ This is a utility script for creating a standalone executable of the proxy application using PyInstaller. It includes logic to dynamically find all provider plugins and explicitly include them as hidden imports, ensuring they are bundled into the final executable.
README.md CHANGED
@@ -15,10 +15,10 @@ Your proxy is now running! You can now use it in your applications.
15
 
16
  ## Detailed Setup and Features
17
 
18
- This project provides a robust solution for managing and rotating API keys for various Large Language Model (LLM) providers. It consists of two main components:
19
 
20
- 1. A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys.
21
- 2. A FastAPI proxy application that uses this library to provide an OpenAI-compatible endpoint.
22
 
23
  ## Features
24
 
@@ -31,15 +31,30 @@ This project provides a robust solution for managing and rotating API keys for v
31
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
32
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
33
 
34
- ## Quick Start Guide
 
 
 
 
 
 
 
 
 
35
 
36
- This guide will get you up and running in just a few minutes.
37
 
38
- ### 1. Setup
 
 
 
 
39
 
40
- First, clone the repository and install the required dependencies.
41
 
42
- **For Linux/macOS:**
 
 
43
  ```bash
44
  # Clone the repository
45
  git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
@@ -53,7 +68,7 @@ source venv/bin/activate
53
  pip install -r requirements.txt
54
  ```
55
 
56
- **For Windows:**
57
  ```powershell
58
  # Clone the repository
59
  git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
@@ -67,34 +82,32 @@ python -m venv venv
67
  pip install -r requirements.txt
68
  ```
69
 
70
- ### 2. Configure API Keys
71
 
72
- Next, create your `.env` file by copying the provided example. This file is where you will store all your secret keys.
73
 
74
- **For Linux/macOS:**
75
  ```bash
76
  cp .env.example .env
77
  ```
78
 
79
- **For Windows:**
80
  ```powershell
81
  copy .env.example .env
82
  ```
83
 
84
- Now, open the new `.env` file and replace the placeholder values with your actual API keys.
85
 
86
  **Refer to the `.env.example` file for the correct format and a full list of supported providers.**
87
 
88
- The two main types of keys are:
89
-
90
- 1. **`PROXY_API_KEY`**: This is a secret key *you create*. It is used to authorize requests to *your* proxy, preventing unauthorized use.
91
  2. **Provider Keys**: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g., `GEMINI_API_KEY_1`).
92
 
93
  **Example `.env` configuration:**
94
  ```env
95
  # A secret key for your proxy server to authenticate requests.
96
  # This can be any secret string you choose.
97
- PROXY_API_KEY="YOUR_PROXY_API_KEY"
98
 
99
  # --- Provider API Keys ---
100
  # Add your keys from various providers below.
@@ -153,9 +166,9 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
153
 
154
  ## Advanced Usage
155
 
156
- ### Using with the OpenAI Python Library
157
 
158
- The proxy is OpenAI-compatible, so you can use it directly with the `openai` Python client. This is the recommended way to integrate the proxy into your applications.
159
 
160
  ```python
161
  import openai
@@ -163,12 +176,12 @@ import openai
163
  # Point the client to your local proxy
164
  client = openai.OpenAI(
165
  base_url="http://127.0.0.1:8000/v1",
166
- api_key="your-super-secret-proxy-key" # Use your proxy key here
167
  )
168
 
169
  # Make a request
170
  response = client.chat.completions.create(
171
- model="gemini/gemini-2.5-flash-preview", # Specify provider and model
172
  messages=[
173
  {"role": "user", "content": "Write a short poem about space."}
174
  ]
@@ -177,6 +190,21 @@ response = client.chat.completions.create(
177
  print(response.choices[0].message.content)
178
  ```
179
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
  ### Available API Endpoints
181
 
182
  - `POST /v1/chat/completions`: The main endpoint for making chat requests.
@@ -185,6 +213,22 @@ print(response.choices[0].message.content)
185
  - `GET /v1/providers`: Returns a list of all configured providers.
186
  - `POST /v1/token-count`: Calculates the token count for a given message payload.
187
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
  ### Enabling Request Logging
189
 
190
  For debugging purposes, you can log the full request and response for every API call. To enable this, start the proxy with the `--enable-request-logging` flag:
@@ -199,25 +243,15 @@ uvicorn src.proxy_app.main:app --reload -- --enable-request-logging
199
  ./proxy_app.exe --enable-request-logging
200
  ```
201
 
202
- Logs will be saved in the `logs/` directory.
203
 
204
- ## How It Works
205
 
206
- The core of this project is the `RotatingClient` library, which manages a pool of API keys with a sophisticated concurrency model. When a request is made, the client:
 
 
207
 
208
- 1. **Acquires the Best Key**: It requests the best available key from the `UsageManager`. The manager uses a tiered locking strategy to find a key that is not on cooldown and preferably not in use. If a key is busy with another request for the *same model*, it waits. Otherwise, it allows concurrent use for *different models*.
209
- 2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
210
- 3. **Handles Errors**:
211
- - It uses a `classify_error` function to determine the failure type.
212
- - For **server errors**, it retries the request with the same key using exponential backoff.
213
- - For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
214
- 4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
215
-
216
- ## Troubleshooting
217
-
218
- - **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization` header of your request.
219
- - **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
220
- - **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
221
 
222
  ## Library and Technical Docs
223
 
 
15
 
16
  ## Detailed Setup and Features
17
 
18
+ This project provides a robust, self-hosted solution for managing and rotating API keys for various Large Language Model (LLM) providers. It consists of two main components:
19
 
20
+ 1. A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys with advanced concurrency and error handling.
21
+ 2. A FastAPI proxy application that uses this library to provide a single, unified, and OpenAI-compatible endpoint for all your LLM requests.
22
 
23
  ## Features
24
 
 
31
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
32
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
33
 
34
+ ---
35
+
36
+ ## 1. Quick Start (Windows Executable)
37
+
38
+ This is the fastest way to get started for most users on Windows.
39
+
40
+ 1. **Download the latest release** from the [GitHub Releases page](https://github.com/Mirrowel/LLM-API-Key-Proxy/releases/latest).
41
+ 2. Unzip the downloaded file.
42
+ 3. **Run `setup_env.bat`**. A window will open to help you add your API keys. Follow the on-screen instructions.
43
+ 4. **Run `proxy_app.exe`**. This will start the proxy server in a new terminal window.
44
 
45
+ Your proxy is now running and ready to use at `http://127.0.0.1:8000`.
46
 
47
+ ---
48
+
49
+ ## 2. Detailed Setup (From Source)
50
+
51
+ This guide is for users who want to run the proxy from the source code on any operating system.
52
 
53
+ ### Step 1: Clone and Install
54
 
55
+ First, clone the repository and install the required dependencies into a virtual environment.
56
+
57
+ **Linux/macOS:**
58
  ```bash
59
  # Clone the repository
60
  git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
 
68
  pip install -r requirements.txt
69
  ```
70
 
71
+ **Windows:**
72
  ```powershell
73
  # Clone the repository
74
  git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
 
82
  pip install -r requirements.txt
83
  ```
84
 
85
+ ### Step 2: Configure API Keys
86
 
87
+ Create a `.env` file to store your secret keys. You can do this by copying the example file.
88
 
89
+ **Linux/macOS:**
90
  ```bash
91
  cp .env.example .env
92
  ```
93
 
94
+ **Windows:**
95
  ```powershell
96
  copy .env.example .env
97
  ```
98
 
99
+ Now, open the new `.env` file and add your keys.
100
 
101
  **Refer to the `.env.example` file for the correct format and a full list of supported providers.**
102
 
103
+ 1. **`PROXY_API_KEY`**: This is a secret key **you create**. It is used to authorize requests to *your* proxy, preventing unauthorized use.
 
 
104
  2. **Provider Keys**: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g., `GEMINI_API_KEY_1`).
105
 
106
  **Example `.env` configuration:**
107
  ```env
108
  # A secret key for your proxy server to authenticate requests.
109
  # This can be any secret string you choose.
110
+ PROXY_API_KEY="a-very-secret-and-unique-key"
111
 
112
  # --- Provider API Keys ---
113
  # Add your keys from various providers below.
 
166
 
167
  ## Advanced Usage
168
 
169
+ ### Using with the OpenAI Python Library (Recommended)
170
 
171
+ The proxy is OpenAI-compatible, so you can use it directly with the `openai` Python client.
172
 
173
  ```python
174
  import openai
 
176
  # Point the client to your local proxy
177
  client = openai.OpenAI(
178
  base_url="http://127.0.0.1:8000/v1",
179
+ api_key="a-very-secret-and-unique-key" # Use your PROXY_API_KEY here
180
  )
181
 
182
  # Make a request
183
  response = client.chat.completions.create(
184
+ model="gemini/gemini-2.5-flash", # Specify provider and model
185
  messages=[
186
  {"role": "user", "content": "Write a short poem about space."}
187
  ]
 
190
  print(response.choices[0].message.content)
191
  ```
192
 
193
+ ### Using with `curl`
194
+
195
+ ```bash
196
+ You can also send requests directly using tools like `curl`.
197
+
198
+ ```bash
199
+ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
200
+ -H "Content-Type: application/json" \
201
+ -H "Authorization: Bearer a-very-secret-and-unique-key" \
202
+ -d '{
203
+ "model": "gemini/gemini-2.5-flash",
204
+ "messages": [{"role": "user", "content": "What is the capital of France?"}]
205
+ }'
206
+ ```
207
+
208
  ### Available API Endpoints
209
 
210
  - `POST /v1/chat/completions`: The main endpoint for making chat requests.
 
213
  - `GET /v1/providers`: Returns a list of all configured providers.
214
  - `POST /v1/token-count`: Calculates the token count for a given message payload.
215
 
216
+ ---
217
+
218
+ ## 4. Advanced Topics
219
+
220
+ ### How It Works
221
+
222
+ The core of this project is the `RotatingClient` library. When a request is made, the client:
223
+
224
+ 1. **Acquires the Best Key**: It requests the best available key from the `UsageManager`. The manager uses a tiered locking strategy to find a key that is not on cooldown and preferably not in use. If a key is busy with another request for the *same model*, it waits. Otherwise, it allows concurrent use for *different models*.
225
+ 2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
226
+ 3. **Handles Errors**:
227
+ - It uses a `classify_error` function to determine the failure type.
228
+ - For **server errors**, it retries the request with the same key using exponential backoff.
229
+ - For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
230
+ 4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
231
+
232
  ### Enabling Request Logging
233
 
234
  For debugging purposes, you can log the full request and response for every API call. To enable this, start the proxy with the `--enable-request-logging` flag:
 
243
  ./proxy_app.exe --enable-request-logging
244
  ```
245
 
246
+ Logs will be saved as JSON files in the `logs/` directory.
247
 
248
+ ### Troubleshooting
249
 
250
+ - **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization: Bearer <key>` header of your request.
251
+ - **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service.
252
+ - **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory (if enabled) or the `key_usage.json` file for details on why the failures occurred.
253
 
254
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
255
 
256
  ## Library and Technical Docs
257
 
src/rotator_library/README.md CHANGED
@@ -2,24 +2,26 @@
2
 
3
  A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
4
 
5
- ## Features
6
 
7
  - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
- - **Advanced Concurrency Control**: A single key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety.
9
- - **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy.
10
- - **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model.
11
- - **Automatic Retries**: Retries requests on transient server errors with exponential backoff.
12
- - **Detailed Usage Tracking**: Tracks daily and global usage for each key, including token counts and approximate cost.
13
- - **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily.
 
 
 
14
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
15
- - **Extensible**: Easily add support for new providers through a plugin-based architecture.
16
 
17
  ## Installation
18
 
19
- To install the library, you can install it directly from a local path, which is recommended for development.
20
 
21
  ```bash
22
- # The -e flag installs it in "editable" mode
23
  pip install -e .
24
  ```
25
 
@@ -31,11 +33,18 @@ This is the main class for interacting with the library. It is designed to be a
31
 
32
  ```python
33
  from rotating_api_key_client import RotatingClient
 
 
 
 
 
 
 
34
 
35
  client = RotatingClient(
36
- api_keys: Dict[str, List[str]],
37
- max_retries: int = 2,
38
- usage_file_path: str = "key_usage.json"
39
  )
40
  ```
41
 
@@ -45,19 +54,21 @@ client = RotatingClient(
45
 
46
  ### Concurrency and Resource Management
47
 
48
- The `RotatingClient` is asynchronous and manages an `httpx.AsyncClient` internally. It's crucial to close the client properly to release resources. This can be done manually or by using an `async with` block.
49
 
50
- **Manual Management:**
51
  ```python
52
- client = RotatingClient(api_keys=api_keys)
53
- # ... use the client ...
54
- await client.close()
55
- ```
56
 
57
- **Recommended (`async with`):**
58
- ```python
59
- async with RotatingClient(api_keys=api_keys) as client:
60
- # ... use the client ...
 
 
 
 
 
 
61
  ```
62
 
63
  ### Methods
@@ -71,24 +82,26 @@ This is the primary method for making API calls. It's a wrapper around `litellm.
71
  - For non-streaming requests, it returns the `litellm` response object.
72
  - For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE). The wrapper ensures that key locks are released and usage is recorded only after the stream is fully consumed.
73
 
74
- **Example:**
75
 
76
  ```python
77
- import asyncio
78
- from rotating_api_key_client import RotatingClient
79
-
80
- async def main():
81
- api_keys = {"gemini": ["key1", "key2"]}
82
  async with RotatingClient(api_keys=api_keys) as client:
83
- response = await client.acompletion(
84
- model="gemini/gemini-2.5-flash-preview-05-20",
85
- messages=[{"role": "user", "content": "Hello!"}]
 
86
  )
87
- print(response)
 
88
 
89
- asyncio.run(main())
90
  ```
91
 
 
 
 
 
92
  #### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
93
 
94
  Calculates the token count for a given text or list of messages using `litellm.token_counter`.
@@ -124,10 +137,13 @@ from typing import List
124
  import httpx
125
 
126
  class MyProvider(ProviderInterface):
127
- async def get_models(self, api_key: str, http_client: httpx.AsyncClient) -> List[str]:
128
  # Logic to fetch and return a list of model names
129
  # The model names should be prefixed with the provider name.
130
  # e.g., ["my-provider/model-1", "my-provider/model-2"]
 
 
 
131
  pass
132
  ```
133
 
 
2
 
3
  A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
4
 
5
+ ## Key Features
6
 
7
  - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
+ - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety. Requests for the *same model* using the same key are queued, preventing conflicts.
9
+ - **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy to distribute load evenly.
10
+ - **Intelligent Error Handling**:
11
+ - **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model, allowing it to continue being used for others.
12
+ - **Automatic Retries**: Retries requests on transient server errors (e.g., 5xx) with exponential backoff.
13
+ - **Key-Level Lockouts**: If a key fails across multiple models, it's temporarily taken out of rotation entirely.
14
+ - **Robust Streaming Support**: The client includes a wrapper for streaming responses that can reassemble fragmented JSON chunks and intelligently detect and handle errors that occur mid-stream.
15
+ - **Detailed Usage Tracking**: Tracks daily and global usage for each key, including token counts and approximate cost, persisted to a JSON file.
16
+ - **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily to keep the system running smoothly.
17
  - **Provider Agnostic**: Works with any provider supported by `litellm`.
18
+ - **Extensible**: Easily add support for new providers through a simple plugin-based architecture.
19
 
20
  ## Installation
21
 
22
+ To install the library, you can install it directly from a local path. Using the `-e` flag installs it in "editable" mode, which is recommended for development.
23
 
24
  ```bash
 
25
  pip install -e .
26
  ```
27
 
 
33
 
34
  ```python
35
  from rotating_api_key_client import RotatingClient
36
+ from typing import Dict, List
37
+
38
+ # Define your API keys, grouped by provider
39
+ api_keys: Dict[str, List[str]] = {
40
+ "gemini": ["your_gemini_key_1", "your_gemini_key_2"],
41
+ "openai": ["your_openai_key_1"],
42
+ }
43
 
44
  client = RotatingClient(
45
+ api_keys=api_keys,
46
+ max_retries=2,
47
+ usage_file_path="key_usage.json"
48
  )
49
  ```
50
 
 
54
 
55
  ### Concurrency and Resource Management
56
 
57
+ The `RotatingClient` is asynchronous and manages an `httpx.AsyncClient` internally. It's crucial to close the client properly to release resources. The recommended way is to use an `async with` block, which handles setup and teardown automatically.
58
 
 
59
  ```python
60
+ import asyncio
 
 
 
61
 
62
+ async def main():
63
+ async with RotatingClient(api_keys=api_keys) as client:
64
+ # ... use the client ...
65
+ response = await client.acompletion(
66
+ model="gemini/gemini-1.5-flash",
67
+ messages=[{"role": "user", "content": "Hello!"}]
68
+ )
69
+ print(response)
70
+
71
+ asyncio.run(main())
72
  ```
73
 
74
  ### Methods
 
82
  - For non-streaming requests, it returns the `litellm` response object.
83
  - For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE). The wrapper ensures that key locks are released and usage is recorded only after the stream is fully consumed.
84
 
85
+ **Streaming Example:**
86
 
87
  ```python
88
+ async def stream_example():
 
 
 
 
89
  async with RotatingClient(api_keys=api_keys) as client:
90
+ response_stream = await client.acompletion(
91
+ model="gemini/gemini-1.5-flash",
92
+ messages=[{"role": "user", "content": "Tell me a long story."}],
93
+ stream=True
94
  )
95
+ async for chunk in response_stream:
96
+ print(chunk)
97
 
98
+ asyncio.run(stream_example())
99
  ```
100
 
101
+ #### `async def aembedding(self, **kwargs) -> Any:`
102
+
103
+ A wrapper around `litellm.aembedding` that provides the same key rotation and retry logic for embedding requests.
104
+
105
  #### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
106
 
107
  Calculates the token count for a given text or list of messages using `litellm.token_counter`.
 
137
  import httpx
138
 
139
  class MyProvider(ProviderInterface):
140
+ async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
141
  # Logic to fetch and return a list of model names
142
  # The model names should be prefixed with the provider name.
143
  # e.g., ["my-provider/model-1", "my-provider/model-2"]
144
+ # Example:
145
+ # response = await client.get("https://api.myprovider.com/models", headers={"Auth": api_key})
146
+ # return [f"my-provider/{model['id']}" for model in response.json()]
147
  pass
148
  ```
149