Spaces:
Paused
Paused
Mirrowel
commited on
Commit
·
7ba1fcd
1
Parent(s):
d195a5f
docs: Big documentation update part
Browse files- DOCUMENTATION.md +93 -58
- README.md +72 -38
- src/rotator_library/README.md +51 -35
DOCUMENTATION.md
CHANGED
|
@@ -1,73 +1,85 @@
|
|
| 1 |
-
# Technical Documentation:
|
| 2 |
|
| 3 |
-
This document provides a detailed technical explanation of the `rotating-api-key-client` library,
|
| 4 |
|
| 5 |
-
## 1.
|
| 6 |
|
| 7 |
-
The
|
| 8 |
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
- Interfacing with the `UsageManager` to acquire and release API keys.
|
| 12 |
-
- Handling provider-specific request modifications.
|
| 13 |
-
- Executing API calls via `litellm` with a robust retry and rotation strategy.
|
| 14 |
-
- Providing a safe wrapper for streaming responses.
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
| 23 |
-
a. **Acquire Best Key**: It calls `self.usage_manager.acquire_key()`. This is a blocking call that waits until a suitable key is available, based on the manager's tiered locking strategy (see `UsageManager` section).
|
| 24 |
-
b. **Prepare Request**: It prepares the `litellm` keyword arguments. This includes:
|
| 25 |
-
- **Request Sanitization**: Calling `sanitize_request_payload()` to remove parameters that might be unsupported by the target model, preventing errors.
|
| 26 |
-
- **Provider-Specific Logic**: Applying special handling for providers like Gemini (safety settings), Gemma (system prompts), and Chutes.ai (`api_base` and model name remapping).
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
b. **Success (Non-Streaming)**:
|
| 31 |
-
- It calls `self.usage_manager.record_success()` to update usage stats and clear any cooldowns
|
| 32 |
-
- It calls `self.usage_manager.release_key()` to release the lock
|
| 33 |
- It returns the response, and the process ends.
|
| 34 |
c. **Success (Streaming)**:
|
| 35 |
-
- It returns
|
| 36 |
- It yields SSE-formatted chunks to the consumer.
|
| 37 |
-
-
|
|
|
|
| 38 |
d. **Failure**: If an exception occurs:
|
| 39 |
-
- The failure is logged in detail by `log_failure()`.
|
| 40 |
- The exception is passed to `classify_error()` to get a structured `ClassifiedError` object.
|
| 41 |
-
- **Server Error**: If the error
|
| 42 |
-
- **Rotation Error (Rate Limit, Auth, etc.)**: For any other error, it's
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
This class is the
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
- **`filelock`**: A file-level lock (`.json.lock`) prevents race conditions if multiple *processes* are running and sharing the same usage file.
|
| 53 |
-
- **`asyncio.Lock` & `asyncio.Condition`**: Each key has its own `asyncio.Lock` and `asyncio.Condition` object. This enables the fine-grained, model-aware locking strategy.
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
-
This method implements the
|
| 58 |
|
| 59 |
1. **Filtering**: It first filters out any keys that are on a global or model-specific cooldown.
|
| 60 |
2. **Tiering**: It categorizes the remaining, valid keys into two tiers:
|
| 61 |
- **Tier 1 (Ideal)**: Keys that are completely free (not being used by any model).
|
| 62 |
-
- **Tier 2 (Acceptable)**: Keys that are currently in use, but for *different models* than the one being requested.
|
| 63 |
-
3. **Selection**: It attempts to acquire a lock on a key, prioritizing Tier 1 over Tier 2. Within each tier, it prioritizes the
|
| 64 |
-
4. **Waiting**: If no keys in Tier 1 or Tier 2 can be locked, it means all eligible keys are currently handling requests for the *same model*. The method then `await`s on the `asyncio.Condition` of the best available key, waiting until it is notified that
|
| 65 |
|
| 66 |
-
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
|
| 72 |
### Data Structure
|
| 73 |
|
|
@@ -103,29 +115,52 @@ The `key_usage.json` file has a more complex structure to store this detailed st
|
|
| 103 |
|
| 104 |
## 3. `error_handler.py`
|
| 105 |
|
| 106 |
-
This module provides a centralized function, `classify_error`, which is a significant improvement over
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
-
|
| 109 |
-
- This object contains the `error_type` (e.g., `'rate_limit'`, `'authentication'`, `'server_error'`), the original exception, the status code, and any `retry_after` information extracted from the error message.
|
| 110 |
-
- This structured classification allows the `RotatingClient` to make more intelligent decisions about whether to retry with the same key or rotate to a new one.
|
| 111 |
|
| 112 |
-
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
| 118 |
|
| 119 |
-
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
| 124 |
|
| 125 |
-
### `
|
| 126 |
|
| 127 |
-
This
|
| 128 |
|
| 129 |
-
|
| 130 |
|
| 131 |
-
|
|
|
|
| 1 |
+
# Technical Documentation: API Key Proxy & Rotator Library
|
| 2 |
|
| 3 |
+
This document provides a detailed technical explanation of the API Key Proxy and the `rotating-api-key-client` library, covering their architecture, components, and internal workings.
|
| 4 |
|
| 5 |
+
## 1. Architecture Overview
|
| 6 |
|
| 7 |
+
The project is a monorepo containing two primary components:
|
| 8 |
|
| 9 |
+
1. **`rotator_library`**: A standalone, reusable Python library for intelligent API key rotation and management.
|
| 10 |
+
2. **`proxy_app`**: A FastAPI application that consumes the `rotator_library` and exposes its functionality through an OpenAI-compatible web API.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
+
This architecture separates the core rotation logic from the web-serving layer, making the library portable and the proxy a clean implementation of its features.
|
| 13 |
|
| 14 |
+
---
|
| 15 |
|
| 16 |
+
## 2. `rotator_library` - The Core Engine
|
| 17 |
|
| 18 |
+
This library is the heart of the project, containing all the logic for key rotation, usage tracking, and provider management.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
### 2.1. `client.py` - The `RotatingClient`
|
| 21 |
+
|
| 22 |
+
The `RotatingClient` is the central class that orchestrates all operations. It is designed as a long-lived, async-native object.
|
| 23 |
+
|
| 24 |
+
#### Core Responsibilities
|
| 25 |
+
|
| 26 |
+
* Managing a shared `httpx.AsyncClient` for all non-blocking HTTP requests.
|
| 27 |
+
* Interfacing with the `UsageManager` to acquire and release API keys.
|
| 28 |
+
* Dynamically loading and using provider-specific plugins from the `providers/` directory.
|
| 29 |
+
* Executing API calls via `litellm` with a robust retry and rotation strategy.
|
| 30 |
+
* Providing a safe, stateful wrapper for handling streaming responses.
|
| 31 |
+
|
| 32 |
+
#### Request Lifecycle (`acompletion` & `aembedding`)
|
| 33 |
+
|
| 34 |
+
When `acompletion` or `aembedding` is called, it follows a sophisticated, multi-layered process:
|
| 35 |
+
|
| 36 |
+
1. **Provider & Key Validation**: It extracts the provider from the `model` name (e.g., `"gemini/gemini-1.5-pro"` -> `"gemini"`) and ensures keys are configured for it.
|
| 37 |
+
|
| 38 |
+
2. **Key Acquisition Loop**: The client enters a `while` loop that attempts to find a valid key and complete the request. It iterates until one key succeeds or all have been tried.
|
| 39 |
+
a. **Acquire Best Key**: It calls `self.usage_manager.acquire_key()`. This is a crucial, potentially blocking call that waits until a suitable key is available, based on the manager's tiered locking strategy (see `UsageManager` section).
|
| 40 |
+
b. **Prepare Request**: It prepares the `litellm` keyword arguments. This includes applying provider-specific logic (e.g., remapping safety settings for Gemini, handling `api_base` for Chutes.ai) and sanitizing the payload to remove unsupported parameters.
|
| 41 |
+
|
| 42 |
+
3. **Retry Loop**: Once a key is acquired, it enters an inner `for` loop (`for attempt in range(self.max_retries)`):
|
| 43 |
+
a. **API Call**: It calls `litellm.acompletion` or `litellm.aembedding`.
|
| 44 |
b. **Success (Non-Streaming)**:
|
| 45 |
+
- It calls `self.usage_manager.record_success()` to update usage stats and clear any cooldowns.
|
| 46 |
+
- It calls `self.usage_manager.release_key()` to release the lock.
|
| 47 |
- It returns the response, and the process ends.
|
| 48 |
c. **Success (Streaming)**:
|
| 49 |
+
- It returns the `_safe_streaming_wrapper` async generator. This wrapper is critical:
|
| 50 |
- It yields SSE-formatted chunks to the consumer.
|
| 51 |
+
- It can reassemble fragmented JSON chunks and detect errors mid-stream.
|
| 52 |
+
- Its `finally` block ensures that `record_success()` and `release_key()` are called *only after the stream is fully consumed or closed*. This guarantees the key lock is held for the entire duration of the stream.
|
| 53 |
d. **Failure**: If an exception occurs:
|
|
|
|
| 54 |
- The exception is passed to `classify_error()` to get a structured `ClassifiedError` object.
|
| 55 |
+
- **Server Error**: If the error is temporary (e.g., 5xx), it waits with exponential backoff and retries the request with the *same key*.
|
| 56 |
+
- **Rotation Error (Rate Limit, Auth, etc.)**: For any other error, it's a trigger to rotate. `self.usage_manager.record_failure()` is called to apply a cooldown, and the lock is released. The inner `attempt` loop is broken, and the outer `while` loop continues, acquiring a new key.
|
| 57 |
|
| 58 |
+
### 2.2. `usage_manager.py` - Stateful Concurrency & Usage Management
|
| 59 |
|
| 60 |
+
This class is the stateful core of the library, managing concurrency, usage, and cooldowns.
|
| 61 |
|
| 62 |
+
#### Key Concepts
|
| 63 |
|
| 64 |
+
* **Async-Native & Lazy-Loaded**: The class is fully asynchronous, using `aiofiles` for non-blocking file I/O. The usage data from the JSON file is loaded only when the first request is made (`_lazy_init`).
|
| 65 |
+
* **Fine-Grained Locking**: Each API key is associated with its own `asyncio.Lock` and `asyncio.Condition` object. This allows for a highly granular and efficient locking strategy.
|
|
|
|
|
|
|
| 66 |
|
| 67 |
+
#### Tiered Key Acquisition (`acquire_key`)
|
| 68 |
|
| 69 |
+
This method implements the intelligent logic for selecting the best key for a job.
|
| 70 |
|
| 71 |
1. **Filtering**: It first filters out any keys that are on a global or model-specific cooldown.
|
| 72 |
2. **Tiering**: It categorizes the remaining, valid keys into two tiers:
|
| 73 |
- **Tier 1 (Ideal)**: Keys that are completely free (not being used by any model).
|
| 74 |
+
- **Tier 2 (Acceptable)**: Keys that are currently in use, but for *different models* than the one being requested. This allows a single key to be used for concurrent calls to, for example, `gemini-1.5-pro` and `gemini-1.5-flash`.
|
| 75 |
+
3. **Selection**: It attempts to acquire a lock on a key, prioritizing Tier 1 over Tier 2. Within each tier, it prioritizes the key with the lowest usage count.
|
| 76 |
+
4. **Waiting**: If no keys in Tier 1 or Tier 2 can be locked, it means all eligible keys are currently handling requests for the *same model*. The method then `await`s on the `asyncio.Condition` of the best available key, waiting efficiently until it is notified that a key has been released.
|
| 77 |
|
| 78 |
+
#### Failure Handling & Cooldowns (`record_failure`)
|
| 79 |
|
| 80 |
+
* **Escalating Backoff**: When a failure is recorded, it applies a cooldown that increases with the number of consecutive failures for that specific key-model pair (e.g., 10s, 30s, 60s, up to 2 hours).
|
| 81 |
+
* **Authentication Errors**: These are treated more severely, applying an immediate 5-minute key-level lockout.
|
| 82 |
+
* **Key-Level Lockouts**: If a single key accumulates 3 or more long-term (2-hour) cooldowns across different models, the manager assumes the key is compromised or disabled and applies a 5-minute global lockout on the key.
|
| 83 |
|
| 84 |
### Data Structure
|
| 85 |
|
|
|
|
| 115 |
|
| 116 |
## 3. `error_handler.py`
|
| 117 |
|
| 118 |
+
This module provides a centralized function, `classify_error`, which is a significant improvement over simple boolean checks.
|
| 119 |
+
|
| 120 |
+
* It takes a raw exception from `litellm` and returns a `ClassifiedError` data object.
|
| 121 |
+
* This object contains the `error_type` (e.g., `'rate_limit'`, `'authentication'`), the original exception, the status code, and any `retry_after` information extracted from the error message.
|
| 122 |
+
* This structured classification allows the `RotatingClient` to make more intelligent decisions about whether to retry with the same key or rotate to a new one.
|
| 123 |
+
|
| 124 |
+
### 2.4. `providers/` - Provider Plugins
|
| 125 |
+
|
| 126 |
+
The provider plugin system allows for easy extension. The `__init__.py` file in this directory dynamically scans for all modules ending in `_provider.py`, imports the provider class from each, and registers it in the `PROVIDER_PLUGINS` dictionary. This makes adding new providers as simple as dropping a new file into the directory.
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
## 3. `proxy_app` - The FastAPI Proxy
|
| 131 |
+
|
| 132 |
+
The `proxy_app` directory contains the FastAPI application that serves the `rotator_library`.
|
| 133 |
+
|
| 134 |
+
### 3.1. `main.py` - The FastAPI App
|
| 135 |
+
|
| 136 |
+
This file defines the web server and its endpoints.
|
| 137 |
+
|
| 138 |
+
#### Lifespan Management
|
| 139 |
|
| 140 |
+
The application uses FastAPI's `lifespan` context manager to manage the `RotatingClient` instance. The client is initialized when the application starts and gracefully closed (releasing its `httpx` resources) when the application shuts down. This ensures that a single, stateful client instance is shared across all requests.
|
|
|
|
|
|
|
| 141 |
|
| 142 |
+
#### Endpoints
|
| 143 |
|
| 144 |
+
* `POST /v1/chat/completions`: The main endpoint for chat requests.
|
| 145 |
+
* `POST /v1/embeddings`: The endpoint for creating embeddings.
|
| 146 |
+
* `GET /v1/models`: Returns a list of all available models from configured providers.
|
| 147 |
+
* `GET /v1/providers`: Returns a list of all configured providers.
|
| 148 |
+
* `POST /v1/token-count`: Calculates the token count for a given message payload.
|
| 149 |
|
| 150 |
+
#### Authentication
|
| 151 |
|
| 152 |
+
All endpoints are protected by the `verify_api_key` dependency, which checks for a valid `Authorization: Bearer <PROXY_API_KEY>` header.
|
| 153 |
|
| 154 |
+
#### Streaming Response Handling
|
| 155 |
|
| 156 |
+
For streaming requests, the `chat_completions` endpoint returns a `StreamingResponse` whose content is generated by the `streaming_response_wrapper` function. This wrapper serves two purposes:
|
| 157 |
+
1. It passes the chunks from the `RotatingClient`'s stream directly to the user.
|
| 158 |
+
2. It aggregates the full response in the background so that it can be logged completely once the stream is finished.
|
| 159 |
|
| 160 |
+
### 3.2. `request_logger.py`
|
| 161 |
|
| 162 |
+
This module provides the `log_request_response` function, which writes the request and response data to a timestamped JSON file in the `logs/` directory. It handles creating separate directories for `completions` and `embeddings`.
|
| 163 |
|
| 164 |
+
### 3.3. `build.py`
|
| 165 |
|
| 166 |
+
This is a utility script for creating a standalone executable of the proxy application using PyInstaller. It includes logic to dynamically find all provider plugins and explicitly include them as hidden imports, ensuring they are bundled into the final executable.
|
README.md
CHANGED
|
@@ -15,10 +15,10 @@ Your proxy is now running! You can now use it in your applications.
|
|
| 15 |
|
| 16 |
## Detailed Setup and Features
|
| 17 |
|
| 18 |
-
This project provides a robust solution for managing and rotating API keys for various Large Language Model (LLM) providers. It consists of two main components:
|
| 19 |
|
| 20 |
-
1. A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys.
|
| 21 |
-
2. A FastAPI proxy application that uses this library to provide
|
| 22 |
|
| 23 |
## Features
|
| 24 |
|
|
@@ -31,15 +31,30 @@ This project provides a robust solution for managing and rotating API keys for v
|
|
| 31 |
- **Provider Agnostic**: Compatible with any provider supported by `litellm`.
|
| 32 |
- **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
| 43 |
```bash
|
| 44 |
# Clone the repository
|
| 45 |
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
|
|
@@ -53,7 +68,7 @@ source venv/bin/activate
|
|
| 53 |
pip install -r requirements.txt
|
| 54 |
```
|
| 55 |
|
| 56 |
-
**
|
| 57 |
```powershell
|
| 58 |
# Clone the repository
|
| 59 |
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
|
|
@@ -67,34 +82,32 @@ python -m venv venv
|
|
| 67 |
pip install -r requirements.txt
|
| 68 |
```
|
| 69 |
|
| 70 |
-
### 2
|
| 71 |
|
| 72 |
-
|
| 73 |
|
| 74 |
-
**
|
| 75 |
```bash
|
| 76 |
cp .env.example .env
|
| 77 |
```
|
| 78 |
|
| 79 |
-
**
|
| 80 |
```powershell
|
| 81 |
copy .env.example .env
|
| 82 |
```
|
| 83 |
|
| 84 |
-
Now, open the new `.env` file and
|
| 85 |
|
| 86 |
**Refer to the `.env.example` file for the correct format and a full list of supported providers.**
|
| 87 |
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
1. **`PROXY_API_KEY`**: This is a secret key *you create*. It is used to authorize requests to *your* proxy, preventing unauthorized use.
|
| 91 |
2. **Provider Keys**: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g., `GEMINI_API_KEY_1`).
|
| 92 |
|
| 93 |
**Example `.env` configuration:**
|
| 94 |
```env
|
| 95 |
# A secret key for your proxy server to authenticate requests.
|
| 96 |
# This can be any secret string you choose.
|
| 97 |
-
PROXY_API_KEY="
|
| 98 |
|
| 99 |
# --- Provider API Keys ---
|
| 100 |
# Add your keys from various providers below.
|
|
@@ -153,9 +166,9 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
|
|
| 153 |
|
| 154 |
## Advanced Usage
|
| 155 |
|
| 156 |
-
### Using with the OpenAI Python Library
|
| 157 |
|
| 158 |
-
The proxy is OpenAI-compatible, so you can use it directly with the `openai` Python client.
|
| 159 |
|
| 160 |
```python
|
| 161 |
import openai
|
|
@@ -163,12 +176,12 @@ import openai
|
|
| 163 |
# Point the client to your local proxy
|
| 164 |
client = openai.OpenAI(
|
| 165 |
base_url="http://127.0.0.1:8000/v1",
|
| 166 |
-
api_key="
|
| 167 |
)
|
| 168 |
|
| 169 |
# Make a request
|
| 170 |
response = client.chat.completions.create(
|
| 171 |
-
model="gemini/gemini-2.5-flash
|
| 172 |
messages=[
|
| 173 |
{"role": "user", "content": "Write a short poem about space."}
|
| 174 |
]
|
|
@@ -177,6 +190,21 @@ response = client.chat.completions.create(
|
|
| 177 |
print(response.choices[0].message.content)
|
| 178 |
```
|
| 179 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
### Available API Endpoints
|
| 181 |
|
| 182 |
- `POST /v1/chat/completions`: The main endpoint for making chat requests.
|
|
@@ -185,6 +213,22 @@ print(response.choices[0].message.content)
|
|
| 185 |
- `GET /v1/providers`: Returns a list of all configured providers.
|
| 186 |
- `POST /v1/token-count`: Calculates the token count for a given message payload.
|
| 187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
### Enabling Request Logging
|
| 189 |
|
| 190 |
For debugging purposes, you can log the full request and response for every API call. To enable this, start the proxy with the `--enable-request-logging` flag:
|
|
@@ -199,25 +243,15 @@ uvicorn src.proxy_app.main:app --reload -- --enable-request-logging
|
|
| 199 |
./proxy_app.exe --enable-request-logging
|
| 200 |
```
|
| 201 |
|
| 202 |
-
Logs will be saved in the `logs/` directory.
|
| 203 |
|
| 204 |
-
|
| 205 |
|
| 206 |
-
|
|
|
|
|
|
|
| 207 |
|
| 208 |
-
|
| 209 |
-
2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
|
| 210 |
-
3. **Handles Errors**:
|
| 211 |
-
- It uses a `classify_error` function to determine the failure type.
|
| 212 |
-
- For **server errors**, it retries the request with the same key using exponential backoff.
|
| 213 |
-
- For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
|
| 214 |
-
4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
|
| 215 |
-
|
| 216 |
-
## Troubleshooting
|
| 217 |
-
|
| 218 |
-
- **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization` header of your request.
|
| 219 |
-
- **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
|
| 220 |
-
- **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
|
| 221 |
|
| 222 |
## Library and Technical Docs
|
| 223 |
|
|
|
|
| 15 |
|
| 16 |
## Detailed Setup and Features
|
| 17 |
|
| 18 |
+
This project provides a robust, self-hosted solution for managing and rotating API keys for various Large Language Model (LLM) providers. It consists of two main components:
|
| 19 |
|
| 20 |
+
1. A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys with advanced concurrency and error handling.
|
| 21 |
+
2. A FastAPI proxy application that uses this library to provide a single, unified, and OpenAI-compatible endpoint for all your LLM requests.
|
| 22 |
|
| 23 |
## Features
|
| 24 |
|
|
|
|
| 31 |
- **Provider Agnostic**: Compatible with any provider supported by `litellm`.
|
| 32 |
- **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
|
| 33 |
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## 1. Quick Start (Windows Executable)
|
| 37 |
+
|
| 38 |
+
This is the fastest way to get started for most users on Windows.
|
| 39 |
+
|
| 40 |
+
1. **Download the latest release** from the [GitHub Releases page](https://github.com/Mirrowel/LLM-API-Key-Proxy/releases/latest).
|
| 41 |
+
2. Unzip the downloaded file.
|
| 42 |
+
3. **Run `setup_env.bat`**. A window will open to help you add your API keys. Follow the on-screen instructions.
|
| 43 |
+
4. **Run `proxy_app.exe`**. This will start the proxy server in a new terminal window.
|
| 44 |
|
| 45 |
+
Your proxy is now running and ready to use at `http://127.0.0.1:8000`.
|
| 46 |
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## 2. Detailed Setup (From Source)
|
| 50 |
+
|
| 51 |
+
This guide is for users who want to run the proxy from the source code on any operating system.
|
| 52 |
|
| 53 |
+
### Step 1: Clone and Install
|
| 54 |
|
| 55 |
+
First, clone the repository and install the required dependencies into a virtual environment.
|
| 56 |
+
|
| 57 |
+
**Linux/macOS:**
|
| 58 |
```bash
|
| 59 |
# Clone the repository
|
| 60 |
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
|
|
|
|
| 68 |
pip install -r requirements.txt
|
| 69 |
```
|
| 70 |
|
| 71 |
+
**Windows:**
|
| 72 |
```powershell
|
| 73 |
# Clone the repository
|
| 74 |
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
|
|
|
|
| 82 |
pip install -r requirements.txt
|
| 83 |
```
|
| 84 |
|
| 85 |
+
### Step 2: Configure API Keys
|
| 86 |
|
| 87 |
+
Create a `.env` file to store your secret keys. You can do this by copying the example file.
|
| 88 |
|
| 89 |
+
**Linux/macOS:**
|
| 90 |
```bash
|
| 91 |
cp .env.example .env
|
| 92 |
```
|
| 93 |
|
| 94 |
+
**Windows:**
|
| 95 |
```powershell
|
| 96 |
copy .env.example .env
|
| 97 |
```
|
| 98 |
|
| 99 |
+
Now, open the new `.env` file and add your keys.
|
| 100 |
|
| 101 |
**Refer to the `.env.example` file for the correct format and a full list of supported providers.**
|
| 102 |
|
| 103 |
+
1. **`PROXY_API_KEY`**: This is a secret key **you create**. It is used to authorize requests to *your* proxy, preventing unauthorized use.
|
|
|
|
|
|
|
| 104 |
2. **Provider Keys**: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g., `GEMINI_API_KEY_1`).
|
| 105 |
|
| 106 |
**Example `.env` configuration:**
|
| 107 |
```env
|
| 108 |
# A secret key for your proxy server to authenticate requests.
|
| 109 |
# This can be any secret string you choose.
|
| 110 |
+
PROXY_API_KEY="a-very-secret-and-unique-key"
|
| 111 |
|
| 112 |
# --- Provider API Keys ---
|
| 113 |
# Add your keys from various providers below.
|
|
|
|
| 166 |
|
| 167 |
## Advanced Usage
|
| 168 |
|
| 169 |
+
### Using with the OpenAI Python Library (Recommended)
|
| 170 |
|
| 171 |
+
The proxy is OpenAI-compatible, so you can use it directly with the `openai` Python client.
|
| 172 |
|
| 173 |
```python
|
| 174 |
import openai
|
|
|
|
| 176 |
# Point the client to your local proxy
|
| 177 |
client = openai.OpenAI(
|
| 178 |
base_url="http://127.0.0.1:8000/v1",
|
| 179 |
+
api_key="a-very-secret-and-unique-key" # Use your PROXY_API_KEY here
|
| 180 |
)
|
| 181 |
|
| 182 |
# Make a request
|
| 183 |
response = client.chat.completions.create(
|
| 184 |
+
model="gemini/gemini-2.5-flash", # Specify provider and model
|
| 185 |
messages=[
|
| 186 |
{"role": "user", "content": "Write a short poem about space."}
|
| 187 |
]
|
|
|
|
| 190 |
print(response.choices[0].message.content)
|
| 191 |
```
|
| 192 |
|
| 193 |
+
### Using with `curl`
|
| 194 |
+
|
| 195 |
+
```bash
|
| 196 |
+
You can also send requests directly using tools like `curl`.
|
| 197 |
+
|
| 198 |
+
```bash
|
| 199 |
+
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
|
| 200 |
+
-H "Content-Type: application/json" \
|
| 201 |
+
-H "Authorization: Bearer a-very-secret-and-unique-key" \
|
| 202 |
+
-d '{
|
| 203 |
+
"model": "gemini/gemini-2.5-flash",
|
| 204 |
+
"messages": [{"role": "user", "content": "What is the capital of France?"}]
|
| 205 |
+
}'
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
### Available API Endpoints
|
| 209 |
|
| 210 |
- `POST /v1/chat/completions`: The main endpoint for making chat requests.
|
|
|
|
| 213 |
- `GET /v1/providers`: Returns a list of all configured providers.
|
| 214 |
- `POST /v1/token-count`: Calculates the token count for a given message payload.
|
| 215 |
|
| 216 |
+
---
|
| 217 |
+
|
| 218 |
+
## 4. Advanced Topics
|
| 219 |
+
|
| 220 |
+
### How It Works
|
| 221 |
+
|
| 222 |
+
The core of this project is the `RotatingClient` library. When a request is made, the client:
|
| 223 |
+
|
| 224 |
+
1. **Acquires the Best Key**: It requests the best available key from the `UsageManager`. The manager uses a tiered locking strategy to find a key that is not on cooldown and preferably not in use. If a key is busy with another request for the *same model*, it waits. Otherwise, it allows concurrent use for *different models*.
|
| 225 |
+
2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
|
| 226 |
+
3. **Handles Errors**:
|
| 227 |
+
- It uses a `classify_error` function to determine the failure type.
|
| 228 |
+
- For **server errors**, it retries the request with the same key using exponential backoff.
|
| 229 |
+
- For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
|
| 230 |
+
4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
|
| 231 |
+
|
| 232 |
### Enabling Request Logging
|
| 233 |
|
| 234 |
For debugging purposes, you can log the full request and response for every API call. To enable this, start the proxy with the `--enable-request-logging` flag:
|
|
|
|
| 243 |
./proxy_app.exe --enable-request-logging
|
| 244 |
```
|
| 245 |
|
| 246 |
+
Logs will be saved as JSON files in the `logs/` directory.
|
| 247 |
|
| 248 |
+
### Troubleshooting
|
| 249 |
|
| 250 |
+
- **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization: Bearer <key>` header of your request.
|
| 251 |
+
- **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service.
|
| 252 |
+
- **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory (if enabled) or the `key_usage.json` file for details on why the failures occurred.
|
| 253 |
|
| 254 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 255 |
|
| 256 |
## Library and Technical Docs
|
| 257 |
|
src/rotator_library/README.md
CHANGED
|
@@ -2,24 +2,26 @@
|
|
| 2 |
|
| 3 |
A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
|
| 4 |
|
| 5 |
-
## Features
|
| 6 |
|
| 7 |
- **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
|
| 8 |
-
- **Advanced Concurrency Control**: A single key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety.
|
| 9 |
-
- **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy.
|
| 10 |
-
- **
|
| 11 |
-
- **
|
| 12 |
-
- **
|
| 13 |
-
- **
|
|
|
|
|
|
|
|
|
|
| 14 |
- **Provider Agnostic**: Works with any provider supported by `litellm`.
|
| 15 |
-
- **Extensible**: Easily add support for new providers through a plugin-based architecture.
|
| 16 |
|
| 17 |
## Installation
|
| 18 |
|
| 19 |
-
To install the library, you can install it directly from a local path, which is recommended for development.
|
| 20 |
|
| 21 |
```bash
|
| 22 |
-
# The -e flag installs it in "editable" mode
|
| 23 |
pip install -e .
|
| 24 |
```
|
| 25 |
|
|
@@ -31,11 +33,18 @@ This is the main class for interacting with the library. It is designed to be a
|
|
| 31 |
|
| 32 |
```python
|
| 33 |
from rotating_api_key_client import RotatingClient
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
client = RotatingClient(
|
| 36 |
-
api_keys
|
| 37 |
-
max_retries
|
| 38 |
-
usage_file_path
|
| 39 |
)
|
| 40 |
```
|
| 41 |
|
|
@@ -45,19 +54,21 @@ client = RotatingClient(
|
|
| 45 |
|
| 46 |
### Concurrency and Resource Management
|
| 47 |
|
| 48 |
-
The `RotatingClient` is asynchronous and manages an `httpx.AsyncClient` internally. It's crucial to close the client properly to release resources.
|
| 49 |
|
| 50 |
-
**Manual Management:**
|
| 51 |
```python
|
| 52 |
-
|
| 53 |
-
# ... use the client ...
|
| 54 |
-
await client.close()
|
| 55 |
-
```
|
| 56 |
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
```
|
| 62 |
|
| 63 |
### Methods
|
|
@@ -71,24 +82,26 @@ This is the primary method for making API calls. It's a wrapper around `litellm.
|
|
| 71 |
- For non-streaming requests, it returns the `litellm` response object.
|
| 72 |
- For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE). The wrapper ensures that key locks are released and usage is recorded only after the stream is fully consumed.
|
| 73 |
|
| 74 |
-
**Example:**
|
| 75 |
|
| 76 |
```python
|
| 77 |
-
|
| 78 |
-
from rotating_api_key_client import RotatingClient
|
| 79 |
-
|
| 80 |
-
async def main():
|
| 81 |
-
api_keys = {"gemini": ["key1", "key2"]}
|
| 82 |
async with RotatingClient(api_keys=api_keys) as client:
|
| 83 |
-
|
| 84 |
-
model="gemini/gemini-
|
| 85 |
-
messages=[{"role": "user", "content": "
|
|
|
|
| 86 |
)
|
| 87 |
-
|
|
|
|
| 88 |
|
| 89 |
-
asyncio.run(
|
| 90 |
```
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
|
| 93 |
|
| 94 |
Calculates the token count for a given text or list of messages using `litellm.token_counter`.
|
|
@@ -124,10 +137,13 @@ from typing import List
|
|
| 124 |
import httpx
|
| 125 |
|
| 126 |
class MyProvider(ProviderInterface):
|
| 127 |
-
async def get_models(self, api_key: str,
|
| 128 |
# Logic to fetch and return a list of model names
|
| 129 |
# The model names should be prefixed with the provider name.
|
| 130 |
# e.g., ["my-provider/model-1", "my-provider/model-2"]
|
|
|
|
|
|
|
|
|
|
| 131 |
pass
|
| 132 |
```
|
| 133 |
|
|
|
|
| 2 |
|
| 3 |
A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
|
| 4 |
|
| 5 |
+
## Key Features
|
| 6 |
|
| 7 |
- **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
|
| 8 |
+
- **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety. Requests for the *same model* using the same key are queued, preventing conflicts.
|
| 9 |
+
- **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy to distribute load evenly.
|
| 10 |
+
- **Intelligent Error Handling**:
|
| 11 |
+
- **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model, allowing it to continue being used for others.
|
| 12 |
+
- **Automatic Retries**: Retries requests on transient server errors (e.g., 5xx) with exponential backoff.
|
| 13 |
+
- **Key-Level Lockouts**: If a key fails across multiple models, it's temporarily taken out of rotation entirely.
|
| 14 |
+
- **Robust Streaming Support**: The client includes a wrapper for streaming responses that can reassemble fragmented JSON chunks and intelligently detect and handle errors that occur mid-stream.
|
| 15 |
+
- **Detailed Usage Tracking**: Tracks daily and global usage for each key, including token counts and approximate cost, persisted to a JSON file.
|
| 16 |
+
- **Automatic Daily Resets**: Automatically resets cooldowns and archives stats daily to keep the system running smoothly.
|
| 17 |
- **Provider Agnostic**: Works with any provider supported by `litellm`.
|
| 18 |
+
- **Extensible**: Easily add support for new providers through a simple plugin-based architecture.
|
| 19 |
|
| 20 |
## Installation
|
| 21 |
|
| 22 |
+
To install the library, you can install it directly from a local path. Using the `-e` flag installs it in "editable" mode, which is recommended for development.
|
| 23 |
|
| 24 |
```bash
|
|
|
|
| 25 |
pip install -e .
|
| 26 |
```
|
| 27 |
|
|
|
|
| 33 |
|
| 34 |
```python
|
| 35 |
from rotating_api_key_client import RotatingClient
|
| 36 |
+
from typing import Dict, List
|
| 37 |
+
|
| 38 |
+
# Define your API keys, grouped by provider
|
| 39 |
+
api_keys: Dict[str, List[str]] = {
|
| 40 |
+
"gemini": ["your_gemini_key_1", "your_gemini_key_2"],
|
| 41 |
+
"openai": ["your_openai_key_1"],
|
| 42 |
+
}
|
| 43 |
|
| 44 |
client = RotatingClient(
|
| 45 |
+
api_keys=api_keys,
|
| 46 |
+
max_retries=2,
|
| 47 |
+
usage_file_path="key_usage.json"
|
| 48 |
)
|
| 49 |
```
|
| 50 |
|
|
|
|
| 54 |
|
| 55 |
### Concurrency and Resource Management
|
| 56 |
|
| 57 |
+
The `RotatingClient` is asynchronous and manages an `httpx.AsyncClient` internally. It's crucial to close the client properly to release resources. The recommended way is to use an `async with` block, which handles setup and teardown automatically.
|
| 58 |
|
|
|
|
| 59 |
```python
|
| 60 |
+
import asyncio
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
async def main():
|
| 63 |
+
async with RotatingClient(api_keys=api_keys) as client:
|
| 64 |
+
# ... use the client ...
|
| 65 |
+
response = await client.acompletion(
|
| 66 |
+
model="gemini/gemini-1.5-flash",
|
| 67 |
+
messages=[{"role": "user", "content": "Hello!"}]
|
| 68 |
+
)
|
| 69 |
+
print(response)
|
| 70 |
+
|
| 71 |
+
asyncio.run(main())
|
| 72 |
```
|
| 73 |
|
| 74 |
### Methods
|
|
|
|
| 82 |
- For non-streaming requests, it returns the `litellm` response object.
|
| 83 |
- For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE). The wrapper ensures that key locks are released and usage is recorded only after the stream is fully consumed.
|
| 84 |
|
| 85 |
+
**Streaming Example:**
|
| 86 |
|
| 87 |
```python
|
| 88 |
+
async def stream_example():
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
async with RotatingClient(api_keys=api_keys) as client:
|
| 90 |
+
response_stream = await client.acompletion(
|
| 91 |
+
model="gemini/gemini-1.5-flash",
|
| 92 |
+
messages=[{"role": "user", "content": "Tell me a long story."}],
|
| 93 |
+
stream=True
|
| 94 |
)
|
| 95 |
+
async for chunk in response_stream:
|
| 96 |
+
print(chunk)
|
| 97 |
|
| 98 |
+
asyncio.run(stream_example())
|
| 99 |
```
|
| 100 |
|
| 101 |
+
#### `async def aembedding(self, **kwargs) -> Any:`
|
| 102 |
+
|
| 103 |
+
A wrapper around `litellm.aembedding` that provides the same key rotation and retry logic for embedding requests.
|
| 104 |
+
|
| 105 |
#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
|
| 106 |
|
| 107 |
Calculates the token count for a given text or list of messages using `litellm.token_counter`.
|
|
|
|
| 137 |
import httpx
|
| 138 |
|
| 139 |
class MyProvider(ProviderInterface):
|
| 140 |
+
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
|
| 141 |
# Logic to fetch and return a list of model names
|
| 142 |
# The model names should be prefixed with the provider name.
|
| 143 |
# e.g., ["my-provider/model-1", "my-provider/model-2"]
|
| 144 |
+
# Example:
|
| 145 |
+
# response = await client.get("https://api.myprovider.com/models", headers={"Auth": api_key})
|
| 146 |
+
# return [f"my-provider/{model['id']}" for model in response.json()]
|
| 147 |
pass
|
| 148 |
```
|
| 149 |
|