Mirrowel commited on
Commit
40be1c9
·
1 Parent(s): 30f6fec

docs: update project descriptions and component names

Browse files

Refine the project's title and descriptions across documentation to better emphasize its role as a universal LLM API proxy with a resilience library, improving clarity for developers building agentic systems. This includes renaming the library to "Resilience & API Key Management Library" and updating feature lists to highlight high availability and efficient key management.

Files changed (3) hide show
  1. DOCUMENTATION.md +10 -10
  2. README.md +17 -16
  3. src/rotator_library/README.md +9 -9
DOCUMENTATION.md CHANGED
@@ -1,21 +1,21 @@
1
- # Technical Documentation: API Key Proxy & Rotator Library
2
 
3
- This document provides a detailed technical explanation of the API Key Proxy and the `rotating-api-key-client` library, covering their architecture, components, and internal workings.
4
 
5
  ## 1. Architecture Overview
6
 
7
  The project is a monorepo containing two primary components:
8
 
9
- 1. **`rotator_library`**: A standalone, reusable Python library for intelligent API key rotation and management.
10
- 2. **`proxy_app`**: A FastAPI application that consumes the `rotator_library` and exposes its functionality through an OpenAI-compatible web API.
11
 
12
- This architecture separates the core rotation logic from the web-serving layer, making the library portable and the proxy a clean implementation of its features.
13
 
14
  ---
15
 
16
- ## 2. `rotator_library` - The Core Engine
17
 
18
- This library is the heart of the project, containing all the logic for key rotation, usage tracking, and provider management.
19
 
20
  ### 2.1. `client.py` - The `RotatingClient`
21
 
@@ -40,7 +40,7 @@ client = RotatingClient(
40
  * Managing a shared `httpx.AsyncClient` for all non-blocking HTTP requests.
41
  * Interfacing with the `UsageManager` to acquire and release API keys.
42
  * Dynamically loading and using provider-specific plugins from the `providers/` directory.
43
- * Executing API calls via `litellm` with a robust, **deadline-driven** retry and rotation strategy.
44
  * Providing a safe, stateful wrapper for handling streaming responses.
45
 
46
  #### Request Lifecycle: A Deadline-Driven Approach
@@ -49,7 +49,7 @@ The request lifecycle has been redesigned around a single, authoritative time bu
49
 
50
  1. **Deadline Establishment**: The moment `acompletion` or `aembedding` is called, a `deadline` is calculated: `time.time() + self.global_timeout`. This `deadline` is the absolute point in time by which the entire operation must complete.
51
 
52
- 2. **Deadline-Aware Key Rotation Loop**: The main `while` loop now has a critical secondary condition: `while len(tried_keys) < len(keys_for_provider) and time.time() < deadline:`. The loop will exit immediately if the `deadline` is reached, regardless of how many keys are left to try.
53
 
54
  3. **Deadline-Aware Key Acquisition**: The `self.usage_manager.acquire_key()` method now accepts the `deadline`. The `UsageManager` will not wait indefinitely for a key; if it cannot acquire one before the `deadline` is met, it will raise a `NoAvailableKeysError`, causing the request to fail fast with a "busy" error.
55
 
@@ -59,7 +59,7 @@ The request lifecycle has been redesigned around a single, authoritative time bu
59
 
60
  5. **Refined Error Propagation**:
61
  - **Fatal Errors**: Invalid requests or authentication errors are raised immediately to the client.
62
- - **Intermittent Errors**: Rate limits, server errors, and other temporary issues are now handled internally. The error is logged, the key is rotated, but the exception is **not** propagated to the end client. This prevents the client from seeing disruptive, intermittent failures.
63
  - **Final Failure**: A non-streaming request will only return `None` (indicating failure) if either a) the global `deadline` is exceeded, or b) all keys for the provider have been tried and have failed. A streaming request will yield a final `[DONE]` with an error message in the same scenarios.
64
 
65
  ### 2.2. `usage_manager.py` - Stateful Concurrency & Usage Management
 
1
+ # Technical Documentation: Universal LLM API Proxy & Resilience Library
2
 
3
+ This document provides a detailed technical explanation of the project's two main components: the Universal LLM API Proxy and the Resilience Library that powers it.
4
 
5
  ## 1. Architecture Overview
6
 
7
  The project is a monorepo containing two primary components:
8
 
9
+ 1. **The Proxy Application (`proxy_app`)**: This is the user-facing component. It's a FastAPI application that uses `litellm` to create a universal, OpenAI-compatible API. Its primary role is to abstract away the complexity of dealing with multiple LLM providers, offering a single point of entry for applications like agentic coders.
10
+ 2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues.
11
 
12
+ This architecture cleanly separates the API interface from the resilience logic, making the library a portable and powerful tool for any application needing robust API key management.
13
 
14
  ---
15
 
16
+ ## 2. `rotator_library` - The Resilience Engine
17
 
18
+ This library is the heart of the project, containing all the logic for managing a pool of API keys, tracking their usage, and handling provider interactions to ensure application resilience.
19
 
20
  ### 2.1. `client.py` - The `RotatingClient`
21
 
 
40
  * Managing a shared `httpx.AsyncClient` for all non-blocking HTTP requests.
41
  * Interfacing with the `UsageManager` to acquire and release API keys.
42
  * Dynamically loading and using provider-specific plugins from the `providers/` directory.
43
+ * Executing API calls via `litellm` with a robust, **deadline-driven** retry and key selection strategy.
44
  * Providing a safe, stateful wrapper for handling streaming responses.
45
 
46
  #### Request Lifecycle: A Deadline-Driven Approach
 
49
 
50
  1. **Deadline Establishment**: The moment `acompletion` or `aembedding` is called, a `deadline` is calculated: `time.time() + self.global_timeout`. This `deadline` is the absolute point in time by which the entire operation must complete.
51
 
52
+ 2. **Deadline-Aware Key Selection Loop**: The main `while` loop now has a critical secondary condition: `while len(tried_keys) < len(keys_for_provider) and time.time() < deadline:`. The loop will exit immediately if the `deadline` is reached, regardless of how many keys are left to try.
53
 
54
  3. **Deadline-Aware Key Acquisition**: The `self.usage_manager.acquire_key()` method now accepts the `deadline`. The `UsageManager` will not wait indefinitely for a key; if it cannot acquire one before the `deadline` is met, it will raise a `NoAvailableKeysError`, causing the request to fail fast with a "busy" error.
55
 
 
59
 
60
  5. **Refined Error Propagation**:
61
  - **Fatal Errors**: Invalid requests or authentication errors are raised immediately to the client.
62
+ - **Intermittent Errors**: Temporary issues like server errors and provider-side capacity limits are now handled internally. The error is logged, the key is rotated, but the exception is **not** propagated to the end client. This prevents the client from seeing disruptive, intermittent failures.
63
  - **Final Failure**: A non-streaming request will only return `None` (indicating failure) if either a) the global `deadline` is exceeded, or b) all keys for the provider have been tried and have failed. A streaming request will yield a final `[DONE]` with an error message in the same scenarios.
64
 
65
  ### 2.2. `usage_manager.py` - Stateful Concurrency & Usage Management
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # API Key Proxy with Rotating Key Library [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/C0C0UZS4P)
2
 
3
  ## Easy Setup for Beginners (Windows)
4
 
@@ -15,17 +15,18 @@ Your proxy is now running! You can now use it in your applications.
15
 
16
  ## Detailed Setup and Features
17
 
18
- This project provides a robust, self-hosted solution for managing and rotating API keys for various Large Language Model (LLM) providers. It consists of two main components:
19
 
20
- 1. A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys with advanced concurrency and error handling.
21
- 2. A FastAPI proxy application that uses this library to provide a single, unified, and OpenAI-compatible endpoint for all your LLM requests.
22
 
23
  ## Features
24
 
25
- - **Predictable Performance**: A new **global timeout** ensures that requests complete within a set time, preventing your application from hanging on slow or failing provider responses.
26
- - **Resilient Error Handling**: The proxy now shields your application from transient backend errors. It handles rate limits and temporary provider issues internally by rotating keys, so your client only sees a failure if all options are exhausted or the timeout is hit.
27
- - **Advanced Concurrency Control**: A single API key can handle multiple concurrent requests to different models, maximizing throughput.
28
- - **Smart Key Rotation**: Intelligently selects the least-used, available API key to distribute request loads evenly.
 
29
  - **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
30
  - **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
31
  - **Request Logging**: Optional logging of full request and response payloads for easy debugging.
@@ -220,15 +221,15 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
220
 
221
  ### How It Works
222
 
223
- The core of this project is the `RotatingClient` library. When a request is made, the client:
224
 
225
- 1. **Acquires the Best Key**: It requests the best available key from the `UsageManager`. The manager uses a tiered locking strategy to find a key that is not on cooldown and preferably not in use. If a key is busy with another request for the *same model*, it waits. Otherwise, it allows concurrent use for *different models*.
226
- 2. **Makes the Request**: It uses the acquired key to make the API call via `litellm`.
227
- 3. **Handles Errors**:
228
  - It uses a `classify_error` function to determine the failure type.
229
- - For **server errors**, it retries the request with the same key using exponential backoff.
230
- - For **rate-limit or auth errors**, it records the failure, applies an escalating cooldown for that specific key-model pair, and the client immediately tries the next available key.
231
- 4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key's lock is then released, notifying any waiting requests that it is available.
232
 
233
  ### Command-Line Arguments and Scripts
234
 
@@ -260,5 +261,5 @@ For convenience on Windows, you can use the provided `.bat` scripts in the root
260
 
261
  ## Library and Technical Docs
262
 
263
- - **Using the Library**: For documentation on how to use the `rotating-api-key-client` library directly in your own Python projects, please refer to its [README.md](src/rotator_library/README.md).
264
  - **Technical Details**: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
 
1
+ # Universal LLM API Proxy & Resilience Library [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/C0C0UZS4P)
2
 
3
  ## Easy Setup for Beginners (Windows)
4
 
 
15
 
16
  ## Detailed Setup and Features
17
 
18
+ This project provides a powerful solution for developers building complex applications, such as agentic systems, that interact with multiple Large Language Model (LLM) providers. It consists of two distinct but complementary components:
19
 
20
+ 1. **A Universal API Proxy**: A self-hosted FastAPI application that provides a single, OpenAI-compatible endpoint for all your LLM requests. Powered by `litellm`, it allows you to seamlessly switch between different providers and models without altering your application's code.
21
+ 2. **A Resilience & Key Management Library**: The core engine that powers the proxy. This reusable Python library intelligently manages a pool of API keys to ensure your application is highly available and resilient to transient provider errors or performance issues.
22
 
23
  ## Features
24
 
25
+ - **Universal API Endpoint**: Simplifies development by providing a single, OpenAI-compatible interface for diverse LLM providers.
26
+ - **High Availability**: The underlying library ensures your application remains operational by gracefully handling transient provider errors and API key-specific issues.
27
+ - **Resilient Performance**: A global timeout on all requests prevents your application from hanging on unresponsive provider APIs.
28
+ - **Efficient Concurrency**: Maximizes throughput by allowing a single API key to handle multiple concurrent requests to different models.
29
+ - **Intelligent Key Management**: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
30
  - **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
31
  - **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
32
  - **Request Logging**: Optional logging of full request and response payloads for easy debugging.
 
221
 
222
  ### How It Works
223
 
224
+ When a request is made to the proxy, the application uses its core resilience library to ensure the request is handled reliably:
225
 
226
+ 1. **Selects an Optimal Key**: The `UsageManager` selects the best available key from your pool. It uses a tiered locking strategy to find a healthy, available key, prioritizing those with the least recent usage. This allows for concurrent requests to different models using the same key, maximizing efficiency.
227
+ 2. **Makes the Request**: The proxy uses the acquired key to make the API call to the target provider via `litellm`.
228
+ 3. **Manages Errors Gracefully**:
229
  - It uses a `classify_error` function to determine the failure type.
230
+ - For **transient server errors**, it retries the request with the same key using exponential backoff.
231
+ - For **key-specific issues (e.g., authentication or provider-side limits)**, it temporarily places that key on a cooldown for the specific model and seamlessly retries the request with the next available key from the pool.
232
+ 4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key is then released back into the available pool, ready for the next request.
233
 
234
  ### Command-Line Arguments and Scripts
235
 
 
261
 
262
  ## Library and Technical Docs
263
 
264
+ - **Using the Library**: For documentation on how to use the `api-key-manager` library directly in your own Python projects, please refer to its [README.md](src/rotator_library/README.md).
265
  - **Technical Details**: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
src/rotator_library/README.md CHANGED
@@ -1,13 +1,13 @@
1
- # Rotating API Key Client
2
 
3
- A robust, asynchronous, and thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient, concurrent, and efficient.
4
 
5
  ## Key Features
6
 
7
  - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
  - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety. Requests for the *same model* using the same key are queued, preventing conflicts.
9
- - **Smart Key Rotation**: Acquires the least-used, available key using a tiered, model-aware locking strategy to distribute load evenly.
10
- - **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key rotations, exceeds a specified time limit, preventing indefinite hangs.
11
  - **Intelligent Error Handling**:
12
  - **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model, allowing it to continue being used for others.
13
  - **Deadline-Aware Retries**: Retries requests on transient server errors with exponential backoff, but only if the wait time fits within the global request budget.
@@ -28,7 +28,7 @@ pip install -e .
28
 
29
  ## `RotatingClient` Class
30
 
31
- This is the main class for interacting with the library. It is designed to be a long-lived object that manages its own HTTP client and key usage state.
32
 
33
  ### Initialization
34
 
@@ -90,7 +90,7 @@ asyncio.run(main())
90
 
91
  #### `async def acompletion(self, **kwargs) -> Any:`
92
 
93
- This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds the core logic for key acquisition, rotation, and retries.
94
 
95
  - **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`. The `model` parameter is required and must be a string in the format `provider/model_name`.
96
  - **Returns**:
@@ -115,7 +115,7 @@ asyncio.run(stream_example())
115
 
116
  #### `async def aembedding(self, **kwargs) -> Any:`
117
 
118
- A wrapper around `litellm.aembedding` that provides the same key rotation and retry logic for embedding requests.
119
 
120
  #### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
121
 
@@ -135,7 +135,7 @@ The client uses a sophisticated error handling mechanism:
135
 
136
  - **Error Classification**: All exceptions from `litellm` are passed through a `classify_error` function to determine their type (`rate_limit`, `authentication`, `server_error`, etc.).
137
  - **Server Errors**: The client will retry the request with the *same key* up to `max_retries` times, using an exponential backoff strategy.
138
- - **Rotation Errors (Rate Limit, Auth, etc.)**: The client records the failure in the `UsageManager`, which applies an escalating cooldown to the key for that specific model. The client then immediately acquires a new key and continues its attempt to complete the request.
139
  - **Key-Level Lockouts**: If a key fails on multiple different models, the `UsageManager` can apply a key-level lockout, taking it out of rotation entirely for a short period.
140
 
141
  ### Global Timeout and Deadline-Driven Logic
@@ -144,7 +144,7 @@ To ensure predictable performance, the client now operates on a strict time budg
144
 
145
  - **Deadline Enforcement**: When a request starts, a `deadline` is set. The entire process, including all key rotations and retries, must complete before this deadline.
146
  - **Deadline-Aware Retries**: If a retry requires a wait time that would exceed the remaining budget, the wait is skipped, and the client immediately rotates to the next key.
147
- - **Silent Internal Errors**: Intermittent failures like rate limits or temporary server errors are logged internally but are **not raised** to the caller. The client will simply rotate to the next key. A non-streaming request will only return `None` (or a streaming request will end) if the global timeout is exceeded or all keys have been exhausted. This creates a more stable experience for the end-user, as they are shielded from transient backend issues.
148
 
149
  ## Extending with Provider Plugins
150
 
 
1
+ # Resilience & API Key Management Library
2
 
3
+ A robust, asynchronous, and thread-safe Python library for managing a pool of API keys. It is designed to be integrated into applications (such as the Universal LLM API Proxy included in this project) to provide a powerful layer of resilience and high availability when interacting with multiple LLM providers.
4
 
5
  ## Key Features
6
 
7
  - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
8
  - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety. Requests for the *same model* using the same key are queued, preventing conflicts.
9
+ - **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability.
10
+ - **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key selections, exceeds a specified time limit, preventing indefinite hangs.
11
  - **Intelligent Error Handling**:
12
  - **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model, allowing it to continue being used for others.
13
  - **Deadline-Aware Retries**: Retries requests on transient server errors with exponential backoff, but only if the wait time fits within the global request budget.
 
28
 
29
  ## `RotatingClient` Class
30
 
31
+ This is the main class for interacting with the library. It is designed to be a long-lived object that manages the state of your API key pool.
32
 
33
  ### Initialization
34
 
 
90
 
91
  #### `async def acompletion(self, **kwargs) -> Any:`
92
 
93
+ This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds the core logic for key acquisition, selection, and retries.
94
 
95
  - **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`. The `model` parameter is required and must be a string in the format `provider/model_name`.
96
  - **Returns**:
 
115
 
116
  #### `async def aembedding(self, **kwargs) -> Any:`
117
 
118
+ A wrapper around `litellm.aembedding` that provides the same key management and retry logic for embedding requests.
119
 
120
  #### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
121
 
 
135
 
136
  - **Error Classification**: All exceptions from `litellm` are passed through a `classify_error` function to determine their type (`rate_limit`, `authentication`, `server_error`, etc.).
137
  - **Server Errors**: The client will retry the request with the *same key* up to `max_retries` times, using an exponential backoff strategy.
138
+ - **Key-Specific Errors (Authentication, Quota, etc.)**: The client records the failure in the `UsageManager`, which applies an escalating cooldown to the key for that specific model. The client then immediately acquires a new key and continues its attempt to complete the request.
139
  - **Key-Level Lockouts**: If a key fails on multiple different models, the `UsageManager` can apply a key-level lockout, taking it out of rotation entirely for a short period.
140
 
141
  ### Global Timeout and Deadline-Driven Logic
 
144
 
145
  - **Deadline Enforcement**: When a request starts, a `deadline` is set. The entire process, including all key rotations and retries, must complete before this deadline.
146
  - **Deadline-Aware Retries**: If a retry requires a wait time that would exceed the remaining budget, the wait is skipped, and the client immediately rotates to the next key.
147
+ - **Silent Internal Errors**: Intermittent failures like provider capacity limits or temporary server errors are logged internally but are **not raised** to the caller. The client will simply rotate to the next key. A non-streaming request will only return `None` (or a streaming request will end) if the global timeout is exceeded or all keys have been exhausted. This creates a more stable experience for the end-user, as they are shielded from transient backend issues.
148
 
149
  ## Extending with Provider Plugins
150