Spaces:
Paused
Paused
Mirrowel commited on
Commit ·
40be1c9
1
Parent(s): 30f6fec
docs: update project descriptions and component names
Browse filesRefine the project's title and descriptions across documentation to better emphasize its role as a universal LLM API proxy with a resilience library, improving clarity for developers building agentic systems. This includes renaming the library to "Resilience & API Key Management Library" and updating feature lists to highlight high availability and efficient key management.
- DOCUMENTATION.md +10 -10
- README.md +17 -16
- src/rotator_library/README.md +9 -9
DOCUMENTATION.md
CHANGED
|
@@ -1,21 +1,21 @@
|
|
| 1 |
-
# Technical Documentation:
|
| 2 |
|
| 3 |
-
This document provides a detailed technical explanation of the
|
| 4 |
|
| 5 |
## 1. Architecture Overview
|
| 6 |
|
| 7 |
The project is a monorepo containing two primary components:
|
| 8 |
|
| 9 |
-
1. **`
|
| 10 |
-
2. **`
|
| 11 |
|
| 12 |
-
This architecture separates the
|
| 13 |
|
| 14 |
---
|
| 15 |
|
| 16 |
-
## 2. `rotator_library` - The
|
| 17 |
|
| 18 |
-
This library is the heart of the project, containing all the logic for
|
| 19 |
|
| 20 |
### 2.1. `client.py` - The `RotatingClient`
|
| 21 |
|
|
@@ -40,7 +40,7 @@ client = RotatingClient(
|
|
| 40 |
* Managing a shared `httpx.AsyncClient` for all non-blocking HTTP requests.
|
| 41 |
* Interfacing with the `UsageManager` to acquire and release API keys.
|
| 42 |
* Dynamically loading and using provider-specific plugins from the `providers/` directory.
|
| 43 |
-
* Executing API calls via `litellm` with a robust, **deadline-driven** retry and
|
| 44 |
* Providing a safe, stateful wrapper for handling streaming responses.
|
| 45 |
|
| 46 |
#### Request Lifecycle: A Deadline-Driven Approach
|
|
@@ -49,7 +49,7 @@ The request lifecycle has been redesigned around a single, authoritative time bu
|
|
| 49 |
|
| 50 |
1. **Deadline Establishment**: The moment `acompletion` or `aembedding` is called, a `deadline` is calculated: `time.time() + self.global_timeout`. This `deadline` is the absolute point in time by which the entire operation must complete.
|
| 51 |
|
| 52 |
-
2. **Deadline-Aware Key
|
| 53 |
|
| 54 |
3. **Deadline-Aware Key Acquisition**: The `self.usage_manager.acquire_key()` method now accepts the `deadline`. The `UsageManager` will not wait indefinitely for a key; if it cannot acquire one before the `deadline` is met, it will raise a `NoAvailableKeysError`, causing the request to fail fast with a "busy" error.
|
| 55 |
|
|
@@ -59,7 +59,7 @@ The request lifecycle has been redesigned around a single, authoritative time bu
|
|
| 59 |
|
| 60 |
5. **Refined Error Propagation**:
|
| 61 |
- **Fatal Errors**: Invalid requests or authentication errors are raised immediately to the client.
|
| 62 |
-
- **Intermittent Errors**:
|
| 63 |
- **Final Failure**: A non-streaming request will only return `None` (indicating failure) if either a) the global `deadline` is exceeded, or b) all keys for the provider have been tried and have failed. A streaming request will yield a final `[DONE]` with an error message in the same scenarios.
|
| 64 |
|
| 65 |
### 2.2. `usage_manager.py` - Stateful Concurrency & Usage Management
|
|
|
|
| 1 |
+
# Technical Documentation: Universal LLM API Proxy & Resilience Library
|
| 2 |
|
| 3 |
+
This document provides a detailed technical explanation of the project's two main components: the Universal LLM API Proxy and the Resilience Library that powers it.
|
| 4 |
|
| 5 |
## 1. Architecture Overview
|
| 6 |
|
| 7 |
The project is a monorepo containing two primary components:
|
| 8 |
|
| 9 |
+
1. **The Proxy Application (`proxy_app`)**: This is the user-facing component. It's a FastAPI application that uses `litellm` to create a universal, OpenAI-compatible API. Its primary role is to abstract away the complexity of dealing with multiple LLM providers, offering a single point of entry for applications like agentic coders.
|
| 10 |
+
2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues.
|
| 11 |
|
| 12 |
+
This architecture cleanly separates the API interface from the resilience logic, making the library a portable and powerful tool for any application needing robust API key management.
|
| 13 |
|
| 14 |
---
|
| 15 |
|
| 16 |
+
## 2. `rotator_library` - The Resilience Engine
|
| 17 |
|
| 18 |
+
This library is the heart of the project, containing all the logic for managing a pool of API keys, tracking their usage, and handling provider interactions to ensure application resilience.
|
| 19 |
|
| 20 |
### 2.1. `client.py` - The `RotatingClient`
|
| 21 |
|
|
|
|
| 40 |
* Managing a shared `httpx.AsyncClient` for all non-blocking HTTP requests.
|
| 41 |
* Interfacing with the `UsageManager` to acquire and release API keys.
|
| 42 |
* Dynamically loading and using provider-specific plugins from the `providers/` directory.
|
| 43 |
+
* Executing API calls via `litellm` with a robust, **deadline-driven** retry and key selection strategy.
|
| 44 |
* Providing a safe, stateful wrapper for handling streaming responses.
|
| 45 |
|
| 46 |
#### Request Lifecycle: A Deadline-Driven Approach
|
|
|
|
| 49 |
|
| 50 |
1. **Deadline Establishment**: The moment `acompletion` or `aembedding` is called, a `deadline` is calculated: `time.time() + self.global_timeout`. This `deadline` is the absolute point in time by which the entire operation must complete.
|
| 51 |
|
| 52 |
+
2. **Deadline-Aware Key Selection Loop**: The main `while` loop now has a critical secondary condition: `while len(tried_keys) < len(keys_for_provider) and time.time() < deadline:`. The loop will exit immediately if the `deadline` is reached, regardless of how many keys are left to try.
|
| 53 |
|
| 54 |
3. **Deadline-Aware Key Acquisition**: The `self.usage_manager.acquire_key()` method now accepts the `deadline`. The `UsageManager` will not wait indefinitely for a key; if it cannot acquire one before the `deadline` is met, it will raise a `NoAvailableKeysError`, causing the request to fail fast with a "busy" error.
|
| 55 |
|
|
|
|
| 59 |
|
| 60 |
5. **Refined Error Propagation**:
|
| 61 |
- **Fatal Errors**: Invalid requests or authentication errors are raised immediately to the client.
|
| 62 |
+
- **Intermittent Errors**: Temporary issues like server errors and provider-side capacity limits are now handled internally. The error is logged, the key is rotated, but the exception is **not** propagated to the end client. This prevents the client from seeing disruptive, intermittent failures.
|
| 63 |
- **Final Failure**: A non-streaming request will only return `None` (indicating failure) if either a) the global `deadline` is exceeded, or b) all keys for the provider have been tried and have failed. A streaming request will yield a final `[DONE]` with an error message in the same scenarios.
|
| 64 |
|
| 65 |
### 2.2. `usage_manager.py` - Stateful Concurrency & Usage Management
|
README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
## Easy Setup for Beginners (Windows)
|
| 4 |
|
|
@@ -15,17 +15,18 @@ Your proxy is now running! You can now use it in your applications.
|
|
| 15 |
|
| 16 |
## Detailed Setup and Features
|
| 17 |
|
| 18 |
-
This project provides a
|
| 19 |
|
| 20 |
-
1. A
|
| 21 |
-
2. A
|
| 22 |
|
| 23 |
## Features
|
| 24 |
|
| 25 |
-
- **
|
| 26 |
-
- **
|
| 27 |
-
- **
|
| 28 |
-
- **
|
|
|
|
| 29 |
- **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
|
| 30 |
- **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
|
| 31 |
- **Request Logging**: Optional logging of full request and response payloads for easy debugging.
|
|
@@ -220,15 +221,15 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
|
|
| 220 |
|
| 221 |
### How It Works
|
| 222 |
|
| 223 |
-
|
| 224 |
|
| 225 |
-
1. **
|
| 226 |
-
2. **Makes the Request**:
|
| 227 |
-
3. **
|
| 228 |
- It uses a `classify_error` function to determine the failure type.
|
| 229 |
-
- For **server errors**, it retries the request with the same key using exponential backoff.
|
| 230 |
-
- For **
|
| 231 |
-
4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key
|
| 232 |
|
| 233 |
### Command-Line Arguments and Scripts
|
| 234 |
|
|
@@ -260,5 +261,5 @@ For convenience on Windows, you can use the provided `.bat` scripts in the root
|
|
| 260 |
|
| 261 |
## Library and Technical Docs
|
| 262 |
|
| 263 |
-
- **Using the Library**: For documentation on how to use the `
|
| 264 |
- **Technical Details**: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
|
|
|
|
| 1 |
+
# Universal LLM API Proxy & Resilience Library [](https://ko-fi.com/C0C0UZS4P)
|
| 2 |
|
| 3 |
## Easy Setup for Beginners (Windows)
|
| 4 |
|
|
|
|
| 15 |
|
| 16 |
## Detailed Setup and Features
|
| 17 |
|
| 18 |
+
This project provides a powerful solution for developers building complex applications, such as agentic systems, that interact with multiple Large Language Model (LLM) providers. It consists of two distinct but complementary components:
|
| 19 |
|
| 20 |
+
1. **A Universal API Proxy**: A self-hosted FastAPI application that provides a single, OpenAI-compatible endpoint for all your LLM requests. Powered by `litellm`, it allows you to seamlessly switch between different providers and models without altering your application's code.
|
| 21 |
+
2. **A Resilience & Key Management Library**: The core engine that powers the proxy. This reusable Python library intelligently manages a pool of API keys to ensure your application is highly available and resilient to transient provider errors or performance issues.
|
| 22 |
|
| 23 |
## Features
|
| 24 |
|
| 25 |
+
- **Universal API Endpoint**: Simplifies development by providing a single, OpenAI-compatible interface for diverse LLM providers.
|
| 26 |
+
- **High Availability**: The underlying library ensures your application remains operational by gracefully handling transient provider errors and API key-specific issues.
|
| 27 |
+
- **Resilient Performance**: A global timeout on all requests prevents your application from hanging on unresponsive provider APIs.
|
| 28 |
+
- **Efficient Concurrency**: Maximizes throughput by allowing a single API key to handle multiple concurrent requests to different models.
|
| 29 |
+
- **Intelligent Key Management**: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
|
| 30 |
- **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
|
| 31 |
- **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
|
| 32 |
- **Request Logging**: Optional logging of full request and response payloads for easy debugging.
|
|
|
|
| 221 |
|
| 222 |
### How It Works
|
| 223 |
|
| 224 |
+
When a request is made to the proxy, the application uses its core resilience library to ensure the request is handled reliably:
|
| 225 |
|
| 226 |
+
1. **Selects an Optimal Key**: The `UsageManager` selects the best available key from your pool. It uses a tiered locking strategy to find a healthy, available key, prioritizing those with the least recent usage. This allows for concurrent requests to different models using the same key, maximizing efficiency.
|
| 227 |
+
2. **Makes the Request**: The proxy uses the acquired key to make the API call to the target provider via `litellm`.
|
| 228 |
+
3. **Manages Errors Gracefully**:
|
| 229 |
- It uses a `classify_error` function to determine the failure type.
|
| 230 |
+
- For **transient server errors**, it retries the request with the same key using exponential backoff.
|
| 231 |
+
- For **key-specific issues (e.g., authentication or provider-side limits)**, it temporarily places that key on a cooldown for the specific model and seamlessly retries the request with the next available key from the pool.
|
| 232 |
+
4. **Tracks Usage & Releases Key**: On a successful request, it records usage stats. The key is then released back into the available pool, ready for the next request.
|
| 233 |
|
| 234 |
### Command-Line Arguments and Scripts
|
| 235 |
|
|
|
|
| 261 |
|
| 262 |
## Library and Technical Docs
|
| 263 |
|
| 264 |
+
- **Using the Library**: For documentation on how to use the `api-key-manager` library directly in your own Python projects, please refer to its [README.md](src/rotator_library/README.md).
|
| 265 |
- **Technical Details**: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).
|
src/rotator_library/README.md
CHANGED
|
@@ -1,13 +1,13 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
A robust, asynchronous, and thread-safe
|
| 4 |
|
| 5 |
## Key Features
|
| 6 |
|
| 7 |
- **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
|
| 8 |
- **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety. Requests for the *same model* using the same key are queued, preventing conflicts.
|
| 9 |
-
- **Smart Key
|
| 10 |
-
- **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key
|
| 11 |
- **Intelligent Error Handling**:
|
| 12 |
- **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model, allowing it to continue being used for others.
|
| 13 |
- **Deadline-Aware Retries**: Retries requests on transient server errors with exponential backoff, but only if the wait time fits within the global request budget.
|
|
@@ -28,7 +28,7 @@ pip install -e .
|
|
| 28 |
|
| 29 |
## `RotatingClient` Class
|
| 30 |
|
| 31 |
-
This is the main class for interacting with the library. It is designed to be a long-lived object that manages
|
| 32 |
|
| 33 |
### Initialization
|
| 34 |
|
|
@@ -90,7 +90,7 @@ asyncio.run(main())
|
|
| 90 |
|
| 91 |
#### `async def acompletion(self, **kwargs) -> Any:`
|
| 92 |
|
| 93 |
-
This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds the core logic for key acquisition,
|
| 94 |
|
| 95 |
- **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`. The `model` parameter is required and must be a string in the format `provider/model_name`.
|
| 96 |
- **Returns**:
|
|
@@ -115,7 +115,7 @@ asyncio.run(stream_example())
|
|
| 115 |
|
| 116 |
#### `async def aembedding(self, **kwargs) -> Any:`
|
| 117 |
|
| 118 |
-
A wrapper around `litellm.aembedding` that provides the same key
|
| 119 |
|
| 120 |
#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
|
| 121 |
|
|
@@ -135,7 +135,7 @@ The client uses a sophisticated error handling mechanism:
|
|
| 135 |
|
| 136 |
- **Error Classification**: All exceptions from `litellm` are passed through a `classify_error` function to determine their type (`rate_limit`, `authentication`, `server_error`, etc.).
|
| 137 |
- **Server Errors**: The client will retry the request with the *same key* up to `max_retries` times, using an exponential backoff strategy.
|
| 138 |
-
- **
|
| 139 |
- **Key-Level Lockouts**: If a key fails on multiple different models, the `UsageManager` can apply a key-level lockout, taking it out of rotation entirely for a short period.
|
| 140 |
|
| 141 |
### Global Timeout and Deadline-Driven Logic
|
|
@@ -144,7 +144,7 @@ To ensure predictable performance, the client now operates on a strict time budg
|
|
| 144 |
|
| 145 |
- **Deadline Enforcement**: When a request starts, a `deadline` is set. The entire process, including all key rotations and retries, must complete before this deadline.
|
| 146 |
- **Deadline-Aware Retries**: If a retry requires a wait time that would exceed the remaining budget, the wait is skipped, and the client immediately rotates to the next key.
|
| 147 |
-
- **Silent Internal Errors**: Intermittent failures like
|
| 148 |
|
| 149 |
## Extending with Provider Plugins
|
| 150 |
|
|
|
|
| 1 |
+
# Resilience & API Key Management Library
|
| 2 |
|
| 3 |
+
A robust, asynchronous, and thread-safe Python library for managing a pool of API keys. It is designed to be integrated into applications (such as the Universal LLM API Proxy included in this project) to provide a powerful layer of resilience and high availability when interacting with multiple LLM providers.
|
| 4 |
|
| 5 |
## Key Features
|
| 6 |
|
| 7 |
- **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O.
|
| 8 |
- **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests to *different* models, maximizing throughput while ensuring thread safety. Requests for the *same model* using the same key are queued, preventing conflicts.
|
| 9 |
+
- **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability.
|
| 10 |
+
- **Deadline-Driven Requests**: A global timeout ensures that no request, including all retries and key selections, exceeds a specified time limit, preventing indefinite hangs.
|
| 11 |
- **Intelligent Error Handling**:
|
| 12 |
- **Escalating Per-Model Cooldowns**: If a key fails, it's placed on a temporary, escalating cooldown for that specific model, allowing it to continue being used for others.
|
| 13 |
- **Deadline-Aware Retries**: Retries requests on transient server errors with exponential backoff, but only if the wait time fits within the global request budget.
|
|
|
|
| 28 |
|
| 29 |
## `RotatingClient` Class
|
| 30 |
|
| 31 |
+
This is the main class for interacting with the library. It is designed to be a long-lived object that manages the state of your API key pool.
|
| 32 |
|
| 33 |
### Initialization
|
| 34 |
|
|
|
|
| 90 |
|
| 91 |
#### `async def acompletion(self, **kwargs) -> Any:`
|
| 92 |
|
| 93 |
+
This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds the core logic for key acquisition, selection, and retries.
|
| 94 |
|
| 95 |
- **Parameters**: Accepts the same keyword arguments as `litellm.acompletion`. The `model` parameter is required and must be a string in the format `provider/model_name`.
|
| 96 |
- **Returns**:
|
|
|
|
| 115 |
|
| 116 |
#### `async def aembedding(self, **kwargs) -> Any:`
|
| 117 |
|
| 118 |
+
A wrapper around `litellm.aembedding` that provides the same key management and retry logic for embedding requests.
|
| 119 |
|
| 120 |
#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
|
| 121 |
|
|
|
|
| 135 |
|
| 136 |
- **Error Classification**: All exceptions from `litellm` are passed through a `classify_error` function to determine their type (`rate_limit`, `authentication`, `server_error`, etc.).
|
| 137 |
- **Server Errors**: The client will retry the request with the *same key* up to `max_retries` times, using an exponential backoff strategy.
|
| 138 |
+
- **Key-Specific Errors (Authentication, Quota, etc.)**: The client records the failure in the `UsageManager`, which applies an escalating cooldown to the key for that specific model. The client then immediately acquires a new key and continues its attempt to complete the request.
|
| 139 |
- **Key-Level Lockouts**: If a key fails on multiple different models, the `UsageManager` can apply a key-level lockout, taking it out of rotation entirely for a short period.
|
| 140 |
|
| 141 |
### Global Timeout and Deadline-Driven Logic
|
|
|
|
| 144 |
|
| 145 |
- **Deadline Enforcement**: When a request starts, a `deadline` is set. The entire process, including all key rotations and retries, must complete before this deadline.
|
| 146 |
- **Deadline-Aware Retries**: If a retry requires a wait time that would exceed the remaining budget, the wait is skipped, and the client immediately rotates to the next key.
|
| 147 |
+
- **Silent Internal Errors**: Intermittent failures like provider capacity limits or temporary server errors are logged internally but are **not raised** to the caller. The client will simply rotate to the next key. A non-streaming request will only return `None` (or a streaming request will end) if the global timeout is exceeded or all keys have been exhausted. This creates a more stable experience for the end-user, as they are shielded from transient backend issues.
|
| 148 |
|
| 149 |
## Extending with Provider Plugins
|
| 150 |
|