Spaces:

elmerzole
/

llm-api-proxy

Paused

App Files Files Community

Mirrowel commited on Jun 11, 2025

Commit

5838a8e

1 Parent(s): 21dcb11

feat: Add detailed documentation and installation instructions for the rotating API key client

Browse files

Files changed (6) hide show

.gitignore +1 -0
DOCUMENTATION.md +81 -0
LICENSE.MD +21 -0
README.md +63 -41
requirements.txt +10 -0
src/rotator_library/README.md +108 -18

.gitignore CHANGED Viewed

@@ -123,3 +123,4 @@ cython_debug/
 test_proxy.py
 start_proxy.bat
 key_usage.json

 test_proxy.py
 start_proxy.bat
 key_usage.json
+staged_changes.txt

DOCUMENTATION.md ADDED Viewed

	@@ -0,0 +1,81 @@

+# Technical Documentation: `rotating-api-key-client`
+This document provides a detailed technical explanation of the `rotating-api-key-client` library, its components, and its internal workings.
+## 1. `client.py` - The `RotatingClient`
+The `RotatingClient` is the central component of the library, orchestrating API calls, key rotation, and error handling.
+### Request Lifecycle (`acompletion`)
+When `acompletion` is called, it follows these steps:
+1.  **Model and Provider Validation**: It first checks that a `model` is specified and extracts the provider name from it (e.g., `"gemini"` from `"gemini/gemini-2.5-flash-preview-05-20"`). It ensures that API keys for this provider are available.
+2.  **Key Selection Loop**: The client enters a loop to find a valid key and complete the request.
+    a.  **Get Next Smart Key**: It calls `self.usage_manager.get_next_smart_key()` to get the least-used key for the given model that is not currently on cooldown.
+    b.  **No Key Available**: If all keys for the provider are on cooldown, it waits for 5 seconds before restarting the loop.
+3.  **Attempt Loop**: Once a key is selected, it enters a retry loop (`for attempt in range(self.max_retries)`):
+    a.  **API Call**: It calls `litellm.acompletion` with the selected key and the user-provided arguments.
+    b.  **Success**:
+        -   If the call is successful and **non-streaming**, it calls `self.usage_manager.record_success()`, returns the response, and the process ends.
+        -   If the call is successful and **streaming**, it returns a `_streaming_wrapper` async generator. This wrapper formats the response chunks as Server-Sent Events (SSE) and calls `self.usage_manager.record_success()` only when the stream is fully consumed.
+    c.  **Failure**: If an exception occurs:
+        -   The failure is logged using `log_failure()`.
+        -   **Server Error**: If `is_server_error()` returns `True` and there are retries left, it waits for a moment and continues to the next attempt with the *same key*.
+        -   **Unrecoverable Error**: If `is_unrecoverable_error()` returns `True`, the exception is immediately raised, terminating the process.
+        -   **Other Errors (Rate Limit, Auth, etc.)**: For any other error, it's considered a "rotation" error. `self.usage_manager.record_rotation_error()` is called to put the key on cooldown, and the inner `attempt` loop is broken. The outer `while` loop then continues, fetching a new key.
+## 2. `usage_manager.py` - The `UsageManager`
+This class is responsible for all logic related to tracking and selecting API keys.
+### Key Data Structure
+Usage data is stored in a JSON file (e.g., `key_usage.json`). Here's a conceptual view of its structure:
+```json
+{
+  "api_key_1_hash": {
+    "last_used": "timestamp",
+    "cooldown_until": "timestamp",
+    "global_usage": 150,
+    "daily_usage": {
+      "YYYY-MM-DD": 100
+    },
+    "model_usage": {
+      "gemini/gemini-2.5-flash-preview-05-20": 50
+    }
+  }
+}
+```
+-   **Key Hashing**: Keys are stored by their SHA256 hash to avoid exposing sensitive keys in logs or files.
+-   `cooldown_until`: If a key fails, this timestamp is set. The key will not be selected until the current time is past this timestamp.
+-   `model_usage`: Tracks the usage count for each specific model, which is the primary metric for the "smart" key selection.
+### Core Methods
+-   `get_next_smart_key()`: This is the key selection logic. It filters out any keys that are on cooldown and then finds the key with the lowest usage count for the requested `model`.
+-   `record_success()`: Increments the usage counters (`global_usage`, `daily_usage`, `model_usage`) for the given key.
+-   `record_rotation_error()`: Sets the `cooldown_until` timestamp for the given key, effectively taking it out of rotation for a short period.
+## 3. `error_handler.py`
+This module contains functions to classify exceptions returned by `litellm`.
+-   `is_server_error(e)`: Checks if the exception is a transient server-side error (typically a `5xx` status code) that is worth retrying with the same key.
+-   `is_unrecoverable_error(e)`: Checks for critical errors (e.g., invalid request parameters) that should immediately stop the process. Any error that is not a server error or an unrecoverable error is treated as a "rotation" error by the client.
+## 4. `failure_logger.py`
+-   `log_failure()`: This function logs detailed information about a failed API request to a file in the `logs/` directory. This is crucial for debugging issues with specific keys or providers. The log includes the hashed API key, the model, the error message, and the request data.
+## 5. `providers/` - Provider Plugins
+The provider plugin system allows for easy extension to support model list fetching from new LLM providers.
+-   **`provider_interface.py`**: Defines the abstract base class `ProviderPlugin` with a single abstract method, `get_models`. Any new provider plugin must inherit from this class and implement this method.
+-   **Implementations**: Each provider (e.g., `openai_provider.py`, `gemini_provider.py`) has its own file containing a class that implements the `ProviderPlugin` interface. The `get_models` method contains the specific logic to call the provider's API and return a list of their available models.
+-   **`__init__.py`**: This file acts as a registry for the available plugins. The `PROVIDER_PLUGINS` dictionary maps provider names to their corresponding plugin classes. The `RotatingClient` uses this dictionary to instantiate the correct plugin at runtime.

LICENSE.MD ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Mirrowel
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,18 +1,31 @@
 # API Key Proxy with Rotating Key Library
-This project provides two main components:
 1.  A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys.
-2.  A FastAPI proxy application that uses this library to provide an OpenAI-compatible endpoint for various LLM providers.
 ## Features
--   **Smart Key Rotation**: The library automatically uses the least-used key to distribute load.
--   **Automatic Retries**: Retries requests on transient server errors.
--   **Cooldowns**: Puts keys on a temporary cooldown after rate limit or authentication errors.
--   **Usage Tracking**: Tracks daily and global usage for each key.
--   **Provider Agnostic**: Works with any provider supported by `litellm`.
--   **OpenAI-Compatible Proxy**: The proxy provides a familiar API for interacting with different models.
 ## Project Structure
@@ -28,10 +41,9 @@ This project provides two main components:
 │       ├── error_handler.py
 │       ├── failure_logger.py
 │       ├── usage_manager.py
-│       ├── pyproject.toml
-│       └── README.md
 ├── .env.example
-├── .gitignore
 ├── README.md
 └── requirements.txt
 ```
@@ -39,77 +51,79 @@ This project provides two main components:
 ## Setup and Installation
 1.  **Clone the repository:**
     ```bash
     git clone <repository-url>
     cd <repository-name>
     ```
 2.  **Create a virtual environment:**
     ```bash
     python -m venv venv
     source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
     ```
-3.  **Install the dependencies:**
-    The `requirements.txt` file includes the proxy's dependencies and installs the `rotator_library` in editable mode (`-e`), so you can develop both simultaneously.
     ```bash
     pip install -r requirements.txt
     ```
 4.  **Configure environment variables:**
-    Create a `.env` file by copying the `.env.example`:
     ```bash
     cp .env.example .env
     ```
-    Edit the `.env` file with your API keys:
-    ```
-    # A secret key for your proxy to prevent unauthorized access
     PROXY_API_KEY="your-secret-proxy-key"
-    # Add one or more API keys from your chosen provider (e.g., Gemini)
-    # The keys will be tried in order.
     GEMINI_API_KEY_1="your-gemini-api-key-1"
     GEMINI_API_KEY_2="your-gemini-api-key-2"
-    # ...and so on
     ```
 ## Running the Proxy
-To run the proxy application:
 ```bash
 uvicorn src.proxy_app.main:app --reload
 ```
 The proxy will be available at `http://127.0.0.1:8000`.
 ## Using the Proxy
-You can make requests to the proxy as if it were the OpenAI API. Make sure to include your `PROXY_API_KEY` in the `Authorization` header.
-### Example with `curl`:
 ```bash
 curl -X POST http://127.0.0.1:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-proxy-key" \
 -d '{
-    "model": "gemini/gemini-pro",
-    "messages": [{"role": "user", "content": "What is the capital of France?"}],
-    "stream": false
 }'
 ```
-### Example with Python `requests`:
 ```python
 import requests
 import json
@@ -123,16 +137,24 @@ headers = {
 }
 data = {
-    "model": "gemini/gemini-pro",
-    "messages": [{"role": "user", "content": "What is the capital of France?"}],
-    "stream": False
 }
 response = requests.post(proxy_url, headers=headers, data=json.dumps(data))
 print(response.json())
 ```
 ## Using the Library in Other Projects
-The `rotating-api-key-client` library is designed to be reusable. You can find more information on how to use it in its own `README.md` file located at `src/rotator_library/README.md`.

 # API Key Proxy with Rotating Key Library
+This project provides a robust solution for managing and rotating API keys for various Large Language Model (LLM) providers. It consists of two main components:
 1.  A reusable Python library (`rotating-api-key-client`) for intelligently rotating API keys.
+2.  A FastAPI proxy application that uses this library to provide an OpenAI-compatible endpoint.
 ## Features
+-   **Smart Key Rotation**: Intelligently selects the least-used API key to distribute request loads evenly.
+-   **Automatic Retries**: Automatically retries requests on transient server errors (e.g., 5xx status codes).
+-   **Key Cooldowns**: Temporarily disables keys that encounter rate limits or authentication errors to prevent further issues.
+-   **Usage Tracking**: Monitors daily and global usage for each API key.
+-   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
+-   **OpenAI-Compatible Proxy**: Offers a familiar API interface for seamless interaction with different models.
+## How It Works
+The core of this project is the `RotatingClient` library, which manages a pool of API keys. When a request is made, the client:
+1.  **Selects the Best Key**: It identifies the key with the lowest usage count that is not currently in a cooldown period.
+2.  **Makes the Request**: It uses the selected key to make the API call via `litellm`.
+3.  **Handles Errors**:
+    -   If a **retriable error** (like a 500 server error) occurs, it waits and retries the request.
+    -   If a **non-retriable error** (like a rate limit or invalid key error) occurs, it places the key on a temporary cooldown and selects a new key for the next attempt.
+4.  **Tracks Usage**: On a successful request, it records the usage for the key.
+The FastAPI proxy application exposes this functionality through an API endpoint that mimics the OpenAI API, making it easy to integrate with existing tools and applications.
 ## Project Structure
 │       ├── error_handler.py
 │       ├── failure_logger.py
 │       ├── usage_manager.py
+│       ├── providers/
+│       └── ...
 ├── .env.example
 ├── README.md
 └── requirements.txt
 ```
 ## Setup and Installation
 1.  **Clone the repository:**
     ```bash
     git clone <repository-url>
     cd <repository-name>
     ```
 2.  **Create a virtual environment:**
     ```bash
     python -m venv venv
     source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
     ```
+3.  **Install dependencies:**
+    The `requirements.txt` file includes all necessary packages and installs the `rotator_library` in editable mode (`-e`), allowing for simultaneous development of the library and the proxy.
     ```bash
     pip install -r requirements.txt
     ```
 4.  **Configure environment variables:**
+    Create a `.env` file by copying the example file:
     ```bash
     cp .env.example .env
     ```
+    Edit the `.env` file to add your API keys. The proxy automatically detects keys for different providers based on the naming convention `PROVIDER_API_KEY_N`.
+    ```env
+    # A secret key to protect your proxy from unauthorized access
     PROXY_API_KEY="your-secret-proxy-key"
+    # Add API keys for each provider. They will be rotated automatically.
     GEMINI_API_KEY_1="your-gemini-api-key-1"
     GEMINI_API_KEY_2="your-gemini-api-key-2"
+    OPENAI_API_KEY_1="your-openai-api-key-1"
     ```
 ## Running the Proxy
+To start the proxy application, run the following command:
 ```bash
 uvicorn src.proxy_app.main:app --reload
 ```
 The proxy will be available at `http://127.0.0.1:8000`.
 ## Using the Proxy
+You can make requests to the proxy as if it were the OpenAI API. Remember to include your `PROXY_API_KEY` in the `Authorization` header.
+The `model` parameter must be specified in the format `provider/model_name` (e.g., `gemini/gemini-2.5-flash-preview-05-20`, `openai/gpt-4`).
+### Example with `curl` (Non-Streaming):
 ```bash
 curl -X POST http://127.0.0.1:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-proxy-key" \
 -d '{
+    "model": "gemini/gemini-2.5-flash-preview-05-20",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}]
 }'
 ```
+### Example with `curl` (Streaming):
+```bash
+curl -X POST http://127.0.0.1:8000/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer your-secret-proxy-key" \
+-d '{
+    "model": "gemini/gemini-2.5-flash-preview-05-20",
+    "messages": [{"role": "user", "content": "Write a short story about a robot."}],
+    "stream": true
+}'
+```
+### Example with Python `requests`:
 ```python
 import requests
 import json
 }
 data = {
+    "model": "gemini/gemini-2.5-flash-preview-05-20",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}]
 }
 response = requests.post(proxy_url, headers=headers, data=json.dumps(data))
 print(response.json())
 ```
+## Troubleshooting
+-   **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization` header of your request.
+-   **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys or a problem with the provider's service.
+-   **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory for details on why the failures occurred.
 ## Using the Library in Other Projects
+The `rotating-api-key-client` is a standalone library that can be integrated into any Python project. For detailed documentation on how to use it, please refer to its `README.md` file located at `src/rotator_library/README.md`.
+## Detailed Documentation
+For a more in-depth technical explanation of the `rotating-api-key-client` library's architecture, components, and internal workings, please refer to the [Technical Documentation](DOCUMENTATION.md).

requirements.txt CHANGED Viewed

@@ -1,4 +1,14 @@
 fastapi
 uvicorn
 python-dotenv
 -e src/rotator_library

+# FastAPI framework for building the proxy server
 fastapi
+# ASGI server for running the FastAPI application
 uvicorn
+# For loading environment variables from a .env file
 python-dotenv
+# Installs the local rotator_library in editable mode
 -e src/rotator_library
+# A library for calling LLM APIs with a consistent format
+litellm

src/rotator_library/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Rotating API Key Client
-A simple, thread-safe client that intelligently rotates and retries API keys for use with `litellm`.
 ## Features
@@ -9,45 +9,135 @@ A simple, thread-safe client that intelligently rotates and retries API keys for
 -   **Cooldowns**: Puts keys on a temporary cooldown after rate limit or authentication errors.
 -   **Usage Tracking**: Tracks daily and global usage for each key.
 -   **Provider Agnostic**: Works with any provider supported by `litellm`.
 ## Installation
-To install the library, you can install it directly from a Git repository or a local path.
-### From a local path:
 ```bash
 pip install -e .
 ```
-## Usage
-Here's a simple example of how to use the `RotatingClient`:
 ```python
 import asyncio
 from rotating_api_key_client import RotatingClient
 async def main():
-    # List of your API keys
-    api_keys = ["key1", "key2", "key3"]
-    # Initialize the client
     client = RotatingClient(api_keys=api_keys)
-    # Make a request
     response = await client.acompletion(
-        model="gemini/gemini-pro",
-        messages=[{"role": "user", "content": "Hello, how are you?"}]
     )
     print(response)
-if __name__ == "__main__":
-    asyncio.run(main())
 ```
-By default, the client will store usage data in a `key_usage.json` file in the current working directory. You can customize this by passing the `usage_file_path` parameter:
 ```python
-client = RotatingClient(api_keys=api_keys, usage_file_path="/path/to/your/usage.json")

 # Rotating API Key Client
+A simple, thread-safe client that intelligently rotates and retries API keys for use with `litellm`. This library is designed to make your interactions with LLM providers more resilient and efficient.
 ## Features
 -   **Cooldowns**: Puts keys on a temporary cooldown after rate limit or authentication errors.
 -   **Usage Tracking**: Tracks daily and global usage for each key.
 -   **Provider Agnostic**: Works with any provider supported by `litellm`.
+-   **Extensible**: Easily add support for new providers through a plugin-based architecture.
 ## Installation
+To install the library, you can install it directly from a local path, which is recommended for development.
 ```bash
+# The -e flag installs it in "editable" mode
 pip install -e .
 ```
+## `RotatingClient` Class
+This is the main class for interacting with the library.
+### Initialization
+```python
+from rotating_api_key_client import RotatingClient
+client = RotatingClient(
+    api_keys: Dict[str, List[str]],
+    max_retries: int = 2,
+    usage_file_path: str = "key_usage.json"
+)
+```
+-   `api_keys`: A dictionary where keys are provider names (e.g., `"openai"`, `"gemini"`) and values are lists of API keys for that provider.
+-   `max_retries`: The number of times to retry a request with the *same key* if a transient server error occurs.
+-   `usage_file_path`: The path to the JSON file where key usage data will be stored.
+### Methods
+#### `async def acompletion(self, **kwargs) -> Any:`
+This is the primary method for making API calls. It's a wrapper around `litellm.acompletion` that adds key rotation and retry logic.
+-   **Parameters**: Accepts the same keyword arguments as `litellm.acompletion` (e.g., `messages`, `stream`). The `model` parameter is required and must be a string in the format `provider/model_name` (e.g., `"gemini/gemini-2.5-flash-preview-05-20"`).
+-   **Returns**:
+    -   For non-streaming requests, it returns the `litellm` response object.
+    -   For streaming requests, it returns an async generator that yields OpenAI-compatible Server-Sent Events (SSE).
+**Example:**
 ```python
 import asyncio
 from rotating_api_key_client import RotatingClient
 async def main():
+    api_keys = {"gemini": ["key1", "key2"]}
     client = RotatingClient(api_keys=api_keys)
     response = await client.acompletion(
+        model="gemini/gemini-2.5-flash-preview-05-20",
+        messages=[{"role": "user", "content": "Hello!"}]
     )
     print(response)
+asyncio.run(main())
 ```
+#### `def token_count(self, model: str, text: str = None, messages: List[Dict[str, str]] = None) -> int:`
+Calculates the token count for a given text or list of messages using `litellm.token_counter`.
+The `model` parameter is required and must be a string in the format `provider/model_name` (e.g., `"gemini/gemini-2.5-flash-preview-05-20"`).
+**Example:**
 ```python
+count = client.token_count(
+    model="gemini/gemini-2.5-flash-preview-05-20",
+    messages=[{"role": "user", "content": "Count these tokens."}]
+)
+print(f"Token count: {count}")
+```
+#### `async def get_available_models(self, provider: str) -> List[str]:`
+Fetches a list of available models for a specific provider. Results are cached.
+#### `async def get_all_available_models(self) -> Dict[str, List[str]]:`
+Fetches a dictionary of all available models, grouped by provider.
+## Error Handling and Cooldowns
+The client is designed to handle errors gracefully:
+-   **Server Errors (`5xx`)**: The client will retry the request with the *same key* up to `max_retries` times.
+-   **Rate Limit / Auth Errors**: These are considered "rotation" errors. The client will immediately place the failing key on a temporary cooldown and try the request again with a different key.
+-   **Unrecoverable Errors**: For critical errors, the client will fail fast and raise the exception.
+Cooldowns are managed by the `UsageManager` and prevent failing keys from being used repeatedly.
+## Extending with Provider Plugins
+You can add support for fetching model lists from new providers by creating a custom provider plugin.
+1.  **Create a new provider file** in `src/rotator_library/providers/`, for example, `my_provider.py`.
+2.  **Implement the `ProviderPlugin` interface**:
+    ```python
+    # src/rotator_library/providers/my_provider.py
+    from .provider_interface import ProviderPlugin
+    from typing import List
+    class MyProvider(ProviderPlugin):
+        async def get_models(self, api_key: str) -> List[str]:
+            # Logic to fetch and return a list of model names
+            # e.g., ["my-provider/model-1", "my-provider/model-2"]
+            pass
+    ```
+3.  **Register the plugin** in `src/rotator_library/providers/__init__.py`:
+    ```python
+    # src/rotator_library/providers/__init__.py
+    from .openai_provider import OpenAIProvider
+    from .gemini_provider import GeminiProvider
+    from .my_provider import MyProvider # Import your new provider
+    PROVIDER_PLUGINS = {
+        "openai": OpenAIProvider,
+        "gemini": GeminiProvider,
+        "my_provider": MyProvider, # Add it to the dictionary
+    }
+    ```
+The `RotatingClient` will automatically use your new plugin when `get_available_models` is called for `"my_provider"`.
+## Detailed Documentation
+For a more in-depth technical explanation of the `rotating-api-key-client` library's architecture, components, and internal workings, please refer to the [Technical Documentation](../../DOCUMENTATION.md).