Spaces:
Paused
Paused
| title: Llm Api Proxy | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| # Universal LLM API Proxy & Resilience Library | |
| [](https://ko-fi.com/C0C0UZS4P) | |
| [](https://deepwiki.com/Mirrowel/LLM-API-Key-Proxy) [](https://zread.ai/Mirrowel/LLM-API-Key-Proxy) | |
| **One proxy. Any LLM provider. Zero code changes.** | |
| A self-hosted proxy that provides a single, OpenAI-compatible API endpoint for all your LLM providers. Works with any application that supports custom OpenAI base URLsβno code changes required in your existing tools. | |
| This project consists of two components: | |
| 1. **The API Proxy** β A FastAPI application providing a universal `/v1/chat/completions` endpoint | |
| 2. **The Resilience Library** β A reusable Python library for intelligent API key management, rotation, and failover | |
| --- | |
| ## Why Use This? | |
| - **Universal Compatibility** β Works with any app supporting OpenAI-compatible APIs: Opencode, Continue, Roo/Kilo Code, JanitorAI, SillyTavern, custom applications, and more | |
| - **One Endpoint, Many Providers** β Configure Gemini, OpenAI, Anthropic, and [any LiteLLM-supported provider](https://docs.litellm.ai/docs/providers) once. Access them all through a single API key | |
| - **Built-in Resilience** β Automatic key rotation, failover on errors, rate limit handling, and intelligent cooldowns | |
| - **Exclusive Provider Support** β Includes custom providers not available elsewhere: **Antigravity** (Gemini 3 + Claude Sonnet/Opus 4.5), **Gemini CLI**, **Qwen Code**, and **iFlow** | |
| --- | |
| ## Quick Start | |
| ### Windows | |
| 1. **Download** the latest release from [GitHub Releases](https://github.com/Mirrowel/LLM-API-Key-Proxy/releases/latest) | |
| 2. **Unzip** the downloaded file | |
| 3. **Run** `proxy_app.exe` β the interactive TUI launcher opens | |
| <!-- TODO: Add TUI main menu screenshot here --> | |
| ### macOS / Linux | |
| ```bash | |
| # Download and extract the release for your platform | |
| chmod +x proxy_app | |
| ./proxy_app | |
| ``` | |
| ### From Source | |
| ```bash | |
| git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git | |
| cd LLM-API-Key-Proxy | |
| python3 -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| pip install -r requirements.txt | |
| python src/proxy_app/main.py | |
| ``` | |
| > **Tip:** Running with command-line arguments (e.g., `--host 0.0.0.0 --port 8000`) bypasses the TUI and starts the proxy directly. | |
| --- | |
| ## Connecting to the Proxy | |
| Once the proxy is running, configure your application with these settings: | |
| | Setting | Value | | |
| |---------|-------| | |
| | **Base URL / API Endpoint** | `http://127.0.0.1:8000/v1` | | |
| | **API Key** | Your `PROXY_API_KEY` | | |
| ### Model Format: `provider/model_name` | |
| **Important:** Models must be specified in the format `provider/model_name`. The `provider/` prefix tells the proxy which backend to route the request to. | |
| ``` | |
| gemini/gemini-2.5-flash β Gemini API | |
| openai/gpt-4o β OpenAI API | |
| anthropic/claude-3-5-sonnet β Anthropic API | |
| openrouter/anthropic/claude-3-opus β OpenRouter | |
| gemini_cli/gemini-2.5-pro β Gemini CLI (OAuth) | |
| antigravity/gemini-3-pro-preview β Antigravity (Gemini 3, Claude Opus 4.5) | |
| ``` | |
| ### Usage Examples | |
| <details> | |
| <summary><b>Python (OpenAI Library)</b></summary> | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI( | |
| base_url="http://127.0.0.1:8000/v1", | |
| api_key="your-proxy-api-key" | |
| ) | |
| response = client.chat.completions.create( | |
| model="gemini/gemini-2.5-flash", # provider/model format | |
| messages=[{"role": "user", "content": "Hello!"}] | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>curl</b></summary> | |
| ```bash | |
| curl -X POST http://127.0.0.1:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -H "Authorization: Bearer your-proxy-api-key" \ | |
| -d '{ | |
| "model": "gemini/gemini-2.5-flash", | |
| "messages": [{"role": "user", "content": "What is the capital of France?"}] | |
| }' | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>JanitorAI / SillyTavern / Other Chat UIs</b></summary> | |
| 1. Go to **API Settings** | |
| 2. Select **"Proxy"** or **"Custom OpenAI"** mode | |
| 3. Configure: | |
| - **API URL:** `http://127.0.0.1:8000/v1` | |
| - **API Key:** Your `PROXY_API_KEY` | |
| - **Model:** `provider/model_name` (e.g., `gemini/gemini-2.5-flash`) | |
| 4. Save and start chatting | |
| </details> | |
| <details> | |
| <summary><b>Continue / Cursor / IDE Extensions</b></summary> | |
| In your configuration file (e.g., `config.json`): | |
| ```json | |
| { | |
| "models": [{ | |
| "title": "Gemini via Proxy", | |
| "provider": "openai", | |
| "model": "gemini/gemini-2.5-flash", | |
| "apiBase": "http://127.0.0.1:8000/v1", | |
| "apiKey": "your-proxy-api-key" | |
| }] | |
| } | |
| ``` | |
| </details> | |
| ### API Endpoints | |
| | Endpoint | Description | | |
| |----------|-------------| | |
| | `GET /` | Status check β confirms proxy is running | | |
| | `POST /v1/chat/completions` | Chat completions (main endpoint) | | |
| | `POST /v1/embeddings` | Text embeddings | | |
| | `GET /v1/models` | List all available models with pricing & capabilities | | |
| | `GET /v1/models/{model_id}` | Get details for a specific model | | |
| | `GET /v1/providers` | List configured providers | | |
| | `POST /v1/token-count` | Calculate token count for a payload | | |
| | `POST /v1/cost-estimate` | Estimate cost based on token counts | | |
| > **Tip:** The `/v1/models` endpoint is useful for discovering available models in your client. Many apps can fetch this list automatically. Add `?enriched=false` for a minimal response without pricing data. | |
| --- | |
| ## Managing Credentials | |
| The proxy includes an interactive tool for managing all your API keys and OAuth credentials. | |
| ### Using the TUI | |
| <!-- TODO: Add TUI credentials menu screenshot here --> | |
| 1. Run the proxy without arguments to open the TUI | |
| 2. Select **"π Manage Credentials"** | |
| 3. Choose to add API keys or OAuth credentials | |
| ### Using the Command Line | |
| ```bash | |
| python -m rotator_library.credential_tool | |
| ``` | |
| ### Credential Types | |
| | Type | Providers | How to Add | | |
| |------|-----------|------------| | |
| | **API Keys** | Gemini, OpenAI, Anthropic, OpenRouter, Groq, Mistral, NVIDIA, Cohere, Chutes | Enter key in TUI or add to `.env` | | |
| | **OAuth** | Gemini CLI, Antigravity, Qwen Code, iFlow | Interactive browser login via credential tool | | |
| ### The `.env` File | |
| Credentials are stored in a `.env` file. You can edit it directly or use the TUI: | |
| ```env | |
| # Required: Authentication key for YOUR proxy | |
| PROXY_API_KEY="your-secret-proxy-key" | |
| # Provider API Keys (add multiple with _1, _2, etc.) | |
| GEMINI_API_KEY_1="your-gemini-key" | |
| GEMINI_API_KEY_2="another-gemini-key" | |
| OPENAI_API_KEY_1="your-openai-key" | |
| ANTHROPIC_API_KEY_1="your-anthropic-key" | |
| ``` | |
| > Copy `.env.example` to `.env` as a starting point. | |
| --- | |
| ## The Resilience Library | |
| The proxy is powered by a standalone Python library that you can use directly in your own applications. | |
| ### Key Features | |
| - **Async-native** with `asyncio` and `httpx` | |
| - **Intelligent key selection** with tiered, model-aware locking | |
| - **Deadline-driven requests** with configurable global timeout | |
| - **Automatic failover** between keys on errors | |
| - **OAuth support** for Gemini CLI, Antigravity, Qwen, iFlow | |
| - **Stateless deployment ready** β load credentials from environment variables | |
| ### Basic Usage | |
| ```python | |
| from rotator_library import RotatingClient | |
| client = RotatingClient( | |
| api_keys={"gemini": ["key1", "key2"], "openai": ["key3"]}, | |
| global_timeout=30, | |
| max_retries=2 | |
| ) | |
| async with client: | |
| response = await client.acompletion( | |
| model="gemini/gemini-2.5-flash", | |
| messages=[{"role": "user", "content": "Hello!"}] | |
| ) | |
| ``` | |
| ### Library Documentation | |
| See the [Library README](src/rotator_library/README.md) for complete documentation including: | |
| - All initialization parameters | |
| - Streaming support | |
| - Error handling and cooldown strategies | |
| - Provider plugin system | |
| - Credential prioritization | |
| --- | |
| ## Interactive TUI | |
| The proxy includes a powerful text-based UI for configuration and management. | |
| <!-- TODO: Add TUI main menu screenshot here --> | |
| ### TUI Features | |
| - **π Run Proxy** β Start the server with saved settings | |
| - **βοΈ Configure Settings** β Host, port, API key, request logging | |
| - **π Manage Credentials** β Add/edit API keys and OAuth credentials | |
| - **π View Status** β See configured providers and credential counts | |
| - **π§ Advanced Settings** β Custom providers, model definitions, concurrency | |
| ### Configuration Files | |
| | File | Contents | | |
| |------|----------| | |
| | `.env` | All credentials and advanced settings | | |
| | `launcher_config.json` | TUI-specific settings (host, port, logging) | | |
| --- | |
| ## Features | |
| ### Core Capabilities | |
| - **Universal OpenAI-compatible endpoint** for all providers | |
| - **Multi-provider support** via [LiteLLM](https://docs.litellm.ai/docs/providers) fallback | |
| - **Automatic key rotation** and load balancing | |
| - **Interactive TUI** for easy configuration | |
| - **Detailed request logging** for debugging | |
| <details> | |
| <summary><b>π‘οΈ Resilience & High Availability</b></summary> | |
| - **Global timeout** with deadline-driven retries | |
| - **Escalating cooldowns** per model (10s β 30s β 60s β 120s) | |
| - **Key-level lockouts** for consistently failing keys | |
| - **Stream error detection** and graceful recovery | |
| - **Batch embedding aggregation** for improved throughput | |
| - **Automatic daily resets** for cooldowns and usage stats | |
| </details> | |
| <details> | |
| <summary><b>π Credential Management</b></summary> | |
| - **Auto-discovery** of API keys from environment variables | |
| - **OAuth discovery** from standard paths (`~/.gemini/`, `~/.qwen/`, `~/.iflow/`) | |
| - **Duplicate detection** warns when same account added multiple times | |
| - **Credential prioritization** β paid tier used before free tier | |
| - **Stateless deployment** β export OAuth to environment variables | |
| - **Local-first storage** β credentials isolated in `oauth_creds/` directory | |
| </details> | |
| <details> | |
| <summary><b>βοΈ Advanced Configuration</b></summary> | |
| - **Model whitelists/blacklists** with wildcard support | |
| - **Per-provider concurrency limits** (`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`) | |
| - **Rotation modes** β balanced (distribute load) or sequential (use until exhausted) | |
| - **Priority multipliers** β higher concurrency for paid credentials | |
| - **Model quota groups** β shared cooldowns for related models | |
| - **Temperature override** β prevent tool hallucination issues | |
| - **Weighted random rotation** β unpredictable selection patterns | |
| </details> | |
| <details> | |
| <summary><b>π Provider-Specific Features</b></summary> | |
| **Gemini CLI:** | |
| - Zero-config Google Cloud project discovery | |
| - Internal API access with higher rate limits | |
| - Automatic fallback to preview models on rate limit | |
| - Paid vs free tier detection | |
| **Antigravity:** | |
| - Gemini 3 Pro with `thinkingLevel` support | |
| - Gemini 2.5 Flash/Flash Lite with thinking mode | |
| - Claude Opus 4.5 (thinking mode) | |
| - Claude Sonnet 4.5 (thinking and non-thinking) | |
| - GPT-OSS 120B Medium | |
| - Thought signature caching for multi-turn conversations | |
| - Tool hallucination prevention | |
| - Quota baseline tracking with background refresh | |
| - Parallel tool usage instruction injection | |
| - **Quota Groups**: Models that share quota are automatically grouped: | |
| - Claude/GPT-OSS: `claude-sonnet-4-5`, `claude-opus-4-5`, `gpt-oss-120b-medium` | |
| - Gemini 3 Pro: `gemini-3-pro-high`, `gemini-3-pro-low`, `gemini-3-pro-preview` | |
| - Gemini 2.5 Flash: `gemini-2.5-flash`, `gemini-2.5-flash-thinking`, `gemini-2.5-flash-lite` | |
| - All models in a group deplete the usage of the group equally. So in claude group - it is beneficial to use only Opus, and forget about Sonnet and GPT-OSS. | |
| **Qwen Code:** | |
| - Dual auth (API key + OAuth Device Flow) | |
| - `<think>` tag parsing as `reasoning_content` | |
| - Tool schema cleaning | |
| **iFlow:** | |
| - Dual auth (API key + OAuth Authorization Code) | |
| - Hybrid auth with separate API key fetch | |
| - Tool schema cleaning | |
| **NVIDIA NIM:** | |
| - Dynamic model discovery | |
| - DeepSeek thinking support | |
| </details> | |
| <details> | |
| <summary><b>π Logging & Debugging</b></summary> | |
| - **Per-request file logging** with `--enable-request-logging` | |
| - **Unique request directories** with full transaction details | |
| - **Streaming chunk capture** for debugging | |
| - **Performance metadata** (duration, tokens, model used) | |
| - **Provider-specific logs** for Qwen, iFlow, Antigravity | |
| </details> | |
| --- | |
| ## Advanced Configuration | |
| <details> | |
| <summary><b>Environment Variables Reference</b></summary> | |
| ### Proxy Settings | |
| | Variable | Description | Default | | |
| |----------|-------------|---------| | |
| | `PROXY_API_KEY` | Authentication key for your proxy | Required | | |
| | `OAUTH_REFRESH_INTERVAL` | Token refresh check interval (seconds) | `600` | | |
| | `SKIP_OAUTH_INIT_CHECK` | Skip interactive OAuth setup on startup | `false` | | |
| ### Per-Provider Settings | |
| | Pattern | Description | Example | | |
| |---------|-------------|---------| | |
| | `<PROVIDER>_API_KEY_<N>` | API key for provider | `GEMINI_API_KEY_1` | | |
| | `MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>` | Concurrent request limit | `MAX_CONCURRENT_REQUESTS_PER_KEY_OPENAI=3` | | |
| | `ROTATION_MODE_<PROVIDER>` | `balanced` or `sequential` | `ROTATION_MODE_GEMINI=sequential` | | |
| | `IGNORE_MODELS_<PROVIDER>` | Blacklist (comma-separated, supports `*`) | `IGNORE_MODELS_OPENAI=*-preview*` | | |
| | `WHITELIST_MODELS_<PROVIDER>` | Whitelist (overrides blacklist) | `WHITELIST_MODELS_GEMINI=gemini-2.5-pro` | | |
| ### Advanced Features | |
| | Variable | Description | | |
| |----------|-------------| | |
| | `ROTATION_TOLERANCE` | `0.0`=deterministic, `3.0`=weighted random (default) | | |
| | `CONCURRENCY_MULTIPLIER_<PROVIDER>_PRIORITY_<N>` | Concurrency multiplier per priority tier | | |
| | `QUOTA_GROUPS_<PROVIDER>_<GROUP>` | Models sharing quota limits | | |
| | `OVERRIDE_TEMPERATURE_ZERO` | `remove` or `set` to prevent tool hallucination | | |
| </details> | |
| <details> | |
| <summary><b>Model Filtering (Whitelists & Blacklists)</b></summary> | |
| Control which models are exposed through your proxy. | |
| ### Blacklist Only | |
| ```env | |
| # Hide all preview models | |
| IGNORE_MODELS_OPENAI="*-preview*" | |
| ``` | |
| ### Pure Whitelist Mode | |
| ```env | |
| # Block all, then allow specific models | |
| IGNORE_MODELS_GEMINI="*" | |
| WHITELIST_MODELS_GEMINI="gemini-2.5-pro,gemini-2.5-flash" | |
| ``` | |
| ### Exemption Mode | |
| ```env | |
| # Block preview models, but allow one specific preview | |
| IGNORE_MODELS_OPENAI="*-preview*" | |
| WHITELIST_MODELS_OPENAI="gpt-4o-2024-08-06-preview" | |
| ``` | |
| **Logic order:** Whitelist check β Blacklist check β Default allow | |
| </details> | |
| <details> | |
| <summary><b>Concurrency & Rotation Settings</b></summary> | |
| ### Concurrency Limits | |
| ```env | |
| # Allow 3 concurrent requests per OpenAI key | |
| MAX_CONCURRENT_REQUESTS_PER_KEY_OPENAI=3 | |
| # Default is 1 (no concurrency) | |
| MAX_CONCURRENT_REQUESTS_PER_KEY_GEMINI=1 | |
| ``` | |
| ### Rotation Modes | |
| ```env | |
| # balanced (default): Distribute load evenly - best for per-minute rate limits | |
| ROTATION_MODE_OPENAI=balanced | |
| # sequential: Use until exhausted - best for daily/weekly quotas | |
| ROTATION_MODE_GEMINI=sequential | |
| ``` | |
| ### Priority Multipliers | |
| Paid credentials can handle more concurrent requests: | |
| ```env | |
| # Priority 1 (paid ultra): 10x concurrency | |
| CONCURRENCY_MULTIPLIER_ANTIGRAVITY_PRIORITY_1=10 | |
| # Priority 2 (standard paid): 3x | |
| CONCURRENCY_MULTIPLIER_ANTIGRAVITY_PRIORITY_2=3 | |
| ``` | |
| ### Model Quota Groups | |
| Models sharing quota limits: | |
| ```env | |
| # Claude models share quota - when one hits limit, both cool down | |
| QUOTA_GROUPS_ANTIGRAVITY_CLAUDE="claude-sonnet-4-5,claude-opus-4-5" | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Timeout Configuration</b></summary> | |
| Fine-grained control over HTTP timeouts: | |
| ```env | |
| TIMEOUT_CONNECT=30 # Connection establishment | |
| TIMEOUT_WRITE=30 # Request body send | |
| TIMEOUT_POOL=60 # Connection pool acquisition | |
| TIMEOUT_READ_STREAMING=180 # Between streaming chunks (3 min) | |
| TIMEOUT_READ_NON_STREAMING=600 # Full response wait (10 min) | |
| ``` | |
| **Recommendations:** | |
| - Long thinking tasks: Increase `TIMEOUT_READ_STREAMING` to 300-360s | |
| - Unstable network: Increase `TIMEOUT_CONNECT` to 60s | |
| - Large outputs: Increase `TIMEOUT_READ_NON_STREAMING` to 900s+ | |
| </details> | |
| --- | |
| ## OAuth Providers | |
| <details> | |
| <summary><b>Gemini CLI</b></summary> | |
| Uses Google OAuth to access internal Gemini endpoints with higher rate limits. | |
| **Setup:** | |
| 1. Run `python -m rotator_library.credential_tool` | |
| 2. Select "Add OAuth Credential" β "Gemini CLI" | |
| 3. Complete browser authentication | |
| 4. Credentials saved to `oauth_creds/gemini_cli_oauth_1.json` | |
| **Features:** | |
| - Zero-config project discovery | |
| - Automatic free-tier project onboarding | |
| - Paid vs free tier detection | |
| - Smart fallback on rate limits | |
| **Environment Variables (for stateless deployment):** | |
| ```env | |
| GEMINI_CLI_ACCESS_TOKEN="ya29.your-access-token" | |
| GEMINI_CLI_REFRESH_TOKEN="1//your-refresh-token" | |
| GEMINI_CLI_EXPIRY_DATE="1234567890000" | |
| GEMINI_CLI_EMAIL="your-email@gmail.com" | |
| GEMINI_CLI_PROJECT_ID="your-gcp-project-id" # Optional | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Antigravity (Gemini 3 + Claude Opus 4.5)</b></summary> | |
| Access Google's internal Antigravity API for cutting-edge models. | |
| **Supported Models:** | |
| - **Gemini 3 Pro** β with `thinkingLevel` support (low/high) | |
| - **Gemini 2.5 Flash** β with thinking mode support | |
| - **Gemini 2.5 Flash Lite** β configurable thinking budget | |
| - **Claude Opus 4.5** β Anthropic's most powerful model (thinking mode only) | |
| - **Claude Sonnet 4.5** β supports both thinking and non-thinking modes | |
| - **GPT-OSS 120B** β OpenAI-compatible model | |
| **Setup:** | |
| 1. Run `python -m rotator_library.credential_tool` | |
| 2. Select "Add OAuth Credential" β "Antigravity" | |
| 3. Complete browser authentication | |
| **Advanced Features:** | |
| - Thought signature caching for multi-turn conversations | |
| - Tool hallucination prevention via parameter signature injection | |
| - Automatic thinking block sanitization for Claude | |
| - Credential prioritization (paid resets every 5 hours, free weekly) | |
| - Quota baseline tracking with background refresh (accurate remaining quota estimates) | |
| - Parallel tool usage instruction injection for Claude | |
| **Environment Variables:** | |
| ```env | |
| ANTIGRAVITY_ACCESS_TOKEN="ya29.your-access-token" | |
| ANTIGRAVITY_REFRESH_TOKEN="1//your-refresh-token" | |
| ANTIGRAVITY_EXPIRY_DATE="1234567890000" | |
| ANTIGRAVITY_EMAIL="your-email@gmail.com" | |
| # Feature toggles | |
| ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true | |
| ANTIGRAVITY_GEMINI3_TOOL_FIX=true | |
| ANTIGRAVITY_QUOTA_REFRESH_INTERVAL=300 # Quota refresh interval (seconds) | |
| ANTIGRAVITY_PARALLEL_TOOL_INSTRUCTION_CLAUDE=true # Parallel tool instruction for Claude | |
| ``` | |
| > **Note:** Gemini 3 models require a paid-tier Google Cloud project. | |
| </details> | |
| <details> | |
| <summary><b>Qwen Code</b></summary> | |
| Uses OAuth Device Flow for Qwen/Dashscope APIs. | |
| **Setup:** | |
| 1. Run the credential tool | |
| 2. Select "Add OAuth Credential" β "Qwen Code" | |
| 3. Enter the code displayed in your browser | |
| 4. Or add API key directly: `QWEN_CODE_API_KEY_1="your-key"` | |
| **Features:** | |
| - Dual auth (API key or OAuth) | |
| - `<think>` tag parsing as `reasoning_content` | |
| - Automatic tool schema cleaning | |
| - Custom models via `QWEN_CODE_MODELS` env var | |
| </details> | |
| <details> | |
| <summary><b>iFlow</b></summary> | |
| Uses OAuth Authorization Code flow with local callback server. | |
| **Setup:** | |
| 1. Run the credential tool | |
| 2. Select "Add OAuth Credential" β "iFlow" | |
| 3. Complete browser authentication (callback on port 11451) | |
| 4. Or add API key directly: `IFLOW_API_KEY_1="sk-your-key"` | |
| **Features:** | |
| - Dual auth (API key or OAuth) | |
| - Hybrid auth (OAuth token fetches separate API key) | |
| - Automatic tool schema cleaning | |
| - Custom models via `IFLOW_MODELS` env var | |
| </details> | |
| <details> | |
| <summary><b>Stateless Deployment (Export to Environment Variables)</b></summary> | |
| For platforms without file persistence (Railway, Render, Vercel): | |
| 1. **Set up credentials locally:** | |
| ```bash | |
| python -m rotator_library.credential_tool | |
| # Complete OAuth flows | |
| ``` | |
| 2. **Export to environment variables:** | |
| ```bash | |
| python -m rotator_library.credential_tool | |
| # Select "Export [Provider] to .env" | |
| ``` | |
| 3. **Copy generated variables to your platform:** | |
| The tool creates files like `gemini_cli_credential_1.env` containing all necessary variables. | |
| 4. **Set `SKIP_OAUTH_INIT_CHECK=true`** to skip interactive validation on startup. | |
| </details> | |
| <details> | |
| <summary><b>OAuth Callback Port Configuration</b></summary> | |
| Customize OAuth callback ports if defaults conflict: | |
| | Provider | Default Port | Environment Variable | | |
| |----------|-------------|---------------------| | |
| | Gemini CLI | 8085 | `GEMINI_CLI_OAUTH_PORT` | | |
| | Antigravity | 51121 | `ANTIGRAVITY_OAUTH_PORT` | | |
| | iFlow | 11451 | `IFLOW_OAUTH_PORT` | | |
| </details> | |
| --- | |
| ## Deployment | |
| <details> | |
| <summary><b>Command-Line Arguments</b></summary> | |
| ```bash | |
| python src/proxy_app/main.py [OPTIONS] | |
| Options: | |
| --host TEXT Host to bind (default: 0.0.0.0) | |
| --port INTEGER Port to run on (default: 8000) | |
| --enable-request-logging Enable detailed per-request logging | |
| --add-credential Launch interactive credential setup tool | |
| ``` | |
| **Examples:** | |
| ```bash | |
| # Run on custom port | |
| python src/proxy_app/main.py --host 127.0.0.1 --port 9000 | |
| # Run with logging | |
| python src/proxy_app/main.py --enable-request-logging | |
| # Add credentials without starting proxy | |
| python src/proxy_app/main.py --add-credential | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Render / Railway / Vercel</b></summary> | |
| See the [Deployment Guide](Deployment%20guide.md) for complete instructions. | |
| **Quick Setup:** | |
| 1. Fork the repository | |
| 2. Create a `.env` file with your credentials | |
| 3. Create a new Web Service pointing to your repo | |
| 4. Set build command: `pip install -r requirements.txt` | |
| 5. Set start command: `uvicorn src.proxy_app.main:app --host 0.0.0.0 --port $PORT` | |
| 6. Upload `.env` as a secret file | |
| **OAuth Credentials:** | |
| Export OAuth credentials to environment variables using the credential tool, then add them to your platform's environment settings. | |
| </details> | |
| <details> | |
| <summary><b>Custom VPS / Docker</b></summary> | |
| **Option 1: Authenticate locally, deploy credentials** | |
| 1. Complete OAuth flows on your local machine | |
| 2. Export to environment variables | |
| 3. Deploy `.env` to your server | |
| **Option 2: SSH Port Forwarding** | |
| ```bash | |
| # Forward callback ports through SSH | |
| ssh -L 51121:localhost:51121 -L 8085:localhost:8085 user@your-vps | |
| # Then run credential tool on the VPS | |
| ``` | |
| **Systemd Service:** | |
| ```ini | |
| [Unit] | |
| Description=LLM API Key Proxy | |
| After=network.target | |
| [Service] | |
| Type=simple | |
| WorkingDirectory=/path/to/LLM-API-Key-Proxy | |
| ExecStart=/path/to/python -m uvicorn src.proxy_app.main:app --host 0.0.0.0 --port 8000 | |
| Restart=always | |
| [Install] | |
| WantedBy=multi-user.target | |
| ``` | |
| See [VPS Deployment](Deployment%20guide.md#appendix-deploying-to-a-custom-vps) for complete guide. | |
| </details> | |
| --- | |
| ## Troubleshooting | |
| | Issue | Solution | | |
| |-------|----------| | |
| | `401 Unauthorized` | Verify `PROXY_API_KEY` matches your `Authorization: Bearer` header exactly | | |
| | `500 Internal Server Error` | Check provider key validity; enable `--enable-request-logging` for details | | |
| | All keys on cooldown | All keys failed recently; check `logs/detailed_logs/` for upstream errors | | |
| | Model not found | Verify format is `provider/model_name` (e.g., `gemini/gemini-2.5-flash`) | | |
| | OAuth callback failed | Ensure callback port (8085, 51121, 11451) isn't blocked by firewall | | |
| | Streaming hangs | Increase `TIMEOUT_READ_STREAMING`; check provider status | | |
| **Detailed Logs:** | |
| When `--enable-request-logging` is enabled, check `logs/detailed_logs/` for: | |
| - `request.json` β Exact request payload | |
| - `final_response.json` β Complete response or error | |
| - `streaming_chunks.jsonl` β All SSE chunks received | |
| - `metadata.json` β Performance metrics | |
| --- | |
| ## Documentation | |
| | Document | Description | | |
| |----------|-------------| | |
| | [Technical Documentation](DOCUMENTATION.md) | Architecture, internals, provider implementations | | |
| | [Library README](src/rotator_library/README.md) | Using the resilience library directly | | |
| | [Deployment Guide](Deployment%20guide.md) | Hosting on Render, Railway, VPS | | |
| | [.env.example](.env.example) | Complete environment variable reference | | |
| --- | |
| ## License | |
| This project is dual-licensed: | |
| - **Proxy Application** (`src/proxy_app/`) β [MIT License](src/proxy_app/LICENSE) | |
| - **Resilience Library** (`src/rotator_library/`) β [LGPL-3.0](src/rotator_library/COPYING.LESSER) | |