Spaces:

yym68686
/

uni-api

Sleeping

App Files Files Community

yym68686 commited on Nov 7, 2024

Commit

926469d

1 Parent(s): eeaa4ee

✨ Feature: Add feature: support setting API key random load balancing

Browse files

Files changed (3) hide show

README.md +9 -8
README_CN.md +5 -4
utils.py +14 -4

README.md CHANGED Viewed

@@ -13,20 +13,20 @@
 ## Introduction
-For personal use, one/new-api is too complex with many commercial features that individuals don't need. If you don't want a complicated frontend interface and prefer support for more models, you can try uni-api. This is a project that unifies the management of large language model APIs, allowing you to call multiple backend services through a single unified API interface, converting them all to OpenAI format, and supporting load balancing. Currently supported backend services include: OpenAI, Anthropic, Gemini, Vertex, Cohere, Groq, Cloudflare, DeepBricks, OpenRouter, and more.
 ## ✨ Features
 - No front-end, pure configuration file to configure API channels. You can run your own API station just by writing a file, and the documentation has a detailed configuration guide, beginner-friendly.
-- Unified management of multiple backend services, supporting providers such as OpenAI, Deepseek, DeepBricks, OpenRouter, and other APIs in OpenAI format. Supports OpenAI Dalle-3 image generation.
 - Simultaneously supports Anthropic, Gemini, Vertex AI, Cohere, Groq, Cloudflare. Vertex simultaneously supports Claude and Gemini API.
 - Support OpenAI, Anthropic, Gemini, Vertex native tool use function calls.
 - Support OpenAI, Anthropic, Gemini, Vertex native image recognition API.
 - Support four types of load balancing.
-1. Supports channel-level weighted load balancing, allowing requests to be distributed according to different channel weights. It is not enabled by default and requires configuring channel weights.
-2. Support Vertex regional load balancing and high concurrency, which can increase Gemini and Claude concurrency by up to (number of APIs * number of regions) times. Automatically enabled without additional configuration.
-3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. It is not enabled by default and requires configuring `SCHEDULING_ALGORITHM` as `round_robin`.
-4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
 - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
 - Support channel cooling: When an API channel response fails, the channel will automatically be excluded and cooled for a period of time, and requests to the channel will be stopped. After the cooling period ends, the model will automatically be restored until it fails again, at which point it will be cooled again.
 - Support fine-grained model timeout settings, allowing different timeout durations for each model.
@@ -48,7 +48,7 @@ You must fill in the configuration file in advance to start `uni-api`, and you m
 ```yaml
 providers:
-  - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, can be any name, required
     base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
     api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required, automatically uses base_url and api to get all available models through the /v1/models endpoint.
   # Multiple providers can be configured here, each provider can configure multiple API Keys, and each API Key can configure multiple models.
@@ -61,7 +61,7 @@ Detailed advanced configuration of `api.yaml`:
 ```yaml
 providers:
-  - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, can be any name, required
     base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
     api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
     model: # Optional, if model is not configured, all available models will be automatically obtained through base_url and api via the /v1/models endpoint.
@@ -96,6 +96,7 @@ providers:
       #   gemini-1.5-flash: 2/min
       #   default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
       api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.

 ## Introduction
+For personal use, one/new-api is too complex with many commercial features that individuals don't need. If you don't want a complicated frontend interface and prefer support for more models, you can try uni-api. This is a project that unifies the management of large language model APIs, allowing you to call multiple backend services through a single unified API interface, converting them all to OpenAI format, and supporting load balancing. Currently supported backend services include: OpenAI, Anthropic, Gemini, Vertex, Cohere, Groq, Cloudflare, OpenRouter, and more.
 ## ✨ Features
 - No front-end, pure configuration file to configure API channels. You can run your own API station just by writing a file, and the documentation has a detailed configuration guide, beginner-friendly.
+- Unified management of multiple backend services, supporting providers such as OpenAI, Deepseek, OpenRouter, and other APIs in OpenAI format. Supports OpenAI Dalle-3 image generation.
 - Simultaneously supports Anthropic, Gemini, Vertex AI, Cohere, Groq, Cloudflare. Vertex simultaneously supports Claude and Gemini API.
 - Support OpenAI, Anthropic, Gemini, Vertex native tool use function calls.
 - Support OpenAI, Anthropic, Gemini, Vertex native image recognition API.
 - Support four types of load balancing.
+  1. Supports channel-level weighted load balancing, allowing requests to be distributed according to different channel weights. It is not enabled by default and requires configuring channel weights.
+  2. Support Vertex regional load balancing and high concurrency, which can increase Gemini and Claude concurrency by up to (number of APIs * number of regions) times. Automatically enabled without additional configuration.
+  3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. It is not enabled by default and requires configuring `SCHEDULING_ALGORITHM` as `round_robin`.
+  4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
 - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
 - Support channel cooling: When an API channel response fails, the channel will automatically be excluded and cooled for a period of time, and requests to the channel will be stopped. After the cooling period ends, the model will automatically be restored until it fails again, at which point it will be cooled again.
 - Support fine-grained model timeout settings, allowing different timeout durations for each model.
 ```yaml
 providers:
+  - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, can be any name, required
     base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
     api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required, automatically uses base_url and api to get all available models through the /v1/models endpoint.
   # Multiple providers can be configured here, each provider can configure multiple API Keys, and each API Key can configure multiple models.
 ```yaml
 providers:
+  - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, can be any name, required
     base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
     api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
     model: # Optional, if model is not configured, all available models will be automatically obtained through base_url and api via the /v1/models endpoint.
       #   gemini-1.5-flash: 2/min
       #   default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
       api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
+      api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.

README_CN.md CHANGED Viewed

@@ -13,12 +13,12 @@
 ## 介绍
-如果个人使用的话，one/new-api 过于复杂，有很多个人不需要使用的商用功能，如果你不想要复杂的前端界面，有想要支持的模型多一点，可以试试 uni-api。这是一个统一管理大模型API的项目，可以通过一个统一的API接口调用多个后端服务，统一转换为 OpenAI 格式，支持负载均衡。目前支持的后端服务有：OpenAI、Anthropic、Gemini、Vertex、Cohere、Groq、Cloudflare、DeepBricks、OpenRouter 等。
 ## ✨ 特性
 - 无前端，纯配置文件配置 API 渠道。只要写一个文件就能运行起一个属于自己的 API 站，文档有详细的配置指南，小白友好。
-- 统一管理多个后端服务，支持 OpenAI、Deepseek、DeepBricks、OpenRouter 等其他 API 是 OpenAI 格式的提供商。支持 OpenAI Dalle-3 图像生成。
 - 同时支持 Anthropic、Gemini、Vertex AI、Cohere、Groq、Cloudflare。Vertex 同时支持 Claude 和 Gemini API。
 - 支持 OpenAI、 Anthropic、Gemini、Vertex 原生 tool use 函数调用。
 - 支持 OpenAI、Anthropic、Gemini、Vertex 原生识图 API。
@@ -48,7 +48,7 @@
 ```yaml
 providers:
-  - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks，随便取名字，必填
     base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址，必填
     api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key，必填，自动使用 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
   # 这里可以配置多个提供商，每个提供商可以配置多个 API Key，每个 API Key 可以配置多个模型。
@@ -61,7 +61,7 @@ api_keys:
 ```yaml
 providers:
-  - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks，随便取名字，必填
     base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址，必填
     api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key，必填
     model: # 选填，如果不配置 model，会自动通过 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
@@ -96,6 +96,7 @@ providers:
       #   gemini-1.5-flash: 2/min
       #   default: 4/min # 如果模型没有设置频率限制，使用 default 的频率限制
       api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间，单位为秒，选填。默认为 0 秒, 当设置为 0 秒时，不启用冷却机制。当存在多个 API key 时才会生效。
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx #    描述： 您的Google Cloud项目ID。格式： 字符串，通常由小写字母、数字和连字符组成。获取方式： 在Google Cloud Console的项目选择器中可以找到您的项目ID。

 ## 介绍
+如果个人使用的话，one/new-api 过于复杂，有很多个人不需要使用的商用功能，如果你不想要复杂的前端界面，又想要支持的模型多一点，可以试试 uni-api。这是一个统一管理大模型 API 的项目，可以通过一个统一的API 接口调用多种不同提供商的服务，统一转换为 OpenAI 格式，支持负载均衡。目前支持的后端服务有：OpenAI、Anthropic、Gemini、Vertex、Cohere、Groq、Cloudflare、OpenRouter 等。
 ## ✨ 特性
 - 无前端，纯配置文件配置 API 渠道。只要写一个文件就能运行起一个属于自己的 API 站，文档有详细的配置指南，小白友好。
+- 统一管理多个后端服务，支持 OpenAI、Deepseek、OpenRouter 等其他 API 是 OpenAI 格式的提供商。支持 OpenAI Dalle-3 图像生成。
 - 同时支持 Anthropic、Gemini、Vertex AI、Cohere、Groq、Cloudflare。Vertex 同时支持 Claude 和 Gemini API。
 - 支持 OpenAI、 Anthropic、Gemini、Vertex 原生 tool use 函数调用。
 - 支持 OpenAI、Anthropic、Gemini、Vertex 原生识图 API。
 ```yaml
 providers:
+  - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter，随便取名字，必填
     base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址，必填
     api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key，必填，自动使用 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
   # 这里可以配置多个提供商，每个提供商可以配置多个 API Key，每个 API Key 可以配置多个模型。
 ```yaml
 providers:
+  - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter，随便取名字，必填
     base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址，必填
     api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key，必填
     model: # 选填，如果不配置 model，会自动通过 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
       #   gemini-1.5-flash: 2/min
       #   default: 4/min # 如果模型没有设置频率限制，使用 default 的频率限制
       api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间，单位为秒，选填。默认为 0 秒, 当设置为 0 秒时，不启用冷却机制。当存在多个 API key 时才会生效。
+      api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序，选填。默认为 round_robin，可选值有：round_robin，random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡，random 是随机负载均衡。
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx #    描述： 您的Google Cloud项目ID。格式： 字符串，通常由小写字母、数字和连字符组成。获取方式： 在Google Cloud Console的项目选择器中可以找到您的项目ID。

utils.py CHANGED Viewed

@@ -80,8 +80,16 @@ async def get_user_rate_limit(app, api_index: int = None):
 import asyncio
 class ThreadSafeCircularList:
-    def __init__(self, items = [], rate_limit={"default": "999999/min"}):
-        self.items = items
         self.index = 0
         self.lock = asyncio.Lock()
         # 修改为二级字典，第一级是item，第二级是model
@@ -260,12 +268,14 @@ def update_config(config_data, use_config_url=False):
             if isinstance(provider_api, str):
                 provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
                     [provider_api],
-                    safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"})
                 )
             if isinstance(provider_api, list):
                 provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
                     provider_api,
-                    safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"})
                 )
         if "models.inference.ai.azure.com" in provider['base_url'] and not provider.get("model"):

 import asyncio
 class ThreadSafeCircularList:
+    def __init__(self, items = [], rate_limit={"default": "999999/min"}, schedule_algorithm="round_robin"):
+        if schedule_algorithm == "random":
+            import random
+            self.items = random.sample(items, len(items))
+        elif schedule_algorithm == "round_robin":
+            self.items = items
+        else:
+            self.items = items
+            logger.warning(f"Unknown schedule algorithm: {schedule_algorithm}, use (round_robin, random) instead")
         self.index = 0
         self.lock = asyncio.Lock()
         # 修改为二级字典，第一级是item，第二级是model
             if isinstance(provider_api, str):
                 provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
                     [provider_api],
+                    safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"}),
+                    safe_get(provider, "preferences", "api_key_schedule_algorithm", default="round_robin")
                 )
             if isinstance(provider_api, list):
                 provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
                     provider_api,
+                    safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"}),
+                    safe_get(provider, "preferences", "api_key_schedule_algorithm", default="round_robin")
                 )
         if "models.inference.ai.azure.com" in provider['base_url'] and not provider.get("model"):