yym68686 commited on
Commit
926469d
·
1 Parent(s): eeaa4ee

✨ Feature: Add feature: support setting API key random load balancing

Browse files
Files changed (3) hide show
  1. README.md +9 -8
  2. README_CN.md +5 -4
  3. utils.py +14 -4
README.md CHANGED
@@ -13,20 +13,20 @@
13
 
14
  ## Introduction
15
 
16
- For personal use, one/new-api is too complex with many commercial features that individuals don't need. If you don't want a complicated frontend interface and prefer support for more models, you can try uni-api. This is a project that unifies the management of large language model APIs, allowing you to call multiple backend services through a single unified API interface, converting them all to OpenAI format, and supporting load balancing. Currently supported backend services include: OpenAI, Anthropic, Gemini, Vertex, Cohere, Groq, Cloudflare, DeepBricks, OpenRouter, and more.
17
 
18
  ## ✨ Features
19
 
20
  - No front-end, pure configuration file to configure API channels. You can run your own API station just by writing a file, and the documentation has a detailed configuration guide, beginner-friendly.
21
- - Unified management of multiple backend services, supporting providers such as OpenAI, Deepseek, DeepBricks, OpenRouter, and other APIs in OpenAI format. Supports OpenAI Dalle-3 image generation.
22
  - Simultaneously supports Anthropic, Gemini, Vertex AI, Cohere, Groq, Cloudflare. Vertex simultaneously supports Claude and Gemini API.
23
  - Support OpenAI, Anthropic, Gemini, Vertex native tool use function calls.
24
  - Support OpenAI, Anthropic, Gemini, Vertex native image recognition API.
25
  - Support four types of load balancing.
26
- 1. Supports channel-level weighted load balancing, allowing requests to be distributed according to different channel weights. It is not enabled by default and requires configuring channel weights.
27
- 2. Support Vertex regional load balancing and high concurrency, which can increase Gemini and Claude concurrency by up to (number of APIs * number of regions) times. Automatically enabled without additional configuration.
28
- 3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. It is not enabled by default and requires configuring `SCHEDULING_ALGORITHM` as `round_robin`.
29
- 4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
30
  - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
31
  - Support channel cooling: When an API channel response fails, the channel will automatically be excluded and cooled for a period of time, and requests to the channel will be stopped. After the cooling period ends, the model will automatically be restored until it fails again, at which point it will be cooled again.
32
  - Support fine-grained model timeout settings, allowing different timeout durations for each model.
@@ -48,7 +48,7 @@ You must fill in the configuration file in advance to start `uni-api`, and you m
48
 
49
  ```yaml
50
  providers:
51
- - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, can be any name, required
52
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
53
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required, automatically uses base_url and api to get all available models through the /v1/models endpoint.
54
  # Multiple providers can be configured here, each provider can configure multiple API Keys, and each API Key can configure multiple models.
@@ -61,7 +61,7 @@ Detailed advanced configuration of `api.yaml`:
61
 
62
  ```yaml
63
  providers:
64
- - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, can be any name, required
65
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
66
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
67
  model: # Optional, if model is not configured, all available models will be automatically obtained through base_url and api via the /v1/models endpoint.
@@ -96,6 +96,7 @@ providers:
96
  # gemini-1.5-flash: 2/min
97
  # default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
98
  api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
 
99
 
100
  - provider: vertex
101
  project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
 
13
 
14
  ## Introduction
15
 
16
+ For personal use, one/new-api is too complex with many commercial features that individuals don't need. If you don't want a complicated frontend interface and prefer support for more models, you can try uni-api. This is a project that unifies the management of large language model APIs, allowing you to call multiple backend services through a single unified API interface, converting them all to OpenAI format, and supporting load balancing. Currently supported backend services include: OpenAI, Anthropic, Gemini, Vertex, Cohere, Groq, Cloudflare, OpenRouter, and more.
17
 
18
  ## ✨ Features
19
 
20
  - No front-end, pure configuration file to configure API channels. You can run your own API station just by writing a file, and the documentation has a detailed configuration guide, beginner-friendly.
21
+ - Unified management of multiple backend services, supporting providers such as OpenAI, Deepseek, OpenRouter, and other APIs in OpenAI format. Supports OpenAI Dalle-3 image generation.
22
  - Simultaneously supports Anthropic, Gemini, Vertex AI, Cohere, Groq, Cloudflare. Vertex simultaneously supports Claude and Gemini API.
23
  - Support OpenAI, Anthropic, Gemini, Vertex native tool use function calls.
24
  - Support OpenAI, Anthropic, Gemini, Vertex native image recognition API.
25
  - Support four types of load balancing.
26
+ 1. Supports channel-level weighted load balancing, allowing requests to be distributed according to different channel weights. It is not enabled by default and requires configuring channel weights.
27
+ 2. Support Vertex regional load balancing and high concurrency, which can increase Gemini and Claude concurrency by up to (number of APIs * number of regions) times. Automatically enabled without additional configuration.
28
+ 3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. It is not enabled by default and requires configuring `SCHEDULING_ALGORITHM` as `round_robin`.
29
+ 4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
30
  - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
31
  - Support channel cooling: When an API channel response fails, the channel will automatically be excluded and cooled for a period of time, and requests to the channel will be stopped. After the cooling period ends, the model will automatically be restored until it fails again, at which point it will be cooled again.
32
  - Support fine-grained model timeout settings, allowing different timeout durations for each model.
 
48
 
49
  ```yaml
50
  providers:
51
+ - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, can be any name, required
52
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
53
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required, automatically uses base_url and api to get all available models through the /v1/models endpoint.
54
  # Multiple providers can be configured here, each provider can configure multiple API Keys, and each API Key can configure multiple models.
 
61
 
62
  ```yaml
63
  providers:
64
+ - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, can be any name, required
65
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
66
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
67
  model: # Optional, if model is not configured, all available models will be automatically obtained through base_url and api via the /v1/models endpoint.
 
96
  # gemini-1.5-flash: 2/min
97
  # default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
98
  api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
99
+ api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
100
 
101
  - provider: vertex
102
  project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
README_CN.md CHANGED
@@ -13,12 +13,12 @@
13
 
14
  ## 介绍
15
 
16
- 如果个人使用的话,one/new-api 过于复杂,有很多个人不需要使用的商用功能,如果你不想要复杂的前端界面,想要支持的模型多一点,可以试试 uni-api。这是一个统一管理大模型API的项目,可以通过一个统一的API接口调用多个后端服务,统一转换为 OpenAI 格式,支持负载均衡。目前支持的后端服务有:OpenAI、Anthropic、Gemini、Vertex、Cohere、Groq、Cloudflare、DeepBricks、OpenRouter 等。
17
 
18
  ## ✨ 特性
19
 
20
  - 无前端,纯配置文件配置 API 渠道。只要写一个文件就能运行起一个属于自己的 API 站,文档有详细的配置指南,小白友好。
21
- - 统一管理多个后端服务,支持 OpenAI、Deepseek、DeepBricks、OpenRouter 等其他 API 是 OpenAI 格式的提供商。支持 OpenAI Dalle-3 图像生成。
22
  - 同时支持 Anthropic、Gemini、Vertex AI、Cohere、Groq、Cloudflare。Vertex 同时支持 Claude 和 Gemini API。
23
  - 支持 OpenAI、 Anthropic、Gemini、Vertex 原生 tool use 函数调用。
24
  - 支持 OpenAI、Anthropic、Gemini、Vertex 原生识图 API。
@@ -48,7 +48,7 @@
48
 
49
  ```yaml
50
  providers:
51
- - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks,随便取名字,必填
52
  base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
53
  api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填,自动使用 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
54
  # 这里可以配置多个提供商,每个提供商可以配置多个 API Key,每个 API Key 可以配置多个模型。
@@ -61,7 +61,7 @@ api_keys:
61
 
62
  ```yaml
63
  providers:
64
- - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks,随便取名字,必填
65
  base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
66
  api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填
67
  model: # 选填,如果不配置 model,会自动通过 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
@@ -96,6 +96,7 @@ providers:
96
  # gemini-1.5-flash: 2/min
97
  # default: 4/min # 如果模型没有设置频率限制,使用 default 的频率限制
98
  api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间,单位为秒,选填。默认为 0 秒, 当设置为 0 秒时,不启用冷却机制。当存在多个 API key 时才会生效。
 
99
 
100
  - provider: vertex
101
  project_id: gen-lang-client-xxxxxxxxxxxxxx # 描述: 您的Google Cloud项目ID。格式: 字符串,通常由小写字母、数字和连字符组成。获取方式: 在Google Cloud Console的项目选择器中可以找到您的项目ID。
 
13
 
14
  ## 介绍
15
 
16
+ 如果个人使用的话,one/new-api 过于复杂,有很多个人不需要使用的商用功能,如果你不想要复杂的前端界面,想要支持的模型多一点,可以试试 uni-api。这是一个统一管理大模型 API 的项目,可以通过一个统一的API 接口调用多种不同提供商的服务,统一转换为 OpenAI 格式,支持负载均衡。目前支持的后端服务有:OpenAI、Anthropic、Gemini、Vertex、Cohere、Groq、Cloudflare、OpenRouter 等。
17
 
18
  ## ✨ 特性
19
 
20
  - 无前端,纯配置文件配置 API 渠道。只要写一个文件就能运行起一个属于自己的 API 站,文档有详细的配置指南,小白友好。
21
+ - 统一管理多个后端服务,支持 OpenAI、Deepseek、OpenRouter 等其他 API 是 OpenAI 格式的提供商。支持 OpenAI Dalle-3 图像生成。
22
  - 同时支持 Anthropic、Gemini、Vertex AI、Cohere、Groq、Cloudflare。Vertex 同时支持 Claude 和 Gemini API。
23
  - 支持 OpenAI、 Anthropic、Gemini、Vertex 原生 tool use 函数调用。
24
  - 支持 OpenAI、Anthropic、Gemini、Vertex 原生识图 API。
 
48
 
49
  ```yaml
50
  providers:
51
+ - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter,随便取名字,必填
52
  base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
53
  api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填,自动使用 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
54
  # 这里可以配置多个提供商,每个提供商可以配置多个 API Key,每个 API Key 可以配置多个模型。
 
61
 
62
  ```yaml
63
  providers:
64
+ - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter,随便取名字,必填
65
  base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
66
  api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填
67
  model: # 选填,如果不配置 model,会自动通过 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
 
96
  # gemini-1.5-flash: 2/min
97
  # default: 4/min # 如果模型没有设置频率限制,使用 default 的频率限制
98
  api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间,单位为秒,选填。默认为 0 秒, 当设置为 0 秒时,不启用冷却机制。当存在多个 API key 时才会生效。
99
+ api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序,选填。默认为 round_robin,可选值有:round_robin,random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡,random 是随机负载均衡。
100
 
101
  - provider: vertex
102
  project_id: gen-lang-client-xxxxxxxxxxxxxx # 描述: 您的Google Cloud项目ID。格式: 字符串,通常由小写字母、数字和连字符组成。获取方式: 在Google Cloud Console的项目选择器中可以找到您的项目ID。
utils.py CHANGED
@@ -80,8 +80,16 @@ async def get_user_rate_limit(app, api_index: int = None):
80
  import asyncio
81
 
82
  class ThreadSafeCircularList:
83
- def __init__(self, items = [], rate_limit={"default": "999999/min"}):
84
- self.items = items
 
 
 
 
 
 
 
 
85
  self.index = 0
86
  self.lock = asyncio.Lock()
87
  # 修改为二级字典,第一级是item,第二级是model
@@ -260,12 +268,14 @@ def update_config(config_data, use_config_url=False):
260
  if isinstance(provider_api, str):
261
  provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
262
  [provider_api],
263
- safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"})
 
264
  )
265
  if isinstance(provider_api, list):
266
  provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
267
  provider_api,
268
- safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"})
 
269
  )
270
 
271
  if "models.inference.ai.azure.com" in provider['base_url'] and not provider.get("model"):
 
80
  import asyncio
81
 
82
  class ThreadSafeCircularList:
83
+ def __init__(self, items = [], rate_limit={"default": "999999/min"}, schedule_algorithm="round_robin"):
84
+ if schedule_algorithm == "random":
85
+ import random
86
+ self.items = random.sample(items, len(items))
87
+ elif schedule_algorithm == "round_robin":
88
+ self.items = items
89
+ else:
90
+ self.items = items
91
+ logger.warning(f"Unknown schedule algorithm: {schedule_algorithm}, use (round_robin, random) instead")
92
+
93
  self.index = 0
94
  self.lock = asyncio.Lock()
95
  # 修改为二级字典,第一级是item,第二级是model
 
268
  if isinstance(provider_api, str):
269
  provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
270
  [provider_api],
271
+ safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"}),
272
+ safe_get(provider, "preferences", "api_key_schedule_algorithm", default="round_robin")
273
  )
274
  if isinstance(provider_api, list):
275
  provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
276
  provider_api,
277
+ safe_get(provider, "preferences", "api_key_rate_limit", default={"default": "999999/min"}),
278
+ safe_get(provider, "preferences", "api_key_schedule_algorithm", default="round_robin")
279
  )
280
 
281
  if "models.inference.ai.azure.com" in provider['base_url'] and not provider.get("model"):